public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Enable GCC support for AMX
@ 2020-07-06  1:58 Hongyu Wang
  2020-07-07  3:24 ` Hongyu Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Hongyu Wang @ 2020-07-06  1:58 UTC (permalink / raw)
  To: gcc-patches, ubizjak

[-- Attachment #1: Type: text/plain, Size: 3933 bytes --]

Hi:

This patch is about to support Intel Advanced Matrix Extensions (AMX)
which will be enabled in GLC.

AMX is a new 64-bit programming paradigm consisting of two
compo nents: a set of 2-dimensional registers (tiles) representing
sub-arrays from a larger 2-dimensional memory image,
and an accelerator able to operate on tiles

Supported instructions are

AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
AMX-BF16:tdpbf16ps

The intrinsics adopts constant tile register number as its input parameters.

For detailed information, please refer to
https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Bootstrap ok, regression test on i386/x86 backend is ok.

OK for master?

gcc/ChangeLog

    * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
    OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
    OPTION_MASK_ISA2_AMX_TILE_UNSET,
    OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
    New marcos.
    (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
    * common/config/i386/i386-cpuinfo.h (processor_types): Add
    FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
    * common/config/i386/cpuinfo.h (XSTATE_TILECFG,
    XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
    (get_available_features): Enable AMX features only if
    their states are suoorited by OSXSAVE.
    * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
    for amx-tile, amx-int8, amx-bf16.
    * config.gcc: Add amxtileintrin.h, amxint8intrin.h,
    amxbf16intrin.h to extra headers.
    * config/i386/amxbf16intrin.h: New file.
    * config/i386/amxint8intrin.h: Ditto.
    * config/i386/amxtileintrin.h: Ditto.
    * config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
    New macro.
    * config/i386/i386-c.c (ix86_target_macros_internal): Define
    __AMX_TILE__, __AMX_INT8__, AMX_BF16__.
    * config/i386/i386-options.c (ix86_target_string): Add
    -mamx-tile, -mamx-int8, -mamx-bf16.
    (ix86_option_override_internal): Handle AMX-TILE,
    AMX-INT8, AMX-BF16.
    * config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
    TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
    PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
    * config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
    * config/i386/immintrin.h: Include amxtileintrin.h,
    amxint8intrin.h, amxbf16intrin.h.
    * doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
    * doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
    * doc/sourcebuild.texi ((Effective-Target Keywords, Other
    hardware attributes): Document amx_int8, amx_tile, amx_bf16.

gcc/testsuite/ChangeLog

    * lib/target-supports.exp (check_effective_target_amx_tile,
    check_effective_target_amx_int8,
    check_effective_target_amx_bf16): New proc.
    * g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
    * g++.dg/other/i386-3.C: Ditto.
    * gcc.target/i386/sse-12.c: Ditto.
    * gcc.target/i386/sse-13.c: Ditto.
    * gcc.target/i386/sse-14.c: Ditto.
    * gcc.target/i386/sse-22.c: Ditto.
    * gcc.target/i386/sse-23.c: Ditto.
    * gcc.target/i386/funcspec-56.inc: Add new target attribute.
    * gcc.target/i386/amxbf16-asmatt-1.c: New test.
    * gcc.target/i386/amxint8-asmatt-1.c: Ditto.
    * gcc.target/i386/amxtile-asmatt-1.c: Ditto.
    * gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
    * gcc.target/i386/amxint8-asmintel-1.c: Ditto.
    * gcc.target/i386/amxtile-asmintel-1.c: Ditto.
    * gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
    * gcc.target/i386/amxint8-asmatt-2.c: Ditto.
    * gcc.target/i386/amxtile-asmatt-2.c: Ditto.
    * gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
    * gcc.target/i386/amxint8-asmintel-2.c: Ditto.
    * gcc.target/i386/amxtile-asmintel-2.c: Ditto.

[-- Attachment #2: 0001-Enable-GCC-support-for-AMX-TILE-AMX-INT8-AMX-BF16.patch --]
[-- Type: text/x-patch, Size: 50002 bytes --]

From 88a81d93c9d896cf67869f450905c2ea2b08be74 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 25 Jul 2019 16:49:36 +0800
Subject: [PATCH] Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16.

AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
AMX-BF16:tdpbf16ps

gcc/ChangeLog

	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
	OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
	OPTION_MASK_ISA2_AMX_TILE_UNSET,
	OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
	New marcos.
	(ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
	* common/config/i386/i386-cpuinfo.h (processor_types): Add
	FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
	* common/config/i386/cpuinfo.h (XSTATE_TILECFG,
	XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
	(get_available_features): Enable AMX features only if
	their states are suoorited by OSXSAVE.
	* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
	for amx-tile, amx-int8, amx-bf16.
	* config.gcc: Add amxtileintrin.h, amxint8intrin.h,
	amxbf16intrin.h to extra headers.
	* config/i386/amxbf16intrin.h: New file.
	* config/i386/amxint8intrin.h: Ditto.
	* config/i386/amxtileintrin.h: Ditto.
	* config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
	New macro.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AMX_TILE__, __AMX_INT8__, AMX_BF16__.
	* config/i386/i386-options.c (ix86_target_string): Add
	-mamx-tile, -mamx-int8, -mamx-bf16.
	(ix86_option_override_internal): Handle AMX-TILE,
	AMX-INT8, AMX-BF16.
	* config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
	TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
	PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
	* config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* config/i386/immintrin.h: Include amxtileintrin.h,
	amxint8intrin.h, amxbf16intrin.h.
	* doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
	* doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
	* doc/sourcebuild.texi ((Effective-Target Keywords, Other
	hardware attributes): Document amx_int8, amx_tile, amx_bf16.

gcc/testsuite/ChangeLog

	* lib/target-supports.exp (check_effective_target_amx_tile,
	check_effective_target_amx_int8,
	check_effective_target_amx_bf16): New proc.
	* g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* g++.dg/other/i386-3.C: Ditto.
	* gcc.target/i386/sse-12.c: Ditto.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
	* gcc.target/i386/amxbf16-asmatt-1.c: New test.
	* gcc.target/i386/amxint8-asmatt-1.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-1.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
	* gcc.target/i386/amxint8-asmatt-2.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-2.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-2.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-2.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h              | 16 +++++
 gcc/common/config/i386/i386-common.c          | 45 +++++++++++++
 gcc/common/config/i386/i386-cpuinfo.h         |  3 +
 gcc/common/config/i386/i386-isas.h            |  3 +
 gcc/config.gcc                                |  4 +-
 gcc/config/i386/amxbf16intrin.h               | 25 ++++++++
 gcc/config/i386/amxint8intrin.h               | 37 +++++++++++
 gcc/config/i386/amxtileintrin.h               | 63 +++++++++++++++++++
 gcc/config/i386/cpuid.h                       |  3 +
 gcc/config/i386/i386-c.c                      |  7 +++
 gcc/config/i386/i386-options.c                | 20 +++++-
 gcc/config/i386/i386.h                        |  9 +++
 gcc/config/i386/i386.opt                      | 14 ++++-
 gcc/config/i386/immintrin.h                   |  6 ++
 gcc/doc/extend.texi                           | 15 +++++
 gcc/doc/invoke.texi                           | 10 +++
 gcc/doc/sourcebuild.texi                      |  9 +++
 gcc/testsuite/g++.dg/other/i386-2.C           |  3 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |  3 +-
 .../gcc.target/i386/amxbf16-asmatt-1.c        |  9 +++
 .../gcc.target/i386/amxbf16-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxbf16-asmintel-1.c      |  9 +++
 .../gcc.target/i386/amxbf16-asmintel-2.c      |  4 ++
 .../gcc.target/i386/amxint8-asmatt-1.c        | 15 +++++
 .../gcc.target/i386/amxint8-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxint8-asmintel-1.c      | 15 +++++
 .../gcc.target/i386/amxint8-asmintel-2.c      |  4 ++
 .../gcc.target/i386/amxtile-asmatt-1.c        | 24 +++++++
 .../gcc.target/i386/amxtile-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxtile-asmintel-1.c      | 24 +++++++
 .../gcc.target/i386/amxtile-asmintel-2.c      |  4 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |  6 ++
 gcc/testsuite/gcc.target/i386/sse-12.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |  5 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |  3 +-
 gcc/testsuite/lib/target-supports.exp         | 33 ++++++++++
 38 files changed, 456 insertions(+), 12 deletions(-)
 create mode 100644 gcc/config/i386/amxbf16intrin.h
 create mode 100644 gcc/config/i386/amxint8intrin.h
 create mode 100644 gcc/config/i386/amxtileintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 3eda53240f6..99e196c3230 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -491,15 +491,20 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_OPMASK			0x20
 #define XSTATE_ZMM			0x40
 #define XSTATE_HI_ZMM			0x80
+#define XSTATE_TILECFG			0x20000
+#define XSTATE_TILEDATA		0x40000
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
 #define XCR_AVX512F_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
+#define XCR_AMX_ENABLED_MASK \
+  (XSTATE_TILECFG | XSTATE_TILEDATA)
 
   /* Check if AVX and AVX512 are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
+  int amx_usable = 0;
   if ((ecx & bit_OSXSAVE))
     {
       /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
@@ -515,6 +520,8 @@ get_available_features (struct __processor_model *cpu_model,
 	  avx512_usable = ((xcrlow & XCR_AVX512F_ENABLED_MASK)
 			   == XCR_AVX512F_ENABLED_MASK);
 	}
+      amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
+		    == XCR_AMX_ENABLED_MASK);
     }
 
 #define set_feature(f) \
@@ -633,6 +640,15 @@ get_available_features (struct __processor_model *cpu_model,
 	set_feature (FEATURE_PCONFIG);
       if (edx & bit_IBT)
 	set_feature (FEATURE_IBT);
+      if (amx_usable)
+	{
+	  if (edx & bit_AMX_TILE)
+	    set_feature (FEATURE_AMX_TILE);
+	  if (edx & bit_AMX_INT8)
+	    set_feature (FEATURE_AMX_INT8);
+	  if (edx & bit_AMX_BF16)
+	    set_feature (FEATURE_AMX_BF16);
+	}
       if (avx512_usable)
 	{
 	  if (ebx & bit_AVX512F)
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 654df68d688..f763a8e2f30 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -101,6 +101,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_SET OPTION_MASK_ISA2_AVX512VP2INTERSECT
+#define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -246,6 +249,9 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SERIALIZE_UNSET OPTION_MASK_ISA2_SERIALIZE
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET OPTION_MASK_ISA2_AVX512VP2INTERSECT
 #define OPTION_MASK_ISA2_TSXLDTRK_UNSET OPTION_MASK_ISA2_TSXLDTRK
+#define OPTION_MASK_ISA2_AMX_TILE_UNSET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_UNSET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_UNSET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -930,6 +936,45 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mamx_tile:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_int8:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_bf16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 96cf0eaea47..b0ed8e77006 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -214,6 +214,9 @@ enum processor_features
   FEATURE_XSAVEC,
   FEATURE_XSAVEOPT,
   FEATURE_XSAVES,
+  FEATURE_AMX_TILE,
+  FEATURE_AMX_INT8,
+  FEATURE_AMX_BF16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 08c9dbecc76..3c830ea08ff 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -160,4 +160,7 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("xsaveopt", FEATURE_XSAVEOPT, P_NONE,
 			"-mxsaveopt")
   ISA_NAMES_TABLE_ENTRY("xsaves", FEATURE_XSAVES, P_NONE, "-mxsaves")
+  ISA_NAMES_TABLE_ENTRY("amx-tile", FEATURE_AMX_TILE, P_NONE, "-mamx-tile")
+  ISA_NAMES_TABLE_ENTRY("amx-int8", FEATURE_AMX_INT8, P_NONE, "-mamx-int8")
+  ISA_NAMES_TABLE_ENTRY("amx-bf16", FEATURE_AMX_BF16, P_NONE, "-mamx-bf16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index c0460686e21..bed834bed6b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -412,7 +412,7 @@ i[34567]86-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -447,7 +447,7 @@ x86_64-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
new file mode 100644
index 00000000000..df0e2262d50
--- /dev/null
+++ b/gcc/config/i386/amxbf16intrin.h
@@ -0,0 +1,25 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXBF16INTRIN_H_INCLUDED
+#define _AMXBF16INTRIN_H_INCLUDED
+
+#if !defined(__AMX_BF16__)
+#pragma GCC push_options
+#pragma GCC target("amx-bf16")
+#define __DISABLE_AMX_BF16__
+#endif /* __AMX_BF16__ */
+
+#if defined(__x86_64__) && defined(__AMX_BF16__)
+#define _tile_dpbf16ps(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+#endif
+
+#ifdef __DISABLE_AMX_BF16__
+#undef __DISABLE_AMX_BF16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_BF16__ */
+
+#endif /* _AMXBF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxint8intrin.h b/gcc/config/i386/amxint8intrin.h
new file mode 100644
index 00000000000..4b7a59587dc
--- /dev/null
+++ b/gcc/config/i386/amxint8intrin.h
@@ -0,0 +1,37 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxint8intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXINT8INTRIN_H_INCLUDED
+#define _AMXINT8INTRIN_H_INCLUDED
+
+#if !defined(__AMX_INT8__)
+#pragma GCC push_options
+#pragma GCC target("amx-int8")
+#define __DISABLE_AMX_INT8__
+#endif /* __AMX_INT8__ */
+
+#if defined(__x86_64__) && defined(__AMX_INT8__)
+#define _tile_dpbssd(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbssd\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbssd\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbsud(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbsud\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbsud\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbusd(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbusd\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbusd\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbuud(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbuud\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbuud\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+#endif
+
+#ifdef __DISABLE_AMX_INT8__
+#undef __DISABLE_AMX_INT8__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_INT8__ */
+
+#endif /* _AMXINT8INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
new file mode 100644
index 00000000000..fe995232743
--- /dev/null
+++ b/gcc/config/i386/amxtileintrin.h
@@ -0,0 +1,63 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxtileintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXTILEINTRIN_H_INCLUDED
+#define _AMXTILEINTRIN_H_INCLUDED
+
+#if !defined(__AMX_TILE__)
+#pragma GCC push_options
+#pragma GCC target("amx-tile")
+#define __DISABLE_AMX_TILE__
+#endif /* __AMX_TILE__ */
+
+#if defined(__x86_64__) && defined(__AMX_TILE__)
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_loadconfig (const void *__config)
+{
+  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (__config));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_storeconfig (void *__config)
+{
+  __asm__ volatile ("sttilecfg\t%X0" : "=m" (__config));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_release (void)
+{
+  __asm__ volatile ("tilerelease" ::);
+}
+
+#define _tile_loadd(dst,base,stride)					\
+  __asm__ volatile							\
+  ("{tileloadd\t(%0,%1,1), %%tmm"#dst"|tileloadd\t%%tmm"#dst", [%0+%1*1]}" \
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stream_loadd(dst,base,stride)				\
+  __asm__ volatile							\
+  ("{tileloaddt1\t(%0,%1,1), %%tmm"#dst"|tileloaddt1\t%%tmm"#dst", [%0+%1*1]}"\
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stored(src,base,stride)					\
+  __asm__ volatile							\
+  ("{tilestored\t%%tmm"#src", (%0,%1,1)|tilestored\t[%0+%1*1], %%tmm"#src"}" \
+   :: "r" ((void*) base), "r" ((long) stride))
+
+#define _tile_zero(dst)				\
+  __asm__ volatile				\
+  ("tilezero\t%%tmm"#dst ::)
+
+#endif
+
+#ifdef __DISABLE_AMX_TILE__
+#undef __DISABLE_AMX_TILE__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_TILE__ */
+
+#endif /* _AMXTILEINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 94af4910d3c..226c62433bd 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -124,6 +124,9 @@
 #define bit_PCONFIG	(1 << 18)
 #define bit_SERIALIZE	(1 << 14)
 #define bit_TSXLDTRK    (1 << 16)
+#define bit_AMX_BF16    (1 << 22)
+#define bit_AMX_TILE    (1 << 24)
+#define bit_AMX_INT8    (1 << 25)
 
 /* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */
 #define bit_BNDREGS     (1 << 3)
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 891b2c68372..45b5ed56bbf 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -573,6 +573,13 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__ENQCMD__");
   if (isa_flag2 & OPTION_MASK_ISA2_TSXLDTRK)
     def_or_undef (parse_in, "__TSXLDTRK__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_TILE)
+    def_or_undef (parse_in, "__AMX_TILE__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_INT8)
+    def_or_undef (parse_in, "__AMX_INT8__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_BF16)
+    def_or_undef (parse_in, "__AMX_BF16__");
+
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 67480b2deea..e3f00d186f0 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -207,7 +207,10 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavx512bf16",	OPTION_MASK_ISA2_AVX512BF16 },
   { "-menqcmd",		OPTION_MASK_ISA2_ENQCMD },
   { "-mserialize",	OPTION_MASK_ISA2_SERIALIZE },
-  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK }
+  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK },
+  { "-mamx-tile",	OPTION_MASK_ISA2_AMX_TILE },
+  { "-mamx-int8",	OPTION_MASK_ISA2_AMX_INT8 },
+  { "-mamx-bf16",	OPTION_MASK_ISA2_AMX_BF16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1021,6 +1024,9 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("enqcmd", OPT_menqcmd),
     IX86_ATTR_ISA ("serialize", OPT_mserialize),
     IX86_ATTR_ISA ("tsxldtrk", OPT_mtsxldtrk),
+    IX86_ATTR_ISA ("amx-tile", OPT_mamx_tile),
+    IX86_ATTR_ISA ("amx-int8", OPT_mamx_int8),
+    IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2206,6 +2212,18 @@ ix86_option_override_internal (bool main_args_p,
 	    && !(opts->x_ix86_isa_flags2_explicit
 		 & OPTION_MASK_ISA2_AVX512BF16))
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512BF16;
+	if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_TILE))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE;
+	if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_INT8))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8;
+	if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_BF16))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16;
         if (((processor_alias_table[i].flags & PTA_MOVDIRI) != 0)
             && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MOVDIRI))
           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MOVDIRI;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e1775ff0b5d..f96f16c4ec1 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -203,6 +203,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_SERIALIZE_P(x) TARGET_ISA2_SERIALIZE_P(x)
 #define TARGET_TSXLDTRK	TARGET_ISA2_TSXLDTRK
 #define TARGET_TSXLDTRK_P(x) TARGET_ISA2_TSXLDTRK_P(x)
+#define TARGET_AMX_TILE TARGET_ISA2_AMX_TILE
+#define TARGET_AMX_TILE_P(x) TARGET_ISA2_AMX_TILE(x)
+#define TARGET_AMX_INT8 TARGET_ISA2_AMX_INT8
+#define TARGET_AMX_INT8_P(x) TARGET_ISA2_AMX_INT8(x)
+#define TARGET_AMX_BF16 TARGET_ISA2_AMX_BF16
+#define TARGET_AMX_BF16_P(x) TARGET_ISA2_AMX_BF16(x)
 
 #define TARGET_LP64	TARGET_ABI_64
 #define TARGET_LP64_P(x)	TARGET_ABI_64_P(x)
@@ -2449,6 +2455,9 @@ const wide_int_bitmask PTA_AVX512BF16 (0, HOST_WIDE_INT_1U << 11);
 const wide_int_bitmask PTA_WAITPKG (0, HOST_WIDE_INT_1U << 12);
 const wide_int_bitmask PTA_MOVDIRI(0, HOST_WIDE_INT_1U << 13);
 const wide_int_bitmask PTA_MOVDIR64B(0, HOST_WIDE_INT_1U << 14);
+const wide_int_bitmask PTA_AMX_TILE(0, HOST_WIDE_INT_1U << 15);
+const wide_int_bitmask PTA_AMX_INT8(0, HOST_WIDE_INT_1U << 16);
+const wide_int_bitmask PTA_AMX_BF16(0, HOST_WIDE_INT_1U << 17);
 
 const wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
   | PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index c9f7195d423..9389dc24948 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1114,4 +1114,16 @@ Support SERIALIZE built-in functions and code generation.
 
 mtsxldtrk
 Target Report Mask(ISA2_TSXLDTRK) Var(ix86_isa_flags2) Save
-Support TSXLDTRK built-in functions and code generation.
\ No newline at end of file
+Support TSXLDTRK built-in functions and code generation.
+
+mamx-tile
+Target Report Mask(ISA2_AMX_TILE) Var(ix86_isa_flags2) Save
+Support AMX-TILE built-in functions and code generation.
+
+mamx-int8
+Target Report Mask(ISA2_AMX_INT8) Var(ix86_isa_flags2) Save
+Support AMX-INT8 built-in functions and code generation.
+
+mamx-bf16
+Target Report Mask(ISA2_AMX_BF16) Var(ix86_isa_flags2) Save
+Support AMX-BF16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index b660d0d9040..6d25f44c303 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -144,6 +144,12 @@
 
 #include <tsxldtrkintrin.h>
 
+#include <amxtileintrin.h>
+
+#include <amxint8intrin.h>
+
+#include <amxbf16intrin.h>
+
 #include <rdseedintrin.h>
 
 #include <prfchwintrin.h>
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 95f7192e41e..3e6f5212262 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6597,6 +6597,21 @@ Enable/disable the generation of the XSAVEOPT instructions.
 @cindex @code{target("xsaves")} function attribute, x86
 Enable/disable the generation of the XSAVES instructions.
 
+@item amx-tile
+@itemx no-amx-tile
+@cindex @code{target("amx-tile")} function attribute, x86
+Enable/disable the generation of the AMX-TILE instructions.
+
+@item amx-int8
+@itemx no-amx-int8
+@cindex @code{target("amx-int8")} function attribute, x86
+Enable/disable the generation of the AMX-INT8 instructions.
+
+@item amx-bf16
+@itemx no-amx-bf16
+@cindex @code{target("amx-bf16")} function attribute, x86
+Enable/disable the generation of the AMX-BF16 instructions.
+
 @item cld
 @itemx no-cld
 @cindex @code{target("cld")} function attribute, x86
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 98cc0f2f0de..4938d32331d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1348,6 +1348,7 @@ See RS/6000 and PowerPC Options.
 -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
+-mamx-tile -mamx-int8 -mamx-bf16@gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mmemcpy-strategy=@var{strategy}  -mmemset-strategy=@var{strategy} @gol
@@ -29854,6 +29855,15 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @itemx -mserialize
 @opindex mserialize
+@need 200
+@itemx -mamx-tile
+@opindex mamx-tile
+@need 200
+@itemx -mamx-int8
+@opindex mamx-int8
+@need 200
+@itemx -mamx-bf16
+@opindex mamx-bf16
 These switches enable the use of instructions in the MMX, SSE,
 SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index a12af822443..72b2ae6714b 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,6 +2178,15 @@ Target supports the execution of @code{avx512f} instructions.
 @item avx512vp2intersect
 Target supports the execution of @code{avx512vp2intersect} instructions.
 
+@item amx_tile
+Target supports the execution of @code{amx-tile} instructions.
+
+@item amx_int8
+Target supports the execution of @code{amx-int8} instructions.
+
+@item amx_bf16
+Target supports the execution of @code{amx-bf16} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 04d5fec0f6c..449f30dbace 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h.h are usable
    with -O -pedantic-errors.  */
 
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index f40172ee9b5..29e98919386 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h are usable
    with -O -fkeep-inline-functions.  */
 
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
new file mode 100644
index 00000000000..98758f99a10
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
new file mode 100644
index 00000000000..b7332248ba7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-require-effective-target amx_bf16 } */
+#include"amxbf16-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
new file mode 100644
index 00000000000..c2d6074387a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
new file mode 100644
index 00000000000..605a44df3f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-require-effective-target amx_bf16 } */
+#include"amxbf16-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
new file mode 100644
index 00000000000..7af801bd223
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
new file mode 100644
index 00000000000..307c9d813bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-require-effective-target amx_int8 } */
+#include"amxint8-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
new file mode 100644
index 00000000000..bcfbb3fa5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
new file mode 100644
index 00000000000..7e1c1d63594
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-require-effective-target amx_int8 } */
+#include"amxint8-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
new file mode 100644
index 00000000000..96578719833
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]+\[^\n\]*%tmm\[0-9\]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
new file mode 100644
index 00000000000..c00cd0a8fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile" } */
+/* { dg-require-effective-target amx_tile } */
+#include"amxtile-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
new file mode 100644
index 00000000000..88ef612ed14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]\[^\n\]+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c
new file mode 100644
index 00000000000..99da63c119e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel" } */
+/* { dg-require-effective-target amx_tile } */
+#include"amxtile-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 9fe4a21984b..be376453b03 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -71,6 +71,9 @@ extern void test_tsxldtrk (void)		__attribute__((__target__("tsxldtrk")));
 extern void test_enqcmd (void)			__attribute__((__target__("enqcmd")));
 extern void test_avx512bf16 (void)		__attribute__((__target__("avx512bf16")));
 extern void test_avx512vp2intersect (void)	__attribute__((__target__("avx512vp2intersect")));
+extern void test_amx_tile (void)		__attribute__((__target__("amx-tile")));
+extern void test_amx_int8 (void)		__attribute__((__target__("amx-int8")));
+extern void test_amx_bf16 (void)		__attribute__((__target__("amx-bf16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -143,6 +146,9 @@ extern void test_no_tsxldtrk (void)		__attribute__((__target__("no-tsxldtrk")));
 extern void test_no_enqcmd (void)		__attribute__((__target__("no-enqcmd")));
 extern void test_no_avx512bf16 (void)		__attribute__((__target__("no-avx512bf16")));
 extern void test_no_avx512vp2intersect (void)	__attribute__((__target__("no-avx512vp2intersect")));
+extern void test_no_amx_tile (void)		__attribute__((__target__("no-amx-tile")));
+extern void test_no_amx_int8 (void)		__attribute__((__target__("no-amx-int8")));
+extern void test_no_amx_bf16 (void)		__attribute__((__target__("no-amx-bf16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index b1690d7204f..61146b2b30a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h gfniintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 3a6404707c4..4d6c9b3a17a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index edaa2aa8ad4..837b51c53e6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 7364b2ff337..fc75669f41b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -11,6 +11,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxldtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -102,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -219,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index eaadebef187..9ca7c5d919d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -10,6 +10,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -697,6 +698,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index cf0cfa11eb9..e1aa12a3ae9 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8854,6 +8854,39 @@ proc check_effective_target_avx512vaes { } {
     } "-mvaes" ]
 }
 
+# Return 1 if amx-tile instructions can be compiled.
+proc check_effective_target_amx_tile { } {
+    return [check_no_compiler_messages amx_tile object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tilerelease" ::);
+	}
+    } "-mamx-tile" ]
+}
+
+# Return 1 if amx-int8 instructions can be compiled.
+proc check_effective_target_amx_int8 { } {
+    return [check_no_compiler_messages amx_int8 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbssd\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-int8" ]
+}
+
+# Return 1 if amx-bf16 instructions can be compiled.
+proc check_effective_target_amx_bf16 { } {
+    return [check_no_compiler_messages amx_bf16 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbf16ps\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-bf16" ]
+}
+
 # Return 1 if vpclmulqdq instructions can be compiled.
 proc check_effective_target_vpclmulqdq { } {
     return [check_no_compiler_messages vpclmulqdq object {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-07-06  1:58 [PATCH] Enable GCC support for AMX Hongyu Wang
@ 2020-07-07  3:24 ` Hongyu Wang
  2020-07-17  5:40   ` Hongyu Wang
  2020-08-04 14:47 ` Kirill Yukhin
  2020-09-03 15:07 ` Kirill Yukhin
  2 siblings, 1 reply; 17+ messages in thread
From: Hongyu Wang @ 2020-07-07  3:24 UTC (permalink / raw)
  To: gcc-patches, kirill.yukhin

Hi Kirill, could you help review this patch?

Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月6日周一 上午9:58写道:
>
> Hi:
>
> This patch is about to support Intel Advanced Matrix Extensions (AMX)
> which will be enabled in GLC.
>
> AMX is a new 64-bit programming paradigm consisting of two
> compo nents: a set of 2-dimensional registers (tiles) representing
> sub-arrays from a larger 2-dimensional memory image,
> and an accelerator able to operate on tiles
>
> Supported instructions are
>
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> AMX-BF16:tdpbf16ps
>
> The intrinsics adopts constant tile register number as its input parameters.
>
> For detailed information, please refer to
> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
>
> Bootstrap ok, regression test on i386/x86 backend is ok.
>
> OK for master?
>
> gcc/ChangeLog
>
>     * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
>     OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
>     OPTION_MASK_ISA2_AMX_TILE_UNSET,
>     OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
>     New marcos.
>     (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
>     * common/config/i386/i386-cpuinfo.h (processor_types): Add
>     FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
>     * common/config/i386/cpuinfo.h (XSTATE_TILECFG,
>     XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
>     (get_available_features): Enable AMX features only if
>     their states are suoorited by OSXSAVE.
>     * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
>     for amx-tile, amx-int8, amx-bf16.
>     * config.gcc: Add amxtileintrin.h, amxint8intrin.h,
>     amxbf16intrin.h to extra headers.
>     * config/i386/amxbf16intrin.h: New file.
>     * config/i386/amxint8intrin.h: Ditto.
>     * config/i386/amxtileintrin.h: Ditto.
>     * config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
>     New macro.
>     * config/i386/i386-c.c (ix86_target_macros_internal): Define
>     __AMX_TILE__, __AMX_INT8__, AMX_BF16__.
>     * config/i386/i386-options.c (ix86_target_string): Add
>     -mamx-tile, -mamx-int8, -mamx-bf16.
>     (ix86_option_override_internal): Handle AMX-TILE,
>     AMX-INT8, AMX-BF16.
>     * config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
>     TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
>     PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
>     * config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
>     * config/i386/immintrin.h: Include amxtileintrin.h,
>     amxint8intrin.h, amxbf16intrin.h.
>     * doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
>     * doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
>     * doc/sourcebuild.texi ((Effective-Target Keywords, Other
>     hardware attributes): Document amx_int8, amx_tile, amx_bf16.
>
> gcc/testsuite/ChangeLog
>
>     * lib/target-supports.exp (check_effective_target_amx_tile,
>     check_effective_target_amx_int8,
>     check_effective_target_amx_bf16): New proc.
>     * g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
>     * g++.dg/other/i386-3.C: Ditto.
>     * gcc.target/i386/sse-12.c: Ditto.
>     * gcc.target/i386/sse-13.c: Ditto.
>     * gcc.target/i386/sse-14.c: Ditto.
>     * gcc.target/i386/sse-22.c: Ditto.
>     * gcc.target/i386/sse-23.c: Ditto.
>     * gcc.target/i386/funcspec-56.inc: Add new target attribute.
>     * gcc.target/i386/amxbf16-asmatt-1.c: New test.
>     * gcc.target/i386/amxint8-asmatt-1.c: Ditto.
>     * gcc.target/i386/amxtile-asmatt-1.c: Ditto.
>     * gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
>     * gcc.target/i386/amxint8-asmintel-1.c: Ditto.
>     * gcc.target/i386/amxtile-asmintel-1.c: Ditto.
>     * gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
>     * gcc.target/i386/amxint8-asmatt-2.c: Ditto.
>     * gcc.target/i386/amxtile-asmatt-2.c: Ditto.
>     * gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
>     * gcc.target/i386/amxint8-asmintel-2.c: Ditto.
>     * gcc.target/i386/amxtile-asmintel-2.c: Ditto.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-07-07  3:24 ` Hongyu Wang
@ 2020-07-17  5:40   ` Hongyu Wang
  2020-07-24  5:41     ` Hongyu Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Hongyu Wang @ 2020-07-17  5:40 UTC (permalink / raw)
  To: gcc-patches, kirill.yukhin

[-- Attachment #1: Type: text/plain, Size: 4639 bytes --]

Update for SAPPHIRERAPIDS and PING

Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月7日周二 上午11:24写道:

>
> Hi Kirill, could you help review this patch?
>
> Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月6日周一 上午9:58写道:
> >
> > Hi:
> >
> > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > which will be enabled in GLC.
> >
> > AMX is a new 64-bit programming paradigm consisting of two
> > compo nents: a set of 2-dimensional registers (tiles) representing
> > sub-arrays from a larger 2-dimensional memory image,
> > and an accelerator able to operate on tiles
> >
> > Supported instructions are
> >
> > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > AMX-BF16:tdpbf16ps
> >
> > The intrinsics adopts constant tile register number as its input parameters.
> >
> > For detailed information, please refer to
> > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> >
> > Bootstrap ok, regression test on i386/x86 backend is ok.
> >
> > OK for master?
> >
> > gcc/ChangeLog
> >
> >     * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
> >     OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
> >     OPTION_MASK_ISA2_AMX_TILE_UNSET,
> >     OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
> >     New marcos.
> >     (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
> >     * common/config/i386/i386-cpuinfo.h (processor_types): Add
> >     FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
> >     * common/config/i386/cpuinfo.h (XSTATE_TILECFG,
> >     XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
> >     (get_available_features): Enable AMX features only if
> >     their states are suoorited by OSXSAVE.
> >     * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
> >     for amx-tile, amx-int8, amx-bf16.
> >     * config.gcc: Add amxtileintrin.h, amxint8intrin.h,
> >     amxbf16intrin.h to extra headers.
> >     * config/i386/amxbf16intrin.h: New file.
> >     * config/i386/amxint8intrin.h: Ditto.
> >     * config/i386/amxtileintrin.h: Ditto.
> >     * config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
> >     New macro.
> >     * config/i386/i386-c.c (ix86_target_macros_internal): Define
> >     __AMX_TILE__, __AMX_INT8__, AMX_BF16__.
> >     * config/i386/i386-options.c (ix86_target_string): Add
> >     -mamx-tile, -mamx-int8, -mamx-bf16.
> >     (ix86_option_override_internal): Handle AMX-TILE,
> >     AMX-INT8, AMX-BF16.
> >     * config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
> >     TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
> >     PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
> >     * config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> >     * config/i386/immintrin.h: Include amxtileintrin.h,
> >     amxint8intrin.h, amxbf16intrin.h.
> >     * doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
> >     * doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
> >     * doc/sourcebuild.texi ((Effective-Target Keywords, Other
> >     hardware attributes): Document amx_int8, amx_tile, amx_bf16.
> >
> > gcc/testsuite/ChangeLog
> >
> >     * lib/target-supports.exp (check_effective_target_amx_tile,
> >     check_effective_target_amx_int8,
> >     check_effective_target_amx_bf16): New proc.
> >     * g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> >     * g++.dg/other/i386-3.C: Ditto.
> >     * gcc.target/i386/sse-12.c: Ditto.
> >     * gcc.target/i386/sse-13.c: Ditto.
> >     * gcc.target/i386/sse-14.c: Ditto.
> >     * gcc.target/i386/sse-22.c: Ditto.
> >     * gcc.target/i386/sse-23.c: Ditto.
> >     * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> >     * gcc.target/i386/amxbf16-asmatt-1.c: New test.
> >     * gcc.target/i386/amxint8-asmatt-1.c: Ditto.
> >     * gcc.target/i386/amxtile-asmatt-1.c: Ditto.
> >     * gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
> >     * gcc.target/i386/amxint8-asmintel-1.c: Ditto.
> >     * gcc.target/i386/amxtile-asmintel-1.c: Ditto.
> >     * gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
> >     * gcc.target/i386/amxint8-asmatt-2.c: Ditto.
> >     * gcc.target/i386/amxtile-asmatt-2.c: Ditto.
> >     * gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
> >     * gcc.target/i386/amxint8-asmintel-2.c: Ditto.
> >     * gcc.target/i386/amxtile-asmintel-2.c: Ditto.

[-- Attachment #2: 0001-Enable-GCC-support-for-AMX-TILE-AMX-INT8-AMX-BF16.patch --]
[-- Type: text/x-patch, Size: 50666 bytes --]

From c56e576233be156fc6d172a968c3838f6102155d Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 25 Jul 2019 16:49:36 +0800
Subject: [PATCH] Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16.

AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
AMX-BF16:tdpbf16ps

gcc/ChangeLog

	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
	OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
	OPTION_MASK_ISA2_AMX_TILE_UNSET,
	OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
	New marcos.
	(ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
	* common/config/i386/i386-cpuinfo.h (processor_types): Add
	FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
	* common/config/i386/cpuinfo.h (XSTATE_TILECFG,
	XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
	(get_available_features): Enable AMX features only if
	their states are suoorited by OSXSAVE.
	* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
	for amx-tile, amx-int8, amx-bf16.
	* config.gcc: Add amxtileintrin.h, amxint8intrin.h,
	amxbf16intrin.h to extra headers.
	* config/i386/amxbf16intrin.h: New file.
	* config/i386/amxint8intrin.h: Ditto.
	* config/i386/amxtileintrin.h: Ditto.
	* config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
	New macro.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AMX_TILE__, __AMX_INT8__, AMX_BF16__.
	* config/i386/i386-options.c (ix86_target_string): Add
	-mamx-tile, -mamx-int8, -mamx-bf16.
	(ix86_option_override_internal): Handle AMX-TILE,
	AMX-INT8, AMX-BF16.
	* config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
	TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
	PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
	* config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* config/i386/immintrin.h: Include amxtileintrin.h,
	amxint8intrin.h, amxbf16intrin.h.
	* doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
	* doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
	* doc/sourcebuild.texi ((Effective-Target Keywords, Other
	hardware attributes): Document amx_int8, amx_tile, amx_bf16.

gcc/testsuite/ChangeLog

	* lib/target-supports.exp (check_effective_target_amx_tile,
	check_effective_target_amx_int8,
	check_effective_target_amx_bf16): New proc.
	* g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* g++.dg/other/i386-3.C: Ditto.
	* gcc.target/i386/sse-12.c: Ditto.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
	* gcc.target/i386/amxbf16-asmatt-1.c: New test.
	* gcc.target/i386/amxint8-asmatt-1.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-1.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
	* gcc.target/i386/amxint8-asmatt-2.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-2.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-2.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-2.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h              | 16 +++++
 gcc/common/config/i386/i386-common.c          | 45 +++++++++++++
 gcc/common/config/i386/i386-cpuinfo.h         |  3 +
 gcc/common/config/i386/i386-isas.h            |  3 +
 gcc/config.gcc                                |  4 +-
 gcc/config/i386/amxbf16intrin.h               | 25 ++++++++
 gcc/config/i386/amxint8intrin.h               | 37 +++++++++++
 gcc/config/i386/amxtileintrin.h               | 63 +++++++++++++++++++
 gcc/config/i386/cpuid.h                       |  3 +
 gcc/config/i386/i386-c.c                      |  7 +++
 gcc/config/i386/i386-options.c                | 20 +++++-
 gcc/config/i386/i386.h                        | 12 +++-
 gcc/config/i386/i386.opt                      | 14 ++++-
 gcc/config/i386/immintrin.h                   |  6 ++
 gcc/doc/extend.texi                           | 15 +++++
 gcc/doc/invoke.texi                           | 10 +++
 gcc/doc/sourcebuild.texi                      |  9 +++
 gcc/testsuite/g++.dg/other/i386-2.C           |  3 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |  3 +-
 .../gcc.target/i386/amxbf16-asmatt-1.c        |  9 +++
 .../gcc.target/i386/amxbf16-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxbf16-asmintel-1.c      |  9 +++
 .../gcc.target/i386/amxbf16-asmintel-2.c      |  4 ++
 .../gcc.target/i386/amxint8-asmatt-1.c        | 15 +++++
 .../gcc.target/i386/amxint8-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxint8-asmintel-1.c      | 15 +++++
 .../gcc.target/i386/amxint8-asmintel-2.c      |  4 ++
 .../gcc.target/i386/amxtile-asmatt-1.c        | 24 +++++++
 .../gcc.target/i386/amxtile-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxtile-asmintel-1.c      | 24 +++++++
 .../gcc.target/i386/amxtile-asmintel-2.c      |  4 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |  6 ++
 gcc/testsuite/gcc.target/i386/sse-12.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |  5 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |  3 +-
 gcc/testsuite/lib/target-supports.exp         | 33 ++++++++++
 38 files changed, 458 insertions(+), 13 deletions(-)
 create mode 100644 gcc/config/i386/amxbf16intrin.h
 create mode 100644 gcc/config/i386/amxint8intrin.h
 create mode 100644 gcc/config/i386/amxtileintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index b14c7c668da..2245940d5bf 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -499,15 +499,20 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_OPMASK			0x20
 #define XSTATE_ZMM			0x40
 #define XSTATE_HI_ZMM			0x80
+#define XSTATE_TILECFG			0x20000
+#define XSTATE_TILEDATA		0x40000
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
 #define XCR_AVX512F_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
+#define XCR_AMX_ENABLED_MASK \
+  (XSTATE_TILECFG | XSTATE_TILEDATA)
 
   /* Check if AVX and AVX512 are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
+  int amx_usable = 0;
   if ((ecx & bit_OSXSAVE))
     {
       /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
@@ -523,6 +528,8 @@ get_available_features (struct __processor_model *cpu_model,
 	  avx512_usable = ((xcrlow & XCR_AVX512F_ENABLED_MASK)
 			   == XCR_AVX512F_ENABLED_MASK);
 	}
+      amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
+		    == XCR_AMX_ENABLED_MASK);
     }
 
 #define set_feature(f) \
@@ -641,6 +648,15 @@ get_available_features (struct __processor_model *cpu_model,
 	set_feature (FEATURE_PCONFIG);
       if (edx & bit_IBT)
 	set_feature (FEATURE_IBT);
+      if (amx_usable)
+	{
+	  if (edx & bit_AMX_TILE)
+	    set_feature (FEATURE_AMX_TILE);
+	  if (edx & bit_AMX_INT8)
+	    set_feature (FEATURE_AMX_INT8);
+	  if (edx & bit_AMX_BF16)
+	    set_feature (FEATURE_AMX_BF16);
+	}
       if (avx512_usable)
 	{
 	  if (ebx & bit_AVX512F)
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index bb14305ad7b..eaf3c1792a9 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -101,6 +101,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_SET OPTION_MASK_ISA2_AVX512VP2INTERSECT
+#define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -246,6 +249,9 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SERIALIZE_UNSET OPTION_MASK_ISA2_SERIALIZE
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET OPTION_MASK_ISA2_AVX512VP2INTERSECT
 #define OPTION_MASK_ISA2_TSXLDTRK_UNSET OPTION_MASK_ISA2_TSXLDTRK
+#define OPTION_MASK_ISA2_AMX_TILE_UNSET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_UNSET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_UNSET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -930,6 +936,45 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mamx_tile:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_int8:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_bf16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 84ca97e7ade..5b94b1f1df7 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -216,6 +216,9 @@ enum processor_features
   FEATURE_XSAVEC,
   FEATURE_XSAVEOPT,
   FEATURE_XSAVES,
+  FEATURE_AMX_TILE,
+  FEATURE_AMX_INT8,
+  FEATURE_AMX_BF16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 08c9dbecc76..3c830ea08ff 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -160,4 +160,7 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("xsaveopt", FEATURE_XSAVEOPT, P_NONE,
 			"-mxsaveopt")
   ISA_NAMES_TABLE_ENTRY("xsaves", FEATURE_XSAVES, P_NONE, "-mxsaves")
+  ISA_NAMES_TABLE_ENTRY("amx-tile", FEATURE_AMX_TILE, P_NONE, "-mamx-tile")
+  ISA_NAMES_TABLE_ENTRY("amx-int8", FEATURE_AMX_INT8, P_NONE, "-mamx-int8")
+  ISA_NAMES_TABLE_ENTRY("amx-bf16", FEATURE_AMX_BF16, P_NONE, "-mamx-bf16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 30b51c3dc81..29b43760c7a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -412,7 +412,7 @@ i[34567]86-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -447,7 +447,7 @@ x86_64-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
new file mode 100644
index 00000000000..df0e2262d50
--- /dev/null
+++ b/gcc/config/i386/amxbf16intrin.h
@@ -0,0 +1,25 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXBF16INTRIN_H_INCLUDED
+#define _AMXBF16INTRIN_H_INCLUDED
+
+#if !defined(__AMX_BF16__)
+#pragma GCC push_options
+#pragma GCC target("amx-bf16")
+#define __DISABLE_AMX_BF16__
+#endif /* __AMX_BF16__ */
+
+#if defined(__x86_64__) && defined(__AMX_BF16__)
+#define _tile_dpbf16ps(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+#endif
+
+#ifdef __DISABLE_AMX_BF16__
+#undef __DISABLE_AMX_BF16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_BF16__ */
+
+#endif /* _AMXBF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxint8intrin.h b/gcc/config/i386/amxint8intrin.h
new file mode 100644
index 00000000000..4b7a59587dc
--- /dev/null
+++ b/gcc/config/i386/amxint8intrin.h
@@ -0,0 +1,37 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxint8intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXINT8INTRIN_H_INCLUDED
+#define _AMXINT8INTRIN_H_INCLUDED
+
+#if !defined(__AMX_INT8__)
+#pragma GCC push_options
+#pragma GCC target("amx-int8")
+#define __DISABLE_AMX_INT8__
+#endif /* __AMX_INT8__ */
+
+#if defined(__x86_64__) && defined(__AMX_INT8__)
+#define _tile_dpbssd(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbssd\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbssd\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbsud(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbsud\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbsud\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbusd(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbusd\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbusd\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbuud(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbuud\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbuud\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+#endif
+
+#ifdef __DISABLE_AMX_INT8__
+#undef __DISABLE_AMX_INT8__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_INT8__ */
+
+#endif /* _AMXINT8INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
new file mode 100644
index 00000000000..fe995232743
--- /dev/null
+++ b/gcc/config/i386/amxtileintrin.h
@@ -0,0 +1,63 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxtileintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXTILEINTRIN_H_INCLUDED
+#define _AMXTILEINTRIN_H_INCLUDED
+
+#if !defined(__AMX_TILE__)
+#pragma GCC push_options
+#pragma GCC target("amx-tile")
+#define __DISABLE_AMX_TILE__
+#endif /* __AMX_TILE__ */
+
+#if defined(__x86_64__) && defined(__AMX_TILE__)
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_loadconfig (const void *__config)
+{
+  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (__config));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_storeconfig (void *__config)
+{
+  __asm__ volatile ("sttilecfg\t%X0" : "=m" (__config));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_release (void)
+{
+  __asm__ volatile ("tilerelease" ::);
+}
+
+#define _tile_loadd(dst,base,stride)					\
+  __asm__ volatile							\
+  ("{tileloadd\t(%0,%1,1), %%tmm"#dst"|tileloadd\t%%tmm"#dst", [%0+%1*1]}" \
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stream_loadd(dst,base,stride)				\
+  __asm__ volatile							\
+  ("{tileloaddt1\t(%0,%1,1), %%tmm"#dst"|tileloaddt1\t%%tmm"#dst", [%0+%1*1]}"\
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stored(src,base,stride)					\
+  __asm__ volatile							\
+  ("{tilestored\t%%tmm"#src", (%0,%1,1)|tilestored\t[%0+%1*1], %%tmm"#src"}" \
+   :: "r" ((void*) base), "r" ((long) stride))
+
+#define _tile_zero(dst)				\
+  __asm__ volatile				\
+  ("tilezero\t%%tmm"#dst ::)
+
+#endif
+
+#ifdef __DISABLE_AMX_TILE__
+#undef __DISABLE_AMX_TILE__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_TILE__ */
+
+#endif /* _AMXTILEINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 94af4910d3c..226c62433bd 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -124,6 +124,9 @@
 #define bit_PCONFIG	(1 << 18)
 #define bit_SERIALIZE	(1 << 14)
 #define bit_TSXLDTRK    (1 << 16)
+#define bit_AMX_BF16    (1 << 22)
+#define bit_AMX_TILE    (1 << 24)
+#define bit_AMX_INT8    (1 << 25)
 
 /* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */
 #define bit_BNDREGS     (1 << 3)
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 2d61a0ce70a..6a68e7caf08 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -588,6 +588,13 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__ENQCMD__");
   if (isa_flag2 & OPTION_MASK_ISA2_TSXLDTRK)
     def_or_undef (parse_in, "__TSXLDTRK__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_TILE)
+    def_or_undef (parse_in, "__AMX_TILE__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_INT8)
+    def_or_undef (parse_in, "__AMX_INT8__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_BF16)
+    def_or_undef (parse_in, "__AMX_BF16__");
+
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 26d1ea18ef1..162570242ff 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -209,7 +209,10 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavx512bf16",	OPTION_MASK_ISA2_AVX512BF16 },
   { "-menqcmd",		OPTION_MASK_ISA2_ENQCMD },
   { "-mserialize",	OPTION_MASK_ISA2_SERIALIZE },
-  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK }
+  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK },
+  { "-mamx-tile",	OPTION_MASK_ISA2_AMX_TILE },
+  { "-mamx-int8",	OPTION_MASK_ISA2_AMX_INT8 },
+  { "-mamx-bf16",	OPTION_MASK_ISA2_AMX_BF16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1025,6 +1028,9 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("enqcmd", OPT_menqcmd),
     IX86_ATTR_ISA ("serialize", OPT_mserialize),
     IX86_ATTR_ISA ("tsxldtrk", OPT_mtsxldtrk),
+    IX86_ATTR_ISA ("amx-tile", OPT_mamx_tile),
+    IX86_ATTR_ISA ("amx-int8", OPT_mamx_int8),
+    IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2210,6 +2216,18 @@ ix86_option_override_internal (bool main_args_p,
 	    && !(opts->x_ix86_isa_flags2_explicit
 		 & OPTION_MASK_ISA2_AVX512BF16))
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512BF16;
+	if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_TILE))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE;
+	if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_INT8))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8;
+	if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_BF16))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16;
         if (((processor_alias_table[i].flags & PTA_MOVDIRI) != 0)
             && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MOVDIRI))
           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MOVDIRI;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index f4a8f1391fa..78782f06870 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -203,6 +203,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_SERIALIZE_P(x) TARGET_ISA2_SERIALIZE_P(x)
 #define TARGET_TSXLDTRK	TARGET_ISA2_TSXLDTRK
 #define TARGET_TSXLDTRK_P(x) TARGET_ISA2_TSXLDTRK_P(x)
+#define TARGET_AMX_TILE TARGET_ISA2_AMX_TILE
+#define TARGET_AMX_TILE_P(x) TARGET_ISA2_AMX_TILE(x)
+#define TARGET_AMX_INT8 TARGET_ISA2_AMX_INT8
+#define TARGET_AMX_INT8_P(x) TARGET_ISA2_AMX_INT8(x)
+#define TARGET_AMX_BF16 TARGET_ISA2_AMX_BF16
+#define TARGET_AMX_BF16_P(x) TARGET_ISA2_AMX_BF16(x)
 
 #define TARGET_LP64	TARGET_ABI_64
 #define TARGET_LP64_P(x)	TARGET_ABI_64_P(x)
@@ -2457,6 +2463,9 @@ const wide_int_bitmask PTA_ENQCMD (0, HOST_WIDE_INT_1U << 15);
 const wide_int_bitmask PTA_CLDEMOTE (0, HOST_WIDE_INT_1U << 16);
 const wide_int_bitmask PTA_SERIALIZE (0, HOST_WIDE_INT_1U << 17);
 const wide_int_bitmask PTA_TSXLDTRK (0, HOST_WIDE_INT_1U << 18);
+const wide_int_bitmask PTA_AMX_TILE(0, HOST_WIDE_INT_1U << 19);
+const wide_int_bitmask PTA_AMX_INT8(0, HOST_WIDE_INT_1U << 20);
+const wide_int_bitmask PTA_AMX_BF16(0, HOST_WIDE_INT_1U << 21);
 
 const wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
   | PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
@@ -2490,7 +2499,8 @@ const wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_CLWB | PTA_AVX512VP2INTERSECT;
 const wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
-  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK;
+  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
+  | PTA_AMX_INT8 | PTA_AMX_BF16;
 const wide_int_bitmask PTA_ALDERLAKE = PTA_SKYLAKE | PTA_CLDEMOTE | PTA_PTWRITE
   | PTA_WAITPKG | PTA_SERIALIZE;
 const wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index c9f7195d423..9389dc24948 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1114,4 +1114,16 @@ Support SERIALIZE built-in functions and code generation.
 
 mtsxldtrk
 Target Report Mask(ISA2_TSXLDTRK) Var(ix86_isa_flags2) Save
-Support TSXLDTRK built-in functions and code generation.
\ No newline at end of file
+Support TSXLDTRK built-in functions and code generation.
+
+mamx-tile
+Target Report Mask(ISA2_AMX_TILE) Var(ix86_isa_flags2) Save
+Support AMX-TILE built-in functions and code generation.
+
+mamx-int8
+Target Report Mask(ISA2_AMX_INT8) Var(ix86_isa_flags2) Save
+Support AMX-INT8 built-in functions and code generation.
+
+mamx-bf16
+Target Report Mask(ISA2_AMX_BF16) Var(ix86_isa_flags2) Save
+Support AMX-BF16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index b660d0d9040..6d25f44c303 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -144,6 +144,12 @@
 
 #include <tsxldtrkintrin.h>
 
+#include <amxtileintrin.h>
+
+#include <amxint8intrin.h>
+
+#include <amxbf16intrin.h>
+
 #include <rdseedintrin.h>
 
 #include <prfchwintrin.h>
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0fb7e27d9ce..57eca49db1a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6597,6 +6597,21 @@ Enable/disable the generation of the XSAVEOPT instructions.
 @cindex @code{target("xsaves")} function attribute, x86
 Enable/disable the generation of the XSAVES instructions.
 
+@item amx-tile
+@itemx no-amx-tile
+@cindex @code{target("amx-tile")} function attribute, x86
+Enable/disable the generation of the AMX-TILE instructions.
+
+@item amx-int8
+@itemx no-amx-int8
+@cindex @code{target("amx-int8")} function attribute, x86
+Enable/disable the generation of the AMX-INT8 instructions.
+
+@item amx-bf16
+@itemx no-amx-bf16
+@cindex @code{target("amx-bf16")} function attribute, x86
+Enable/disable the generation of the AMX-BF16 instructions.
+
 @item cld
 @itemx no-cld
 @cindex @code{target("cld")} function attribute, x86
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 825fd669a75..8743c8c2f85 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1351,6 +1351,7 @@ See RS/6000 and PowerPC Options.
 -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
+-mamx-tile -mamx-int8 -mamx-bf16@gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mmemcpy-strategy=@var{strategy}  -mmemset-strategy=@var{strategy} @gol
@@ -29936,6 +29937,15 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @itemx -mserialize
 @opindex mserialize
+@need 200
+@itemx -mamx-tile
+@opindex mamx-tile
+@need 200
+@itemx -mamx-int8
+@opindex mamx-int8
+@need 200
+@itemx -mamx-bf16
+@opindex mamx-bf16
 These switches enable the use of instructions in the MMX, SSE,
 SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 9f37ac26241..7c1e4cf742e 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2189,6 +2189,15 @@ Target supports the execution of @code{avx512f} instructions.
 @item avx512vp2intersect
 Target supports the execution of @code{avx512vp2intersect} instructions.
 
+@item amx_tile
+Target supports the execution of @code{amx-tile} instructions.
+
+@item amx_int8
+Target supports the execution of @code{amx-int8} instructions.
+
+@item amx_bf16
+Target supports the execution of @code{amx-bf16} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 04d5fec0f6c..449f30dbace 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h.h are usable
    with -O -pedantic-errors.  */
 
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index f40172ee9b5..29e98919386 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h are usable
    with -O -fkeep-inline-functions.  */
 
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
new file mode 100644
index 00000000000..98758f99a10
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
new file mode 100644
index 00000000000..b7332248ba7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-require-effective-target amx_bf16 } */
+#include"amxbf16-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
new file mode 100644
index 00000000000..c2d6074387a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
new file mode 100644
index 00000000000..605a44df3f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-require-effective-target amx_bf16 } */
+#include"amxbf16-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
new file mode 100644
index 00000000000..7af801bd223
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
new file mode 100644
index 00000000000..307c9d813bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-require-effective-target amx_int8 } */
+#include"amxint8-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
new file mode 100644
index 00000000000..bcfbb3fa5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
new file mode 100644
index 00000000000..7e1c1d63594
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-require-effective-target amx_int8 } */
+#include"amxint8-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
new file mode 100644
index 00000000000..96578719833
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]+\[^\n\]*%tmm\[0-9\]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
new file mode 100644
index 00000000000..c00cd0a8fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile" } */
+/* { dg-require-effective-target amx_tile } */
+#include"amxtile-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
new file mode 100644
index 00000000000..88ef612ed14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]\[^\n\]+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c
new file mode 100644
index 00000000000..99da63c119e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel" } */
+/* { dg-require-effective-target amx_tile } */
+#include"amxtile-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 94ffbb64c75..8e669f12215 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -71,6 +71,9 @@ extern void test_tsxldtrk (void)		__attribute__((__target__("tsxldtrk")));
 extern void test_enqcmd (void)			__attribute__((__target__("enqcmd")));
 extern void test_avx512bf16 (void)		__attribute__((__target__("avx512bf16")));
 extern void test_avx512vp2intersect (void)	__attribute__((__target__("avx512vp2intersect")));
+extern void test_amx_tile (void)		__attribute__((__target__("amx-tile")));
+extern void test_amx_int8 (void)		__attribute__((__target__("amx-int8")));
+extern void test_amx_bf16 (void)		__attribute__((__target__("amx-bf16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -143,6 +146,9 @@ extern void test_no_tsxldtrk (void)		__attribute__((__target__("no-tsxldtrk")));
 extern void test_no_enqcmd (void)		__attribute__((__target__("no-enqcmd")));
 extern void test_no_avx512bf16 (void)		__attribute__((__target__("no-avx512bf16")));
 extern void test_no_avx512vp2intersect (void)	__attribute__((__target__("no-avx512vp2intersect")));
+extern void test_no_amx_tile (void)		__attribute__((__target__("no-amx-tile")));
+extern void test_no_amx_int8 (void)		__attribute__((__target__("no-amx-int8")));
+extern void test_no_amx_bf16 (void)		__attribute__((__target__("no-amx-bf16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index b1690d7204f..61146b2b30a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h gfniintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 3a6404707c4..4d6c9b3a17a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index edaa2aa8ad4..837b51c53e6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 7364b2ff337..fc75669f41b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -11,6 +11,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxldtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -102,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -219,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index eaadebef187..9ca7c5d919d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -10,6 +10,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -697,6 +698,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 57eed3012b9..4313a75c2bd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8884,6 +8884,39 @@ proc check_effective_target_avx512vaes { } {
     } "-mvaes" ]
 }
 
+# Return 1 if amx-tile instructions can be compiled.
+proc check_effective_target_amx_tile { } {
+    return [check_no_compiler_messages amx_tile object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tilerelease" ::);
+	}
+    } "-mamx-tile" ]
+}
+
+# Return 1 if amx-int8 instructions can be compiled.
+proc check_effective_target_amx_int8 { } {
+    return [check_no_compiler_messages amx_int8 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbssd\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-int8" ]
+}
+
+# Return 1 if amx-bf16 instructions can be compiled.
+proc check_effective_target_amx_bf16 { } {
+    return [check_no_compiler_messages amx_bf16 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbf16ps\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-bf16" ]
+}
+
 # Return 1 if vpclmulqdq instructions can be compiled.
 proc check_effective_target_vpclmulqdq { } {
     return [check_no_compiler_messages vpclmulqdq object {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-07-17  5:40   ` Hongyu Wang
@ 2020-07-24  5:41     ` Hongyu Wang
  2020-08-04 12:17       ` Hongyu Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Hongyu Wang @ 2020-07-24  5:41 UTC (permalink / raw)
  To: gcc-patches, kirill.yukhin

PING^2

Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月17日周五 下午1:40写道:
>
> Update for SAPPHIRERAPIDS and PING
>
> Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月7日周二 上午11:24写道:
>
> >
> > Hi Kirill, could you help review this patch?
> >
> > Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月6日周一 上午9:58写道:
> > >
> > > Hi:
> > >
> > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > which will be enabled in GLC.
> > >
> > > AMX is a new 64-bit programming paradigm consisting of two
> > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > sub-arrays from a larger 2-dimensional memory image,
> > > and an accelerator able to operate on tiles
> > >
> > > Supported instructions are
> > >
> > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > AMX-BF16:tdpbf16ps
> > >
> > > The intrinsics adopts constant tile register number as its input parameters.
> > >
> > > For detailed information, please refer to
> > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > >
> > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > >
> > > OK for master?
> > >
> > > gcc/ChangeLog
> > >
> > >     * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
> > >     OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
> > >     OPTION_MASK_ISA2_AMX_TILE_UNSET,
> > >     OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
> > >     New marcos.
> > >     (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
> > >     * common/config/i386/i386-cpuinfo.h (processor_types): Add
> > >     FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
> > >     * common/config/i386/cpuinfo.h (XSTATE_TILECFG,
> > >     XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
> > >     (get_available_features): Enable AMX features only if
> > >     their states are suoorited by OSXSAVE.
> > >     * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
> > >     for amx-tile, amx-int8, amx-bf16.
> > >     * config.gcc: Add amxtileintrin.h, amxint8intrin.h,
> > >     amxbf16intrin.h to extra headers.
> > >     * config/i386/amxbf16intrin.h: New file.
> > >     * config/i386/amxint8intrin.h: Ditto.
> > >     * config/i386/amxtileintrin.h: Ditto.
> > >     * config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
> > >     New macro.
> > >     * config/i386/i386-c.c (ix86_target_macros_internal): Define
> > >     __AMX_TILE__, __AMX_INT8__, AMX_BF16__.
> > >     * config/i386/i386-options.c (ix86_target_string): Add
> > >     -mamx-tile, -mamx-int8, -mamx-bf16.
> > >     (ix86_option_override_internal): Handle AMX-TILE,
> > >     AMX-INT8, AMX-BF16.
> > >     * config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
> > >     TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
> > >     PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
> > >     * config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> > >     * config/i386/immintrin.h: Include amxtileintrin.h,
> > >     amxint8intrin.h, amxbf16intrin.h.
> > >     * doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
> > >     * doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
> > >     * doc/sourcebuild.texi ((Effective-Target Keywords, Other
> > >     hardware attributes): Document amx_int8, amx_tile, amx_bf16.
> > >
> > > gcc/testsuite/ChangeLog
> > >
> > >     * lib/target-supports.exp (check_effective_target_amx_tile,
> > >     check_effective_target_amx_int8,
> > >     check_effective_target_amx_bf16): New proc.
> > >     * g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> > >     * g++.dg/other/i386-3.C: Ditto.
> > >     * gcc.target/i386/sse-12.c: Ditto.
> > >     * gcc.target/i386/sse-13.c: Ditto.
> > >     * gcc.target/i386/sse-14.c: Ditto.
> > >     * gcc.target/i386/sse-22.c: Ditto.
> > >     * gcc.target/i386/sse-23.c: Ditto.
> > >     * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > >     * gcc.target/i386/amxbf16-asmatt-1.c: New test.
> > >     * gcc.target/i386/amxint8-asmatt-1.c: Ditto.
> > >     * gcc.target/i386/amxtile-asmatt-1.c: Ditto.
> > >     * gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
> > >     * gcc.target/i386/amxint8-asmintel-1.c: Ditto.
> > >     * gcc.target/i386/amxtile-asmintel-1.c: Ditto.
> > >     * gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
> > >     * gcc.target/i386/amxint8-asmatt-2.c: Ditto.
> > >     * gcc.target/i386/amxtile-asmatt-2.c: Ditto.
> > >     * gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
> > >     * gcc.target/i386/amxint8-asmintel-2.c: Ditto.
> > >     * gcc.target/i386/amxtile-asmintel-2.c: Ditto.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-07-24  5:41     ` Hongyu Wang
@ 2020-08-04 12:17       ` Hongyu Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Hongyu Wang @ 2020-08-04 12:17 UTC (permalink / raw)
  To: gcc-patches, kirill.yukhin

PING^3

Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月24日周五 下午1:41写道:
>
> PING^2
>
> Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月17日周五 下午1:40写道:
> >
> > Update for SAPPHIRERAPIDS and PING
> >
> > Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月7日周二 上午11:24写道:
> >
> > >
> > > Hi Kirill, could you help review this patch?
> > >
> > > Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年7月6日周一 上午9:58写道:
> > > >
> > > > Hi:
> > > >
> > > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > > which will be enabled in GLC.
> > > >
> > > > AMX is a new 64-bit programming paradigm consisting of two
> > > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > > sub-arrays from a larger 2-dimensional memory image,
> > > > and an accelerator able to operate on tiles
> > > >
> > > > Supported instructions are
> > > >
> > > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > > AMX-BF16:tdpbf16ps
> > > >
> > > > The intrinsics adopts constant tile register number as its input parameters.
> > > >
> > > > For detailed information, please refer to
> > > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > > >
> > > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > > >
> > > > OK for master?
> > > >
> > > > gcc/ChangeLog
> > > >
> > > >     * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
> > > >     OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
> > > >     OPTION_MASK_ISA2_AMX_TILE_UNSET,
> > > >     OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
> > > >     New marcos.
> > > >     (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
> > > >     * common/config/i386/i386-cpuinfo.h (processor_types): Add
> > > >     FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
> > > >     * common/config/i386/cpuinfo.h (XSTATE_TILECFG,
> > > >     XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
> > > >     (get_available_features): Enable AMX features only if
> > > >     their states are suoorited by OSXSAVE.
> > > >     * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
> > > >     for amx-tile, amx-int8, amx-bf16.
> > > >     * config.gcc: Add amxtileintrin.h, amxint8intrin.h,
> > > >     amxbf16intrin.h to extra headers.
> > > >     * config/i386/amxbf16intrin.h: New file.
> > > >     * config/i386/amxint8intrin.h: Ditto.
> > > >     * config/i386/amxtileintrin.h: Ditto.
> > > >     * config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
> > > >     New macro.
> > > >     * config/i386/i386-c.c (ix86_target_macros_internal): Define
> > > >     __AMX_TILE__, __AMX_INT8__, AMX_BF16__.
> > > >     * config/i386/i386-options.c (ix86_target_string): Add
> > > >     -mamx-tile, -mamx-int8, -mamx-bf16.
> > > >     (ix86_option_override_internal): Handle AMX-TILE,
> > > >     AMX-INT8, AMX-BF16.
> > > >     * config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
> > > >     TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
> > > >     PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
> > > >     * config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> > > >     * config/i386/immintrin.h: Include amxtileintrin.h,
> > > >     amxint8intrin.h, amxbf16intrin.h.
> > > >     * doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
> > > >     * doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
> > > >     * doc/sourcebuild.texi ((Effective-Target Keywords, Other
> > > >     hardware attributes): Document amx_int8, amx_tile, amx_bf16.
> > > >
> > > > gcc/testsuite/ChangeLog
> > > >
> > > >     * lib/target-supports.exp (check_effective_target_amx_tile,
> > > >     check_effective_target_amx_int8,
> > > >     check_effective_target_amx_bf16): New proc.
> > > >     * g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> > > >     * g++.dg/other/i386-3.C: Ditto.
> > > >     * gcc.target/i386/sse-12.c: Ditto.
> > > >     * gcc.target/i386/sse-13.c: Ditto.
> > > >     * gcc.target/i386/sse-14.c: Ditto.
> > > >     * gcc.target/i386/sse-22.c: Ditto.
> > > >     * gcc.target/i386/sse-23.c: Ditto.
> > > >     * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > > >     * gcc.target/i386/amxbf16-asmatt-1.c: New test.
> > > >     * gcc.target/i386/amxint8-asmatt-1.c: Ditto.
> > > >     * gcc.target/i386/amxtile-asmatt-1.c: Ditto.
> > > >     * gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
> > > >     * gcc.target/i386/amxint8-asmintel-1.c: Ditto.
> > > >     * gcc.target/i386/amxtile-asmintel-1.c: Ditto.
> > > >     * gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
> > > >     * gcc.target/i386/amxint8-asmatt-2.c: Ditto.
> > > >     * gcc.target/i386/amxtile-asmatt-2.c: Ditto.
> > > >     * gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
> > > >     * gcc.target/i386/amxint8-asmintel-2.c: Ditto.
> > > >     * gcc.target/i386/amxtile-asmintel-2.c: Ditto.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-07-06  1:58 [PATCH] Enable GCC support for AMX Hongyu Wang
  2020-07-07  3:24 ` Hongyu Wang
@ 2020-08-04 14:47 ` Kirill Yukhin
  2020-08-04 15:40   ` Hongyu Wang
  2020-09-03 15:07 ` Kirill Yukhin
  2 siblings, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2020-08-04 14:47 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, ubizjak

Hello,

On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> Hi:
> 
> This patch is about to support Intel Advanced Matrix Extensions (AMX)
> which will be enabled in GLC.
> 
> AMX is a new 64-bit programming paradigm consisting of two
> compo nents: a set of 2-dimensional registers (tiles) representing
> sub-arrays from a larger 2-dimensional memory image,
> and an accelerator able to operate on tiles
> 
> Supported instructions are
> 
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> AMX-BF16:tdpbf16ps
> 
> The intrinsics adopts constant tile register number as its input parameters.

I didn't go into the patch deeply, but why did you use inline asm for intrinsics
definition? Are you going to introduce register classes for thouse new tmm
registers and new instruction definitions for new insns in machine description?

--
K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-08-04 14:47 ` Kirill Yukhin
@ 2020-08-04 15:40   ` Hongyu Wang
  2020-09-01  1:31     ` Hongyu Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Hongyu Wang @ 2020-08-04 15:40 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: gcc-patches, ubizjak

Kirill Yukhin <kirill.yukhin@gmail.com> 于2020年8月4日周二 下午10:47写道:
>
> Hello,
>
> On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > Hi:
> >
> > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > which will be enabled in GLC.
> >
> > AMX is a new 64-bit programming paradigm consisting of two
> > compo nents: a set of 2-dimensional registers (tiles) representing
> > sub-arrays from a larger 2-dimensional memory image,
> > and an accelerator able to operate on tiles
> >
> > Supported instructions are
> >
> > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > AMX-BF16:tdpbf16ps
> >
> > The intrinsics adopts constant tile register number as its input parameters.
>
> I didn't go into the patch deeply, but why did you use inline asm for intrinsics
> definition? Are you going to introduce register classes for thouse new tmm
> registers and new instruction definitions for new insns in machine description?

In this version of patch, we just align our implementation to what
have been submitted
to llvm community. Since AMX allows variant register size in runtime
configuration,
the implementation of register allocation is still under discussion.
We will introduce
new register class and new insns in the future patch.

>
> --
> K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-08-04 15:40   ` Hongyu Wang
@ 2020-09-01  1:31     ` Hongyu Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Hongyu Wang @ 2020-09-01  1:31 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: gcc-patches

PING^3

Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年8月4日周二 下午11:40写道:
>
> Kirill Yukhin <kirill.yukhin@gmail.com> 于2020年8月4日周二 下午10:47写道:
> >
> > Hello,
> >
> > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > Hi:
> > >
> > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > which will be enabled in GLC.
> > >
> > > AMX is a new 64-bit programming paradigm consisting of two
> > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > sub-arrays from a larger 2-dimensional memory image,
> > > and an accelerator able to operate on tiles
> > >
> > > Supported instructions are
> > >
> > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > AMX-BF16:tdpbf16ps
> > >
> > > The intrinsics adopts constant tile register number as its input parameters.
> >
> > I didn't go into the patch deeply, but why did you use inline asm for intrinsics
> > definition? Are you going to introduce register classes for thouse new tmm
> > registers and new instruction definitions for new insns in machine description?
>
> In this version of patch, we just align our implementation to what
> have been submitted
> to llvm community. Since AMX allows variant register size in runtime
> configuration,
> the implementation of register allocation is still under discussion.
> We will introduce
> new register class and new insns in the future patch.
>
> >
> > --
> > K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-07-06  1:58 [PATCH] Enable GCC support for AMX Hongyu Wang
  2020-07-07  3:24 ` Hongyu Wang
  2020-08-04 14:47 ` Kirill Yukhin
@ 2020-09-03 15:07 ` Kirill Yukhin
  2020-09-03 15:17   ` H.J. Lu
  2 siblings, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2020-09-03 15:07 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, ubizjak

Hello,

On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> Hi:
> 
> This patch is about to support Intel Advanced Matrix Extensions (AMX)
> which will be enabled in GLC.
> 
> AMX is a new 64-bit programming paradigm consisting of two
> compo nents: a set of 2-dimensional registers (tiles) representing
> sub-arrays from a larger 2-dimensional memory image,
> and an accelerator able to operate on tiles
> 
> Supported instructions are
> 
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> AMX-BF16:tdpbf16ps
> 
> The intrinsics adopts constant tile register number as its input parameters.
> 
> For detailed information, please refer to
> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Bootstrap ok, regression test on i386/x86 backend is ok.
> 
> OK for master?

I was trying to apply your patch to recent master and got
compilation error:

g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi
ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn
o-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
t/kyukhin/gcc/src/gcc/../include -I/export/kyukhin/gcc/src/gcc/../libcpp/include  -I/export/kyukhin/gcc/src/gcc/../libdecnumber 
-I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber -I/export/kyukhin/gcc/src/gcc/../libbacktrace   -o i386-opti
ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function ‘bool ix86_option_override_internal(bool, gcc_options*, gcc_
options*)’:
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41: error: ‘PTA_AMX_TILE’ was not declared in this scope
  if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
                                         ^
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41: error: ‘PTA_AMX_INT8’ was not declared in this scope
  if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
                                         ^
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41: error: ‘PTA_AMX_BF16’ was not declared in this scope
  if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)

Could you please fix that?


--
K

PS: Please excuse me for late response.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-09-03 15:07 ` Kirill Yukhin
@ 2020-09-03 15:17   ` H.J. Lu
  2020-09-04 14:01     ` Kirill Yukhin
  0 siblings, 1 reply; 17+ messages in thread
From: H.J. Lu @ 2020-09-03 15:17 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Hongyu Wang, Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]

On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hello,
>
> On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > Hi:
> >
> > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > which will be enabled in GLC.
> >
> > AMX is a new 64-bit programming paradigm consisting of two
> > compo nents: a set of 2-dimensional registers (tiles) representing
> > sub-arrays from a larger 2-dimensional memory image,
> > and an accelerator able to operate on tiles
> >
> > Supported instructions are
> >
> > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > AMX-BF16:tdpbf16ps
> >
> > The intrinsics adopts constant tile register number as its input parameters.
> >
> > For detailed information, please refer to
> > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> >
> > Bootstrap ok, regression test on i386/x86 backend is ok.
> >
> > OK for master?
>
> I was trying to apply your patch to recent master and got
> compilation error:
>
> g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi
> ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn
> o-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
> t/kyukhin/gcc/src/gcc/../include -I/export/kyukhin/gcc/src/gcc/../libcpp/include  -I/export/kyukhin/gcc/src/gcc/../libdecnumber
> -I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber -I/export/kyukhin/gcc/src/gcc/../libbacktrace   -o i386-opti
> ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
> /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function ‘bool ix86_option_override_internal(bool, gcc_options*, gcc_
> options*)’:
> /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41: error: ‘PTA_AMX_TILE’ was not declared in this scope
>   if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
>                                          ^
> /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41: error: ‘PTA_AMX_INT8’ was not declared in this scope
>   if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
>                                          ^
> /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41: error: ‘PTA_AMX_BF16’ was not declared in this scope
>   if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
>
> Could you please fix that?

Here is the rebased patch against

commit 3c219134152f645103f2fcd50735b177ccd76cde
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Sep 3 12:38:50 2020 +0100

    libstdc++: Optimise GCD algorithms

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-Enable-GCC-support-for-AMX-TILE-AMX-INT8-AMX-BF16.patch --]
[-- Type: text/x-patch, Size: 50666 bytes --]

From 713cafb77a16331620af3eb2c2384a7c388ecd90 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 25 Jul 2019 16:49:36 +0800
Subject: [PATCH] Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16.

AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
AMX-BF16:tdpbf16ps

gcc/ChangeLog

	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
	OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
	OPTION_MASK_ISA2_AMX_TILE_UNSET,
	OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
	New marcos.
	(ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
	* common/config/i386/i386-cpuinfo.h (processor_types): Add
	FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
	* common/config/i386/cpuinfo.h (XSTATE_TILECFG,
	XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
	(get_available_features): Enable AMX features only if
	their states are suoorited by OSXSAVE.
	* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
	for amx-tile, amx-int8, amx-bf16.
	* config.gcc: Add amxtileintrin.h, amxint8intrin.h,
	amxbf16intrin.h to extra headers.
	* config/i386/amxbf16intrin.h: New file.
	* config/i386/amxint8intrin.h: Ditto.
	* config/i386/amxtileintrin.h: Ditto.
	* config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
	New macro.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AMX_TILE__, __AMX_INT8__, AMX_BF16__.
	* config/i386/i386-options.c (ix86_target_string): Add
	-mamx-tile, -mamx-int8, -mamx-bf16.
	(ix86_option_override_internal): Handle AMX-TILE,
	AMX-INT8, AMX-BF16.
	* config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
	TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
	PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
	* config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* config/i386/immintrin.h: Include amxtileintrin.h,
	amxint8intrin.h, amxbf16intrin.h.
	* doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
	* doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
	* doc/sourcebuild.texi ((Effective-Target Keywords, Other
	hardware attributes): Document amx_int8, amx_tile, amx_bf16.

gcc/testsuite/ChangeLog

	* lib/target-supports.exp (check_effective_target_amx_tile,
	check_effective_target_amx_int8,
	check_effective_target_amx_bf16): New proc.
	* g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* g++.dg/other/i386-3.C: Ditto.
	* gcc.target/i386/sse-12.c: Ditto.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
	* gcc.target/i386/amxbf16-asmatt-1.c: New test.
	* gcc.target/i386/amxint8-asmatt-1.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-1.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
	* gcc.target/i386/amxint8-asmatt-2.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-2.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-2.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-2.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h              | 16 +++++
 gcc/common/config/i386/i386-common.c          | 45 +++++++++++++
 gcc/common/config/i386/i386-cpuinfo.h         |  3 +
 gcc/common/config/i386/i386-isas.h            |  3 +
 gcc/config.gcc                                |  4 +-
 gcc/config/i386/amxbf16intrin.h               | 25 ++++++++
 gcc/config/i386/amxint8intrin.h               | 37 +++++++++++
 gcc/config/i386/amxtileintrin.h               | 63 +++++++++++++++++++
 gcc/config/i386/cpuid.h                       |  3 +
 gcc/config/i386/i386-c.c                      |  7 +++
 gcc/config/i386/i386-options.c                | 20 +++++-
 gcc/config/i386/i386.h                        | 12 +++-
 gcc/config/i386/i386.opt                      | 14 ++++-
 gcc/config/i386/immintrin.h                   |  6 ++
 gcc/doc/extend.texi                           | 15 +++++
 gcc/doc/invoke.texi                           | 10 +++
 gcc/doc/sourcebuild.texi                      |  9 +++
 gcc/testsuite/g++.dg/other/i386-2.C           |  3 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |  3 +-
 .../gcc.target/i386/amxbf16-asmatt-1.c        |  9 +++
 .../gcc.target/i386/amxbf16-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxbf16-asmintel-1.c      |  9 +++
 .../gcc.target/i386/amxbf16-asmintel-2.c      |  4 ++
 .../gcc.target/i386/amxint8-asmatt-1.c        | 15 +++++
 .../gcc.target/i386/amxint8-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxint8-asmintel-1.c      | 15 +++++
 .../gcc.target/i386/amxint8-asmintel-2.c      |  4 ++
 .../gcc.target/i386/amxtile-asmatt-1.c        | 24 +++++++
 .../gcc.target/i386/amxtile-asmatt-2.c        |  4 ++
 .../gcc.target/i386/amxtile-asmintel-1.c      | 24 +++++++
 .../gcc.target/i386/amxtile-asmintel-2.c      |  4 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |  6 ++
 gcc/testsuite/gcc.target/i386/sse-12.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |  5 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |  3 +-
 gcc/testsuite/lib/target-supports.exp         | 33 ++++++++++
 38 files changed, 458 insertions(+), 13 deletions(-)
 create mode 100644 gcc/config/i386/amxbf16intrin.h
 create mode 100644 gcc/config/i386/amxint8intrin.h
 create mode 100644 gcc/config/i386/amxtileintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 12237e2f449..c96455ce64f 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -509,15 +509,20 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_OPMASK			0x20
 #define XSTATE_ZMM			0x40
 #define XSTATE_HI_ZMM			0x80
+#define XSTATE_TILECFG			0x20000
+#define XSTATE_TILEDATA		0x40000
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
 #define XCR_AVX512F_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
+#define XCR_AMX_ENABLED_MASK \
+  (XSTATE_TILECFG | XSTATE_TILEDATA)
 
   /* Check if AVX and AVX512 are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
+  int amx_usable = 0;
   if ((ecx & bit_OSXSAVE))
     {
       /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
@@ -533,6 +538,8 @@ get_available_features (struct __processor_model *cpu_model,
 	  avx512_usable = ((xcrlow & XCR_AVX512F_ENABLED_MASK)
 			   == XCR_AVX512F_ENABLED_MASK);
 	}
+      amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
+		    == XCR_AMX_ENABLED_MASK);
     }
 
 #define set_feature(f) \
@@ -651,6 +658,15 @@ get_available_features (struct __processor_model *cpu_model,
 	set_feature (FEATURE_PCONFIG);
       if (edx & bit_IBT)
 	set_feature (FEATURE_IBT);
+      if (amx_usable)
+	{
+	  if (edx & bit_AMX_TILE)
+	    set_feature (FEATURE_AMX_TILE);
+	  if (edx & bit_AMX_INT8)
+	    set_feature (FEATURE_AMX_INT8);
+	  if (edx & bit_AMX_BF16)
+	    set_feature (FEATURE_AMX_BF16);
+	}
       if (avx512_usable)
 	{
 	  if (ebx & bit_AVX512F)
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 5305145a8c9..cd5a432d783 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -101,6 +101,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_SET OPTION_MASK_ISA2_AVX512VP2INTERSECT
+#define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -246,6 +249,9 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SERIALIZE_UNSET OPTION_MASK_ISA2_SERIALIZE
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET OPTION_MASK_ISA2_AVX512VP2INTERSECT
 #define OPTION_MASK_ISA2_TSXLDTRK_UNSET OPTION_MASK_ISA2_TSXLDTRK
+#define OPTION_MASK_ISA2_AMX_TILE_UNSET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_UNSET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_UNSET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -930,6 +936,45 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mamx_tile:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_int8:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_bf16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 84ca97e7ade..5b94b1f1df7 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -216,6 +216,9 @@ enum processor_features
   FEATURE_XSAVEC,
   FEATURE_XSAVEOPT,
   FEATURE_XSAVES,
+  FEATURE_AMX_TILE,
+  FEATURE_AMX_INT8,
+  FEATURE_AMX_BF16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 08c9dbecc76..3c830ea08ff 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -160,4 +160,7 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("xsaveopt", FEATURE_XSAVEOPT, P_NONE,
 			"-mxsaveopt")
   ISA_NAMES_TABLE_ENTRY("xsaves", FEATURE_XSAVES, P_NONE, "-mxsaves")
+  ISA_NAMES_TABLE_ENTRY("amx-tile", FEATURE_AMX_TILE, P_NONE, "-mamx-tile")
+  ISA_NAMES_TABLE_ENTRY("amx-int8", FEATURE_AMX_INT8, P_NONE, "-mamx-int8")
+  ISA_NAMES_TABLE_ENTRY("amx-bf16", FEATURE_AMX_BF16, P_NONE, "-mamx-bf16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 797f0ad5edd..d0e59e86a5c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -412,7 +412,7 @@ i[34567]86-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -447,7 +447,7 @@ x86_64-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
new file mode 100644
index 00000000000..df0e2262d50
--- /dev/null
+++ b/gcc/config/i386/amxbf16intrin.h
@@ -0,0 +1,25 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXBF16INTRIN_H_INCLUDED
+#define _AMXBF16INTRIN_H_INCLUDED
+
+#if !defined(__AMX_BF16__)
+#pragma GCC push_options
+#pragma GCC target("amx-bf16")
+#define __DISABLE_AMX_BF16__
+#endif /* __AMX_BF16__ */
+
+#if defined(__x86_64__) && defined(__AMX_BF16__)
+#define _tile_dpbf16ps(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+#endif
+
+#ifdef __DISABLE_AMX_BF16__
+#undef __DISABLE_AMX_BF16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_BF16__ */
+
+#endif /* _AMXBF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxint8intrin.h b/gcc/config/i386/amxint8intrin.h
new file mode 100644
index 00000000000..4b7a59587dc
--- /dev/null
+++ b/gcc/config/i386/amxint8intrin.h
@@ -0,0 +1,37 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxint8intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXINT8INTRIN_H_INCLUDED
+#define _AMXINT8INTRIN_H_INCLUDED
+
+#if !defined(__AMX_INT8__)
+#pragma GCC push_options
+#pragma GCC target("amx-int8")
+#define __DISABLE_AMX_INT8__
+#endif /* __AMX_INT8__ */
+
+#if defined(__x86_64__) && defined(__AMX_INT8__)
+#define _tile_dpbssd(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbssd\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbssd\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbsud(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbsud\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbsud\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbusd(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbusd\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbusd\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbuud(dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{tdpbuud\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbuud\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+#endif
+
+#ifdef __DISABLE_AMX_INT8__
+#undef __DISABLE_AMX_INT8__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_INT8__ */
+
+#endif /* _AMXINT8INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
new file mode 100644
index 00000000000..fe995232743
--- /dev/null
+++ b/gcc/config/i386/amxtileintrin.h
@@ -0,0 +1,63 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxtileintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXTILEINTRIN_H_INCLUDED
+#define _AMXTILEINTRIN_H_INCLUDED
+
+#if !defined(__AMX_TILE__)
+#pragma GCC push_options
+#pragma GCC target("amx-tile")
+#define __DISABLE_AMX_TILE__
+#endif /* __AMX_TILE__ */
+
+#if defined(__x86_64__) && defined(__AMX_TILE__)
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_loadconfig (const void *__config)
+{
+  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (__config));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_storeconfig (void *__config)
+{
+  __asm__ volatile ("sttilecfg\t%X0" : "=m" (__config));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_release (void)
+{
+  __asm__ volatile ("tilerelease" ::);
+}
+
+#define _tile_loadd(dst,base,stride)					\
+  __asm__ volatile							\
+  ("{tileloadd\t(%0,%1,1), %%tmm"#dst"|tileloadd\t%%tmm"#dst", [%0+%1*1]}" \
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stream_loadd(dst,base,stride)				\
+  __asm__ volatile							\
+  ("{tileloaddt1\t(%0,%1,1), %%tmm"#dst"|tileloaddt1\t%%tmm"#dst", [%0+%1*1]}"\
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stored(src,base,stride)					\
+  __asm__ volatile							\
+  ("{tilestored\t%%tmm"#src", (%0,%1,1)|tilestored\t[%0+%1*1], %%tmm"#src"}" \
+   :: "r" ((void*) base), "r" ((long) stride))
+
+#define _tile_zero(dst)				\
+  __asm__ volatile				\
+  ("tilezero\t%%tmm"#dst ::)
+
+#endif
+
+#ifdef __DISABLE_AMX_TILE__
+#undef __DISABLE_AMX_TILE__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_TILE__ */
+
+#endif /* _AMXTILEINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index bca61d620db..4598434fd02 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -127,6 +127,9 @@
 #define bit_PCONFIG	(1 << 18)
 #define bit_SERIALIZE	(1 << 14)
 #define bit_TSXLDTRK    (1 << 16)
+#define bit_AMX_BF16    (1 << 22)
+#define bit_AMX_TILE    (1 << 24)
+#define bit_AMX_INT8    (1 << 25)
 
 /* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */
 #define bit_BNDREGS     (1 << 3)
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 2d61a0ce70a..6a68e7caf08 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -588,6 +588,13 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__ENQCMD__");
   if (isa_flag2 & OPTION_MASK_ISA2_TSXLDTRK)
     def_or_undef (parse_in, "__TSXLDTRK__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_TILE)
+    def_or_undef (parse_in, "__AMX_TILE__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_INT8)
+    def_or_undef (parse_in, "__AMX_INT8__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_BF16)
+    def_or_undef (parse_in, "__AMX_BF16__");
+
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index b93c338346f..f79b6a89270 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -209,7 +209,10 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavx512bf16",	OPTION_MASK_ISA2_AVX512BF16 },
   { "-menqcmd",		OPTION_MASK_ISA2_ENQCMD },
   { "-mserialize",	OPTION_MASK_ISA2_SERIALIZE },
-  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK }
+  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK },
+  { "-mamx-tile",	OPTION_MASK_ISA2_AMX_TILE },
+  { "-mamx-int8",	OPTION_MASK_ISA2_AMX_INT8 },
+  { "-mamx-bf16",	OPTION_MASK_ISA2_AMX_BF16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1031,6 +1034,9 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("enqcmd", OPT_menqcmd),
     IX86_ATTR_ISA ("serialize", OPT_mserialize),
     IX86_ATTR_ISA ("tsxldtrk", OPT_mtsxldtrk),
+    IX86_ATTR_ISA ("amx-tile", OPT_mamx_tile),
+    IX86_ATTR_ISA ("amx-int8", OPT_mamx_int8),
+    IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2254,6 +2260,18 @@ ix86_option_override_internal (bool main_args_p,
 	    && !(opts->x_ix86_isa_flags2_explicit
 		 & OPTION_MASK_ISA2_AVX512BF16))
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512BF16;
+	if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_TILE))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE;
+	if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_INT8))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8;
+	if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_BF16))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16;
         if (((processor_alias_table[i].flags & PTA_MOVDIRI) != 0)
             && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MOVDIRI))
           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MOVDIRI;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 92b7475a7bf..a449653cc3e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -203,6 +203,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_SERIALIZE_P(x) TARGET_ISA2_SERIALIZE_P(x)
 #define TARGET_TSXLDTRK	TARGET_ISA2_TSXLDTRK
 #define TARGET_TSXLDTRK_P(x) TARGET_ISA2_TSXLDTRK_P(x)
+#define TARGET_AMX_TILE TARGET_ISA2_AMX_TILE
+#define TARGET_AMX_TILE_P(x) TARGET_ISA2_AMX_TILE(x)
+#define TARGET_AMX_INT8 TARGET_ISA2_AMX_INT8
+#define TARGET_AMX_INT8_P(x) TARGET_ISA2_AMX_INT8(x)
+#define TARGET_AMX_BF16 TARGET_ISA2_AMX_BF16
+#define TARGET_AMX_BF16_P(x) TARGET_ISA2_AMX_BF16(x)
 
 #define TARGET_LP64	TARGET_ABI_64
 #define TARGET_LP64_P(x)	TARGET_ABI_64_P(x)
@@ -2466,6 +2472,9 @@ const wide_int_bitmask PTA_ENQCMD (0, HOST_WIDE_INT_1U << 15);
 const wide_int_bitmask PTA_CLDEMOTE (0, HOST_WIDE_INT_1U << 16);
 const wide_int_bitmask PTA_SERIALIZE (0, HOST_WIDE_INT_1U << 17);
 const wide_int_bitmask PTA_TSXLDTRK (0, HOST_WIDE_INT_1U << 18);
+const wide_int_bitmask PTA_AMX_TILE(0, HOST_WIDE_INT_1U << 19);
+const wide_int_bitmask PTA_AMX_INT8(0, HOST_WIDE_INT_1U << 20);
+const wide_int_bitmask PTA_AMX_BF16(0, HOST_WIDE_INT_1U << 21);
 
 const wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
   | PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
@@ -2499,7 +2508,8 @@ const wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_CLWB | PTA_AVX512VP2INTERSECT;
 const wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
-  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK;
+  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
+  | PTA_AMX_INT8 | PTA_AMX_BF16;
 const wide_int_bitmask PTA_ALDERLAKE = PTA_SKYLAKE | PTA_CLDEMOTE | PTA_PTWRITE
   | PTA_WAITPKG | PTA_SERIALIZE;
 const wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index c9f7195d423..9389dc24948 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1114,4 +1114,16 @@ Support SERIALIZE built-in functions and code generation.
 
 mtsxldtrk
 Target Report Mask(ISA2_TSXLDTRK) Var(ix86_isa_flags2) Save
-Support TSXLDTRK built-in functions and code generation.
\ No newline at end of file
+Support TSXLDTRK built-in functions and code generation.
+
+mamx-tile
+Target Report Mask(ISA2_AMX_TILE) Var(ix86_isa_flags2) Save
+Support AMX-TILE built-in functions and code generation.
+
+mamx-int8
+Target Report Mask(ISA2_AMX_INT8) Var(ix86_isa_flags2) Save
+Support AMX-INT8 built-in functions and code generation.
+
+mamx-bf16
+Target Report Mask(ISA2_AMX_BF16) Var(ix86_isa_flags2) Save
+Support AMX-BF16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index b660d0d9040..6d25f44c303 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -144,6 +144,12 @@
 
 #include <tsxldtrkintrin.h>
 
+#include <amxtileintrin.h>
+
+#include <amxint8intrin.h>
+
+#include <amxbf16intrin.h>
+
 #include <rdseedintrin.h>
 
 #include <prfchwintrin.h>
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3b37aba5795..ba9a7a4d5f9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6623,6 +6623,21 @@ Enable/disable the generation of the XSAVEOPT instructions.
 @cindex @code{target("xsaves")} function attribute, x86
 Enable/disable the generation of the XSAVES instructions.
 
+@item amx-tile
+@itemx no-amx-tile
+@cindex @code{target("amx-tile")} function attribute, x86
+Enable/disable the generation of the AMX-TILE instructions.
+
+@item amx-int8
+@itemx no-amx-int8
+@cindex @code{target("amx-int8")} function attribute, x86
+Enable/disable the generation of the AMX-INT8 instructions.
+
+@item amx-bf16
+@itemx no-amx-bf16
+@cindex @code{target("amx-bf16")} function attribute, x86
+Enable/disable the generation of the AMX-BF16 instructions.
+
 @item cld
 @itemx no-cld
 @cindex @code{target("cld")} function attribute, x86
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bca8c856dc8..a46e31f5862 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
 -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
+-mamx-tile -mamx-int8 -mamx-bf16@gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mmemcpy-strategy=@var{strategy}  -mmemset-strategy=@var{strategy} @gol
@@ -30020,6 +30021,15 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @itemx -mserialize
 @opindex mserialize
+@need 200
+@itemx -mamx-tile
+@opindex mamx-tile
+@need 200
+@itemx -mamx-int8
+@opindex mamx-int8
+@need 200
+@itemx -mamx-bf16
+@opindex mamx-bf16
 These switches enable the use of instructions in the MMX, SSE,
 SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 65b2e552b74..b625f1e9f68 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2249,6 +2249,15 @@ Target supports the execution of @code{avx512f} instructions.
 @item avx512vp2intersect
 Target supports the execution of @code{avx512vp2intersect} instructions.
 
+@item amx_tile
+Target supports the execution of @code{amx-tile} instructions.
+
+@item amx_int8
+Target supports the execution of @code{amx-int8} instructions.
+
+@item amx_bf16
+Target supports the execution of @code{amx-bf16} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 04d5fec0f6c..449f30dbace 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h.h are usable
    with -O -pedantic-errors.  */
 
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index f40172ee9b5..29e98919386 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h are usable
    with -O -fkeep-inline-functions.  */
 
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
new file mode 100644
index 00000000000..98758f99a10
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
new file mode 100644
index 00000000000..b7332248ba7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-require-effective-target amx_bf16 } */
+#include"amxbf16-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
new file mode 100644
index 00000000000..c2d6074387a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
new file mode 100644
index 00000000000..605a44df3f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-require-effective-target amx_bf16 } */
+#include"amxbf16-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
new file mode 100644
index 00000000000..7af801bd223
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
new file mode 100644
index 00000000000..307c9d813bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-require-effective-target amx_int8 } */
+#include"amxint8-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
new file mode 100644
index 00000000000..bcfbb3fa5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
new file mode 100644
index 00000000000..7e1c1d63594
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-require-effective-target amx_int8 } */
+#include"amxint8-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
new file mode 100644
index 00000000000..96578719833
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]+\[^\n\]*%tmm\[0-9\]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
new file mode 100644
index 00000000000..c00cd0a8fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile" } */
+/* { dg-require-effective-target amx_tile } */
+#include"amxtile-asmatt-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
new file mode 100644
index 00000000000..88ef612ed14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]\[^\n\]+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c
new file mode 100644
index 00000000000..99da63c119e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-2.c
@@ -0,0 +1,4 @@
+/* { dg-do assemble { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel" } */
+/* { dg-require-effective-target amx_tile } */
+#include"amxtile-asmintel-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 94ffbb64c75..8e669f12215 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -71,6 +71,9 @@ extern void test_tsxldtrk (void)		__attribute__((__target__("tsxldtrk")));
 extern void test_enqcmd (void)			__attribute__((__target__("enqcmd")));
 extern void test_avx512bf16 (void)		__attribute__((__target__("avx512bf16")));
 extern void test_avx512vp2intersect (void)	__attribute__((__target__("avx512vp2intersect")));
+extern void test_amx_tile (void)		__attribute__((__target__("amx-tile")));
+extern void test_amx_int8 (void)		__attribute__((__target__("amx-int8")));
+extern void test_amx_bf16 (void)		__attribute__((__target__("amx-bf16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -143,6 +146,9 @@ extern void test_no_tsxldtrk (void)		__attribute__((__target__("no-tsxldtrk")));
 extern void test_no_enqcmd (void)		__attribute__((__target__("no-enqcmd")));
 extern void test_no_avx512bf16 (void)		__attribute__((__target__("no-avx512bf16")));
 extern void test_no_avx512vp2intersect (void)	__attribute__((__target__("no-avx512vp2intersect")));
+extern void test_no_amx_tile (void)		__attribute__((__target__("no-amx-tile")));
+extern void test_no_amx_int8 (void)		__attribute__((__target__("no-amx-int8")));
+extern void test_no_amx_bf16 (void)		__attribute__((__target__("no-amx-bf16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index b1690d7204f..61146b2b30a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h gfniintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 3a6404707c4..4d6c9b3a17a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index edaa2aa8ad4..837b51c53e6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 7364b2ff337..fc75669f41b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -11,6 +11,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxldtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -102,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -219,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index eaadebef187..9ca7c5d919d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -10,6 +10,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -697,6 +698,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index e106278631f..c6b2c56a51d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8976,6 +8976,39 @@ proc check_effective_target_avx512vaes { } {
     } "-mvaes" ]
 }
 
+# Return 1 if amx-tile instructions can be compiled.
+proc check_effective_target_amx_tile { } {
+    return [check_no_compiler_messages amx_tile object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tilerelease" ::);
+	}
+    } "-mamx-tile" ]
+}
+
+# Return 1 if amx-int8 instructions can be compiled.
+proc check_effective_target_amx_int8 { } {
+    return [check_no_compiler_messages amx_int8 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbssd\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-int8" ]
+}
+
+# Return 1 if amx-bf16 instructions can be compiled.
+proc check_effective_target_amx_bf16 { } {
+    return [check_no_compiler_messages amx_bf16 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbf16ps\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-bf16" ]
+}
+
 # Return 1 if vpclmulqdq instructions can be compiled.
 proc check_effective_target_vpclmulqdq { } {
     return [check_no_compiler_messages vpclmulqdq object {
-- 
2.26.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-09-03 15:17   ` H.J. Lu
@ 2020-09-04 14:01     ` Kirill Yukhin
  2020-09-11 17:00       ` Hongyu Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2020-09-04 14:01 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Hongyu Wang, Uros Bizjak, GCC Patches

Hello,

On 03 сен 08:17, H.J. Lu wrote:
> On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hello,
> >
> > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > Hi:
> > >
> > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > which will be enabled in GLC.
> > >
> > > AMX is a new 64-bit programming paradigm consisting of two
> > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > sub-arrays from a larger 2-dimensional memory image,
> > > and an accelerator able to operate on tiles
> > >
> > > Supported instructions are
> > >
> > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > AMX-BF16:tdpbf16ps
> > >
> > > The intrinsics adopts constant tile register number as its input parameters.
> > >
> > > For detailed information, please refer to
> > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > >
> > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > >
> > > OK for master?
> >
> > I was trying to apply your patch to recent master and got
> > compilation error:
> >
> > g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi
> > ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn
> > o-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
> > t/kyukhin/gcc/src/gcc/../include -I/export/kyukhin/gcc/src/gcc/../libcpp/include  -I/export/kyukhin/gcc/src/gcc/../libdecnumber
> > -I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber -I/export/kyukhin/gcc/src/gcc/../libbacktrace   -o i386-opti
> > ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function ‘bool ix86_option_override_internal(bool, gcc_options*, gcc_
> > options*)’:
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41: error: ‘PTA_AMX_TILE’ was not declared in this scope
> >   if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
> >                                          ^
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41: error: ‘PTA_AMX_INT8’ was not declared in this scope
> >   if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
> >                                          ^
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41: error: ‘PTA_AMX_BF16’ was not declared in this scope
> >   if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
> >
> > Could you please fix that?
> 
> Here is the rebased patch against
> 
> commit 3c219134152f645103f2fcd50735b177ccd76cde
> Author: Jonathan Wakely <jwakely@redhat.com>
> Date:   Thu Sep 3 12:38:50 2020 +0100
> 
>     libstdc++: Optimise GCD algorithms
> 
> Thanks.
> 
> -- 
> H.J.

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 797f0ad5edd..d0e59e86a5c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -412,7 +412,7 @@ i[34567]86-*-*)
>  		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
>  		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
>  		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> -		       tsxldtrkintrin.h"
> +		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"

Line more than 80 chars.

>  	;;
>  x86_64-*-*)
>  	cpu_type=i386
> @@ -447,7 +447,7 @@ x86_64-*-*)
>  		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
>  		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
>  		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> -		       tsxldtrkintrin.h"
> +		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"

Ditto.

> diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
> new file mode 100644
> index 00000000000..df0e2262d50
> --- /dev/null
> +++ b/gcc/config/i386/amxbf16intrin.h
> @@ -0,0 +1,25 @@
> +#if !defined _IMMINTRIN_H_INCLUDED
> +#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
> +#endif
> +
> +#ifndef _AMXBF16INTRIN_H_INCLUDED
> +#define _AMXBF16INTRIN_H_INCLUDED
> +
> +#if !defined(__AMX_BF16__)
> +#pragma GCC push_options
> +#pragma GCC target("amx-bf16")
> +#define __DISABLE_AMX_BF16__
> +#endif /* __AMX_BF16__ */
> +
> +#if defined(__x86_64__) && defined(__AMX_BF16__)
> +#define _tile_dpbf16ps(dst,src1,src2)					\
> +  __asm__ volatile\
> +  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> +#endif

I hope in future we'll replace it with unspecs at least...

> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index c9f7195d423..9389dc24948 100644
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index bca8c856dc8..a46e31f5862 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
>  -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
>  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
>  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> +-mamx-tile -mamx-int8 -mamx-bf16@gol

Add space please.

> diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> new file mode 100644
> index 00000000000..605a44df3f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> @@ -0,0 +1,4 @@
> +/* { dg-do assemble { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> +/* { dg-require-effective-target amx_bf16 } */
> +#include"amxbf16-asmintel-1.c"

I didn't get it. We ususally use second tescase to actually execute
it and (well, a little) verify that semantics is ok. E.g. that
operands order is correct. Could you please do that?
This applies to all *-2.c cases.
I've checked and looks like public SDE simulator supports AMX.

--
K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-09-04 14:01     ` Kirill Yukhin
@ 2020-09-11 17:00       ` Hongyu Wang
  2020-09-18  8:31         ` Hongyu Wang
  2020-09-28 11:38         ` [PATCH] Enable GCC support for AMX Kirill Yukhin
  0 siblings, 2 replies; 17+ messages in thread
From: Hongyu Wang @ 2020-09-11 17:00 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: H.J. Lu, Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 10807 bytes --]

Hi

Thanks for your review, and sorry for the late reply. It took a while
to finish the runtime test.

> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 797f0ad5edd..d0e59e86a5c 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> >                      waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
> >                      avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
> >                      avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> > -                    tsxldtrkintrin.h"
> > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
>
> Line more than 80 chars.
>
> >       ;;
> >  x86_64-*-*)
> >       cpu_type=i386
> > @@ -447,7 +447,7 @@ x86_64-*-*)
> >                      waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
> >                      avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
> >                      avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> > -                    tsxldtrkintrin.h"
> > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
>
> Ditto.

Changed.

>
> > diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
> > new file mode 100644
> > index 00000000000..df0e2262d50
> > --- /dev/null
> > +++ b/gcc/config/i386/amxbf16intrin.h
> > @@ -0,0 +1,25 @@
> > +#if !defined _IMMINTRIN_H_INCLUDED
> > +#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
> > +#endif
> > +
> > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > +#define _AMXBF16INTRIN_H_INCLUDED
> > +
> > +#if !defined(__AMX_BF16__)
> > +#pragma GCC push_options
> > +#pragma GCC target("amx-bf16")
> > +#define __DISABLE_AMX_BF16__
> > +#endif /* __AMX_BF16__ */
> > +
> > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > +#define _tile_dpbf16ps(dst,src1,src2)                                        \
> > +  __asm__ volatile\
> > +  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > +#endif
>
> I hope in future we'll replace it with unspecs at least...

Currently we think it is redundant to add builtins with just const int
parameters, which are supposed to be replaced in the future.

>
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index c9f7195d423..9389dc24948 100644
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index bca8c856dc8..a46e31f5862 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> >  -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
> >  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
> >  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > +-mamx-tile -mamx-int8 -mamx-bf16@gol
>
> Add space please.

Changed.

>
> > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > new file mode 100644
> > index 00000000000..605a44df3f8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > @@ -0,0 +1,4 @@
> > +/* { dg-do assemble { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > +/* { dg-require-effective-target amx_bf16 } */
> > +#include"amxbf16-asmintel-1.c"
>
> I didn't get it. We ususally use second tescase to actually execute
> it and (well, a little) verify that semantics is ok. E.g. that
> operands order is correct. Could you please do that?
> This applies to all *-2.c cases.
> I've checked and looks like public SDE simulator supports AMX.
>

Added runtime test. Tested and passed under SDE.

Also, we adjust the intrinsic call to accept macro parameters.

Updated patch.

> --
> K
> Hello,
>
> On 03 сен 08:17, H.J. Lu wrote:
> > On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hello,
> > >
> > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > > Hi:
> > > >
> > > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > > which will be enabled in GLC.
> > > >
> > > > AMX is a new 64-bit programming paradigm consisting of two
> > > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > > sub-arrays from a larger 2-dimensional memory image,
> > > > and an accelerator able to operate on tiles
> > > >
> > > > Supported instructions are
> > > >
> > > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > > AMX-BF16:tdpbf16ps
> > > >
> > > > The intrinsics adopts constant tile register number as its input parameters.
> > > >
> > > > For detailed information, please refer to
> > > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > > >
> > > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > > >
> > > > OK for master?
> > >
> > > I was trying to apply your patch to recent master and got
> > > compilation error:
> > >
> > > g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi
> > > ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn
> > > o-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
> > > t/kyukhin/gcc/src/gcc/../include -I/export/kyukhin/gcc/src/gcc/../libcpp/include  -I/export/kyukhin/gcc/src/gcc/../libdecnumber
> > > -I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber -I/export/kyukhin/gcc/src/gcc/../libbacktrace   -o i386-opti
> > > ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
> > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function ‘bool ix86_option_override_internal(bool, gcc_options*, gcc_
> > > options*)’:
> > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41: error: ‘PTA_AMX_TILE’ was not declared in this scope
> > >   if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
> > >                                          ^
> > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41: error: ‘PTA_AMX_INT8’ was not declared in this scope
> > >   if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
> > >                                          ^
> > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41: error: ‘PTA_AMX_BF16’ was not declared in this scope
> > >   if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
> > >
> > > Could you please fix that?
> >
> > Here is the rebased patch against
> >
> > commit 3c219134152f645103f2fcd50735b177ccd76cde
> > Author: Jonathan Wakely <jwakely@redhat.com>
> > Date:   Thu Sep 3 12:38:50 2020 +0100
> >
> >     libstdc++: Optimise GCD algorithms
> >
> > Thanks.
> >
> > --
> > H.J.
>
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 797f0ad5edd..d0e59e86a5c 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> >                      waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
> >                      avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
> >                      avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> > -                    tsxldtrkintrin.h"
> > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
>
> Line more than 80 chars.
>
> >       ;;
> >  x86_64-*-*)
> >       cpu_type=i386
> > @@ -447,7 +447,7 @@ x86_64-*-*)
> >                      waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
> >                      avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
> >                      avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> > -                    tsxldtrkintrin.h"
> > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h"
>
> Ditto.
>
> > diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
> > new file mode 100644
> > index 00000000000..df0e2262d50
> > --- /dev/null
> > +++ b/gcc/config/i386/amxbf16intrin.h
> > @@ -0,0 +1,25 @@
> > +#if !defined _IMMINTRIN_H_INCLUDED
> > +#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
> > +#endif
> > +
> > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > +#define _AMXBF16INTRIN_H_INCLUDED
> > +
> > +#if !defined(__AMX_BF16__)
> > +#pragma GCC push_options
> > +#pragma GCC target("amx-bf16")
> > +#define __DISABLE_AMX_BF16__
> > +#endif /* __AMX_BF16__ */
> > +
> > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > +#define _tile_dpbf16ps(dst,src1,src2)                                        \
> > +  __asm__ volatile\
> > +  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > +#endif
>
> I hope in future we'll replace it with unspecs at least...
>
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index c9f7195d423..9389dc24948 100644
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index bca8c856dc8..a46e31f5862 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> >  -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
> >  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
> >  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > +-mamx-tile -mamx-int8 -mamx-bf16@gol
>
> Add space please.
>
> > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > new file mode 100644
> > index 00000000000..605a44df3f8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > @@ -0,0 +1,4 @@
> > +/* { dg-do assemble { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > +/* { dg-require-effective-target amx_bf16 } */
> > +#include"amxbf16-asmintel-1.c"
>
> I didn't get it. We ususally use second tescase to actually execute
> it and (well, a little) verify that semantics is ok. E.g. that
> operands order is correct. Could you please do that?
> This applies to all *-2.c cases.
> I've checked and looks like public SDE simulator supports AMX.
>
> --
> K

[-- Attachment #2: GCC_AMX_support_v2.patch --]
[-- Type: application/octet-stream, Size: 64726 bytes --]

From 3cbc87e72887f98f20871a468e808dae187f39f7 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 25 Jul 2019 16:49:36 +0800
Subject: [PATCH] Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16.

AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
AMX-BF16:tdpbf16ps

gcc/ChangeLog

	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
	OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
	OPTION_MASK_ISA2_AMX_TILE_UNSET,
	OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
	New marcos.
	(ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
	* common/config/i386/i386-cpuinfo.h (processor_types): Add
	FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
	* common/config/i386/cpuinfo.h (XSTATE_TILECFG,
	XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
	(get_available_features): Enable AMX features only if
	their states are suoorited by OSXSAVE.
	* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
	for amx-tile, amx-int8, amx-bf16.
	* config.gcc: Add amxtileintrin.h, amxint8intrin.h,
	amxbf16intrin.h to extra headers.
	* config/i386/amxbf16intrin.h: New file.
	* config/i386/amxint8intrin.h: Ditto.
	* config/i386/amxtileintrin.h: Ditto.
	* config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
	New macro.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AMX_TILE__, __AMX_INT8__, AMX_BF16__.
	* config/i386/i386-options.c (ix86_target_string): Add
	-mamx-tile, -mamx-int8, -mamx-bf16.
	(ix86_option_override_internal): Handle AMX-TILE,
	AMX-INT8, AMX-BF16.
	* config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
	TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
	PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
	* config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* config/i386/immintrin.h: Include amxtileintrin.h,
	amxint8intrin.h, amxbf16intrin.h.
	* doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
	* doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
	* doc/sourcebuild.texi ((Effective-Target Keywords, Other
	hardware attributes): Document amx_int8, amx_tile, amx_bf16.

gcc/testsuite/ChangeLog

	* lib/target-supports.exp (check_effective_target_amx_tile,
	check_effective_target_amx_int8,
	check_effective_target_amx_bf16): New proc.
	* g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* g++.dg/other/i386-3.C: Ditto.
	* gcc.target/i386/sse-12.c: Ditto.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
	* gcc.target/i386/amx-check.h: New header file.
	* gcc.garget/i386/amxbf16-asmatt-1.c: New test.
	* gcc.target/i386/amxint8-asmatt-1.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-1.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-1.c: Ditto.
	* gcc.target/i386/amxbf16-dpbf16ps-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbssd-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbsud-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbusd-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbuud-2.c: Ditto.
	* gcc.target/i386/amxtile-2.c: Ditto.

Add AMX runtime testcases
---
 gcc/common/config/i386/cpuinfo.h              |  16 ++
 gcc/common/config/i386/i386-common.c          |  45 ++++
 gcc/common/config/i386/i386-cpuinfo.h         |   3 +
 gcc/common/config/i386/i386-isas.h            |   3 +
 gcc/config.gcc                                |   6 +-
 gcc/config/i386/amxbf16intrin.h               |  29 +++
 gcc/config/i386/amxint8intrin.h               |  38 +++
 gcc/config/i386/amxtileintrin.h               |  74 ++++++
 gcc/config/i386/cpuid.h                       |   3 +
 gcc/config/i386/i386-c.c                      |   7 +
 gcc/config/i386/i386-options.c                |  20 +-
 gcc/config/i386/i386.h                        |  12 +-
 gcc/config/i386/i386.opt                      |  14 +-
 gcc/config/i386/immintrin.h                   |   6 +
 gcc/doc/extend.texi                           |  15 ++
 gcc/doc/invoke.texi                           |  10 +
 gcc/doc/sourcebuild.texi                      |   9 +
 gcc/testsuite/g++.dg/other/i386-2.C           |   3 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |   3 +-
 gcc/testsuite/gcc.target/i386/amx-check.h     | 216 ++++++++++++++++++
 .../gcc.target/i386/amxbf16-asmatt-1.c        |  13 ++
 .../gcc.target/i386/amxbf16-asmintel-1.c      |   9 +
 .../gcc.target/i386/amxbf16-dpbf16ps-2.c      |  83 +++++++
 .../gcc.target/i386/amxint8-asmatt-1.c        |  19 ++
 .../gcc.target/i386/amxint8-asmintel-1.c      |  15 ++
 .../gcc.target/i386/amxint8-dpbssd-2.c        |  62 +++++
 .../gcc.target/i386/amxint8-dpbsud-2.c        |  61 +++++
 .../gcc.target/i386/amxint8-dpbusd-2.c        |  61 +++++
 .../gcc.target/i386/amxint8-dpbuud-2.c        |  61 +++++
 gcc/testsuite/gcc.target/i386/amxtile-2.c     |  47 ++++
 .../gcc.target/i386/amxtile-asmatt-1.c        |  30 +++
 .../gcc.target/i386/amxtile-asmintel-1.c      |  24 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   6 +
 gcc/testsuite/gcc.target/i386/sse-12.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |   5 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |   3 +-
 gcc/testsuite/lib/target-supports.exp         |  33 +++
 39 files changed, 1057 insertions(+), 13 deletions(-)
 create mode 100644 gcc/config/i386/amxbf16intrin.h
 create mode 100644 gcc/config/i386/amxint8intrin.h
 create mode 100644 gcc/config/i386/amxtileintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amx-check.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 12237e2f449..c96455ce64f 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -509,15 +509,20 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_OPMASK			0x20
 #define XSTATE_ZMM			0x40
 #define XSTATE_HI_ZMM			0x80
+#define XSTATE_TILECFG			0x20000
+#define XSTATE_TILEDATA		0x40000
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
 #define XCR_AVX512F_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
+#define XCR_AMX_ENABLED_MASK \
+  (XSTATE_TILECFG | XSTATE_TILEDATA)
 
   /* Check if AVX and AVX512 are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
+  int amx_usable = 0;
   if ((ecx & bit_OSXSAVE))
     {
       /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
@@ -533,6 +538,8 @@ get_available_features (struct __processor_model *cpu_model,
 	  avx512_usable = ((xcrlow & XCR_AVX512F_ENABLED_MASK)
 			   == XCR_AVX512F_ENABLED_MASK);
 	}
+      amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
+		    == XCR_AMX_ENABLED_MASK);
     }
 
 #define set_feature(f) \
@@ -651,6 +658,15 @@ get_available_features (struct __processor_model *cpu_model,
 	set_feature (FEATURE_PCONFIG);
       if (edx & bit_IBT)
 	set_feature (FEATURE_IBT);
+      if (amx_usable)
+	{
+	  if (edx & bit_AMX_TILE)
+	    set_feature (FEATURE_AMX_TILE);
+	  if (edx & bit_AMX_INT8)
+	    set_feature (FEATURE_AMX_INT8);
+	  if (edx & bit_AMX_BF16)
+	    set_feature (FEATURE_AMX_BF16);
+	}
       if (avx512_usable)
 	{
 	  if (ebx & bit_AVX512F)
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 5305145a8c9..cd5a432d783 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -101,6 +101,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_SET OPTION_MASK_ISA2_AVX512VP2INTERSECT
+#define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -246,6 +249,9 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SERIALIZE_UNSET OPTION_MASK_ISA2_SERIALIZE
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET OPTION_MASK_ISA2_AVX512VP2INTERSECT
 #define OPTION_MASK_ISA2_TSXLDTRK_UNSET OPTION_MASK_ISA2_TSXLDTRK
+#define OPTION_MASK_ISA2_AMX_TILE_UNSET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_UNSET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_UNSET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -930,6 +936,45 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mamx_tile:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_int8:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_bf16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 84ca97e7ade..5b94b1f1df7 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -216,6 +216,9 @@ enum processor_features
   FEATURE_XSAVEC,
   FEATURE_XSAVEOPT,
   FEATURE_XSAVES,
+  FEATURE_AMX_TILE,
+  FEATURE_AMX_INT8,
+  FEATURE_AMX_BF16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 08c9dbecc76..3c830ea08ff 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -160,4 +160,7 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("xsaveopt", FEATURE_XSAVEOPT, P_NONE,
 			"-mxsaveopt")
   ISA_NAMES_TABLE_ENTRY("xsaves", FEATURE_XSAVES, P_NONE, "-mxsaves")
+  ISA_NAMES_TABLE_ENTRY("amx-tile", FEATURE_AMX_TILE, P_NONE, "-mamx-tile")
+  ISA_NAMES_TABLE_ENTRY("amx-int8", FEATURE_AMX_INT8, P_NONE, "-mamx-int8")
+  ISA_NAMES_TABLE_ENTRY("amx-bf16", FEATURE_AMX_BF16, P_NONE, "-mamx-bf16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 797f0ad5edd..5713e6d3893 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -412,7 +412,8 @@ i[34567]86-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
+		       amxbf16intrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -447,7 +448,8 @@ x86_64-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
+		       amxbf16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
new file mode 100644
index 00000000000..b1620963944
--- /dev/null
+++ b/gcc/config/i386/amxbf16intrin.h
@@ -0,0 +1,29 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXBF16INTRIN_H_INCLUDED
+#define _AMXBF16INTRIN_H_INCLUDED
+
+#if !defined(__AMX_BF16__)
+#pragma GCC push_options
+#pragma GCC target("amx-bf16")
+#define __DISABLE_AMX_BF16__
+#endif /* __AMX_BF16__ */
+
+#if defined(__x86_64__) && defined(__AMX_BF16__)
+#define _tile_dpbf16ps_internal(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbf16ps(dst,src1,src2)					\
+  _tile_dpbf16ps_internal (dst, src1, src2)
+
+#endif
+
+#ifdef __DISABLE_AMX_BF16__
+#undef __DISABLE_AMX_BF16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_BF16__ */
+
+#endif /* _AMXBF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxint8intrin.h b/gcc/config/i386/amxint8intrin.h
new file mode 100644
index 00000000000..11adc1f1295
--- /dev/null
+++ b/gcc/config/i386/amxint8intrin.h
@@ -0,0 +1,38 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxint8intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXINT8INTRIN_H_INCLUDED
+#define _AMXINT8INTRIN_H_INCLUDED
+
+#if !defined(__AMX_INT8__)
+#pragma GCC push_options
+#pragma GCC target("amx-int8")
+#define __DISABLE_AMX_INT8__
+#endif /* __AMX_INT8__ */
+
+#if defined(__x86_64__) && defined(__AMX_INT8__)
+#define _tile_int8_dp_internal(name,dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{"#name"\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|"#name"\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbssd(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbssd, dst, src1, src2)
+
+#define _tile_dpbsud(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbsud, dst, src1, src2)
+
+#define _tile_dpbusd(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbusd, dst, src1, src2)
+
+#define _tile_dpbuud(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbuud, dst, src1, src2)
+
+#endif
+
+#ifdef __DISABLE_AMX_INT8__
+#undef __DISABLE_AMX_INT8__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_INT8__ */
+
+#endif /* _AMXINT8INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
new file mode 100644
index 00000000000..ee23b682fa1
--- /dev/null
+++ b/gcc/config/i386/amxtileintrin.h
@@ -0,0 +1,74 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxtileintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXTILEINTRIN_H_INCLUDED
+#define _AMXTILEINTRIN_H_INCLUDED
+
+#if !defined(__AMX_TILE__)
+#pragma GCC push_options
+#pragma GCC target("amx-tile")
+#define __DISABLE_AMX_TILE__
+#endif /* __AMX_TILE__ */
+
+#if defined(__x86_64__) && defined(__AMX_TILE__)
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_loadconfig (const void *__config)
+{
+  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_storeconfig (void *__config)
+{
+  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_release (void)
+{
+  __asm__ volatile ("tilerelease" ::);
+}
+
+#define _tile_loadd(dst,base,stride)		\
+  _tile_loadd_internal (dst, base, stride)
+
+#define _tile_loadd_internal(dst,base,stride)				\
+  __asm__ volatile							\
+  ("{tileloadd\t(%0,%1,1), %%tmm"#dst"|tileloadd\t%%tmm"#dst", [%0+%1*1]}" \
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stream_loadd(dst,base,stride)		\
+  _tile_stream_loadd_internal (dst, base, stride)
+
+#define _tile_stream_loadd_internal(dst,base,stride)			\
+  __asm__ volatile							\
+  ("{tileloaddt1\t(%0,%1,1), %%tmm"#dst"|tileloaddt1\t%%tmm"#dst", [%0+%1*1]}" \
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stored(dst,base,stride)		\
+  _tile_stored_internal (dst, base, stride)
+
+#define _tile_stored_internal(src,base,stride)				\
+  __asm__ volatile							\
+  ("{tilestored\t%%tmm"#src", (%0,%1,1)|tilestored\t[%0+%1*1], %%tmm"#src"}" \
+   :: "r" ((void*) base), "r" ((long) stride))
+
+#define _tile_zero(dst)				\
+  _tile_zero_internal (dst)
+
+#define _tile_zero_internal(dst)		\
+  __asm__ volatile				\
+  ("tilezero\t%%tmm"#dst ::)
+
+#endif
+
+#ifdef __DISABLE_AMX_TILE__
+#undef __DISABLE_AMX_TILE__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_TILE__ */
+
+#endif /* _AMXTILEINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index bca61d620db..4598434fd02 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -127,6 +127,9 @@
 #define bit_PCONFIG	(1 << 18)
 #define bit_SERIALIZE	(1 << 14)
 #define bit_TSXLDTRK    (1 << 16)
+#define bit_AMX_BF16    (1 << 22)
+#define bit_AMX_TILE    (1 << 24)
+#define bit_AMX_INT8    (1 << 25)
 
 /* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */
 #define bit_BNDREGS     (1 << 3)
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 2d61a0ce70a..6a68e7caf08 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -588,6 +588,13 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__ENQCMD__");
   if (isa_flag2 & OPTION_MASK_ISA2_TSXLDTRK)
     def_or_undef (parse_in, "__TSXLDTRK__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_TILE)
+    def_or_undef (parse_in, "__AMX_TILE__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_INT8)
+    def_or_undef (parse_in, "__AMX_INT8__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_BF16)
+    def_or_undef (parse_in, "__AMX_BF16__");
+
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index b93c338346f..f79b6a89270 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -209,7 +209,10 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavx512bf16",	OPTION_MASK_ISA2_AVX512BF16 },
   { "-menqcmd",		OPTION_MASK_ISA2_ENQCMD },
   { "-mserialize",	OPTION_MASK_ISA2_SERIALIZE },
-  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK }
+  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK },
+  { "-mamx-tile",	OPTION_MASK_ISA2_AMX_TILE },
+  { "-mamx-int8",	OPTION_MASK_ISA2_AMX_INT8 },
+  { "-mamx-bf16",	OPTION_MASK_ISA2_AMX_BF16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1031,6 +1034,9 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("enqcmd", OPT_menqcmd),
     IX86_ATTR_ISA ("serialize", OPT_mserialize),
     IX86_ATTR_ISA ("tsxldtrk", OPT_mtsxldtrk),
+    IX86_ATTR_ISA ("amx-tile", OPT_mamx_tile),
+    IX86_ATTR_ISA ("amx-int8", OPT_mamx_int8),
+    IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2254,6 +2260,18 @@ ix86_option_override_internal (bool main_args_p,
 	    && !(opts->x_ix86_isa_flags2_explicit
 		 & OPTION_MASK_ISA2_AVX512BF16))
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512BF16;
+	if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_TILE))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE;
+	if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_INT8))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8;
+	if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_BF16))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16;
         if (((processor_alias_table[i].flags & PTA_MOVDIRI) != 0)
             && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MOVDIRI))
           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MOVDIRI;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 92b7475a7bf..a449653cc3e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -203,6 +203,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_SERIALIZE_P(x) TARGET_ISA2_SERIALIZE_P(x)
 #define TARGET_TSXLDTRK	TARGET_ISA2_TSXLDTRK
 #define TARGET_TSXLDTRK_P(x) TARGET_ISA2_TSXLDTRK_P(x)
+#define TARGET_AMX_TILE TARGET_ISA2_AMX_TILE
+#define TARGET_AMX_TILE_P(x) TARGET_ISA2_AMX_TILE(x)
+#define TARGET_AMX_INT8 TARGET_ISA2_AMX_INT8
+#define TARGET_AMX_INT8_P(x) TARGET_ISA2_AMX_INT8(x)
+#define TARGET_AMX_BF16 TARGET_ISA2_AMX_BF16
+#define TARGET_AMX_BF16_P(x) TARGET_ISA2_AMX_BF16(x)
 
 #define TARGET_LP64	TARGET_ABI_64
 #define TARGET_LP64_P(x)	TARGET_ABI_64_P(x)
@@ -2466,6 +2472,9 @@ const wide_int_bitmask PTA_ENQCMD (0, HOST_WIDE_INT_1U << 15);
 const wide_int_bitmask PTA_CLDEMOTE (0, HOST_WIDE_INT_1U << 16);
 const wide_int_bitmask PTA_SERIALIZE (0, HOST_WIDE_INT_1U << 17);
 const wide_int_bitmask PTA_TSXLDTRK (0, HOST_WIDE_INT_1U << 18);
+const wide_int_bitmask PTA_AMX_TILE(0, HOST_WIDE_INT_1U << 19);
+const wide_int_bitmask PTA_AMX_INT8(0, HOST_WIDE_INT_1U << 20);
+const wide_int_bitmask PTA_AMX_BF16(0, HOST_WIDE_INT_1U << 21);
 
 const wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
   | PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
@@ -2499,7 +2508,8 @@ const wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_CLWB | PTA_AVX512VP2INTERSECT;
 const wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
-  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK;
+  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
+  | PTA_AMX_INT8 | PTA_AMX_BF16;
 const wide_int_bitmask PTA_ALDERLAKE = PTA_SKYLAKE | PTA_CLDEMOTE | PTA_PTWRITE
   | PTA_WAITPKG | PTA_SERIALIZE;
 const wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index c9f7195d423..9389dc24948 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1114,4 +1114,16 @@ Support SERIALIZE built-in functions and code generation.
 
 mtsxldtrk
 Target Report Mask(ISA2_TSXLDTRK) Var(ix86_isa_flags2) Save
-Support TSXLDTRK built-in functions and code generation.
\ No newline at end of file
+Support TSXLDTRK built-in functions and code generation.
+
+mamx-tile
+Target Report Mask(ISA2_AMX_TILE) Var(ix86_isa_flags2) Save
+Support AMX-TILE built-in functions and code generation.
+
+mamx-int8
+Target Report Mask(ISA2_AMX_INT8) Var(ix86_isa_flags2) Save
+Support AMX-INT8 built-in functions and code generation.
+
+mamx-bf16
+Target Report Mask(ISA2_AMX_BF16) Var(ix86_isa_flags2) Save
+Support AMX-BF16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index b660d0d9040..6d25f44c303 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -144,6 +144,12 @@
 
 #include <tsxldtrkintrin.h>
 
+#include <amxtileintrin.h>
+
+#include <amxint8intrin.h>
+
+#include <amxbf16intrin.h>
+
 #include <rdseedintrin.h>
 
 #include <prfchwintrin.h>
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3b37aba5795..ba9a7a4d5f9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6623,6 +6623,21 @@ Enable/disable the generation of the XSAVEOPT instructions.
 @cindex @code{target("xsaves")} function attribute, x86
 Enable/disable the generation of the XSAVES instructions.
 
+@item amx-tile
+@itemx no-amx-tile
+@cindex @code{target("amx-tile")} function attribute, x86
+Enable/disable the generation of the AMX-TILE instructions.
+
+@item amx-int8
+@itemx no-amx-int8
+@cindex @code{target("amx-int8")} function attribute, x86
+Enable/disable the generation of the AMX-INT8 instructions.
+
+@item amx-bf16
+@itemx no-amx-bf16
+@cindex @code{target("amx-bf16")} function attribute, x86
+Enable/disable the generation of the AMX-BF16 instructions.
+
 @item cld
 @itemx no-cld
 @cindex @code{target("cld")} function attribute, x86
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bca8c856dc8..3e67108a67b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
 -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
+-mamx-tile  -mamx-int8  -mamx-bf16@gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mmemcpy-strategy=@var{strategy}  -mmemset-strategy=@var{strategy} @gol
@@ -30020,6 +30021,15 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @itemx -mserialize
 @opindex mserialize
+@need 200
+@itemx -mamx-tile
+@opindex mamx-tile
+@need 200
+@itemx -mamx-int8
+@opindex mamx-int8
+@need 200
+@itemx -mamx-bf16
+@opindex mamx-bf16
 These switches enable the use of instructions in the MMX, SSE,
 SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 65b2e552b74..b625f1e9f68 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2249,6 +2249,15 @@ Target supports the execution of @code{avx512f} instructions.
 @item avx512vp2intersect
 Target supports the execution of @code{avx512vp2intersect} instructions.
 
+@item amx_tile
+Target supports the execution of @code{amx-tile} instructions.
+
+@item amx_int8
+Target supports the execution of @code{amx-int8} instructions.
+
+@item amx_bf16
+Target supports the execution of @code{amx-bf16} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 04d5fec0f6c..449f30dbace 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h.h are usable
    with -O -pedantic-errors.  */
 
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index f40172ee9b5..29e98919386 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h are usable
    with -O -fkeep-inline-functions.  */
 
diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h b/gcc/testsuite/gcc.target/i386/amx-check.h
new file mode 100644
index 00000000000..be4f297ee06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amx-check.h
@@ -0,0 +1,216 @@
+#ifndef AMX_CHECK_H_INCLUDED
+#define AMX_CHECK_H_INCLUDED
+
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+#include "cpuid.h"
+
+/* TODO: The tmm emulation is temporary for current
+   AMX implementation with no tmm regclass, should
+   be changed in the future. */
+typedef struct __tile_config
+{
+  uint8_t palette_id; 
+  uint8_t start_row;   
+  uint8_t reserved_0[14];
+  uint16_t colsb[8]; /* Colum size of each tmm register in bytes */
+  uint16_t reserved_1[8];
+  uint8_t rows[8]; /* Row size of each tmm reg in bytes */
+  uint8_t reserved_2[8];
+} __tilecfg;
+
+typedef union __union_tile_config
+{
+  __tilecfg s;
+  uint8_t a[64];
+} __tilecfg_u;
+
+typedef struct __tile
+{
+  /* Max size of tile register */
+  uint8_t buf[1024];
+  int rows;
+  int colsb;
+} __tile;
+
+/* Maxium col/row size in bytes */
+#define MAX_ROWS 16
+#define MAX_COLS 64
+
+/* Stride (colum width in byte) used for tileload/store */
+#define _STRIDE 64
+
+/* Initialize tile config by setting all tmm size to 16x64 */
+void init_tile_config (__tilecfg_u *dst)
+{
+  int i;
+
+  dst->s.palette_id = 1;
+  dst->s.start_row = 0;
+
+  for (i = 0; i < 14; i++)
+    dst->s.reserved_0[i] = 0;
+
+  for (i = 0; i < 8; i++)
+  {
+    dst->s.colsb[i] = _STRIDE;
+    dst->s.rows[i] = 16;
+    dst->s.reserved_1[i] = 0;
+    dst->s.reserved_2[i] = 0;
+  }
+
+  _tile_loadconfig (dst->a);
+}
+
+/* Init __tile variable that going to be store to register
+   w/o extra buffer. If buffer exists, it should be the same
+   size matrix as corresponding tmm register.
+   Should execute init_tile_config first */
+void init_tile_src (const int tmm_num, __tile *src, uint8_t *buffer)
+{
+  int rows, colsb, i, j;
+  __tilecfg_u tmp;
+
+  _tile_storeconfig (tmp.a);
+
+  src->rows = rows = tmp.s.rows[tmm_num];
+  src->colsb = colsb = tmp.s.colsb[tmm_num];
+
+  for (i = 0; i < rows; i++)
+    for (j = 0; j < colsb; j++)
+    {
+      if(buffer)
+	src->buf[i * colsb + j] = buffer[i * colsb + j];
+      else
+	src->buf[i * colsb + j] = (i + 11 * j) % 256;
+    }
+
+}
+
+/* Init __tile src and corresponding tmm register */
+#define init_tile_reg_and_src(tmm_num, src)   \
+{					      \
+  init_tile_src (tmm_num, &src, NULL);	      \
+  _tile_loadd (tmm_num, src.buf, _STRIDE);   \
+}
+
+#define init_tile_reg_and_src_with_buffer(tmm_num, src, buffer) \
+{								\
+  init_tile_src (tmm_num, &src, buffer);				\
+  _tile_loadd (tmm_num, src.buf, _STRIDE);			\
+}
+
+/* Zero __tile src. It should be init first. */
+void zero_tile_src (__tile *src)
+{
+  int i, j;
+
+  for (i = 0; i < src->rows; i++)
+    for (j = 0; j < src->colsb; j++)
+      src->buf[i * src->colsb + j] = 0;
+}
+
+/* Compare tile config value with __tilecfg_u dst */
+int check_tile_config (__tilecfg_u *src, __tilecfg_u *dst)
+{
+  size_t size = sizeof(__tilecfg);
+  uint8_t *pa_src = (uint8_t *) src->a;
+  uint8_t *pa_dst = (uint8_t *) dst->a;
+
+  for (int i = 0; i < size; i++)
+    if (pa_src[i] != pa_dst[i])
+      return 0;
+
+  return 1;
+}
+
+/* Compare tile register value with __tile variable */
+int check_tile_register (__tile* ref, __tile* target)
+{
+  /* Tile register should be stored from tmm to
+     memory and compare with emulation results. */
+  int rows = target->rows;
+  int colsb = target->colsb;
+  int i, j;
+
+  for (i = 0; i < rows; i++)
+    for (j = 0; j < colsb; j++)
+	if (ref->buf[i * colsb + j] != target->buf[i * colsb + j])
+	    return 0;
+
+  return 1;
+}
+
+#ifndef DO_TEST
+#define DO_TEST do_test
+static void test_amx (void);
+__attribute__ ((noinline))
+static void
+do_test (void)
+{
+  test_amx ();
+}
+#endif
+
+/* To verify whethe host has AMX support*/
+int
+valid_test ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+/* Check XCR0 stat for AMX */
+#define XSTATE_TILECFG          0x20000
+#define XSTATE_TILEDATA         0x40000
+  
+  __cpuid (1, eax, ebx, ecx, edx);
+
+  if (ecx & bit_OSXSAVE)
+    {
+      unsigned int xcrlow;
+      
+      __asm__ ("xgetbv"
+	      : "=a" (xcrlow)
+	      : "c" (0));
+
+      if (xcrlow & (XSTATE_TILECFG | XSTATE_TILEDATA))
+	{
+	  __get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx);
+
+	  if (edx & bit_AMX_TILE
+#ifdef AMX_INT8
+	    && (edx & bit_AMX_INT8)
+#endif
+#ifdef AMX_BF16
+	    && (edx & bit_AMX_BF16)
+#endif
+	    )
+	    return 1;
+	}
+    }
+
+  return 0;
+}
+
+int
+main ()
+{
+  if (valid_test ())
+    {
+      DO_TEST ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
new file mode 100644
index 00000000000..a5e5bddedac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+#define TMM1 1
+#define TMM2 2
+#define TMM3 3
+
+void TEST ()
+{
+  _tile_dpbf16ps (TMM1, TMM2, TMM3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
new file mode 100644
index 00000000000..c2d6074387a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c
new file mode 100644
index 00000000000..c819113897d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c
@@ -0,0 +1,83 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-bf16" } */
+#include <immintrin.h>
+
+#define AMX_BF16
+#define DO_TEST test_amx_bf16_dpbf16ps
+void test_amx_bf16_dpbf16ps ();
+#include "amx-check.h"
+
+/* Transformation functions between bf16/float */
+static uint16_t make_bf16 (float f)
+{
+  uint32_t u = (uint32_t)f;
+  u = (u >> 16) & 0xffff;
+  return (uint16_t)u;
+}
+
+static float make_f32 (uint16_t bf)
+{
+  uint32_t u = (uint32_t)(bf << 16);
+  return (float)u;
+}
+
+/* Init tile buffer with bf16 pairs */
+void init_bf16_max_tile_buffer (uint8_t *buf)
+{ 
+  int i, j;
+  uint16_t *ptr = (uint16_t *)buf;
+
+  for(i = 0; i < 16; i++)
+    for(j = 0; j < 32; j++)
+      {	
+	float f = 16.1f * i + 3.4f * j;
+	ptr[i * 32 + j] = make_bf16(f);
+      }
+}
+
+void calc_matrix_dpbf16ps (__tile *dst, __tile *src1, __tile *src2)
+{
+  uint16_t *src1_buf = (uint16_t *)src1->buf;
+  uint16_t *src2_buf = (uint16_t *)src2->buf;
+  float *dst_buf = (float *)dst->buf;
+  
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 2; t+=2)
+	  {    
+	    dst_buf[i * N + k] += 
+	      (make_f32(src1_buf[i * 4 * N + 4 * j + t]) *
+	      make_f32(src2_buf[j * 4 * K + 4 * k + t])) +
+	      (make_f32(src1_buf[i * 4 * N + 4 * j + t + 1]) *
+	      make_f32(src1_buf[i * 4 * N + 4 * j + t + 1]));
+	  }
+
+}
+
+void test_amx_bf16_dpbf16ps ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+
+  init_bf16_max_tile_buffer (tmp_dst_buf);
+  
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src_with_buffer (2, dst, tmp_dst_buf);
+  init_tile_reg_and_src_with_buffer (3, dst, tmp_dst_buf);
+
+  calc_matrix_dpbf16ps (&dst, &src1, &src2);
+  
+  _tile_dpbf16ps (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+
+  if (!check_tile_register (&dst_ref, &dst))
+        abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
new file mode 100644
index 00000000000..1842c234be8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+#define TMM1 1
+#define TMM2 2
+#define TMM3 3
+
+void TEST ()
+{
+  _tile_dpbssd (TMM1, TMM2, TMM3);
+  _tile_dpbsud (TMM1, TMM2, TMM3);
+  _tile_dpbusd (TMM1, TMM2, TMM3);
+  _tile_dpbuud (TMM1, TMM2, TMM3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
new file mode 100644
index 00000000000..bcfbb3fa5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c
new file mode 100644
index 00000000000..62d31ce3e81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c
@@ -0,0 +1,62 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbssd
+void test_amx_int8_dpbssd ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbssd (__tile *dst, __tile *src1, __tile *src2)
+{
+  int8_t *src1_buf = (int8_t *)src1->buf;
+  int8_t *src2_buf = (int8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] +=  
+	      ((int) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((int) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbssd ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbssd (&dst, &src1, &src2);
+
+  _tile_dpbssd (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c
new file mode 100644
index 00000000000..5007ee917f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c
@@ -0,0 +1,61 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbsud
+void test_amx_int8_dpbsud ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbsud (__tile *dst, __tile *src1, __tile *src2)
+{
+  int8_t *src1_buf = (int8_t *)src1->buf;
+  uint8_t *src2_buf = (uint8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] += 
+	      ((int) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((unsigned) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbsud ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbsud (&dst, &src1, &src2);
+  _tile_dpbsud (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c
new file mode 100644
index 00000000000..17888e26116
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c
@@ -0,0 +1,61 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbusd
+void test_amx_int8_dpbusd ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbusd (__tile *dst, __tile *src1, __tile *src2)
+{
+  uint8_t *src1_buf = (uint8_t *)src1->buf;
+  int8_t *src2_buf = (int8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] += 
+	      ((unsigned) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((int) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbusd ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbusd (&dst, &src1, &src2);
+  _tile_dpbusd (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c
new file mode 100644
index 00000000000..c39666c3643
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c
@@ -0,0 +1,61 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbuud
+void test_amx_int8_dpbuud ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbuud (__tile *dst, __tile *src1, __tile *src2)
+{
+  uint8_t *src1_buf = (uint8_t *)src1->buf;
+  uint8_t *src2_buf = (uint8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] += 
+	      ((unsigned) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((unsigned) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbuud ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbuud (&dst, &src1, &src2);
+  _tile_dpbuud (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-2.c b/gcc/testsuite/gcc.target/i386/amxtile-2.c
new file mode 100644
index 00000000000..cef84f9f479
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-2.c
@@ -0,0 +1,47 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile " } */
+#include <immintrin.h>
+
+#define DO_TEST test_amx_tile
+void test_amx_tile ();
+#include "amx-check.h"
+
+void test_amx_tile ()
+{
+  __tilecfg_u cfg_src, cfg_dst;
+  __tile reg_src1, reg_src2, reg_ref;
+
+  /* check tile config load & store. */
+  init_tile_config (&cfg_src);
+  _tile_storeconfig (cfg_dst.a);
+
+  if (!check_tile_config (&cfg_src, &cfg_dst))
+    abort ();
+
+  /* check tile register load & store. */
+  init_tile_reg_and_src (1, reg_src1);
+  _tile_stored (1, reg_ref.buf, _STRIDE);
+  if (!check_tile_register (&reg_ref, &reg_src1))
+    abort ();
+
+  /* check tile stream load instruction */
+  init_tile_src (2, &reg_src2, NULL);
+  _tile_stream_loadd (2, reg_src2.buf, _STRIDE);
+  _tile_stored (2, reg_ref.buf, _STRIDE);
+  if (!check_tile_register (&reg_ref, &reg_src2))
+    abort ();
+
+  /* check tile register zeroing */
+  zero_tile_src (&reg_src2);
+  _tile_zero (2);
+  _tile_stored (2, reg_ref.buf, _STRIDE);
+  if (!check_tile_register (&reg_ref, &reg_src2))
+    abort ();
+
+  /* check tile cfg zeroing */
+  memset (cfg_dst.a, 0, sizeof(__tilecfg));
+  _tile_release ();
+  _tile_storeconfig (cfg_src.a);
+  if (!check_tile_config (&cfg_src, &cfg_dst))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
new file mode 100644
index 00000000000..ceb5fa4bde3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]+\[^\n\]*%tmm\[0-9\]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+
+#define TMM0 0
+#define TMM1 1
+#define TMM2 2
+#define TMM3 3
+
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (TMM3, base, stride);
+  _tile_stream_loadd (TMM2, base, stride);
+  _tile_stored (TMM1, base, stride);
+  _tile_zero (TMM0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
new file mode 100644
index 00000000000..88ef612ed14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]\[^\n\]+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 94ffbb64c75..8e669f12215 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -71,6 +71,9 @@ extern void test_tsxldtrk (void)		__attribute__((__target__("tsxldtrk")));
 extern void test_enqcmd (void)			__attribute__((__target__("enqcmd")));
 extern void test_avx512bf16 (void)		__attribute__((__target__("avx512bf16")));
 extern void test_avx512vp2intersect (void)	__attribute__((__target__("avx512vp2intersect")));
+extern void test_amx_tile (void)		__attribute__((__target__("amx-tile")));
+extern void test_amx_int8 (void)		__attribute__((__target__("amx-int8")));
+extern void test_amx_bf16 (void)		__attribute__((__target__("amx-bf16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -143,6 +146,9 @@ extern void test_no_tsxldtrk (void)		__attribute__((__target__("no-tsxldtrk")));
 extern void test_no_enqcmd (void)		__attribute__((__target__("no-enqcmd")));
 extern void test_no_avx512bf16 (void)		__attribute__((__target__("no-avx512bf16")));
 extern void test_no_avx512vp2intersect (void)	__attribute__((__target__("no-avx512vp2intersect")));
+extern void test_no_amx_tile (void)		__attribute__((__target__("no-amx-tile")));
+extern void test_no_amx_int8 (void)		__attribute__((__target__("no-amx-int8")));
+extern void test_no_amx_bf16 (void)		__attribute__((__target__("no-amx-bf16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index b1690d7204f..61146b2b30a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h gfniintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 3a6404707c4..4d6c9b3a17a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index edaa2aa8ad4..837b51c53e6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 7364b2ff337..fc75669f41b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -11,6 +11,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxldtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -102,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -219,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index eaadebef187..9ca7c5d919d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -10,6 +10,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -697,6 +698,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 6881b66cd23..9ab54dc14ce 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8956,6 +8956,39 @@ proc check_effective_target_avx512vaes { } {
     } "-mvaes" ]
 }
 
+# Return 1 if amx-tile instructions can be compiled.
+proc check_effective_target_amx_tile { } {
+    return [check_no_compiler_messages amx_tile object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tilerelease" ::);
+	}
+    } "-mamx-tile" ]
+}
+
+# Return 1 if amx-int8 instructions can be compiled.
+proc check_effective_target_amx_int8 { } {
+    return [check_no_compiler_messages amx_int8 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbssd\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-int8" ]
+}
+
+# Return 1 if amx-bf16 instructions can be compiled.
+proc check_effective_target_amx_bf16 { } {
+    return [check_no_compiler_messages amx_bf16 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbf16ps\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-bf16" ]
+}
+
 # Return 1 if vpclmulqdq instructions can be compiled.
 proc check_effective_target_vpclmulqdq { } {
     return [check_no_compiler_messages vpclmulqdq object {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-09-11 17:00       ` Hongyu Wang
@ 2020-09-18  8:31         ` Hongyu Wang
  2020-09-30 11:51           ` [committed] testsuite: Fix up amx* dg-do run tests with older binutils Jakub Jelinek
  2020-09-28 11:38         ` [PATCH] Enable GCC support for AMX Kirill Yukhin
  1 sibling, 1 reply; 17+ messages in thread
From: Hongyu Wang @ 2020-09-18  8:31 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: H.J. Lu, Uros Bizjak, GCC Patches, crazylht

[-- Attachment #1: Type: text/plain, Size: 11733 bytes --]

Hi Kirill,

Very Appreciated for your review again

I just update the patch with adding XSAVE dependency and use
__builtin_cpu_supports for runtime test.

Re-based on Sept. 15 trunk and tested with sde. Kindly PING.


Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年9月12日周六 上午1:00写道:

> Hi
>
> Thanks for your review, and sorry for the late reply. It took a while
> to finish the runtime test.
>
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 797f0ad5edd..d0e59e86a5c 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> > >                      waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > >                      avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > >                      avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > -                    tsxldtrkintrin.h"
> > > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Line more than 80 chars.
> >
> > >       ;;
> > >  x86_64-*-*)
> > >       cpu_type=i386
> > > @@ -447,7 +447,7 @@ x86_64-*-*)
> > >                      waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > >                      avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > >                      avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > -                    tsxldtrkintrin.h"
> > > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Ditto.
>
> Changed.
>
> >
> > > diff --git a/gcc/config/i386/amxbf16intrin.h
> b/gcc/config/i386/amxbf16intrin.h
> > > new file mode 100644
> > > index 00000000000..df0e2262d50
> > > --- /dev/null
> > > +++ b/gcc/config/i386/amxbf16intrin.h
> > > @@ -0,0 +1,25 @@
> > > +#if !defined _IMMINTRIN_H_INCLUDED
> > > +#error "Never use <amxbf16intrin.h> directly; include <immintrin.h>
> instead."
> > > +#endif
> > > +
> > > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > > +#define _AMXBF16INTRIN_H_INCLUDED
> > > +
> > > +#if !defined(__AMX_BF16__)
> > > +#pragma GCC push_options
> > > +#pragma GCC target("amx-bf16")
> > > +#define __DISABLE_AMX_BF16__
> > > +#endif /* __AMX_BF16__ */
> > > +
> > > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > > +#define _tile_dpbf16ps(dst,src1,src2)
>         \
> > > +  __asm__ volatile\
> > > +  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1",
> %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > > +#endif
> >
> > I hope in future we'll replace it with unspecs at least...
>
> Currently we think it is redundant to add builtins with just const int
> parameters, which are supposed to be replaced in the future.
>
> >
> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > > index c9f7195d423..9389dc24948 100644
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index bca8c856dc8..a46e31f5862 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> > >  -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b
> -mavx512vpopcntdq @gol
> > >  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
> > >  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > > +-mamx-tile -mamx-int8 -mamx-bf16@gol
> >
> > Add space please.
>
> Changed.
>
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > new file mode 100644
> > > index 00000000000..605a44df3f8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do assemble { target { ! ia32 } } } */
> > > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > > +/* { dg-require-effective-target amx_bf16 } */
> > > +#include"amxbf16-asmintel-1.c"
> >
> > I didn't get it. We ususally use second tescase to actually execute
> > it and (well, a little) verify that semantics is ok. E.g. that
> > operands order is correct. Could you please do that?
> > This applies to all *-2.c cases.
> > I've checked and looks like public SDE simulator supports AMX.
> >
>
> Added runtime test. Tested and passed under SDE.
>
> Also, we adjust the intrinsic call to accept macro parameters.
>
> Updated patch.
>
> > --
> > K
> > Hello,
> >
> > On 03 сен 08:17, H.J. Lu wrote:
> > > On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hello,
> > > >
> > > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > > > Hi:
> > > > >
> > > > > This patch is about to support Intel Advanced Matrix Extensions
> (AMX)
> > > > > which will be enabled in GLC.
> > > > >
> > > > > AMX is a new 64-bit programming paradigm consisting of two
> > > > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > > > sub-arrays from a larger 2-dimensional memory image,
> > > > > and an accelerator able to operate on tiles
> > > > >
> > > > > Supported instructions are
> > > > >
> > > > >
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > > > AMX-BF16:tdpbf16ps
> > > > >
> > > > > The intrinsics adopts constant tile register number as its input
> parameters.
> > > > >
> > > > > For detailed information, please refer to
> > > > >
> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > > > >
> > > > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > > > >
> > > > > OK for master?
> > > >
> > > > I was trying to apply your patch to recent master and got
> > > > compilation error:
> > > >
> > > > g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC     -fno-exceptions
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi
> > > > ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn
> > > > o-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I.
> -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
> > > > t/kyukhin/gcc/src/gcc/../include
> -I/export/kyukhin/gcc/src/gcc/../libcpp/include
> -I/export/kyukhin/gcc/src/gcc/../libdecnumber
> > > > -I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber
> -I/export/kyukhin/gcc/src/gcc/../libbacktrace   -o i386-opti
> > > > ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo
> /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function
> ‘bool ix86_option_override_internal(bool, gcc_options*, gcc_
> > > > options*)’:
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41:
> error: ‘PTA_AMX_TILE’ was not declared in this scope
> > > >   if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
> > > >                                          ^
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41:
> error: ‘PTA_AMX_INT8’ was not declared in this scope
> > > >   if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
> > > >                                          ^
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41:
> error: ‘PTA_AMX_BF16’ was not declared in this scope
> > > >   if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
> > > >
> > > > Could you please fix that?
> > >
> > > Here is the rebased patch against
> > >
> > > commit 3c219134152f645103f2fcd50735b177ccd76cde
> > > Author: Jonathan Wakely <jwakely@redhat.com>
> > > Date:   Thu Sep 3 12:38:50 2020 +0100
> > >
> > >     libstdc++: Optimise GCD algorithms
> > >
> > > Thanks.
> > >
> > > --
> > > H.J.
> >
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 797f0ad5edd..d0e59e86a5c 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> > >                      waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > >                      avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > >                      avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > -                    tsxldtrkintrin.h"
> > > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Line more than 80 chars.
> >
> > >       ;;
> > >  x86_64-*-*)
> > >       cpu_type=i386
> > > @@ -447,7 +447,7 @@ x86_64-*-*)
> > >                      waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > >                      avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > >                      avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > -                    tsxldtrkintrin.h"
> > > +                    tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Ditto.
> >
> > > diff --git a/gcc/config/i386/amxbf16intrin.h
> b/gcc/config/i386/amxbf16intrin.h
> > > new file mode 100644
> > > index 00000000000..df0e2262d50
> > > --- /dev/null
> > > +++ b/gcc/config/i386/amxbf16intrin.h
> > > @@ -0,0 +1,25 @@
> > > +#if !defined _IMMINTRIN_H_INCLUDED
> > > +#error "Never use <amxbf16intrin.h> directly; include <immintrin.h>
> instead."
> > > +#endif
> > > +
> > > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > > +#define _AMXBF16INTRIN_H_INCLUDED
> > > +
> > > +#if !defined(__AMX_BF16__)
> > > +#pragma GCC push_options
> > > +#pragma GCC target("amx-bf16")
> > > +#define __DISABLE_AMX_BF16__
> > > +#endif /* __AMX_BF16__ */
> > > +
> > > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > > +#define _tile_dpbf16ps(dst,src1,src2)
>         \
> > > +  __asm__ volatile\
> > > +  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1",
> %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > > +#endif
> >
> > I hope in future we'll replace it with unspecs at least...
> >
> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > > index c9f7195d423..9389dc24948 100644
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index bca8c856dc8..a46e31f5862 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> > >  -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b
> -mavx512vpopcntdq @gol
> > >  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
> > >  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > > +-mamx-tile -mamx-int8 -mamx-bf16@gol
> >
> > Add space please.
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > new file mode 100644
> > > index 00000000000..605a44df3f8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do assemble { target { ! ia32 } } } */
> > > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > > +/* { dg-require-effective-target amx_bf16 } */
> > > +#include"amxbf16-asmintel-1.c"
> >
> > I didn't get it. We ususally use second tescase to actually execute
> > it and (well, a little) verify that semantics is ok. E.g. that
> > operands order is correct. Could you please do that?
> > This applies to all *-2.c cases.
> > I've checked and looks like public SDE simulator supports AMX.
> >
> > --
> > K
>


-- 
Regards,

Hongyu, Wang

[-- Attachment #2: GCC_AMX_support_v3.patch --]
[-- Type: text/x-patch, Size: 65206 bytes --]

From f8d7df2fa959662f883ce03a05ffce08274e2d42 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 25 Jul 2019 16:49:36 +0800
Subject: [PATCH] Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16.

AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
AMX-BF16:tdpbf16ps

gcc/ChangeLog

	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
	OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
	OPTION_MASK_ISA2_AMX_TILE_UNSET, OPTION_MASK_ISA2_AMX_INT8_UNSET,
	OPTION_MASK_ISA2_AMX_BF16_UNSET, OPTION_MASK_ISA2_XSAVE_UNSET):
	New marcos.
	(ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
	* common/config/i386/i386-cpuinfo.h (processor_types): Add
	FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
	* common/config/i386/cpuinfo.h (XSTATE_TILECFG,
	XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
	(get_available_features): Enable AMX features only if
	their states are suoorited by OSXSAVE.
	* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
	for amx-tile, amx-int8, amx-bf16.
	* config.gcc: Add amxtileintrin.h, amxint8intrin.h,
	amxbf16intrin.h to extra headers.
	* config/i386/amxbf16intrin.h: New file.
	* config/i386/amxint8intrin.h: Ditto.
	* config/i386/amxtileintrin.h: Ditto.
	* config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
	New macro.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AMX_TILE__, __AMX_INT8__, AMX_BF16__.
	* config/i386/i386-options.c (ix86_target_string): Add
	-mamx-tile, -mamx-int8, -mamx-bf16.
	(ix86_option_override_internal): Handle AMX-TILE,
	AMX-INT8, AMX-BF16.
	* config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
	TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
	PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
	* config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* config/i386/immintrin.h: Include amxtileintrin.h,
	amxint8intrin.h, amxbf16intrin.h.
	* doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
	* doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
	* doc/sourcebuild.texi ((Effective-Target Keywords, Other
	hardware attributes): Document amx_int8, amx_tile, amx_bf16.

gcc/testsuite/ChangeLog

	* lib/target-supports.exp (check_effective_target_amx_tile,
	check_effective_target_amx_int8,
	check_effective_target_amx_bf16): New proc.
	* g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
	* g++.dg/other/i386-3.C: Ditto.
	* gcc.target/i386/sse-12.c: Ditto.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
	* gcc.target/i386/amx-check.h: New header file.
	* gcc.garget/i386/amxbf16-asmatt-1.c: New test.
	* gcc.target/i386/amxint8-asmatt-1.c: Ditto.
	* gcc.target/i386/amxtile-asmatt-1.c: Ditto.
	* gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
	* gcc.target/i386/amxint8-asmintel-1.c: Ditto.
	* gcc.target/i386/amxtile-asmintel-1.c: Ditto.
	* gcc.target/i386/amxbf16-dpbf16ps-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbssd-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbsud-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbusd-2.c: Ditto.
	* gcc.target/i386/amxint8-dpbuud-2.c: Ditto.
	* gcc.target/i386/amxtile-2.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h              |  16 ++
 gcc/common/config/i386/i386-common.c          |  50 +++++
 gcc/common/config/i386/i386-cpuinfo.h         |   3 +
 gcc/common/config/i386/i386-isas.h            |   3 +
 gcc/config.gcc                                |   6 +-
 gcc/config/i386/amxbf16intrin.h               |  29 +++
 gcc/config/i386/amxint8intrin.h               |  38 ++++
 gcc/config/i386/amxtileintrin.h               |  75 +++++++
 gcc/config/i386/cpuid.h                       |   3 +
 gcc/config/i386/i386-c.c                      |   7 +
 gcc/config/i386/i386-options.c                |  20 +-
 gcc/config/i386/i386.h                        |  12 +-
 gcc/config/i386/i386.opt                      |  14 +-
 gcc/config/i386/immintrin.h                   |   6 +
 gcc/doc/extend.texi                           |  15 ++
 gcc/doc/invoke.texi                           |  10 +
 gcc/doc/sourcebuild.texi                      |   9 +
 gcc/testsuite/g++.dg/other/i386-2.C           |   3 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |   3 +-
 gcc/testsuite/gcc.target/i386/amx-check.h     | 185 ++++++++++++++++++
 .../gcc.target/i386/amxbf16-asmatt-1.c        |  13 ++
 .../gcc.target/i386/amxbf16-asmintel-1.c      |   9 +
 .../gcc.target/i386/amxbf16-dpbf16ps-2.c      |  83 ++++++++
 .../gcc.target/i386/amxint8-asmatt-1.c        |  19 ++
 .../gcc.target/i386/amxint8-asmintel-1.c      |  15 ++
 .../gcc.target/i386/amxint8-dpbssd-2.c        |  62 ++++++
 .../gcc.target/i386/amxint8-dpbsud-2.c        |  61 ++++++
 .../gcc.target/i386/amxint8-dpbusd-2.c        |  61 ++++++
 .../gcc.target/i386/amxint8-dpbuud-2.c        |  61 ++++++
 gcc/testsuite/gcc.target/i386/amxtile-2.c     |  47 +++++
 .../gcc.target/i386/amxtile-asmatt-1.c        |  30 +++
 .../gcc.target/i386/amxtile-asmintel-1.c      |  24 +++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   6 +
 gcc/testsuite/gcc.target/i386/sse-12.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |   5 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |   3 +-
 gcc/testsuite/lib/target-supports.exp         |  33 ++++
 39 files changed, 1032 insertions(+), 13 deletions(-)
 create mode 100644 gcc/config/i386/amxbf16intrin.h
 create mode 100644 gcc/config/i386/amxint8intrin.h
 create mode 100644 gcc/config/i386/amxtileintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amx-check.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 12237e2f449..c96455ce64f 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -509,15 +509,20 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_OPMASK			0x20
 #define XSTATE_ZMM			0x40
 #define XSTATE_HI_ZMM			0x80
+#define XSTATE_TILECFG			0x20000
+#define XSTATE_TILEDATA		0x40000
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
 #define XCR_AVX512F_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
+#define XCR_AMX_ENABLED_MASK \
+  (XSTATE_TILECFG | XSTATE_TILEDATA)
 
   /* Check if AVX and AVX512 are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
+  int amx_usable = 0;
   if ((ecx & bit_OSXSAVE))
     {
       /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
@@ -533,6 +538,8 @@ get_available_features (struct __processor_model *cpu_model,
 	  avx512_usable = ((xcrlow & XCR_AVX512F_ENABLED_MASK)
 			   == XCR_AVX512F_ENABLED_MASK);
 	}
+      amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
+		    == XCR_AMX_ENABLED_MASK);
     }
 
 #define set_feature(f) \
@@ -651,6 +658,15 @@ get_available_features (struct __processor_model *cpu_model,
 	set_feature (FEATURE_PCONFIG);
       if (edx & bit_IBT)
 	set_feature (FEATURE_IBT);
+      if (amx_usable)
+	{
+	  if (edx & bit_AMX_TILE)
+	    set_feature (FEATURE_AMX_TILE);
+	  if (edx & bit_AMX_INT8)
+	    set_feature (FEATURE_AMX_INT8);
+	  if (edx & bit_AMX_BF16)
+	    set_feature (FEATURE_AMX_BF16);
+	}
       if (avx512_usable)
 	{
 	  if (ebx & bit_AVX512F)
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 5305145a8c9..27803fcb671 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -101,6 +101,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_SET OPTION_MASK_ISA2_AVX512VP2INTERSECT
+#define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -193,6 +196,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_XSAVE_UNSET \
   (OPTION_MASK_ISA_XSAVE | OPTION_MASK_ISA_XSAVEOPT_UNSET \
    | OPTION_MASK_ISA_XSAVES_UNSET | OPTION_MASK_ISA_XSAVEC_UNSET)
+#define OPTION_MASK_ISA2_XSAVE_UNSET OPTION_MASK_ISA2_AMX_TILE_UNSET
 #define OPTION_MASK_ISA_XSAVEOPT_UNSET OPTION_MASK_ISA_XSAVEOPT
 #define OPTION_MASK_ISA_AVX2_UNSET \
   (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET)
@@ -246,6 +250,9 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SERIALIZE_UNSET OPTION_MASK_ISA2_SERIALIZE
 #define OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET OPTION_MASK_ISA2_AVX512VP2INTERSECT
 #define OPTION_MASK_ISA2_TSXLDTRK_UNSET OPTION_MASK_ISA2_TSXLDTRK
+#define OPTION_MASK_ISA2_AMX_TILE_UNSET OPTION_MASK_ISA2_AMX_TILE
+#define OPTION_MASK_ISA2_AMX_INT8_UNSET OPTION_MASK_ISA2_AMX_INT8
+#define OPTION_MASK_ISA2_AMX_BF16_UNSET OPTION_MASK_ISA2_AMX_BF16
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -930,6 +937,47 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mamx_tile:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_XSAVE_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_XSAVE_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_TILE_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_int8:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_INT8_UNSET;
+	}
+      return true;
+
+    case OPT_mamx_bf16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_BF16_UNSET;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
@@ -1264,6 +1312,8 @@ ix86_handle_option (struct gcc_options *opts,
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_XSAVE_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_XSAVE_UNSET;
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_XSAVE_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_XSAVE_UNSET;
 	}
       return true;
 
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 84ca97e7ade..5b94b1f1df7 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -216,6 +216,9 @@ enum processor_features
   FEATURE_XSAVEC,
   FEATURE_XSAVEOPT,
   FEATURE_XSAVES,
+  FEATURE_AMX_TILE,
+  FEATURE_AMX_INT8,
+  FEATURE_AMX_BF16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 08c9dbecc76..3c830ea08ff 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -160,4 +160,7 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("xsaveopt", FEATURE_XSAVEOPT, P_NONE,
 			"-mxsaveopt")
   ISA_NAMES_TABLE_ENTRY("xsaves", FEATURE_XSAVES, P_NONE, "-mxsaves")
+  ISA_NAMES_TABLE_ENTRY("amx-tile", FEATURE_AMX_TILE, P_NONE, "-mamx-tile")
+  ISA_NAMES_TABLE_ENTRY("amx-int8", FEATURE_AMX_INT8, P_NONE, "-mamx-int8")
+  ISA_NAMES_TABLE_ENTRY("amx-bf16", FEATURE_AMX_BF16, P_NONE, "-mamx-bf16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 797f0ad5edd..5713e6d3893 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -412,7 +412,8 @@ i[34567]86-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
+		       amxbf16intrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -447,7 +448,8 @@ x86_64-*-*)
 		       waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
 		       avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
 		       avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
-		       tsxldtrkintrin.h"
+		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
+		       amxbf16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
new file mode 100644
index 00000000000..b1620963944
--- /dev/null
+++ b/gcc/config/i386/amxbf16intrin.h
@@ -0,0 +1,29 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxbf16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXBF16INTRIN_H_INCLUDED
+#define _AMXBF16INTRIN_H_INCLUDED
+
+#if !defined(__AMX_BF16__)
+#pragma GCC push_options
+#pragma GCC target("amx-bf16")
+#define __DISABLE_AMX_BF16__
+#endif /* __AMX_BF16__ */
+
+#if defined(__x86_64__) && defined(__AMX_BF16__)
+#define _tile_dpbf16ps_internal(dst,src1,src2)					\
+  __asm__ volatile\
+  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbf16ps(dst,src1,src2)					\
+  _tile_dpbf16ps_internal (dst, src1, src2)
+
+#endif
+
+#ifdef __DISABLE_AMX_BF16__
+#undef __DISABLE_AMX_BF16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_BF16__ */
+
+#endif /* _AMXBF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxint8intrin.h b/gcc/config/i386/amxint8intrin.h
new file mode 100644
index 00000000000..11adc1f1295
--- /dev/null
+++ b/gcc/config/i386/amxint8intrin.h
@@ -0,0 +1,38 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxint8intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXINT8INTRIN_H_INCLUDED
+#define _AMXINT8INTRIN_H_INCLUDED
+
+#if !defined(__AMX_INT8__)
+#pragma GCC push_options
+#pragma GCC target("amx-int8")
+#define __DISABLE_AMX_INT8__
+#endif /* __AMX_INT8__ */
+
+#if defined(__x86_64__) && defined(__AMX_INT8__)
+#define _tile_int8_dp_internal(name,dst,src1,src2)					\
+  __asm__ volatile							\
+  ("{"#name"\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|"#name"\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
+
+#define _tile_dpbssd(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbssd, dst, src1, src2)
+
+#define _tile_dpbsud(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbsud, dst, src1, src2)
+
+#define _tile_dpbusd(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbusd, dst, src1, src2)
+
+#define _tile_dpbuud(dst,src1,src2)					\
+  _tile_int8_dp_internal (tdpbuud, dst, src1, src2)
+
+#endif
+
+#ifdef __DISABLE_AMX_INT8__
+#undef __DISABLE_AMX_INT8__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_INT8__ */
+
+#endif /* _AMXINT8INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
new file mode 100644
index 00000000000..e78e5c04909
--- /dev/null
+++ b/gcc/config/i386/amxtileintrin.h
@@ -0,0 +1,75 @@
+#if !defined _IMMINTRIN_H_INCLUDED
+#error "Never use <amxtileintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AMXTILEINTRIN_H_INCLUDED
+#define _AMXTILEINTRIN_H_INCLUDED
+
+#if !defined(__AMX_TILE__)
+#pragma GCC push_options
+#pragma GCC target("amx-tile")
+#define __DISABLE_AMX_TILE__
+#endif /* __AMX_TILE__ */
+
+#if defined(__x86_64__) && defined(__AMX_TILE__)
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_loadconfig (const void *__config)
+{
+  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_storeconfig (void *__config)
+{
+  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
+}
+
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_tile_release (void)
+{
+  __asm__ volatile ("tilerelease" ::);
+}
+
+#define _tile_loadd(dst,base,stride)		\
+  _tile_loadd_internal (dst, base, stride)
+
+#define _tile_loadd_internal(dst,base,stride)				\
+  __asm__ volatile							\
+  ("{tileloadd\t(%0,%1,1), %%tmm"#dst"|tileloadd\t%%tmm"#dst", [%0+%1*1]}" \
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stream_loadd(dst,base,stride)		\
+  _tile_stream_loadd_internal (dst, base, stride)
+
+#define _tile_stream_loadd_internal(dst,base,stride)			\
+  __asm__ volatile							\
+  ("{tileloaddt1\t(%0,%1,1), %%tmm"#dst"|tileloaddt1\t%%tmm"#dst", [%0+%1*1]}" \
+   :: "r" ((const void*) base), "r" ((long) stride))
+
+#define _tile_stored(dst,base,stride)		\
+  _tile_stored_internal (dst, base, stride)
+
+#define _tile_stored_internal(src,base,stride)				\
+  __asm__ volatile							\
+  ("{tilestored\t%%tmm"#src", (%0,%1,1)|tilestored\t[%0+%1*1], %%tmm"#src"}" \
+   :: "r" ((void*) base), "r" ((long) stride) \
+   : "memory")
+
+#define _tile_zero(dst)				\
+  _tile_zero_internal (dst)
+
+#define _tile_zero_internal(dst)		\
+  __asm__ volatile				\
+  ("tilezero\t%%tmm"#dst ::)
+
+#endif
+
+#ifdef __DISABLE_AMX_TILE__
+#undef __DISABLE_AMX_TILE__
+#pragma GCC pop_options
+#endif /* __DISABLE_AMX_TILE__ */
+
+#endif /* _AMXTILEINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index bca61d620db..4598434fd02 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -127,6 +127,9 @@
 #define bit_PCONFIG	(1 << 18)
 #define bit_SERIALIZE	(1 << 14)
 #define bit_TSXLDTRK    (1 << 16)
+#define bit_AMX_BF16    (1 << 22)
+#define bit_AMX_TILE    (1 << 24)
+#define bit_AMX_INT8    (1 << 25)
 
 /* XFEATURE_ENABLED_MASK register bits (%eax == 0xd, %ecx == 0) */
 #define bit_BNDREGS     (1 << 3)
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 2d61a0ce70a..6a68e7caf08 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -588,6 +588,13 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__ENQCMD__");
   if (isa_flag2 & OPTION_MASK_ISA2_TSXLDTRK)
     def_or_undef (parse_in, "__TSXLDTRK__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_TILE)
+    def_or_undef (parse_in, "__AMX_TILE__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_INT8)
+    def_or_undef (parse_in, "__AMX_INT8__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AMX_BF16)
+    def_or_undef (parse_in, "__AMX_BF16__");
+
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index b93c338346f..f79b6a89270 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -209,7 +209,10 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavx512bf16",	OPTION_MASK_ISA2_AVX512BF16 },
   { "-menqcmd",		OPTION_MASK_ISA2_ENQCMD },
   { "-mserialize",	OPTION_MASK_ISA2_SERIALIZE },
-  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK }
+  { "-mtsxldtrk",	OPTION_MASK_ISA2_TSXLDTRK },
+  { "-mamx-tile",	OPTION_MASK_ISA2_AMX_TILE },
+  { "-mamx-int8",	OPTION_MASK_ISA2_AMX_INT8 },
+  { "-mamx-bf16",	OPTION_MASK_ISA2_AMX_BF16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1031,6 +1034,9 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("enqcmd", OPT_menqcmd),
     IX86_ATTR_ISA ("serialize", OPT_mserialize),
     IX86_ATTR_ISA ("tsxldtrk", OPT_mtsxldtrk),
+    IX86_ATTR_ISA ("amx-tile", OPT_mamx_tile),
+    IX86_ATTR_ISA ("amx-int8", OPT_mamx_int8),
+    IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2254,6 +2260,18 @@ ix86_option_override_internal (bool main_args_p,
 	    && !(opts->x_ix86_isa_flags2_explicit
 		 & OPTION_MASK_ISA2_AVX512BF16))
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512BF16;
+	if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_TILE))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_TILE;
+	if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_INT8))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_INT8;
+	if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
+	    && !(opts->x_ix86_isa_flags2_explicit
+		 & OPTION_MASK_ISA2_AMX_BF16))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_BF16;
         if (((processor_alias_table[i].flags & PTA_MOVDIRI) != 0)
             && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MOVDIRI))
           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MOVDIRI;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 92b7475a7bf..a449653cc3e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -203,6 +203,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_SERIALIZE_P(x) TARGET_ISA2_SERIALIZE_P(x)
 #define TARGET_TSXLDTRK	TARGET_ISA2_TSXLDTRK
 #define TARGET_TSXLDTRK_P(x) TARGET_ISA2_TSXLDTRK_P(x)
+#define TARGET_AMX_TILE TARGET_ISA2_AMX_TILE
+#define TARGET_AMX_TILE_P(x) TARGET_ISA2_AMX_TILE(x)
+#define TARGET_AMX_INT8 TARGET_ISA2_AMX_INT8
+#define TARGET_AMX_INT8_P(x) TARGET_ISA2_AMX_INT8(x)
+#define TARGET_AMX_BF16 TARGET_ISA2_AMX_BF16
+#define TARGET_AMX_BF16_P(x) TARGET_ISA2_AMX_BF16(x)
 
 #define TARGET_LP64	TARGET_ABI_64
 #define TARGET_LP64_P(x)	TARGET_ABI_64_P(x)
@@ -2466,6 +2472,9 @@ const wide_int_bitmask PTA_ENQCMD (0, HOST_WIDE_INT_1U << 15);
 const wide_int_bitmask PTA_CLDEMOTE (0, HOST_WIDE_INT_1U << 16);
 const wide_int_bitmask PTA_SERIALIZE (0, HOST_WIDE_INT_1U << 17);
 const wide_int_bitmask PTA_TSXLDTRK (0, HOST_WIDE_INT_1U << 18);
+const wide_int_bitmask PTA_AMX_TILE(0, HOST_WIDE_INT_1U << 19);
+const wide_int_bitmask PTA_AMX_INT8(0, HOST_WIDE_INT_1U << 20);
+const wide_int_bitmask PTA_AMX_BF16(0, HOST_WIDE_INT_1U << 21);
 
 const wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
   | PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
@@ -2499,7 +2508,8 @@ const wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_CLWB | PTA_AVX512VP2INTERSECT;
 const wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
-  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK;
+  | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
+  | PTA_AMX_INT8 | PTA_AMX_BF16;
 const wide_int_bitmask PTA_ALDERLAKE = PTA_SKYLAKE | PTA_CLDEMOTE | PTA_PTWRITE
   | PTA_WAITPKG | PTA_SERIALIZE;
 const wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index c9f7195d423..9389dc24948 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1114,4 +1114,16 @@ Support SERIALIZE built-in functions and code generation.
 
 mtsxldtrk
 Target Report Mask(ISA2_TSXLDTRK) Var(ix86_isa_flags2) Save
-Support TSXLDTRK built-in functions and code generation.
\ No newline at end of file
+Support TSXLDTRK built-in functions and code generation.
+
+mamx-tile
+Target Report Mask(ISA2_AMX_TILE) Var(ix86_isa_flags2) Save
+Support AMX-TILE built-in functions and code generation.
+
+mamx-int8
+Target Report Mask(ISA2_AMX_INT8) Var(ix86_isa_flags2) Save
+Support AMX-INT8 built-in functions and code generation.
+
+mamx-bf16
+Target Report Mask(ISA2_AMX_BF16) Var(ix86_isa_flags2) Save
+Support AMX-BF16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index b660d0d9040..6d25f44c303 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -144,6 +144,12 @@
 
 #include <tsxldtrkintrin.h>
 
+#include <amxtileintrin.h>
+
+#include <amxint8intrin.h>
+
+#include <amxbf16intrin.h>
+
 #include <rdseedintrin.h>
 
 #include <prfchwintrin.h>
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3b37aba5795..ba9a7a4d5f9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6623,6 +6623,21 @@ Enable/disable the generation of the XSAVEOPT instructions.
 @cindex @code{target("xsaves")} function attribute, x86
 Enable/disable the generation of the XSAVES instructions.
 
+@item amx-tile
+@itemx no-amx-tile
+@cindex @code{target("amx-tile")} function attribute, x86
+Enable/disable the generation of the AMX-TILE instructions.
+
+@item amx-int8
+@itemx no-amx-int8
+@cindex @code{target("amx-int8")} function attribute, x86
+Enable/disable the generation of the AMX-INT8 instructions.
+
+@item amx-bf16
+@itemx no-amx-bf16
+@cindex @code{target("amx-bf16")} function attribute, x86
+Enable/disable the generation of the AMX-BF16 instructions.
+
 @item cld
 @itemx no-cld
 @cindex @code{target("cld")} function attribute, x86
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bca8c856dc8..3e67108a67b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
 -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq @gol
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
+-mamx-tile  -mamx-int8  -mamx-bf16@gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mmemcpy-strategy=@var{strategy}  -mmemset-strategy=@var{strategy} @gol
@@ -30020,6 +30021,15 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @itemx -mserialize
 @opindex mserialize
+@need 200
+@itemx -mamx-tile
+@opindex mamx-tile
+@need 200
+@itemx -mamx-int8
+@opindex mamx-int8
+@need 200
+@itemx -mamx-bf16
+@opindex mamx-bf16
 These switches enable the use of instructions in the MMX, SSE,
 SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 65b2e552b74..b625f1e9f68 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2249,6 +2249,15 @@ Target supports the execution of @code{avx512f} instructions.
 @item avx512vp2intersect
 Target supports the execution of @code{avx512vp2intersect} instructions.
 
+@item amx_tile
+Target supports the execution of @code{amx-tile} instructions.
+
+@item amx_int8
+Target supports the execution of @code{amx-int8} instructions.
+
+@item amx_bf16
+Target supports the execution of @code{amx-bf16} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 04d5fec0f6c..449f30dbace 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h.h are usable
    with -O -pedantic-errors.  */
 
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index f40172ee9b5..29e98919386 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,11 +1,12 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
    avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
    avx512bitalgintrin.h, avx512vp2intersectintrin.h, tsxldtrkintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h are usable
    with -O -fkeep-inline-functions.  */
 
diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h b/gcc/testsuite/gcc.target/i386/amx-check.h
new file mode 100644
index 00000000000..03616ff0b8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amx-check.h
@@ -0,0 +1,185 @@
+#ifndef AMX_CHECK_H_INCLUDED
+#define AMX_CHECK_H_INCLUDED
+
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+#include "cpuid.h"
+
+/* TODO: The tmm emulation is temporary for current
+   AMX implementation with no tmm regclass, should
+   be changed in the future. */
+typedef struct __tile_config
+{
+  uint8_t palette_id; 
+  uint8_t start_row;   
+  uint8_t reserved_0[14];
+  uint16_t colsb[8]; /* Colum size of each tmm register in bytes */
+  uint16_t reserved_1[8];
+  uint8_t rows[8]; /* Row size of each tmm reg in bytes */
+  uint8_t reserved_2[8];
+} __tilecfg;
+
+typedef union __union_tile_config
+{
+  __tilecfg s;
+  uint8_t a[64];
+} __tilecfg_u;
+
+typedef struct __tile
+{
+  /* Max size of tile register */
+  uint8_t buf[1024];
+  int rows;
+  int colsb;
+} __tile;
+
+/* Maxium col/row size in bytes */
+#define MAX_ROWS 16
+#define MAX_COLS 64
+
+/* Stride (colum width in byte) used for tileload/store */
+#define _STRIDE 64
+
+/* Initialize tile config by setting all tmm size to 16x64 */
+void init_tile_config (__tilecfg_u *dst)
+{
+  int i;
+
+  dst->s.palette_id = 1;
+  dst->s.start_row = 0;
+
+  for (i = 0; i < 14; i++)
+    dst->s.reserved_0[i] = 0;
+
+  for (i = 0; i < 8; i++)
+  {
+    dst->s.colsb[i] = _STRIDE;
+    dst->s.rows[i] = 16;
+    dst->s.reserved_1[i] = 0;
+    dst->s.reserved_2[i] = 0;
+  }
+
+  _tile_loadconfig (dst->a);
+}
+
+/* Init __tile variable that going to be store to register
+   w/o extra buffer. If buffer exists, it should be the same
+   size matrix as corresponding tmm register.
+   Should execute init_tile_config first */
+void init_tile_src (const int tmm_num, __tile *src, uint8_t *buffer)
+{
+  int rows, colsb, i, j;
+  __tilecfg_u tmp;
+
+  _tile_storeconfig (tmp.a);
+
+  src->rows = rows = tmp.s.rows[tmm_num];
+  src->colsb = colsb = tmp.s.colsb[tmm_num];
+
+  for (i = 0; i < rows; i++)
+    for (j = 0; j < colsb; j++)
+    {
+      if(buffer)
+	src->buf[i * colsb + j] = buffer[i * colsb + j];
+      else
+	src->buf[i * colsb + j] = (i + 11 * j) % 256;
+    }
+
+}
+
+/* Init __tile src and corresponding tmm register */
+#define init_tile_reg_and_src(tmm_num, src)   \
+{					      \
+  init_tile_src (tmm_num, &src, NULL);	      \
+  _tile_loadd (tmm_num, src.buf, _STRIDE);   \
+}
+
+#define init_tile_reg_and_src_with_buffer(tmm_num, src, buffer) \
+{								\
+  init_tile_src (tmm_num, &src, buffer);				\
+  _tile_loadd (tmm_num, src.buf, _STRIDE);			\
+}
+
+/* Zero __tile src. It should be init first. */
+void zero_tile_src (__tile *src)
+{
+  int i, j;
+
+  for (i = 0; i < src->rows; i++)
+    for (j = 0; j < src->colsb; j++)
+      src->buf[i * src->colsb + j] = 0;
+}
+
+/* Compare tile config value with __tilecfg_u dst */
+int check_tile_config (__tilecfg_u *src, __tilecfg_u *dst)
+{
+  size_t size = sizeof(__tilecfg);
+  uint8_t *pa_src = (uint8_t *) src->a;
+  uint8_t *pa_dst = (uint8_t *) dst->a;
+
+  for (int i = 0; i < size; i++)
+    if (pa_src[i] != pa_dst[i])
+      return 0;
+
+  return 1;
+}
+
+/* Compare tile register value with __tile variable */
+int check_tile_register (__tile* ref, __tile* target)
+{
+  /* Tile register should be stored from tmm to
+     memory and compare with emulation results. */
+  int rows = target->rows;
+  int colsb = target->colsb;
+  int i, j;
+
+  for (i = 0; i < rows; i++)
+    for (j = 0; j < colsb; j++)
+	if (ref->buf[i * colsb + j] != target->buf[i * colsb + j])
+	    return 0;
+
+  return 1;
+}
+
+#ifndef DO_TEST
+#define DO_TEST do_test
+static void test_amx (void);
+__attribute__ ((noinline))
+static void
+do_test (void)
+{
+  test_amx ();
+}
+#endif
+
+int
+main ()
+{
+  /* Check cpu support for AMX */
+  if (__builtin_cpu_supports ("amx-tile")
+#ifdef AMX_INT8
+      && __builtin_cpu_supports ("amx-int8")
+#endif
+#ifdef AMX_BF16
+      && __builtin_cpu_supports ("amx-bf16")
+#endif
+      )
+    {
+      DO_TEST ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
new file mode 100644
index 00000000000..a5e5bddedac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmatt-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+#define TMM1 1
+#define TMM2 2
+#define TMM3 3
+
+void TEST ()
+{
+  _tile_dpbf16ps (TMM1, TMM2, TMM3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
new file mode 100644
index 00000000000..c2d6074387a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbf16ps\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbf16ps (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c b/gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c
new file mode 100644
index 00000000000..c819113897d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c
@@ -0,0 +1,83 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-bf16" } */
+#include <immintrin.h>
+
+#define AMX_BF16
+#define DO_TEST test_amx_bf16_dpbf16ps
+void test_amx_bf16_dpbf16ps ();
+#include "amx-check.h"
+
+/* Transformation functions between bf16/float */
+static uint16_t make_bf16 (float f)
+{
+  uint32_t u = (uint32_t)f;
+  u = (u >> 16) & 0xffff;
+  return (uint16_t)u;
+}
+
+static float make_f32 (uint16_t bf)
+{
+  uint32_t u = (uint32_t)(bf << 16);
+  return (float)u;
+}
+
+/* Init tile buffer with bf16 pairs */
+void init_bf16_max_tile_buffer (uint8_t *buf)
+{ 
+  int i, j;
+  uint16_t *ptr = (uint16_t *)buf;
+
+  for(i = 0; i < 16; i++)
+    for(j = 0; j < 32; j++)
+      {	
+	float f = 16.1f * i + 3.4f * j;
+	ptr[i * 32 + j] = make_bf16(f);
+      }
+}
+
+void calc_matrix_dpbf16ps (__tile *dst, __tile *src1, __tile *src2)
+{
+  uint16_t *src1_buf = (uint16_t *)src1->buf;
+  uint16_t *src2_buf = (uint16_t *)src2->buf;
+  float *dst_buf = (float *)dst->buf;
+  
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 2; t+=2)
+	  {    
+	    dst_buf[i * N + k] += 
+	      (make_f32(src1_buf[i * 4 * N + 4 * j + t]) *
+	      make_f32(src2_buf[j * 4 * K + 4 * k + t])) +
+	      (make_f32(src1_buf[i * 4 * N + 4 * j + t + 1]) *
+	      make_f32(src1_buf[i * 4 * N + 4 * j + t + 1]));
+	  }
+
+}
+
+void test_amx_bf16_dpbf16ps ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+
+  init_bf16_max_tile_buffer (tmp_dst_buf);
+  
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src_with_buffer (2, dst, tmp_dst_buf);
+  init_tile_reg_and_src_with_buffer (3, dst, tmp_dst_buf);
+
+  calc_matrix_dpbf16ps (&dst, &src1, &src2);
+  
+  _tile_dpbf16ps (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+
+  if (!check_tile_register (&dst_ref, &dst))
+        abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
new file mode 100644
index 00000000000..1842c234be8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm3+\[^\n\]*%tmm2+\[^\n\]*%tmm1"  } } */
+#include <immintrin.h>
+
+#define TMM1 1
+#define TMM2 2
+#define TMM3 3
+
+void TEST ()
+{
+  _tile_dpbssd (TMM1, TMM2, TMM3);
+  _tile_dpbsud (TMM1, TMM2, TMM3);
+  _tile_dpbusd (TMM1, TMM2, TMM3);
+  _tile_dpbuud (TMM1, TMM2, TMM3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
new file mode 100644
index 00000000000..bcfbb3fa5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-int8 -masm=intel" } */
+/* { dg-final { scan-assembler "tdpbssd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbsud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } *
+/* { dg-final { scan-assembler "tdpbusd\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+/* { dg-final { scan-assembler "tdpbuud\[ \\t]+\[^\n\]*%tmm1+\[^\n\]*%tmm2+\[^\n\]*%tmm3"  } } */
+#include <immintrin.h>
+
+void TEST ()
+{
+  _tile_dpbssd (1, 2, 3);
+  _tile_dpbsud (1, 2, 3);
+  _tile_dpbusd (1, 2, 3);
+  _tile_dpbuud (1, 2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c
new file mode 100644
index 00000000000..62d31ce3e81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c
@@ -0,0 +1,62 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbssd
+void test_amx_int8_dpbssd ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbssd (__tile *dst, __tile *src1, __tile *src2)
+{
+  int8_t *src1_buf = (int8_t *)src1->buf;
+  int8_t *src2_buf = (int8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] +=  
+	      ((int) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((int) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbssd ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbssd (&dst, &src1, &src2);
+
+  _tile_dpbssd (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c
new file mode 100644
index 00000000000..5007ee917f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c
@@ -0,0 +1,61 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbsud
+void test_amx_int8_dpbsud ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbsud (__tile *dst, __tile *src1, __tile *src2)
+{
+  int8_t *src1_buf = (int8_t *)src1->buf;
+  uint8_t *src2_buf = (uint8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] += 
+	      ((int) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((unsigned) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbsud ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbsud (&dst, &src1, &src2);
+  _tile_dpbsud (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c
new file mode 100644
index 00000000000..17888e26116
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c
@@ -0,0 +1,61 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbusd
+void test_amx_int8_dpbusd ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbusd (__tile *dst, __tile *src1, __tile *src2)
+{
+  uint8_t *src1_buf = (uint8_t *)src1->buf;
+  int8_t *src2_buf = (int8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] += 
+	      ((unsigned) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((int) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbusd ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbusd (&dst, &src1, &src2);
+  _tile_dpbusd (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c b/gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c
new file mode 100644
index 00000000000..c39666c3643
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c
@@ -0,0 +1,61 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -mamx-int8" } */
+#include <immintrin.h>
+
+#define AMX_INT8
+#define DO_TEST test_amx_int8_dpbuud
+void test_amx_int8_dpbuud ();
+#include "amx-check.h"
+
+/* Init tile buffer with int32 value*/
+void init_i32_max_tile_buffer (uint8_t *buf)
+{
+  int i, j;
+  int *ptr = (int *)buf;
+  for (i = 0; i < 16; i++)
+    for (j = 0; j < 16; j++)
+      ptr[i * 16 + j] = 2 * i - (16 - j);
+}
+
+void calc_matrix_dpbuud (__tile *dst, __tile *src1, __tile *src2)
+{
+  uint8_t *src1_buf = (uint8_t *)src1->buf;
+  uint8_t *src2_buf = (uint8_t *)src2->buf;
+  int *dst_buf = (int *)dst->buf;
+
+  int M = src1->rows;
+  int N = src1->colsb / 4;
+  int K = src2->colsb / 4;
+  int i, j, k, t;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < K; k++)
+	for (t = 0; t < 4; t++)
+	  {
+	    dst_buf[i * N + k] += 
+	      ((unsigned) src1_buf[i * 4 * N + 4 * j + t]) *
+	      ((unsigned) src2_buf[j * 4 * K + 4 * k + t]);
+	  }
+}
+
+void test_amx_int8_dpbuud ()
+{
+  __tilecfg_u cfg;
+  __tile dst, dst_ref, src1, src2;
+  uint8_t tmp_dst_buf[1024];
+  
+  init_i32_max_tile_buffer (tmp_dst_buf);
+
+  init_tile_config (&cfg);
+  init_tile_reg_and_src_with_buffer (1, dst, tmp_dst_buf);
+  init_tile_reg_and_src (2, src1);
+  init_tile_reg_and_src (3, src2);
+
+  calc_matrix_dpbuud (&dst, &src1, &src2);
+  _tile_dpbuud (1, 2, 3);
+  _tile_stored (1, dst_ref.buf, _STRIDE);
+  
+  if (!check_tile_register (&dst_ref, &dst))
+      abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-2.c b/gcc/testsuite/gcc.target/i386/amxtile-2.c
new file mode 100644
index 00000000000..cef84f9f479
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-2.c
@@ -0,0 +1,47 @@
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile " } */
+#include <immintrin.h>
+
+#define DO_TEST test_amx_tile
+void test_amx_tile ();
+#include "amx-check.h"
+
+void test_amx_tile ()
+{
+  __tilecfg_u cfg_src, cfg_dst;
+  __tile reg_src1, reg_src2, reg_ref;
+
+  /* check tile config load & store. */
+  init_tile_config (&cfg_src);
+  _tile_storeconfig (cfg_dst.a);
+
+  if (!check_tile_config (&cfg_src, &cfg_dst))
+    abort ();
+
+  /* check tile register load & store. */
+  init_tile_reg_and_src (1, reg_src1);
+  _tile_stored (1, reg_ref.buf, _STRIDE);
+  if (!check_tile_register (&reg_ref, &reg_src1))
+    abort ();
+
+  /* check tile stream load instruction */
+  init_tile_src (2, &reg_src2, NULL);
+  _tile_stream_loadd (2, reg_src2.buf, _STRIDE);
+  _tile_stored (2, reg_ref.buf, _STRIDE);
+  if (!check_tile_register (&reg_ref, &reg_src2))
+    abort ();
+
+  /* check tile register zeroing */
+  zero_tile_src (&reg_src2);
+  _tile_zero (2);
+  _tile_stored (2, reg_ref.buf, _STRIDE);
+  if (!check_tile_register (&reg_ref, &reg_src2))
+    abort ();
+
+  /* check tile cfg zeroing */
+  memset (cfg_dst.a, 0, sizeof(__tilecfg));
+  _tile_release ();
+  _tile_storeconfig (cfg_src.a);
+  if (!check_tile_config (&cfg_src, &cfg_dst))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
new file mode 100644
index 00000000000..ceb5fa4bde3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmatt-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]+\(\[^\)\n\]*\)"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]+\[^\n\]*%tmm\[0-9\]+\[^\n\]*\\(%\[a-z0-9]*\,%\[a-z0-9\]*\,\[124\]\\)"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+
+#define TMM0 0
+#define TMM1 1
+#define TMM2 2
+#define TMM3 3
+
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (TMM3, base, stride);
+  _tile_stream_loadd (TMM2, base, stride);
+  _tile_stored (TMM1, base, stride);
+  _tile_zero (TMM0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
new file mode 100644
index 00000000000..88ef612ed14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/amxtile-asmintel-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mamx-tile -masm=intel " } */
+/* { dg-final { scan-assembler "ldtilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "sttilecfg\[ \\t]"  } } */
+/* { dg-final { scan-assembler "tilerelease"  } } */
+/* { dg-final { scan-assembler "tileloadd\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tileloaddt1\[ \\t]%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilestored\[ \\t]\[^\n\]+\[^\n\]*%tmm\[0-9\]"  } } */
+/* { dg-final { scan-assembler "tilezero\[ \\t]+\[^\n\]*%tmm\[0-9\]"  } } */
+#include <immintrin.h>
+
+extern int a[];
+extern const void* base;
+extern const int stride;
+void TEST ()
+{
+  _tile_loadconfig (a);
+  _tile_storeconfig (a);
+  _tile_release ();
+  _tile_loadd (5, base, stride);
+  _tile_stream_loadd (4, base, stride);
+  _tile_stored (3, base, stride);
+  _tile_zero (2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 94ffbb64c75..8e669f12215 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -71,6 +71,9 @@ extern void test_tsxldtrk (void)		__attribute__((__target__("tsxldtrk")));
 extern void test_enqcmd (void)			__attribute__((__target__("enqcmd")));
 extern void test_avx512bf16 (void)		__attribute__((__target__("avx512bf16")));
 extern void test_avx512vp2intersect (void)	__attribute__((__target__("avx512vp2intersect")));
+extern void test_amx_tile (void)		__attribute__((__target__("amx-tile")));
+extern void test_amx_int8 (void)		__attribute__((__target__("amx-int8")));
+extern void test_amx_bf16 (void)		__attribute__((__target__("amx-bf16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -143,6 +146,9 @@ extern void test_no_tsxldtrk (void)		__attribute__((__target__("no-tsxldtrk")));
 extern void test_no_enqcmd (void)		__attribute__((__target__("no-enqcmd")));
 extern void test_no_avx512bf16 (void)		__attribute__((__target__("no-avx512bf16")));
 extern void test_no_avx512vp2intersect (void)	__attribute__((__target__("no-avx512vp2intersect")));
+extern void test_no_amx_tile (void)		__attribute__((__target__("no-amx-tile")));
+extern void test_no_amx_int8 (void)		__attribute__((__target__("no-amx-int8")));
+extern void test_no_amx_bf16 (void)		__attribute__((__target__("no-amx-bf16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index b1690d7204f..61146b2b30a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h gfniintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 3a6404707c4..4d6c9b3a17a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index edaa2aa8ad4..837b51c53e6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 7364b2ff337..fc75669f41b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -11,6 +11,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxldtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -102,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -219,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index eaadebef187..9ca7c5d919d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -10,6 +10,7 @@
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h, tsxtrkintrin.h,
    avx5124fmapsintrin.h, avx5124vnniwintrin.h, avx512vpopcntdqintrin.h,
    avx512bitalgintrin.h, avx512vp2intersectintrin.h,
+   amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h,
    avx512vp2intersectvlintrin.h and mm_malloc.h that reference the proper
    builtin functions.
    Defining away "extern" and "__inline" results in all of them being
@@ -697,6 +698,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 6881b66cd23..9ab54dc14ce 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8956,6 +8956,39 @@ proc check_effective_target_avx512vaes { } {
     } "-mvaes" ]
 }
 
+# Return 1 if amx-tile instructions can be compiled.
+proc check_effective_target_amx_tile { } {
+    return [check_no_compiler_messages amx_tile object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tilerelease" ::);
+	}
+    } "-mamx-tile" ]
+}
+
+# Return 1 if amx-int8 instructions can be compiled.
+proc check_effective_target_amx_int8 { } {
+    return [check_no_compiler_messages amx_int8 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbssd\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-int8" ]
+}
+
+# Return 1 if amx-bf16 instructions can be compiled.
+proc check_effective_target_amx_bf16 { } {
+    return [check_no_compiler_messages amx_bf16 object {
+	void
+	foo ()
+	{
+	    __asm__ volatile ("tdpbf16ps\t%%tmm1, %%tmm2, %%tmm3" ::);
+	}
+    } "-mamx-bf16" ]
+}
+
 # Return 1 if vpclmulqdq instructions can be compiled.
 proc check_effective_target_vpclmulqdq { } {
     return [check_no_compiler_messages vpclmulqdq object {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-09-11 17:00       ` Hongyu Wang
  2020-09-18  8:31         ` Hongyu Wang
@ 2020-09-28 11:38         ` Kirill Yukhin
  2020-09-28 12:07           ` Hongyu Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2020-09-28 11:38 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: H.J. Lu, Uros Bizjak, GCC Patches

Hello,

On 12 сен 01:00, Hongyu Wang wrote:
> Hi
> 
> Thanks for your review, and sorry for the late reply. It took a while
> to finish the runtime test.

Thanks for your fixes! The patch is OK for trunk.

--
Thanks, K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Enable GCC support for AMX
  2020-09-28 11:38         ` [PATCH] Enable GCC support for AMX Kirill Yukhin
@ 2020-09-28 12:07           ` Hongyu Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Hongyu Wang @ 2020-09-28 12:07 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: H.J. Lu, Uros Bizjak, GCC Patches

Thanks!  I'll ask my colleague to help check in the patch.

Kirill Yukhin <kirill.yukhin@gmail.com> 于2020年9月28日周一 下午7:38写道:

> Hello,
>
> On 12 сен 01:00, Hongyu Wang wrote:
> > Hi
> >
> > Thanks for your review, and sorry for the late reply. It took a while
> > to finish the runtime test.
>
> Thanks for your fixes! The patch is OK for trunk.
>
> --
> Thanks, K
>


-- 
Regards,

Hongyu, Wang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [committed] testsuite: Fix up amx* dg-do run tests with older binutils
  2020-09-18  8:31         ` Hongyu Wang
@ 2020-09-30 11:51           ` Jakub Jelinek
  2020-09-30 14:05             ` Hongyu Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Jelinek @ 2020-09-30 11:51 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Kirill Yukhin, Uros Bizjak, GCC Patches

On Fri, Sep 18, 2020 at 04:31:55PM +0800, Hongyu Wang via Gcc-patches wrote:
> Very Appreciated for your review again
> 
> I just update the patch with adding XSAVE dependency and use
> __builtin_cpu_supports for runtime test.

Several tests FAIL when using older binutils that don't support AMX.

Fixed thusly, tested on x86_64-linux -m32/-m64, committed to trunk as
obvious:

2020-09-30  Jakub Jelinek  <jakub@redhat.com>

	* gcc.target/i386/amxint8-dpbssd-2.c: Require effective targets
	amx_tile and amx_int8.
	* gcc.target/i386/amxint8-dpbsud-2.c: Likewise.
	* gcc.target/i386/amxint8-dpbusd-2.c: Likewise.
	* gcc.target/i386/amxint8-dpbuud-2.c: Likewise.
	* gcc.target/i386/amxbf16-dpbf16ps-2.c: Require effective targets
	amx_tile and amx_bf16.
	* gcc.target/i386/amxtile-2.c: Require effective target amx_tile.

--- gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c.jj	2020-09-29 11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c	2020-09-30 13:16:08.186445881 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include <immintrin.h>
 
--- gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c.jj	2020-09-29 11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c	2020-09-30 13:16:23.715221450 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include <immintrin.h>
 
--- gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c.jj	2020-09-29 11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c	2020-09-30 13:16:28.998145100 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include <immintrin.h>
 
--- gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c.jj	2020-09-29 11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c	2020-09-30 13:16:35.770047224 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_int8 } */
 /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
 #include <immintrin.h>
 
--- gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c.jj	2020-09-29 11:32:02.949602773 +0200
+++ gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c	2020-09-30 13:15:41.079837637 +0200
@@ -1,4 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
+/* { dg-require-effective-target amx_bf16 } */
 /* { dg-options "-O2 -mamx-tile -mamx-bf16" } */
 #include <immintrin.h>
 
--- gcc/testsuite/gcc.target/i386/amxtile-2.c.jj	2020-09-29 11:32:02.950602758 +0200
+++ gcc/testsuite/gcc.target/i386/amxtile-2.c	2020-09-30 13:16:57.972726339 +0200
@@ -1,4 +1,5 @@
 /* { dg-do run { target { ! ia32 } } } */
+/* { dg-require-effective-target amx_tile } */
 /* { dg-options "-O2 -mamx-tile " } */
 #include <immintrin.h>
 


	Jakub


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [committed] testsuite: Fix up amx* dg-do run tests with older binutils
  2020-09-30 11:51           ` [committed] testsuite: Fix up amx* dg-do run tests with older binutils Jakub Jelinek
@ 2020-09-30 14:05             ` Hongyu Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Hongyu Wang @ 2020-09-30 14:05 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Kirill Yukhin, Uros Bizjak, GCC Patches

Thanks for the fix! I forgot that we don't have builtin check for
target-supports.exp.

Will update these once we implement AMX with builtins.

Jakub Jelinek <jakub@redhat.com> 于2020年9月30日周三 下午7:51写道:

> On Fri, Sep 18, 2020 at 04:31:55PM +0800, Hongyu Wang via Gcc-patches
> wrote:
> > Very Appreciated for your review again
> >
> > I just update the patch with adding XSAVE dependency and use
> > __builtin_cpu_supports for runtime test.
>
> Several tests FAIL when using older binutils that don't support AMX.
>
> Fixed thusly, tested on x86_64-linux -m32/-m64, committed to trunk as
> obvious:
>
> 2020-09-30  Jakub Jelinek  <jakub@redhat.com>
>
>         * gcc.target/i386/amxint8-dpbssd-2.c: Require effective targets
>         amx_tile and amx_int8.
>         * gcc.target/i386/amxint8-dpbsud-2.c: Likewise.
>         * gcc.target/i386/amxint8-dpbusd-2.c: Likewise.
>         * gcc.target/i386/amxint8-dpbuud-2.c: Likewise.
>         * gcc.target/i386/amxbf16-dpbf16ps-2.c: Require effective targets
>         amx_tile and amx_bf16.
>         * gcc.target/i386/amxtile-2.c: Require effective target amx_tile.
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbssd-2.c    2020-09-30
> 13:16:08.186445881 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include <immintrin.h>
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbsud-2.c    2020-09-30
> 13:16:23.715221450 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include <immintrin.h>
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbusd-2.c    2020-09-30
> 13:16:28.998145100 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include <immintrin.h>
>
> --- gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c.jj 2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxint8-dpbuud-2.c    2020-09-30
> 13:16:35.770047224 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_int8 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-int8" } */
>  #include <immintrin.h>
>
> --- gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c.jj       2020-09-29
> 11:32:02.949602773 +0200
> +++ gcc/testsuite/gcc.target/i386/amxbf16-dpbf16ps-2.c  2020-09-30
> 13:15:41.079837637 +0200
> @@ -1,4 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
> +/* { dg-require-effective-target amx_bf16 } */
>  /* { dg-options "-O2 -mamx-tile -mamx-bf16" } */
>  #include <immintrin.h>
>
> --- gcc/testsuite/gcc.target/i386/amxtile-2.c.jj        2020-09-29
> 11:32:02.950602758 +0200
> +++ gcc/testsuite/gcc.target/i386/amxtile-2.c   2020-09-30
> 13:16:57.972726339 +0200
> @@ -1,4 +1,5 @@
>  /* { dg-do run { target { ! ia32 } } } */
> +/* { dg-require-effective-target amx_tile } */
>  /* { dg-options "-O2 -mamx-tile " } */
>  #include <immintrin.h>
>
>
>
>         Jakub
>
>

-- 
Regards,

Hongyu, Wang

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-09-30 14:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06  1:58 [PATCH] Enable GCC support for AMX Hongyu Wang
2020-07-07  3:24 ` Hongyu Wang
2020-07-17  5:40   ` Hongyu Wang
2020-07-24  5:41     ` Hongyu Wang
2020-08-04 12:17       ` Hongyu Wang
2020-08-04 14:47 ` Kirill Yukhin
2020-08-04 15:40   ` Hongyu Wang
2020-09-01  1:31     ` Hongyu Wang
2020-09-03 15:07 ` Kirill Yukhin
2020-09-03 15:17   ` H.J. Lu
2020-09-04 14:01     ` Kirill Yukhin
2020-09-11 17:00       ` Hongyu Wang
2020-09-18  8:31         ` Hongyu Wang
2020-09-30 11:51           ` [committed] testsuite: Fix up amx* dg-do run tests with older binutils Jakub Jelinek
2020-09-30 14:05             ` Hongyu Wang
2020-09-28 11:38         ` [PATCH] Enable GCC support for AMX Kirill Yukhin
2020-09-28 12:07           ` Hongyu Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).