From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=pQBH=DZ=intel.com=haochen.jiang@sourceware.org>
Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31])
	by sourceware.org (Postfix) with ESMTPS id 607F53858C41
	for <gcc-patches@gcc.gnu.org>; Tue,  8 Aug 2023 07:13:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 607F53858C41
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1691478801; x=1723014801;
  h=from:to:cc:subject:date:message-id:mime-version:
   content-transfer-encoding;
  bh=9WrwuNnHrHxdEhsjTNLqjtzv51fa1gT3jkg5YF1VkXU=;
  b=h7IIPA0TzQmvxhIJT5GlZk255fo6NnJgCRfxjjyKTp6uyygjPUjW7N7M
   TJUhIzB06XcTw9Ja6UFJUqlnVLhat7O8a8AWMS9BjjkOnzsyyZHmrL1Kw
   CxcolSywKnwTHLBrXUqXvY/6OASB2/EvJ3e2Ujg/Hnj5hgVGaOUlGjuGi
   oXJnTt0Rwlo/zO4kgH8bc71QDqmRQcgYYH6ce6QChtIcaz8oZ8doIpY+K
   ZZ27vdGhVXr80jF59XI6IcGdNm2T6LAJIlOVtks3PAZ1h6iNX68kpNlPM
   OcRGOtRkyuWpV0ypq1e3VFq9NgczRAHRkrJvGc4grj2lAUqvNyQoFI8Jk
   A==;
X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="434592296"
X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; 
   d="scan'208";a="434592296"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:13:17 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="845345865"
X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; 
   d="scan'208";a="845345865"
Received: from shvmail03.sh.intel.com ([10.239.245.20])
  by fmsmga002.fm.intel.com with ESMTP; 08 Aug 2023 00:13:15 -0700
Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127])
	by shvmail03.sh.intel.com (Postfix) with ESMTP id 7301D1005608;
	Tue,  8 Aug 2023 15:13:14 +0800 (CST)
From: Haochen Jiang <haochen.jiang@intel.com>
To: gcc-patches@gcc.gnu.org
Cc: ubizjak@gmail.com,
	hongtao.liu@intel.com
Subject: Intel AVX10.1 Compiler Design and Support
Date: Tue,  8 Aug 2023 15:13:09 +0800
Message-Id: <20230808071312.1569559-1-haochen.jiang@intel.com>
X-Mailer: git-send-email 2.31.1
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi all,

We will send out our initial support of AVX10 and some sample patches in this
mailing thread. And there will be more coming up afterwards. Therefore, we would
like to share our proposed AVX10 design in GCC.

Here is a quick introduction to AVX10:
  - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
  - Since the introduction of AVX10, we would like to establish a common,
    converged vector instruction set across all Intel architectures, including
    Xeon Server, Atom Server and Clients.
  - The default maximum vector size for AVX10 will be 256 bit, while 512 bit is
    optional.
  - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
  - There will be no new AVX512 CPUID introduced in future. All EVEX vector
    instructions will be under AVX10 umbrella.
  - AVX10 will be version-based ISA instead of tons of different CPUIDs like
    AVX512BW, AVX512DQ, AVX512FP16, etc.
  - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
    (Suppressed All Exceptions) control and new instructions.

If you would like to have a closed look at the details, please follow the links
below:

Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
It describes the Intel Advanced Vector Extensions 10 Instruction Set
Architecture.
https://cdrdv2.intel.com/v1/dl/getContent/784267

The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper
It provides introductory information regarding the converged vector ISA: Intel
Advanced Vector Extensions 10.
https://cdrdv2.intel.com/v1/dl/getContent/784343

Hence, we will have several compiler design ground rules for AVX10:
  - AVX10 is a converged ISA feature set.
    We will not provide -m[no-]xxx to enable/disable each single vector feature
    in one version as we used to before. Instead, a simple option -m[no-]avx10.x
    is used. If 512 bit version is needed, -mavx10.x-512 is all you need. Also,
    maximum vector width should be the same when different version of AVX10 is
    used. For example, enabling AVX10.1 with 512 bit vector width while enabling
    AVX10.2 with only 256 bit vector width is not a desired behavior.
  - AVX10 is an evolving ISA feature set.
    Every feature showed up in the current version will always show up in future
    version.
  - AVX10 is an independent ISA feature set.
    Although sharing the same instructions and encodings, AVX10 and AVX512 are
    conceptual independent features, which means they are orthogonal.

Since AVX10 will have several benefits like bringing AVX512 features on Atom
Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
option to enable features, we lean towards the adoption of AVX10 instead of
AVX512 from now on.

Based on all we got, we would like to introduce the following compiler options:
  - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
    256 bit vector width to make sure the compatibility on all platforms.
  - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 bit
    vector width. “-mno-avx10.x-512” option will not be provided to avoid
    confusion of disabling 512 vector width or avx10.x itself.
  - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 bit
    vector width. But it will disable 512 bit vector width since the vector size
    is indicated in option. “-mno-avx10.x-256” option will not be provided to
    keep align with the 512 ones.
  - -mno-avx10.x: The option will disable all the features introduced >=avx10.x
    (both 256 and 512 bit) and keep features <avx10.x if enabled, just like how
    -mno- options behave previously.

When there comes an option combination of various vector size indicated
(e.g. -mavx10.x-512 -mavx10.y-256), we would like to emit a warning since the
vector size conflicts under this scenario. Also in the warning message, we will
indicate the last mentioned vector size will be picked. The ISA set will be the
highest one.

For the auto dispatch support including function __builtin_cpu_supports (),
function multi versioning, function attribute usage, the behavior will be
identical to compiler options, which means we will have avx10.x, avx10.x-256,
avx10.x-512 and no-avx10.x.

As we have mentioned before, we lean towards the adoption of AVX10 instead of
AVX512 from now on. Hence, we don’t recommend users to combine the AVX10 and
legacy AVX512 options since different users will have different opinions on
compiler behavior with option combinations like “-m[no-]avx10.1 -m[no-]avx512f"
and it is hard to tell whether compiler should open or close the feature under
those scenarios. Furthermore, we don't guarantee that the behavior is
consistent between GCC and LLVM/ICX.

>From our understanding, we propose to maintain the independency between AVX10
and AVX512 switches. Therefore, opening one of them will turn on the feature,
no matter the other one is opened or not. We will emit a warning when user
enables one feature but disable the other afterwards. Some typical examples are
given to help better understand that:
  - -mno-avx512xxx: It will check if AVX10.1 is disabled when handling the
    option. If AVX10.1 is  disabled, it is valid and then disables AVX512xxx.
    If AVX10.1 not disabled, a warning will be emitted and -mno-avx512xxx will
    be ignored.
  - -mno-avx10.1: It will check if all AVX512 features in Granite Rapids are
    disabled when handling the option. If all disabled, it is valid and then
    disables all the features. If not, a warning will be emitted and
    -mno-avx10.1 will be ignored.
  - -mno-avx10.x (x >= 2): It is always valid.

Also, since we maintain the independency between AVX10 and AVX512 switches,
when using a compiler option of “-mavx10.x[-256] -mavx512xxx”, it will actually
open all the AVX10.x 128/256 bit vector instruction support and 512 bit vector
instruction support for AVX512xxx.

Last thing needed to be mentioned is -march options. We will imply AVX10
features for future platforms with AVX10 available, i.e., AVX10/512 for
Xeon Servers and AVX10/256 for Atom Servers and Clients. We purpose to change
the current -march=graniterapids/graniterapids-d from implying AVX512 features
to AVX10.1/512. No obvious behavior changes will happen for these two -march.

There will be a minor open after implying change: when we are using
-march=graniterapids -mno-avx512f or -mno-avx512f -march=graniterapids, it will
not disable AVX512F and it is a change in behavior. Should we emit a warning
for that? Our current behavior is not to emit a warning but I am open for
changes. However, I suppose if we finally choose to emit a warning, it should
only happen in Granite Rapids and Granite Rapids D since for the next
generation Xeon Server product, user should be aware of AVX10 change.

For the following nine patches, first three of them will be the initial support
for AVX10.1 while the latter six is the AVX10.1 support for AVX512DQ+AVX512VL.

If you have any questions, feel free to ask in this thread. Also, if you are
working on AVX512 related patterns during AVX10 upstreaming, especially
constraints, target check and iterators related, please kindly cc me in the
patches since there might be some conflicts.

Thx,
Haochen