From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 607F53858C41 for ; Tue, 8 Aug 2023 07:13:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 607F53858C41 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691478801; x=1723014801; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=9WrwuNnHrHxdEhsjTNLqjtzv51fa1gT3jkg5YF1VkXU=; b=h7IIPA0TzQmvxhIJT5GlZk255fo6NnJgCRfxjjyKTp6uyygjPUjW7N7M TJUhIzB06XcTw9Ja6UFJUqlnVLhat7O8a8AWMS9BjjkOnzsyyZHmrL1Kw CxcolSywKnwTHLBrXUqXvY/6OASB2/EvJ3e2Ujg/Hnj5hgVGaOUlGjuGi oXJnTt0Rwlo/zO4kgH8bc71QDqmRQcgYYH6ce6QChtIcaz8oZ8doIpY+K ZZ27vdGhVXr80jF59XI6IcGdNm2T6LAJIlOVtks3PAZ1h6iNX68kpNlPM OcRGOtRkyuWpV0ypq1e3VFq9NgczRAHRkrJvGc4grj2lAUqvNyQoFI8Jk A==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="434592296" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="434592296" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:13:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="845345865" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="845345865" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga002.fm.intel.com with ESMTP; 08 Aug 2023 00:13:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 7301D1005608; Tue, 8 Aug 2023 15:13:14 +0800 (CST) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, hongtao.liu@intel.com Subject: Intel AVX10.1 Compiler Design and Support Date: Tue, 8 Aug 2023 15:13:09 +0800 Message-Id: <20230808071312.1569559-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi all, We will send out our initial support of AVX10 and some sample patches in this mailing thread. And there will be more coming up afterwards. Therefore, we would like to share our proposed AVX10 design in GCC. Here is a quick introduction to AVX10: - AVX10 is the first major new ISA since the introduction of AVX512 in 2013. - Since the introduction of AVX10, we would like to establish a common, converged vector instruction set across all Intel architectures, including Xeon Server, Atom Server and Clients. - The default maximum vector size for AVX10 will be 256 bit, while 512 bit is optional. - AVX10.1 will include all existing AVX512 instructions in Granite Rapids. - There will be no new AVX512 CPUID introduced in future. All EVEX vector instructions will be under AVX10 umbrella. - AVX10 will be version-based ISA instead of tons of different CPUIDs like AVX512BW, AVX512DQ, AVX512FP16, etc. - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE (Suppressed All Exceptions) control and new instructions. If you would like to have a closed look at the details, please follow the links below: Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification It describes the Intel Advanced Vector Extensions 10 Instruction Set Architecture. https://cdrdv2.intel.com/v1/dl/getContent/784267 The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper It provides introductory information regarding the converged vector ISA: Intel Advanced Vector Extensions 10. https://cdrdv2.intel.com/v1/dl/getContent/784343 Hence, we will have several compiler design ground rules for AVX10: - AVX10 is a converged ISA feature set. We will not provide -m[no-]xxx to enable/disable each single vector feature in one version as we used to before. Instead, a simple option -m[no-]avx10.x is used. If 512 bit version is needed, -mavx10.x-512 is all you need. Also, maximum vector width should be the same when different version of AVX10 is used. For example, enabling AVX10.1 with 512 bit vector width while enabling AVX10.2 with only 256 bit vector width is not a desired behavior. - AVX10 is an evolving ISA feature set. Every feature showed up in the current version will always show up in future version. - AVX10 is an independent ISA feature set. Although sharing the same instructions and encodings, AVX10 and AVX512 are conceptual independent features, which means they are orthogonal. Since AVX10 will have several benefits like bringing AVX512 features on Atom Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10 option to enable features, we lean towards the adoption of AVX10 instead of AVX512 from now on. Based on all we got, we would like to introduce the following compiler options: - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default 256 bit vector width to make sure the compatibility on all platforms. - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 bit vector width. “-mno-avx10.x-512” option will not be provided to avoid confusion of disabling 512 vector width or avx10.x itself. - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 bit vector width. But it will disable 512 bit vector width since the vector size is indicated in option. “-mno-avx10.x-256” option will not be provided to keep align with the 512 ones. - -mno-avx10.x: The option will disable all the features introduced >=avx10.x (both 256 and 512 bit) and keep features From our understanding, we propose to maintain the independency between AVX10 and AVX512 switches. Therefore, opening one of them will turn on the feature, no matter the other one is opened or not. We will emit a warning when user enables one feature but disable the other afterwards. Some typical examples are given to help better understand that: - -mno-avx512xxx: It will check if AVX10.1 is disabled when handling the option. If AVX10.1 is disabled, it is valid and then disables AVX512xxx. If AVX10.1 not disabled, a warning will be emitted and -mno-avx512xxx will be ignored. - -mno-avx10.1: It will check if all AVX512 features in Granite Rapids are disabled when handling the option. If all disabled, it is valid and then disables all the features. If not, a warning will be emitted and -mno-avx10.1 will be ignored. - -mno-avx10.x (x >= 2): It is always valid. Also, since we maintain the independency between AVX10 and AVX512 switches, when using a compiler option of “-mavx10.x[-256] -mavx512xxx”, it will actually open all the AVX10.x 128/256 bit vector instruction support and 512 bit vector instruction support for AVX512xxx. Last thing needed to be mentioned is -march options. We will imply AVX10 features for future platforms with AVX10 available, i.e., AVX10/512 for Xeon Servers and AVX10/256 for Atom Servers and Clients. We purpose to change the current -march=graniterapids/graniterapids-d from implying AVX512 features to AVX10.1/512. No obvious behavior changes will happen for these two -march. There will be a minor open after implying change: when we are using -march=graniterapids -mno-avx512f or -mno-avx512f -march=graniterapids, it will not disable AVX512F and it is a change in behavior. Should we emit a warning for that? Our current behavior is not to emit a warning but I am open for changes. However, I suppose if we finally choose to emit a warning, it should only happen in Granite Rapids and Granite Rapids D since for the next generation Xeon Server product, user should be aware of AVX10 change. For the following nine patches, first three of them will be the initial support for AVX10.1 while the latter six is the AVX10.1 support for AVX512DQ+AVX512VL. If you have any questions, feel free to ask in this thread. Also, if you are working on AVX512 related patterns during AVX10 upstreaming, especially constraints, target check and iterators related, please kindly cc me in the patches since there might be some conflicts. Thx, Haochen