From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2056.outbound.protection.outlook.com [40.107.22.56]) by sourceware.org (Postfix) with ESMTPS id B55743858C1F for ; Tue, 5 Sep 2023 07:25:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B55743858C1F Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BBxrygnJ9XUwgw7uqNOe8WCHkrnFV+uIfXrighuhvaPlvIcVXbc4/Igbi7+NH9pCnbd8Y3DbbADGxViMDPrXaXklRMxqXiqHb60AIXFERshwXWfC+wxxZ3F/NsTbW4jWCwO0IB4tajuwp15xOj7KrSjob4/3olvdTMiI/uUyrk2gZyf3opJd5sFnDg5vgxY7io27GKR50Vm4/c4KkKM9OR6MqZ1k+xJpaQAAF54wEHydCp/pN/fDE59pQtAuxRPuXuWX/kNpTdiFI6tISdLvAL+Sbd1or7VkxvR2bSLQHI1uH+u162Zgffh8C9d9+5iBOs024NOZaKOpci48nSs3sA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tmpVTmLVER4p7RE095fqfMwLrmtTd2Ay6wWsQNIwqaI=; b=MNIKNHQmUSsJ8Xq+W0UscFXjOGAekkeIN6Wt/dRLk/HASNUweznVzOCnstbF0VzrkwyHZAXsdwdFkTnZd9wdPzSmg2u+WwtNtvt9qnxl8pr48M63vFRpBl+dhQATPZ+1q+csHUdd1bIUuh3FUzgXj/Qg/LVHf03KM9payhBMsr7tSeFVqQQIv7wpJn55AW9WQHJTjI+UM8utsFwJhHMPGQywTEZEixpwFSi2hE8Agqrjgrw6ftVrpYWXCD9qCVNGb5GEICSC8LFtAKBJqu78iDO5Rq0G9TYoYy8FU2l7E6aJ96BYrchCurZ5lNnfejrBml2j2M6nXgWH1Ha7uRkLjw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tmpVTmLVER4p7RE095fqfMwLrmtTd2Ay6wWsQNIwqaI=; b=Uf9W2R2t0sYa8Pbz/17eOZdzjO2OSNIl9lYQpn2wnOop1htmw3gKKbLHgmClQq9GGgRIHjzBC8Udo7TgqNBy0GRmVcBKRYeCWnvRQwd22tdAX0MNSj/OtmmPpF8L7QzhthGsNGHOZAfuQORrd48NP2xr0eIV1zbf/UKD+dtV2MBFBezTIRC6NBxT/OLaqRX9Oze+sFQBi3vuoNRXsLq8gEKfLakL8HzDAphjKvobfBm1A4La26XAudCuBymr5NIuJGzJBW57ituTA7kh73GByjemsMU29VXPmVNkgT86+gxED69cFANWZxljEU6VgPQXCvaUa50KJZdE31uXpg94oA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) by VI1PR04MB10001.eurprd04.prod.outlook.com (2603:10a6:800:1d9::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6745.30; Tue, 5 Sep 2023 07:25:50 +0000 Received: from DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::f749:b27f:2187:6654]) by DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::f749:b27f:2187:6654%6]) with mapi id 15.20.6745.030; Tue, 5 Sep 2023 07:25:50 +0000 Message-ID: <15eaf902-08c3-11b2-2e0b-f4ff0adce83f@suse.com> Date: Tue, 5 Sep 2023 09:25:47 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0 Subject: Re: [PATCH 3/5] x86: support AVX10.1/512 Content-Language: en-US To: "Jiang, Haochen" Cc: Binutils , "H.J. Lu" References: <6f819651-36c0-1c69-8224-fe21f0f96a3f@suse.com> <3dc8a453-eb31-1caa-c003-4bee60bf0863@suse.com> <194c4c39-8af0-3eac-9138-df04bc3b2f87@suse.com> From: Jan Beulich In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR2P281CA0173.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:9f::8) To DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU2PR04MB8790:EE_|VI1PR04MB10001:EE_ X-MS-Office365-Filtering-Correlation-Id: e0cca5b6-d475-444d-0614-08dbade15446 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1V7WB8eQ+0aAXCkMJkkX3ZkLuXP/ldL7rcfIeGg2f6EzpgaNlVTXIAJHruOzwujIeHGyWXZv5ScTdEs6NzAzFqon+A7eSrgzUkQgKfFlnHpxY0FIyCa7DNkglVcWT/SpZNuxEVnzNdKn6iXNZce7NqR/VY+szlFN+eDjO9vIo2vklgidBi2e78XSzROFip2NuSot3xBhOZVWMuEcBcHVVJV5kSNKxL9cJfYWiZDaO8UDFgE7VYwj/4qoNmmvzXbmYFm55oVJiJTZgqeyYPmhRtG+7naHytz1TOm6c4N8KboRYBt8BW5lYryGz7eH2OPKnwVHbT3iDconM+aRyu1aiw+iYx8kCs3F8QbPmTsa0eJp/GrILuA9akS2R9xkOzj4faT0jfPWHkc3ZN6946TiZ49n1qHynI79ONBXdgMJ+ZgrJ8eSWjoxQCFC2RkmGSv9yCTXk86TpPCAAI29nQY+43pVPCEur8i0KpmI68EyiMx/gGFSvIKVogTbvUNEASFl9H+lfPvosc9Q2933oV4Mm00kFxER4lHcFSmm++MOowISrk7ZzyoF2WoVEnC0K0/v9YNYUyWfYoOUuHjLN1HI/tZGLS2sk1tprNCvp09pBpQNdMKlNFrWjjcGUhaBRPBQ/7sa9jKTLbE2Cyp0pVgBzA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DU2PR04MB8790.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(346002)(39850400004)(366004)(376002)(136003)(396003)(1800799009)(451199024)(186009)(8676002)(31686004)(5660300002)(6916009)(66556008)(316002)(66946007)(2906002)(36756003)(54906003)(66476007)(8936002)(4326008)(66899024)(41300700001)(6486002)(6506007)(53546011)(6512007)(26005)(6666004)(478600001)(38100700002)(2616005)(83380400001)(86362001)(31696002)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?cUU1MDVHaUxzNk5UR3lQTS85T1FWZENLckxJZGlHWUNUQ2FHUVJYNFdzYlNu?= =?utf-8?B?WnFFZTNXd3luenVNRm92M0tGOXA4SnY5WUk5RVF1bnY0cFJzbC9BTW9OUDBW?= =?utf-8?B?a2FlVEN1ZllBYVYvSzFzVk5uNmQ4aGhScG5hakg4bTFKRGVWVFZYK0pmSjB5?= =?utf-8?B?bEl1Y0g2M0NwbUozVVkxYW5aLzBldlRWZEFDdFhCL3FmbXpKRkxvL1hOV20x?= =?utf-8?B?N0dpQjBVYXltRC9wanA3MFBna3FNVi8xOU8wRkMzeHhXTlViZGlCTU1YNjY3?= =?utf-8?B?dFh0TFJSMXpIZVFnOFVxcy92SlE5aVlnYkcxbGZiUUx6cWVLaTZuckpKQ0xZ?= =?utf-8?B?akwxUUE0WUtweVc2ZGF1L0tSalFGeEVhRzNveVFRaXhYM2ZYOGltWUtZdW9z?= =?utf-8?B?amVyeWQ3VHdpb0V6aHBodVRUMnhDTWhsK0FLcWpKQUJjZWljNDRTeUFEcEJP?= =?utf-8?B?b3E5dStRYUdRNmlJbG4wb3VUdFdsU21aZytKT3haZmFwaUpXUFRBczhablJu?= =?utf-8?B?M0RyTy8wVTA3N0s0R2tRY2ZIWVNiU3NSdVpleUxIL0JmMHZFNlJUbm0rYW8r?= =?utf-8?B?d1lzT3RaYzhLZEp4d21FTmd3WnBtVG9IQmhwNFI2NStMSkpUdzhzYkMyejgr?= =?utf-8?B?V0VDa2NZSm1oZEVtVTFqYjdRSS9MQnBpNkJyaGdiZW1nNTlmUnNEN0dMbFVw?= =?utf-8?B?QjY1RUFQQWRrYmFqc1NkejVDVVFDRFg3d3JJR05DS1ZSdzVXUm9CS2dVZzBE?= =?utf-8?B?YjZ1UnlobEdrQm5seHlZVVRKUkp6WHMyZEZHZVNkckZjelVNdjN5TmFQdHBH?= =?utf-8?B?emM1TG44OEt5SXVlNEFNMWM2cW5vRTJKa2RteGYxZ2UzbGZVcm5DNnlBZGRQ?= =?utf-8?B?a2l1R0svSFpCVzYwb1NzS3d4ZnlLdjhnc0tscW0yek9qZjQ3L2psdjlFMXpu?= =?utf-8?B?eHZXR2REWHNCTU5QZUZ6Wk1jN0tkT0d1TURlMURiRlFwUi9EVG1JZUZpOHRr?= =?utf-8?B?Z080S2VxZ2NUbGhaTEdOb0dkbE1TaUdFeUVmL2JDbDlHeWt1WXdpeDJvRERz?= =?utf-8?B?MVNBckdOS2NMTXlscWc0M21uVTVoNCtZSEVBQ1BtNXVha25aajM2eTVhclpS?= =?utf-8?B?My9NWW4vVlVGRkF6SnQ5ODlzR25aK2xIeE5DdXJpUHJhaXozMWd1WTJuMVQ3?= =?utf-8?B?eml2cHBJM2FNd2QvU1YrMElpZmd4M3I5V2FVR3dRY1dmNDhRVnJ0T1lOUU9n?= =?utf-8?B?NDhzcnFsakdFTVJ5UjY2V2t2cFdmQVREdFJ2U0tFNWZYMy93VXArTDl6UTZO?= =?utf-8?B?Y1I1dTllT1lVSmd4RmkxVUYyM0s4QlZkeXB0UFByMXcwaUNrRkpreVFQNFd5?= =?utf-8?B?cWxvNnRXR3owMEU3OUdEbzNXOTVPelpocGd3RVBwSXNmNWVIRUpNVTMwSnEr?= =?utf-8?B?MzRMbnVNYUc1QXBkVnNQK0xnS0xrc0l4ZlJpSjgxSGJtcFFKa3BmZ3l2dFA4?= =?utf-8?B?TDg1SVB4YThsaUtWN0pQUUZOZE8ybks2UWNYb2NhVjRXbStMZEM3N3l0YjFT?= =?utf-8?B?bEF0bTh0TlBVMjZvNWtTS1kvZHZQVExPWTJySXVtQ2JmV1VDUGNDaFErcjJF?= =?utf-8?B?MmhEa0tJYUR2V21sKy8zQWN6TkdNTXdZTnBQcUdsYnJSYVdCU3oydTEyYjhC?= =?utf-8?B?dWFiWnYrSnB0YW5KV3dwbnlVMHZnUTVOUGlrbGJDNk90RzBXM3d4YjhhNlZI?= =?utf-8?B?NmhiQlM2b0pTTUxoVDE1WjVsTElvMXk2Z29qNDF1aHBDRmdvM1NHS3JGQzdM?= =?utf-8?B?dGw1UW1TamdSU2lUTnJVNFRaV0Q0YVFEcDE3Q2paQ2ZMZ1FXTGY5em9xdXNq?= =?utf-8?B?dzY5L2FOSXlGUzZDSGkvbGI5Y3NpVW5kODFhRkRoVGZBdmUvWUJCUmVjNTRi?= =?utf-8?B?TStXRmRsTnpyYkNIdVZVcXFhVmRlZlp2bUdxSnlzdDZkc0M4NXpFcnh2Y1E5?= =?utf-8?B?VEYzTzdvQk9vRFJSc1ErZ3FQeitSeGZQRlVDalRsemNlQ0dZenUvb3dxcktB?= =?utf-8?B?VkVKTVVLMkdpeThwQXYyeWhOQnY0ZUEwQm95UGIxc2JPc3pCZ0F0QThRbldG?= =?utf-8?Q?QMVM/+Fbj7Dib8nK5SfAol7Pg?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: e0cca5b6-d475-444d-0614-08dbade15446 X-MS-Exchange-CrossTenant-AuthSource: DU2PR04MB8790.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Sep 2023 07:25:49.8740 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ng7CzgcIz5hl8HmlgNxV4VmnX6LcJtL/DGsOtSWpaChl1oPQdzph0ttfrUehr4x6OBFIiCwybp+yNQSMZsxH+w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR04MB10001 X-Spam-Status: No, score=-3027.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_NUMSUBJECT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 05.09.2023 09:04, Jiang, Haochen wrote: >>>> Actually there's something similar with AVX10 itself: AVX512F includes >>>> equivalents right away of what comes under separate extensions for AVX: >>>> F16C and FMA. AVX10, otoh, is presently specified to only guarantee >>>> AVX and AVX2. Does that mean that VEX-encoded vfm{add,sub}* and ps<-ph >>>> conversion insns aren't guaranteed to also be available? Doesn't seem >>>> logical to me, so I'm inclined to make FMA and F16C prereqs of AVX10.1 >>>> as well (or alternatively of AVX512F, but I think this would have >>>> undesirable effects). AVX2 isn't an explicit prereq only because it >>>> already is one of AVX512F. >>> >>> I suppose AVX10 should only enable EVEX encoding, they have nothing >>> to do with the VEX encoding. >>> >>> For those independent VEX ISAs, if AVX512F is not enabling it, AVX10 neither. >>> >>> Actually, not only F16C and FMA, under AVX10, ISAs like AVX-VNNI, AVX-IFMA >>> are also not enabled. >> >> The difference to the AVX-* ones you mention is important here: AVX-VNNI >> (taking that as example) isn't a feature that had equivalent EVEX >> encodings added right in AVX512F. So I'd like to ask that you re-consider > > I see your point since here we are just focusing on features introduced in > AVX512F. But I still would like to mention AVX-VNNI below just for discussion. > >> what you said. Also think about what the compiler does (which doesn't >> emit .arch directives to limit the usable ISA extensions) when just >> -mavx512vl is passed to it: VEX-encoded vfm{add,sub}* would then still be >> resulting (to prevent that, the compiler would need to further emit {evex} >> pseudo-prefixes). IOW in the compiler there is such an implication already >> anyway. > > For FMA, in GCC, we have such comment on that: > > ;; The standard names for scalar FMA are only available with SSE math enabled. > ;; CPUID bit AVX512F enables evex encoded scalar and 512-bit fma. It doesn't > ;; care about FMA bit, so we enable fma for TARGET_AVX512F even when TARGET_FMA > ;; and TARGET_FMA4 are both false. > ;; TODO: In theory AVX512F does not automatically imply FMA, and without FMA > ;; one must force the EVEX encoding of the fma insns. Ideally we'd improve > ;; GAS to allow proper prefix selection. However, for the moment all hardware > ;; that supports AVX512F also supports FMA so we can ignore this for now. Interesting. I wonder what gas improvement is being thought about here, when gcc doesn't emit .arch. > Although splitting the pattern between FMA/FMA4 and AVX512F, the code itself actually > won't emit an {evex} prefix in mnemonic if there is only AVX512F since there is no true > hardware for codegen to do so. > > For F16C, the pattern is even not split, so the scenario is the same as FMA/FMA4. > > Therefore, I suppose it could be ok for AVX10 to imply FMA/F16C in gas for simplicity. But > let's wait for H.J.'s opinion on that. Okay, I'll submit v2 then with this just as a remark for the time being. Luckily in the follow-on work where I ran into this I now no longer depend on there being such an explicit connection. (Whether what I'm doing there is acceptable will need to be seen.) > For AVX-VNNI issue, it is introduced in Sapphire Rapids, which is before AVX10.1 introduction > (Granite Rapids), which means that on the hardware we will always have AVX-VNNI while > AVX10.1 is there. So there might be a chance to imply AVX-VNNI in AVX10.1 in compiler, > but we could put that discussion after everything in AVX10.1 is set in community. Hmm, yes. An implication from making it another prereq is that with AVX10.1 explicitly enabled, VEX encodings then ought to be preferred over the EVEX ones (for being shorter), except when Disp8-scaling helps shortening a memory reference. That'll for sure require extra code in tc-i386.c, so would likely want to be a separate patch then. (Actually I think we should already do so anyway when AVX-VNNI is explicitly enabled.) I'd then further raise the same question towards AVX-IFMA. Jan