From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2067.outbound.protection.outlook.com [40.107.20.67]) by sourceware.org (Postfix) with ESMTPS id 54F583858D3C for ; Wed, 1 May 2024 17:06:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 54F583858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 54F583858D3C Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.67 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714583182; cv=pass; b=nmannQdIN4RRfiGJJbXzSEdASvSrEjNKBWhnXCcKw0UwbEIWe/K9JpR7HRJJ8qDKju0EtpUH9Brt9Kcj/eik7UwuNI2ZlMxL/XZC6sr/leZa4hAOyVn9PiO4Ffzg3qgPN5DFL5KApM1GQCtWZp6QU3/r/Z0wSbSIvQ/sH4t/SAc= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714583182; c=relaxed/simple; bh=cpJnJIDNQNRhGOw8suCLWrTfxNlzIzGB+NZ2dBz/SKk=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=Yth4DkJ7rLXAQqQwpfaortMy+SrFr9nYvnRtNSaadWt7ydowlySHlp8tSgN5sc0Wp38nLjXWwWIzT+D7mhLh3xLYJYiT6ixvNMlZQV+8OX0Ro0tRZZbA1/vDa7gOsjPh+fB/Tfm6VGo+dfbM+kSzLGBnpVjhnWdQLl+3a024OIs= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=EZ2AibHPIHSBZCBYijPUFZqaLGtNeODUuVB0pNknt174HbP3TDaf/JRCeEGu2SYpGgoijIBYVSLb8R0x+8OPSoueO1nc1sbgDRXLsGLdBJQPdDKTrZW1+y3TKEkTgzfkgly3N6Yercp3uUQv8pwSL21XHkMx/L5ADRQjeJbj0JOLSQcBR/ZK0KDMH8nvI7/XeNrCVeGI/slDSwqdxHKFOYRZn88eu0/pxlkZe7geA8MG3RIaFZG7ukqxIzQPLDWhzNa682p8p0hjoUgVFy0xcVOlj4s439bkQPuOhmngSHNU8O4436rl43tVT2JuwTbsQczNz41rdhgu56XJaUw4Xg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/kPjO0P02gVvYDXDx264ilnf9tdqo92dO5/G3u2MXRQ=; b=hY8HSc/1pfMroP3Fyo3/5g9fhj9k0GAncCHTUssv5QlSq5UvfkKmVssGv8D3OrJBRtnuGD5yppiyjHaiyWzjl6182iPCP1RvT7QxSfOUkM5DZah30OzEviNWzpOlOQ4+Wj53xjewRGjMrA+BXYZVzQseC+TVHW/xTeCOPB2eSr8CFHOybmk8EFfBPYw713P6toONyb8Gk+TNyroQyrNT153aEr1HbD6CIl0QSekpDBLyIB36VXaa6Si2BFfx2MPA6Wljtitf/+jctX9dznAqV6v0AVq7KNzz8iUbfhy4Q3B2/Agm0WZ2CylwjcpsLUBXprPD0bRMhFSdo7oDG8SRtg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/kPjO0P02gVvYDXDx264ilnf9tdqo92dO5/G3u2MXRQ=; b=Gnts4GxDDO/kaNqANQts+3iwStWoJlCKbvBR6w7hTLvHniu76PC7xtOOIOpIStu3UgfDdiFV/uScIqgDJX+WyYIwvVBcmXvmAXTFaSSED2RnmkxqhsgxwkGJoTj50xMWMzJhX71aNARyMbYLDi9muxi1iF5wX1FK9lTG22wGWIs= Received: from AS9PR0301CA0042.eurprd03.prod.outlook.com (2603:10a6:20b:469::10) by DU0PR08MB8469.eurprd08.prod.outlook.com (2603:10a6:10:407::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34; Wed, 1 May 2024 17:06:12 +0000 Received: from AM2PEPF0001C70F.eurprd05.prod.outlook.com (2603:10a6:20b:469:cafe::ea) by AS9PR0301CA0042.outlook.office365.com (2603:10a6:20b:469::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.28 via Frontend Transport; Wed, 1 May 2024 17:06:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM2PEPF0001C70F.mail.protection.outlook.com (10.167.16.203) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7544.18 via Frontend Transport; Wed, 1 May 2024 17:06:12 +0000 Received: ("Tessian outbound 9d9bf1c5d85a:v315"); Wed, 01 May 2024 17:06:10 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d21d31911b6e0192 X-CR-MTA-TID: 64aa7808 Received: from 4e6dbcda397c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id DEE24AB8-4285-4AF5-B6DA-39E4B0E60122.1; Wed, 01 May 2024 17:06:04 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4e6dbcda397c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 01 May 2024 17:06:04 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Hf+52q/sgj3mutORTXjZD2+296j9/x1UxIJiCRwlZf4U2J/Eg2z++vy16KakKEgjNEltqvduoPhbhzf/XkBIZs+URd2OzUWNZZdPAKnlIttxvuyvXJxUQgWJwNbsqwh+htHGYJpGcsju4tCY8mwZRt3Rz9PiqT48H0TqV/67S2HWRgTYujXIXzSYLw0HYVcGRgZCwEDk89H3ZwnJuejzxY7M1qh9TiN6jZmUTajpBRx3Gjx8MN3CukCDTYFe5adiAQ51Dv3PzxojazTVlJ1JF5qOON2APxge8UM1kvxQ3EEN39QNNwzUDFE0wolgnWjb1zNqoZYRFk64vfaxD3cbdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/kPjO0P02gVvYDXDx264ilnf9tdqo92dO5/G3u2MXRQ=; b=LJpToUWkgIdhQtSEv3gjukdsCul2X2w3n48HrTYMEzgb+QXotQoBvA9djYknGPj27U4sPXJOYceiaMvglrDAwPDVT6JHIgC2MKWIQHLwpFUKUfPv0d29z901okmo58Xp5+VUhzEGHZhZjvSgCBSbooA6pcyxBlAhJ4fa+eBorh/qUeDa0NU+Sye7HUBOH1L6fT2p65TYgeQfhlIBityUq4/sV1LToIZ9AQEhbuLqc2xgg3x8M3iO1JJrjpTicRV7LnWjaNsKY+0tMio0RPna9dndIpH0lGiRwmYvdW40hbKn3+/YbgSpJgzOo8EIc3GH710LGy0jVIPfX9W7dBxX4A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/kPjO0P02gVvYDXDx264ilnf9tdqo92dO5/G3u2MXRQ=; b=Gnts4GxDDO/kaNqANQts+3iwStWoJlCKbvBR6w7hTLvHniu76PC7xtOOIOpIStu3UgfDdiFV/uScIqgDJX+WyYIwvVBcmXvmAXTFaSSED2RnmkxqhsgxwkGJoTj50xMWMzJhX71aNARyMbYLDi9muxi1iF5wX1FK9lTG22wGWIs= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB10115.eurprd08.prod.outlook.com (2603:10a6:10:4a2::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.36; Wed, 1 May 2024 17:06:01 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::a0e:800c:c8b2:5ff0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::a0e:800c:c8b2:5ff0%4]) with mapi id 15.20.7519.035; Wed, 1 May 2024 17:06:00 +0000 From: Tamar Christina To: "pan2.li@intel.com" , "gcc-patches@gcc.gnu.org" CC: "juzhe.zhong@rivai.ai" , "kito.cheng@gmail.com" , "richard.guenther@gmail.com" , "hongtao.liu@intel.com" Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD Thread-Topic: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD Thread-Index: AQHamgph0aCa30PRsU6/783Nro8/dLGCIxQw Date: Wed, 1 May 2024 17:06:00 +0000 Message-ID: References: <20240406120755.2692291-1-pan2.li@intel.com> <20240429075322.1587986-1-pan2.li@intel.com> In-Reply-To: <20240429075322.1587986-1-pan2.li@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|DB5PR08MB10115:EE_|AM2PEPF0001C70F:EE_|DU0PR08MB8469:EE_ X-MS-Office365-Filtering-Correlation-Id: ba9b6e20-3c66-4af9-819c-08dc6a01011c x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230031|1800799015|366007|376005|38070700009; X-Microsoft-Antispam-Message-Info-Original: =?us-ascii?Q?ghzHygv8n716LugwsaQ91FnHmzh/zVvUoYbjR7vLow44jUpe4boEyv+LUJU6?= =?us-ascii?Q?zpdF306FHFopgyBzkTxu3FjupOPBBGb0BOcl0jO9D80Pj1J67HA8DrPauhlH?= =?us-ascii?Q?dElEn34PYvuM3xgNnXpFTt89ntV/qrPKlin2LHjC//WdA8qY2cjcWBxuJ6BH?= =?us-ascii?Q?AjM6kbaWBCYzRtVWzZeprOVo/Bgk9p3kqWP6pwvtUI2vflld4cL8i7RnFJgJ?= =?us-ascii?Q?CHcqKVM2VzHBwDVEqKQVccev9A/aw9SEVGbzQUe3EE+sOMjsVBU0U32VEJU4?= =?us-ascii?Q?+t1ns6FiLjNanH1eNim2jNP9xyEZyFGKPDkuC/gdJNzCvd2g9vfvFHU5VY0p?= =?us-ascii?Q?7UyuhZXiYnRyJEdPrIF2VfRZhs4AKvRJC8hkSmvPT1w/0YeCw0/WKDqRbPrl?= =?us-ascii?Q?17KmlS5sybA9rMcaZQ6Kx+Nu/XdwI7LDtaSWzPrivGLaVeV997sd4jFYkOxU?= =?us-ascii?Q?QkUFmi167rvXuhfudwkTcBfdlaK5xkGUBKKNGQtWOKHwC/fJWR36uT279gME?= =?us-ascii?Q?ZwG+DETtr6XzjDN3TziW7c41dpYolhSqF3WVJqikYySMs8ne0Ru8hfrADUrd?= =?us-ascii?Q?VCc8WAzJCCF2F7AP02xEMHQJMiUylECcInPPvHvMVNaSo02ffSvTKLNxstWV?= =?us-ascii?Q?7TND1vVF6bLT5o+XwASDDIwXzkus7QIqecczlvUE8ixjlj88B+SgiY0cNnlF?= =?us-ascii?Q?wwIWQiPMu5nZTZuNVPdjeuj5ERRD/YYvbrLenpTmZ+XD26DagD0pDhMyvnCY?= =?us-ascii?Q?vNr+eMQt8zdtJMvnPTePFF1SUnyoVDRDOYYG1Y2kpxJ7dtkxWPEtuJZtPWSf?= =?us-ascii?Q?ApCNE0DhZ75zIjzLzLgM/3rRWtVX1rolYtAJpMjUnci+RvN4AYcaqXZw3vaD?= =?us-ascii?Q?SRH36QnA3Bypr1kbZTLCdLn9w7T0N3U82GyXyz0Cjk8A1to0ltfsR2IpulHP?= =?us-ascii?Q?dPNiSAD3Y8bYWGvndw0423DVC/ywdZp7N0rwn9lKQxpUf7be78ZngF/q2VyZ?= =?us-ascii?Q?BTwZ85TLBvBY/nNMg9vpmY8IXIUKanV33nMkjOSzSJIdgGuE4dm0fS9iwBb1?= =?us-ascii?Q?YSzt2+FkbqyFhfbTK6EDYFFFHW6cbpWf+mExL5J78zivmavc6qB7P/1PnixP?= =?us-ascii?Q?2WfP66HZOhOGr2tShu8/ULp3oCzDvS0x5RpXZ2cQ1F6Jv9EHsYuRJdMjJqAu?= =?us-ascii?Q?4rbJ4U+Z0/5s223LSarVuC+ZtoEjzXKRp8hjmI5+yqGF0DPJqU8Iypdc0frM?= =?us-ascii?Q?KMvWGM2hf895Pho6oHtT9GCCfKvPhkwU1BHjB4DMzztrcgFPlb1KXEvHjaJ3?= =?us-ascii?Q?y8MZoNVZm8rCTgqEd0MZnLVkyQsgQG+5zIMowXgLTW6fUA=3D=3D?= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(1800799015)(366007)(376005)(38070700009);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB10115 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM2PEPF0001C70F.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a80dae09-b4b9-46fc-a115-08dc6a00fa22 X-Microsoft-Antispam: BCL:0;ARA:13230031|376005|36860700004|82310400014|1800799015|35042699010; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?6nzcueerAcSK5LuH0q5t/G0PllME1O1y+E2+NprLjQGMFk5N8fY7bVZ0Fof/?= =?us-ascii?Q?2L1iBh+NHDXfWyKce12QsHDIasiae+DLpobdBy95LTBGa+Cv/knUR3/0X5O/?= =?us-ascii?Q?b/w7B3rmXTj6FKHSjC/IaJEkPUPlM0UCphFx5bjISqhYKYgD3UQUon2Dd5vt?= =?us-ascii?Q?0BUsrB4niAEQVtC4A0/zmZ6n8W86zl6hn+OVbsHDp0wdyqRETTOuYHQUmZ3j?= =?us-ascii?Q?nDAgNwQe6EXZI2VtuVmJ+icRYto7puJvWQa+emO1Qn0S+2EfYC9+n0YKpp1F?= =?us-ascii?Q?PSsz35+HumQjSF5Tnq7wGTXDh6oydCbEP7S1k/VcDjDpaAcqpu5Gmo41Ape/?= =?us-ascii?Q?91Ot2dBaJCSxPbzgqu9qDCeXqDxBoiPRMwAr/AAI5JEDGi9BrJwH95XTWnqM?= =?us-ascii?Q?vKnvMhkzUOiFiLj6yT43Pj3oKNvW8bO5ucEaOBAZ6gQ4/xQ4OBRNSprAScuQ?= =?us-ascii?Q?K/VbL+JCz0rDvIf6S/5k15opn8ET/7ck9pW4ZgLKymSziA7kdnbsHGeGPe+R?= =?us-ascii?Q?Z5lWrsZF3t4g9pmMG3zNYUcf/e8c3o6MBY1xyuEDw/hyHzm2poN+woCTUyTV?= =?us-ascii?Q?3ZB4zN2/yVEkBzSM4JsMVwURogteIJ022QY4YIe/2oymsQ1XT63OHd3Mqrgw?= =?us-ascii?Q?kjsejPJDopSO/SLU39kiyiu7FxAl8bh0cJ42wxe0VdfXlz9n5hxeWbyNIh+S?= =?us-ascii?Q?bX7qfXORYo6+T/I0ILoWYoE9yRjdqTD2gzTCgWN1crVMSDTpflBgb06wepGo?= =?us-ascii?Q?Md7PLWmgHr+PSOqAxCeG/pf+mtQyIFE7HLGIeC1dyyB5JbdXaDW5X1/YBmsM?= =?us-ascii?Q?pAmwLMEGjLZ5X2b9rKzpzbhpWq7cd0LhX6Avi6CTonVyjsPWLr4pLpMUmfuX?= =?us-ascii?Q?i0E+Gio9NEkTbt1ZybO7XuF17lTycdGIqdtCU8b7qzjteVB5n9ZoN31p6MCL?= =?us-ascii?Q?BrRuFvxSdfksytMUErUFoO3HLcZxaHgGDJLxTjt6IwHuUolEkShtGF8+kn+y?= =?us-ascii?Q?/7Vt5eaiRlDXoC6BSt0Xtjr8XzSLodDW6TIqtr2PfQPj2zNmI6gQYNOKL4xu?= =?us-ascii?Q?CwUn2ngWIVBvMTcaQM5bkccs0XfIN8GvxywyIAtfmLK9gFdTTX/ogXPetGvx?= =?us-ascii?Q?Z9CP4+25CfLcBfgPWJn/MrAkm37dlo6wJeigCxjAbw7UTLEsLTI+tNRqsuBg?= =?us-ascii?Q?LGAdAqCMcuN7Uj2saJ+QXf2wEzzlTWycAJBL+3+u9xcSLWte4eauFxKkfU9s?= =?us-ascii?Q?2NsTE7e7ZbfAGuTeC+xDaRH/ZcUISYlyJOSCsySmrHBO1UOdFOoKJs3SKgUV?= =?us-ascii?Q?++R+/0cL3pZXw/WbgKciIpxlTtca6IgjTOz0cIzUpL6vVpQBWkJVaBhbIVNQ?= =?us-ascii?Q?nVUbseTn5+pKyVVtkh634lCNMyYP?= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(376005)(36860700004)(82310400014)(1800799015)(35042699010);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 May 2024 17:06:12.0768 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ba9b6e20-3c66-4af9-819c-08dc6a01011c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM2PEPF0001C70F.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8469 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,GIT_PATCH_0,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, > From: Pan Li >=20 > Update in v3: > * Rebase upstream for conflict. >=20 > Update in v2: > * Fix one failure for x86 bootstrap. >=20 > Original log: >=20 > This patch would like to add the middle-end presentation for the > saturation add. Aka set the result of add to the max when overflow. > It will take the pattern similar as below. >=20 > SAT_ADD (x, y) =3D> (x + y) | (-(TYPE)((TYPE)(x + y) < x)) >=20 > Take uint8_t as example, we will have: >=20 > * SAT_ADD (1, 254) =3D> 255. > * SAT_ADD (1, 255) =3D> 255. > * SAT_ADD (2, 255) =3D> 255. > * SAT_ADD (255, 255) =3D> 255. >=20 > The patch also implement the SAT_ADD in the riscv backend as > the sample for both the scalar and vector. Given below example: >=20 > uint64_t sat_add_u64 (uint64_t x, uint64_t y) > { > return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); > } >=20 > Before this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > long unsigned int _1; > _Bool _2; > long unsigned int _3; > long unsigned int _4; > uint64_t _7; > long unsigned int _10; > __complex__ long unsigned int _11; >=20 > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _11 =3D .ADD_OVERFLOW (x_5(D), y_6(D)); > _1 =3D REALPART_EXPR <_11>; > _10 =3D IMAGPART_EXPR <_11>; > _2 =3D _10 !=3D 0; > _3 =3D (long unsigned int) _2; > _4 =3D -_3; > _7 =3D _1 | _4; > return _7; > ;; succ: EXIT >=20 > } >=20 > After this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > uint64_t _7; >=20 > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _7 =3D .SAT_ADD (x_5(D), y_6(D)); [tail call] > return _7; > ;; succ: EXIT > } >=20 > For vectorize, we leverage the existing vect pattern recog to find > the pattern similar to scalar and let the vectorizer to perform > the rest part for standard name usadd3 in vector mode. > The riscv vector backend have insn "Vector Single-Width Saturating > Add and Subtract" which can be leveraged when expand the usadd3 > in vector mode. For example: >=20 > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n= ) > { > unsigned i; >=20 > for (i =3D 0; i < n; i++) > out[i] =3D (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[= i])); > } >=20 > Before this patch: > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n= ) > { > ... > _80 =3D .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]); > ivtmp_58 =3D _80 * 8; > vect__4.7_61 =3D .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0= ); > vect__6.10_65 =3D .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, = 0); > vect__7.11_66 =3D vect__4.7_61 + vect__6.10_65; > mask__8.12_67 =3D vect__4.7_61 > vect__7.11_66; > vect__12.15_72 =3D .VCOND_MASK (mask__8.12_67, { 18446744073709551615, > ... }, vect__7.11_66); > .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15= _72); > vectp_x.5_60 =3D vectp_x.5_59 + ivtmp_58; > vectp_y.8_64 =3D vectp_y.8_63 + ivtmp_58; > vectp_out.16_75 =3D vectp_out.16_74 + ivtmp_58; > ivtmp_79 =3D ivtmp_78 - _80; > ... > } >=20 > vec_sat_add_u64: > ... > vsetvli a5,a3,e64,m1,ta,ma > vle64.v v0,0(a1) > vle64.v v1,0(a2) > slli a4,a5,3 > sub a3,a3,a5 > add a1,a1,a4 > add a2,a2,a4 > vadd.vv v1,v0,v1 > vmsgtu.vv v0,v0,v1 > vmerge.vim v1,v1,-1,v0 > vse64.v v1,0(a0) > ... >=20 > After this patch: > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n= ) > { > ... > _62 =3D .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]); > ivtmp_46 =3D _62 * 8; > vect__4.7_49 =3D .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0= ); > vect__6.10_53 =3D .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, = 0); > vect__12.11_54 =3D .SAT_ADD (vect__4.7_49, vect__6.10_53); > .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11= _54); > ... > } >=20 > vec_sat_add_u64: > ... > vsetvli a5,a3,e64,m1,ta,ma > vle64.v v1,0(a1) > vle64.v v2,0(a2) > slli a4,a5,3 > sub a3,a3,a5 > add a1,a1,a4 > add a2,a2,a4 > vsaddu.vv v1,v1,v2 > vse64.v v1,0(a0) > ... >=20 > To limit the patch size for review, only unsigned version of > usadd3 are involved here. The signed version will be covered > in the underlying patch(es). >=20 > The below test suites are passed for this patch. > * The riscv fully regression tests. > * The aarch64 fully regression tests. > * The x86 bootstrap tests. > * The x86 fully regression tests. >=20 > PR target/51492 > PR target/112600 >=20 > gcc/ChangeLog: >=20 > * config/riscv/autovec.md (usadd3): New pattern expand > for unsigned SAT_ADD vector. > * config/riscv/riscv-protos.h (riscv_expand_usadd): New func > decl to expand usadd3 pattern. > (expand_vec_usadd): Ditto but for vector. > * config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to > emit the vsadd insn. > (expand_vec_usadd): New func impl to expand usadd3 for > vector. > * config/riscv/riscv.cc (riscv_expand_usadd): New func impl > to expand usadd3 for scalar. > * config/riscv/riscv.md (usadd3): New pattern expand > for unsigned SAT_ADD scalar. > * config/riscv/vector.md: Allow VLS mode for vsaddu. > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD. > * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. > * match.pd: Add unsigned SAT_ADD match and simply. > * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. > * tree-vect-patterns.cc (vect_sat_add_build_call): New func impl > to build the IFN_SAT_ADD gimple call. > (vect_recog_sat_add_pattern): New func impl to recog the pattern > for unsigned SAT_ADD. >=20 Could you split the generic changes off from the RISCV changes? The RISCV c= hanges need to be reviewed by the backend maintainer. Could you also split off the vectorizer change from scalar recog one? Typic= ally I would structure a change like this as: 1. create types/structures + scalar recogn 2. Vector recog code 3. Backend changes Which makes review and bisect easier. I'll only focus on the generic bits. > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 2c764441cde..1104bb03b41 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn) > case IFN_UBSAN_CHECK_MUL: > case IFN_ADD_OVERFLOW: > case IFN_MUL_OVERFLOW: > + case IFN_SAT_ADD: > case IFN_VEC_WIDEN_PLUS: > case IFN_VEC_WIDEN_PLUS_LO: > case IFN_VEC_WIDEN_PLUS_HI: > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index 848bb9dbff3..47326b7033c 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST > | ECF_NOTHROW, first, > DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, > first, > smulhrs, umulhrs, binary) >=20 > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW, > first, > + ssadd, usadd, binary) > + Is ECF_NOTHROW correct here? At least on most targets I believe the scalar = version can set flags/throw exceptions if the saturation happens? > DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary) > DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary) > DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary) > diff --git a/gcc/match.pd b/gcc/match.pd > index d401e7503e6..0b0298df829 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > || POINTER_TYPE_P (itype)) > && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) >=20 Hmm I believe Richi mentioned that he wanted the recognition done in isel? The problem with doing it in match.pd is that it replaces the operations qu= ite early the pipeline. Did I miss an email perhaps? The early replacement mean= s we lose optimizations and things such as range calculations etc, since e.g. ra= nger doesn't know these internal functions. I think Richi will want this in islet or mult widening but I'll continue wi= th match.pd review just in case. > +/* Unsigned Saturation Add */ > +(match (usadd_left_part_1 @0 @1) > + (plus:c @0 @1) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@1))))) > + > +(match (usadd_right_part_1 @0 @1) > + (negate (convert (lt (plus:c @0 @1) @0))) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@1))))) > + > +(match (usadd_right_part_2 @0 @1) > + (negate (convert (gt @0 (plus:c @0 @1)))) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@1))))) Predicates can be overloaded, so these two can just be usadd_right_part whi= ch then... > + > +/* Unsigned saturation add. Case 1 (branchless): > + SAT_U_ADD =3D (X + Y) | - ((X + Y) < X) or > + SAT_U_ADD =3D (X + Y) | - (X > (X + Y)). */ > +(simplify > + (bit_ior:c > + (usadd_left_part_1 @0 @1) > + (usadd_right_part_1 @0 @1)) > + (if (optimize) (IFN_SAT_ADD @0 @1))) The optimize checks in the match.pd file are weird as it seems to check if = we have optimizations enabled? We don't typically need to do this. > +(simplify > + (bit_ior:c > + (usadd_left_part_1 @0 @1) > + (usadd_right_part_2 @0 @1)) > + (if (optimize) (IFN_SAT_ADD @0 @1))) > + Allows you to collapse rules like these into one line. Similarly for below. Note that even when moving to gimple-isel you can reuse the match.pd code = by Leveraging it to build the predicates for you and call them from another pa= ss. See how ctz_table_index is used for example. Doing this, moving it to gimple-isel.cc should be easy. > +/* Unsigned saturation add. Case 2 (branch): > + SAT_U_ADD =3D (X + Y) >=3D x ? (X + Y) : -1 or > + SAT_U_ADD =3D x <=3D (X + Y) ? (X + Y) : -1. */ > +(simplify > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep) > + (if (optimize) (IFN_SAT_ADD @0 @1))) > +(simplify > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep) > + (if (optimize) (IFN_SAT_ADD @0 @1))) > + > +/* Vect recog pattern will leverage unsigned_integer_sat_add. */ > +(match (unsigned_integer_sat_add @0 @1) > + (bit_ior:c > + (usadd_left_part_1 @0 @1) > + (usadd_right_part_1 @0 @1)) > + (if (optimize))) > +(match (unsigned_integer_sat_add @0 @1) > + (bit_ior:c > + (usadd_left_part_1 @0 @1) > + (usadd_right_part_2 @0 @1)) > + (if (optimize))) > +(match (unsigned_integer_sat_add @0 @1) > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep) > + (if (optimize))) > +(match (unsigned_integer_sat_add @0 @1) > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep) > + (if (optimize))) > + > /* x > y && x !=3D XXX_MIN --> x > y > x > y && x =3D=3D XXX_MIN --> false . */ > (for eqne (eq ne) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index ad14f9328b9..3f2cb46aff8 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3") > OPTAB_NX(add_optab, "add$Q$a3") > OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc) > OPTAB_VX(addv_optab, "add$F$a3") > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', > gen_signed_fixed_libfunc) > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', > gen_unsigned_fixed_libfunc) > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', > gen_signed_fixed_libfunc) > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', > gen_unsigned_fixed_libfunc) > OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libf= unc) > OPTAB_NX(sub_optab, "sub$F$a3") > OPTAB_NX(sub_optab, "sub$Q$a3") ... > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 87c2acff386..77924cf10f8 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo, > return pattern_stmt; > } >=20 > +static gimple * > +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_= out, > + tree op_0, tree op_1) > +{ > + tree itype =3D TREE_TYPE (op_0); > + tree vtype =3D get_vectype_for_scalar_type (vinfo, itype); > + > + if (vtype =3D=3D NULL_TREE) > + return NULL; > + > + if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype, > OPTIMIZE_FOR_SPEED)) > + return NULL; > + > + *type_out =3D vtype; > + > + gcall *call =3D gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1= ); > + gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL)); > + gimple_call_set_nothrow (call, /* nothrow_p */ true); > + gimple_set_location (call, gimple_location (last_stmt)); > + > + vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt); > + > + return call; > +} The function has only one caller, you should just inline it into the patter= n. > +/* > + * Try to detect saturation add pattern (SAT_ADD), aka below gimple: > + * _7 =3D _4 + _6; > + * _8 =3D _4 > _7; > + * _9 =3D (long unsigned int) _8; > + * _10 =3D -_9; > + * _12 =3D _7 | _10; > + * > + * And then simplied to > + * _12 =3D .SAT_ADD (_4, _6); > + */ > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)= ); > + > +static gimple * > +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo, > + tree *type_out) > +{ > + gimple *last_stmt =3D stmt_vinfo->stmt; > + STMT_VINFO_STMT (stmt_vinfo); > + if (!is_gimple_assign (last_stmt)) > + return NULL; > + > + tree res_ops[2]; > + tree lhs =3D gimple_assign_lhs (last_stmt); Once you inline vect_sat_add_build_call you can do the check for vtype here, which is the cheaper check so perform it early. Otherwise this looks really good! Thanks for working on it, Tamar > + > + if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL)) > + { > + gimple *call =3D vect_sat_add_build_call (vinfo, last_stmt, type_o= ut, > + res_ops[0], res_ops[1]); > + if (call) > + return call; > + } > + > + return NULL; > +} > + > /* Detect a signed division by a constant that wouldn't be > otherwise vectorized: >=20 > @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = =3D { > { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, > { vect_recog_divmod_pattern, "divmod" }, > { vect_recog_mult_pattern, "mult" }, > + { vect_recog_sat_add_pattern, "sat_add" }, > { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" }, > { vect_recog_gcond_pattern, "gcond" }, > { vect_recog_bool_pattern, "bool" }, > -- > 2.34.1