From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2050.outbound.protection.outlook.com [40.107.20.50]) by sourceware.org (Postfix) with ESMTPS id 1DFD73858D34 for ; Thu, 2 May 2024 03:25:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1DFD73858D34 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1DFD73858D34 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.50 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714620357; cv=pass; b=xxFCJkEITlDKw3d1xrOJen2P5dGMCcEeWq8D3v7oJx3gkywESXcSA/9gmRlFsHLbjTSY4HujaMRTzaNk5z6lRLZ5WD1unWwmRyTy5OYS3BGqJeA3awa1DhHdFZ1QEPTaAk4ne+RsNLTLWxnkJiCp+gaeStjNFep2FX2MxQrFx8I= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1714620357; c=relaxed/simple; bh=m3ZnhXt81FlwB+I9Z5PDPZMJDPcSKiM0frPwzQVYckc=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=xaGnQud/f9/NRvQzkycMPG5QRe61Qs5Ymb2xzPvc/EY9ZSn5m593XBnfOXqHTPM3peNcfw9vOFzTYtV/w1IBmd05ms0dLBHqJUgwvSavq6QqhGJmwlMoFAGwhOwrS8spdK32d5KMT8sLGYvb6Ow1bPeE+Q96NWfjGxeikL7hDrs= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Rm7JyVo+ox5xkMulau9sgka1ORfXMeYAmL1iGbZAr4GdWLsWhGJ9gkbmuoun2tzCbeBpBmYFv3zE8UrTOTG3b0RBZRU9iQWLAxMFqTdFe5GUBhe2Adt+A1ftszM/wQBRA5jRjN7HoPTn2XBMU1ezoV164I9+BS501yFaLojllJbeIBetQCToEo8lIqJETUFqKsj0ANqUVqsCl58Y/Oq4c9aXLSIzNgPUos2y4V51aOL+HZPMXXDoqQaRZwbjTSqEaiq912Ap6KxaYfV05mRZMa+cRZh11MblODbRkG0MLa/SksWBZjKRwD5ZJRRPi7U/4P7MaaAzoRLa67RzIomRXg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tEeuw1WaeVEdO0SRTUz9D2Qhrn8F6TzB83dSMyJHMw4=; b=THaz9lx+EjHF4krL5YL3vJUgI4fb68pAu5435+7coFJKKnnwP8TCbJRO+J61zciccl8AhAW+P7yK9yXJ0Vfr2XXEKAYJM9z1P6DA7rlpTtO3oYm1YtOH/5w3CBavLPq/OAyhNC3ynhrgrZOTz3J3Z+GbpsxG9SspD++zxlb8n6MawudraRqsunkUeCUxTq5cpKXNMxbk2/zA9FLuFiex3VpCg02rP/DoGbGYqeycVKQESJr/LZDUJ94HhINFtLz8PSWloZPHkuMIZsg38l1xbjD5aFQh2coOBMF68mMfpjmfCWpcqhbSOn/TJz2GBMzv47TxYUiPmQU1BRHcPVskvw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tEeuw1WaeVEdO0SRTUz9D2Qhrn8F6TzB83dSMyJHMw4=; b=lJs3uG0/7TFYAQ5YTfMoGEIsyEZoI5PnEw9ZxnlJFQznQ6G38og9cG+4VEM0/zUSNtbVbC1eYQ7d3RCI2XEHqeANeuxkmWGL9PACgkk2GPIwBPmE7DUG03nFvXqBXe8BklfY5Ezec14PinPzTY3dPBVf/JgWcq1r9cvjsdwvuOs= Received: from DU2PR04CA0001.eurprd04.prod.outlook.com (2603:10a6:10:3b::6) by DU0PR08MB7763.eurprd08.prod.outlook.com (2603:10a6:10:3b9::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.28; Thu, 2 May 2024 03:25:46 +0000 Received: from DB3PEPF0000885C.eurprd02.prod.outlook.com (2603:10a6:10:3b:cafe::3f) by DU2PR04CA0001.outlook.office365.com (2603:10a6:10:3b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34 via Frontend Transport; Thu, 2 May 2024 03:25:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB3PEPF0000885C.mail.protection.outlook.com (10.167.242.7) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7544.18 via Frontend Transport; Thu, 2 May 2024 03:25:46 +0000 Received: ("Tessian outbound 9d9bf1c5d85a:v315"); Thu, 02 May 2024 03:25:46 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: cd49a4f04f615468 X-CR-MTA-TID: 64aa7808 Received: from 74a7873db732.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id F7D6C82C-71BA-4E72-A607-AB293E62DF34.1; Thu, 02 May 2024 03:25:39 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 74a7873db732.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 02 May 2024 03:25:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QwkAxxaWAYBfo8FmKYdxpLbbV9v3mNEIOIOrVO2WftOIXigKklAG2RGNWxVpndAy0nmnkEvGkW/o0m6J/OA3JHDG1WF33c2BWO0Ufkhc/HwwWffMvOxS7tZNTvVRQ+8GyZGY5WYaOPihgCucy4WTkvTpLkp18ShNLbi/ibK1q7gOIqzIRj0qgAx53pixlntPgmnFBJxgqafji/XupC+CJ3UGYLKF7b81rvIW+b+Dubb8VsIsD+7Omzwgv3NkkU8UEdFrLrmFbPCkXsKAA7UqHWBSBIzSyh02wdxy8KlGUIB0TO+ShuegQG0Y8idbEgs1kI2KefHa6n0SK0/CEtlcEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tEeuw1WaeVEdO0SRTUz9D2Qhrn8F6TzB83dSMyJHMw4=; b=haNWNbWkeauR3ig/wO8EHppgODfQbMwNzW/JyTRv5XyoU/T0iox4HvGE7yGZJzVbZASqvZAre5zkIistQ6XnSZynyrZu0FAiltNEiz0IYQxpnl40XCxfqLGcrk89NaH3e/CK7KJENzRgOJmzbjOzs2T42hJO+faWEcVcN7qVzflJoyvtmEDW3CNhQZenKwilii3bMkl8Lwv5j11uLsM1MrRebjG9kikJJhV5IxwbO6qMKTqOE4yjcgikgdeVi6qeHYc71J/2AbtbqsENc3TPUHO66MgLIJ9Cst/o5/+yyL1jvlRXxWRnjTJEAbwShmMrFkY2QV/TuQmVuWrFUOOujA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tEeuw1WaeVEdO0SRTUz9D2Qhrn8F6TzB83dSMyJHMw4=; b=lJs3uG0/7TFYAQ5YTfMoGEIsyEZoI5PnEw9ZxnlJFQznQ6G38og9cG+4VEM0/zUSNtbVbC1eYQ7d3RCI2XEHqeANeuxkmWGL9PACgkk2GPIwBPmE7DUG03nFvXqBXe8BklfY5Ezec14PinPzTY3dPBVf/JgWcq1r9cvjsdwvuOs= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DU0PR08MB8835.eurprd08.prod.outlook.com (2603:10a6:10:47c::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.34; Thu, 2 May 2024 03:25:33 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::a0e:800c:c8b2:5ff0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::a0e:800c:c8b2:5ff0%4]) with mapi id 15.20.7544.029; Thu, 2 May 2024 03:25:33 +0000 From: Tamar Christina To: "Li, Pan2" , "gcc-patches@gcc.gnu.org" CC: "juzhe.zhong@rivai.ai" , "kito.cheng@gmail.com" , "richard.guenther@gmail.com" , "Liu, Hongtao" Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD Thread-Topic: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD Thread-Index: AQHamgph0aCa30PRsU6/783Nro8/dLGCIxQwgAEljYCAAAIxsA== Date: Thu, 2 May 2024 03:25:32 +0000 Message-ID: References: <20240406120755.2692291-1-pan2.li@intel.com> <20240429075322.1587986-1-pan2.li@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|DU0PR08MB8835:EE_|DB3PEPF0000885C:EE_|DU0PR08MB7763:EE_ X-MS-Office365-Filtering-Correlation-Id: e013ce6c-19c4-4c7f-2dc4-08dc6a578e9b x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230031|366007|376005|1800799015|38070700009; X-Microsoft-Antispam-Message-Info-Original: =?us-ascii?Q?VBsf4TS/3LpESJhXK4kIwWfEsP0sQMvShaaGNyBbSY81WIvjXut9LaWGV2z3?= =?us-ascii?Q?tqo8WXSgqGTQs9pZ0s4cz1X1+B01Wfee9jEz8ZmP6MDJ3Jq9giPmxO2xPSAC?= =?us-ascii?Q?apbhkqFJLXJtu1JHDJJyhbJT/elalUvrHBuVheYut8D0My7AfqyCAMCAMYT3?= =?us-ascii?Q?BcS/QiBTDEqOXL1FNSvfTVokMRGpIwETt0BTjbPJuRVAmTvjbl+3YpCaPS4C?= =?us-ascii?Q?PgQXBnM+kbrumrrWUR7io5/GX8bbe2bgABKULoGNzhyGIvPD1kVwLSOOPaGN?= =?us-ascii?Q?45cHmodc38W1eeCKS1rUMbtY4dAEoy5rUFWd/VL4iS6G9GC6tRRRcu9rAxMw?= =?us-ascii?Q?GoNlnloQnxQ+iliqrYy0/0fdGejgsqoFy9u361iU0mq0UG+cEcGASse+4B0+?= =?us-ascii?Q?z5lnwj72AVRKcrBfx9HkmdpPI0pu3p3gwEcuU/3p1YgPH31V2UkOcdqBxCLD?= =?us-ascii?Q?jLMhNc419oPU+Qu+IlOTPJKP8CAfoXjA/qf9WNewH3X9MduGG3ekinjVb4U3?= =?us-ascii?Q?qdUNk4/1LJJe3RkdeRsRQN0tcjPyqFTGztB+v2YH245IvbmnFd+GyAiiFF6o?= =?us-ascii?Q?7farY9fEJ4NR4MntStBs7IIJAg9X6+AYrDOYh9hSMGTpEZf3jn6C7F+uTUPh?= =?us-ascii?Q?cAmDa9+MToh4qotXbj6EIUsRUh0pdSj2lmkybxvaloRnQGzfUbbBJyF3MnDJ?= =?us-ascii?Q?bcJhKiVZAVhu5ViPIFfw2aISHH8FeOcYfQgffm5xeJrxdmpderssk6EjJmYt?= =?us-ascii?Q?/QWsYg1qvCzIgLX21j/U6osispC9znln/Pe4JGwQNgwH/oXxJngz6gq8sx2z?= =?us-ascii?Q?VD+kec9isgGi9R1Nqz5IDgAdg/xCZNkcjHLUCdTvqaMjUvEPh/DNOCWtWV+4?= =?us-ascii?Q?LuumISDbGYviqSY7jy97WKUDplmlo7Kf77u3JqwaMwxwtwKwgqr4M7C4X6Za?= =?us-ascii?Q?A6IXq8ok6QNiTXZIl9OWWtT46giySwcNB1KZpMTbM/Q+e3fuhVfmcM+rAHVj?= =?us-ascii?Q?sJjT1I1ViSqsDGD9n2a3mJvF8ifspRucY1M1hqn3Z6wZe1raHi9e7LZCQSt7?= =?us-ascii?Q?qE75E5bKPkVkWsk1PRadhffrXJk3KHAC2RXGUnAWYCrr40CoNmFogiuoLyzs?= =?us-ascii?Q?vs5W9thVCYfBlDoJ4Valx1iEJc5/tvnSAi5oMQSIMhPRBZnh46JRYGFuZ1Qq?= =?us-ascii?Q?2SJgg4Is+0Fn0Ef68Ee9hu6SG22Mg7yAhLt+Fz9p5l0rNVkYTYX3j5ECEJAp?= =?us-ascii?Q?4aANxE9bPTXkFwP71JNqPJCwcDUrItm8UcmKpHeq5G9F+KNdcUaGqkiJ2JFH?= =?us-ascii?Q?SyV86yPQzQNuLPPgNizsXkyf9a80eIho4lmI49XxqwWSYg=3D=3D?= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366007)(376005)(1800799015)(38070700009);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8835 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB3PEPF0000885C.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: b1f5ccb1-1be4-46bc-14bd-08dc6a578663 X-Microsoft-Antispam: BCL:0;ARA:13230031|36860700004|376005|1800799015; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?ua2fzfnMwBzgNOBrvg1p43UJ7jU9OHKWRcZQ50SYtZcLirZndmVxfXnQT17+?= =?us-ascii?Q?paw1e+K910Az4oasITng9Ab5ENXn271eGDuErZC/E59j1x5UfrGmp6m3UtLh?= =?us-ascii?Q?zryGop5IE48sxf8JiiKojTN2VDHbNVcsQzEjjqz6L13VPgaQT6Twj/A2MSoj?= =?us-ascii?Q?yml3Ahe2mq//8r+KFx9qlr7I4xSB58phiJf7pGEM55YQogxu511dxfTomPlj?= =?us-ascii?Q?lw1IotIy9nPumSWcdzZ8iklg81yJ9X2klnhSDcsg+0UeuwzDrRZMX2MDuPWY?= =?us-ascii?Q?u15fPTAXMXEC/hjaA2nFJp1wsdpWNylOYt7D3A2pjYfe/TBkxvphWt/v1zoo?= =?us-ascii?Q?vvKOFNNp3OfRIYL3c1vMcQ7duHxoE0Tz8zxUDEnxaDoRRg4fR9lcSpD7x7Tb?= =?us-ascii?Q?A/7/tK9E4JhvZ5uPlgKGxF0PxH3nQ1gBCFiqPGGg/L9dn/hYhOG5aUTIOl90?= =?us-ascii?Q?edI5q2PSrBiS+2Nhdu5IDNdkbzUey7lECLFrZ/brVN07l7zldmjE+VFE/qPe?= =?us-ascii?Q?HXpiuQjNnJDaTzTv4E/XFiH8NBsztpMZo8gAR9CiQtr9J0CRPF+dNO792fS5?= =?us-ascii?Q?REDGgni2mMhkSFg9hH+kXpEy8V0WIHHmAeK4C96qzUDqUGhN1pq/4/45+B1t?= =?us-ascii?Q?oEKt6vbfege18z8lAnKbKfw4xPurbaubSAf47Xlfp/dSLmd83TEhFThTdYcC?= =?us-ascii?Q?zbjhrgBsYXBQNBow2yE6MyFcxbgbvjNOYUmLbDYDxj+hLRyWI5HpFAitLmnn?= =?us-ascii?Q?KzzFkbMYu//lN0zbZZwIJhbqUbeVzMtPeZdnEpCuKv/472pQtPpg4SojURZV?= =?us-ascii?Q?+9cErxbh5aBZeRG0pgtXmm+Rrd2IrLwvVzZRKaFnNGlKRLi3LI5XYDNr6tYK?= =?us-ascii?Q?8wfKFDXmbfjLXvsasSrdBSkOUzs2JoyN1guy/L2GamhuyfJCLgps52Lw5iDI?= =?us-ascii?Q?ZVfua7/Fy34Gkibn5DQeFuXqZAorAZ0+3wBfGULKGQYy06+6sZXvHm9uL4q8?= =?us-ascii?Q?JdwB8kCISz254pzg38sseEn5TUG8klCVmXP2rFGIS9Mxk11TKYExlLb2EJ3R?= =?us-ascii?Q?bZeS1EwyIu2uDEB8zSyL3Iyn7u2nCzwKU/el2fVti6onh5etr9vVAyf5SUh3?= =?us-ascii?Q?PMu9nxEdpAeqsCld1qs1f/JQ/y10AWrgAtyDzFxI5Mbc0twnZwLYTZpLsM7B?= =?us-ascii?Q?I/traF+PWcjp8/bt+hbJQfrumImASSt1dx6X2eSaN0y2fT3LUxhsS3mkUJMy?= =?us-ascii?Q?6U2A3UeJyTvEmfCF9+0GLHP2ftOKqMmxLVNQ6Vjl437t2poHm78WUFHR5aGh?= =?us-ascii?Q?0T3ydzLByJge0pv2OnfR7I1YpzDBALgRtJPOpU5foof5Z/jUGpIc4RbgLUXB?= =?us-ascii?Q?mA5YqiDudflGR2jEis5gANk9HDkr?= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(36860700004)(376005)(1800799015);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 May 2024 03:25:46.7001 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e013ce6c-19c4-4c7f-2dc4-08dc6a578e9b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB3PEPF0000885C.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7763 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,GIT_PATCH_0,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > -----Original Message----- > From: Li, Pan2 > Sent: Thursday, May 2, 2024 4:11 AM > To: Tamar Christina ; gcc-patches@gcc.gnu.org > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.co= m; > Liu, Hongtao > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_= ADD >=20 > Thanks Tamar >=20 > > Could you also split off the vectorizer change from scalar recog one? T= ypically I > would structure a change like this as: >=20 > > 1. create types/structures + scalar recogn > > 2. Vector recog code > > 3. Backend changes >=20 > Sure thing, will rearrange the patch like this. >=20 > > Is ECF_NOTHROW correct here? At least on most targets I believe the sca= lar > version > > can set flags/throw exceptions if the saturation happens? >=20 > I see, will remove that. >=20 > > Hmm I believe Richi mentioned that he wanted the recognition done in is= el? >=20 > > The problem with doing it in match.pd is that it replaces the operation= s quite > > early the pipeline. Did I miss an email perhaps? The early replacement = means we > > lose optimizations and things such as range calculations etc, since e.g= . ranger > > doesn't know these internal functions. >=20 > > I think Richi will want this in islet or mult widening but I'll continu= e with match.pd > > review just in case. >=20 > If I understand is correct, Richard suggested try vectorizer patterns fir= st and then > possible isel. > Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works= well for > SAT_ADD. > Let's wait the confirmation from Richard. Below are the original words fr= om > previous mail for reference. >=20 I think the comment he made was this > > Given we have saturating integer alu like below, could you help to coac= h me the most reasonable way to represent > > It in scalar as well as vectorize part? Sorry not familiar with this pa= rt and still dig into how it works... >=20 > As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for > the other cases. > > As I said, use vectorizer patterns and possibly do instruction > selection at ISEL/widen_mult time. So he was responding for how to do it for the vectorizer and scalar parts. Remember that the goal is not to introduce new gimple IL that can block oth= er optimizations. The vectorizer already introduces new IL (various IFN) but this is fine as = we don't track things like ranges for vector instructions. So we don't loose any information here. Now for the scalar, if we do an early replacement like in match.pd we preve= nt a lot of other optimizations because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late= , and so at this point we don't expect many more optimizations to happen, so it's a safe spot to insert mor= e IL with "unknown semantics". Was that your intention Richi? Thanks, Tamar > >> As I said, use vectorizer patterns and possibly do instruction > >> selection at ISEL/widen_mult time. >=20 > > The optimize checks in the match.pd file are weird as it seems to check= if we have > > optimizations enabled? >=20 > > We don't typically need to do this. >=20 > Sure, will remove this. >=20 > > The function has only one caller, you should just inline it into the pa= ttern. >=20 > Sure thing. >=20 > > Once you inline vect_sat_add_build_call you can do the check for > > vtype here, which is the cheaper check so perform it early. >=20 > Sure thing. >=20 > Thanks again and will send the v4 with all comments addressed, as well as= the test > results. >=20 > Pan >=20 > -----Original Message----- > From: Tamar Christina > Sent: Thursday, May 2, 2024 1:06 AM > To: Li, Pan2 ; gcc-patches@gcc.gnu.org > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.co= m; > Liu, Hongtao > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_= ADD >=20 > Hi, >=20 > > From: Pan Li > > > > Update in v3: > > * Rebase upstream for conflict. > > > > Update in v2: > > * Fix one failure for x86 bootstrap. > > > > Original log: > > > > This patch would like to add the middle-end presentation for the > > saturation add. Aka set the result of add to the max when overflow. > > It will take the pattern similar as below. > > > > SAT_ADD (x, y) =3D> (x + y) | (-(TYPE)((TYPE)(x + y) < x)) > > > > Take uint8_t as example, we will have: > > > > * SAT_ADD (1, 254) =3D> 255. > > * SAT_ADD (1, 255) =3D> 255. > > * SAT_ADD (2, 255) =3D> 255. > > * SAT_ADD (255, 255) =3D> 255. > > > > The patch also implement the SAT_ADD in the riscv backend as > > the sample for both the scalar and vector. Given below example: > > > > uint64_t sat_add_u64 (uint64_t x, uint64_t y) > > { > > return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); > > } > > > > Before this patch: > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > > { > > long unsigned int _1; > > _Bool _2; > > long unsigned int _3; > > long unsigned int _4; > > uint64_t _7; > > long unsigned int _10; > > __complex__ long unsigned int _11; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _11 =3D .ADD_OVERFLOW (x_5(D), y_6(D)); > > _1 =3D REALPART_EXPR <_11>; > > _10 =3D IMAGPART_EXPR <_11>; > > _2 =3D _10 !=3D 0; > > _3 =3D (long unsigned int) _2; > > _4 =3D -_3; > > _7 =3D _1 | _4; > > return _7; > > ;; succ: EXIT > > > > } > > > > After this patch: > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > > { > > uint64_t _7; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _7 =3D .SAT_ADD (x_5(D), y_6(D)); [tail call] > > return _7; > > ;; succ: EXIT > > } > > > > For vectorize, we leverage the existing vect pattern recog to find > > the pattern similar to scalar and let the vectorizer to perform > > the rest part for standard name usadd3 in vector mode. > > The riscv vector backend have insn "Vector Single-Width Saturating > > Add and Subtract" which can be leveraged when expand the usadd3 > > in vector mode. For example: > > > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned= n) > > { > > unsigned i; > > > > for (i =3D 0; i < n; i++) > > out[i] =3D (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < = x[i])); > > } > > > > Before this patch: > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned= n) > > { > > ... > > _80 =3D .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]); > > ivtmp_58 =3D _80 * 8; > > vect__4.7_61 =3D .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80,= 0); > > vect__6.10_65 =3D .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80= , 0); > > vect__7.11_66 =3D vect__4.7_61 + vect__6.10_65; > > mask__8.12_67 =3D vect__4.7_61 > vect__7.11_66; > > vect__12.15_72 =3D .VCOND_MASK (mask__8.12_67, { > 18446744073709551615, > > ... }, vect__7.11_66); > > .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.= 15_72); > > vectp_x.5_60 =3D vectp_x.5_59 + ivtmp_58; > > vectp_y.8_64 =3D vectp_y.8_63 + ivtmp_58; > > vectp_out.16_75 =3D vectp_out.16_74 + ivtmp_58; > > ivtmp_79 =3D ivtmp_78 - _80; > > ... > > } > > > > vec_sat_add_u64: > > ... > > vsetvli a5,a3,e64,m1,ta,ma > > vle64.v v0,0(a1) > > vle64.v v1,0(a2) > > slli a4,a5,3 > > sub a3,a3,a5 > > add a1,a1,a4 > > add a2,a2,a4 > > vadd.vv v1,v0,v1 > > vmsgtu.vv v0,v0,v1 > > vmerge.vim v1,v1,-1,v0 > > vse64.v v1,0(a0) > > ... > > > > After this patch: > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned= n) > > { > > ... > > _62 =3D .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]); > > ivtmp_46 =3D _62 * 8; > > vect__4.7_49 =3D .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62,= 0); > > vect__6.10_53 =3D .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62= , 0); > > vect__12.11_54 =3D .SAT_ADD (vect__4.7_49, vect__6.10_53); > > .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.= 11_54); > > ... > > } > > > > vec_sat_add_u64: > > ... > > vsetvli a5,a3,e64,m1,ta,ma > > vle64.v v1,0(a1) > > vle64.v v2,0(a2) > > slli a4,a5,3 > > sub a3,a3,a5 > > add a1,a1,a4 > > add a2,a2,a4 > > vsaddu.vv v1,v1,v2 > > vse64.v v1,0(a0) > > ... > > > > To limit the patch size for review, only unsigned version of > > usadd3 are involved here. The signed version will be covered > > in the underlying patch(es). > > > > The below test suites are passed for this patch. > > * The riscv fully regression tests. > > * The aarch64 fully regression tests. > > * The x86 bootstrap tests. > > * The x86 fully regression tests. > > > > PR target/51492 > > PR target/112600 > > > > gcc/ChangeLog: > > > > * config/riscv/autovec.md (usadd3): New pattern expand > > for unsigned SAT_ADD vector. > > * config/riscv/riscv-protos.h (riscv_expand_usadd): New func > > decl to expand usadd3 pattern. > > (expand_vec_usadd): Ditto but for vector. > > * config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to > > emit the vsadd insn. > > (expand_vec_usadd): New func impl to expand usadd3 for > > vector. > > * config/riscv/riscv.cc (riscv_expand_usadd): New func impl > > to expand usadd3 for scalar. > > * config/riscv/riscv.md (usadd3): New pattern expand > > for unsigned SAT_ADD scalar. > > * config/riscv/vector.md: Allow VLS mode for vsaddu. > > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD. > > * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. > > * match.pd: Add unsigned SAT_ADD match and simply. > > * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. > > * tree-vect-patterns.cc (vect_sat_add_build_call): New func impl > > to build the IFN_SAT_ADD gimple call. > > (vect_recog_sat_add_pattern): New func impl to recog the pattern > > for unsigned SAT_ADD. > > >=20 > Could you split the generic changes off from the RISCV changes? The RISCV > changes need to be reviewed by the backend maintainer. >=20 > Could you also split off the vectorizer change from scalar recog one? Typ= ically I > would structure a change like this as: >=20 > 1. create types/structures + scalar recogn > 2. Vector recog code > 3. Backend changes >=20 > Which makes review and bisect easier. I'll only focus on the generic bits= . >=20 > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > > index 2c764441cde..1104bb03b41 100644 > > --- a/gcc/internal-fn.cc > > +++ b/gcc/internal-fn.cc > > @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn) > > case IFN_UBSAN_CHECK_MUL: > > case IFN_ADD_OVERFLOW: > > case IFN_MUL_OVERFLOW: > > + case IFN_SAT_ADD: > > case IFN_VEC_WIDEN_PLUS: > > case IFN_VEC_WIDEN_PLUS_LO: > > case IFN_VEC_WIDEN_PLUS_HI: > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index 848bb9dbff3..47326b7033c 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, > ECF_CONST > > | ECF_NOTHROW, first, > > DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, > > first, > > smulhrs, umulhrs, binary) > > > > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW, > > first, > > + ssadd, usadd, binary) > > + >=20 > Is ECF_NOTHROW correct here? At least on most targets I believe the scala= r version > can set flags/throw exceptions if the saturation happens? >=20 > > DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary) > > DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary) > > DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary) > > diff --git a/gcc/match.pd b/gcc/match.pd > > index d401e7503e6..0b0298df829 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > || POINTER_TYPE_P (itype)) > > && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) > > >=20 > Hmm I believe Richi mentioned that he wanted the recognition done in isel= ? >=20 > The problem with doing it in match.pd is that it replaces the operations = quite > early the pipeline. Did I miss an email perhaps? The early replacement me= ans we > lose optimizations and things such as range calculations etc, since e.g. = ranger > doesn't know these internal functions. >=20 > I think Richi will want this in islet or mult widening but I'll continue = with match.pd > review just in case. >=20 > > +/* Unsigned Saturation Add */ > > +(match (usadd_left_part_1 @0 @1) > > + (plus:c @0 @1) > > + (if (INTEGRAL_TYPE_P (type) > > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > > + && types_match (type, TREE_TYPE (@0)) > > + && types_match (type, TREE_TYPE (@1))))) > > + > > +(match (usadd_right_part_1 @0 @1) > > + (negate (convert (lt (plus:c @0 @1) @0))) > > + (if (INTEGRAL_TYPE_P (type) > > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > > + && types_match (type, TREE_TYPE (@0)) > > + && types_match (type, TREE_TYPE (@1))))) > > + > > +(match (usadd_right_part_2 @0 @1) > > + (negate (convert (gt @0 (plus:c @0 @1)))) > > + (if (INTEGRAL_TYPE_P (type) > > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > > + && types_match (type, TREE_TYPE (@0)) > > + && types_match (type, TREE_TYPE (@1))))) >=20 > Predicates can be overloaded, so these two can just be usadd_right_part w= hich > then... >=20 > > + > > +/* Unsigned saturation add. Case 1 (branchless): > > + SAT_U_ADD =3D (X + Y) | - ((X + Y) < X) or > > + SAT_U_ADD =3D (X + Y) | - (X > (X + Y)). */ > > +(simplify > > + (bit_ior:c > > + (usadd_left_part_1 @0 @1) > > + (usadd_right_part_1 @0 @1)) > > + (if (optimize) (IFN_SAT_ADD @0 @1))) >=20 >=20 > The optimize checks in the match.pd file are weird as it seems to check i= f we have > optimizations enabled? >=20 > We don't typically need to do this. >=20 > > +(simplify > > + (bit_ior:c > > + (usadd_left_part_1 @0 @1) > > + (usadd_right_part_2 @0 @1)) > > + (if (optimize) (IFN_SAT_ADD @0 @1))) > > + >=20 > Allows you to collapse rules like these into one line. Similarly for belo= w. >=20 > Note that even when moving to gimple-isel you can reuse the match.pd cod= e by > Leveraging it to build the predicates for you and call them from another = pass. > See how ctz_table_index is used for example. >=20 > Doing this, moving it to gimple-isel.cc should be easy. >=20 > > +/* Unsigned saturation add. Case 2 (branch): > > + SAT_U_ADD =3D (X + Y) >=3D x ? (X + Y) : -1 or > > + SAT_U_ADD =3D x <=3D (X + Y) ? (X + Y) : -1. */ > > +(simplify > > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep) > > + (if (optimize) (IFN_SAT_ADD @0 @1))) > > +(simplify > > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep) > > + (if (optimize) (IFN_SAT_ADD @0 @1))) > > + > > +/* Vect recog pattern will leverage unsigned_integer_sat_add. */ > > +(match (unsigned_integer_sat_add @0 @1) > > + (bit_ior:c > > + (usadd_left_part_1 @0 @1) > > + (usadd_right_part_1 @0 @1)) > > + (if (optimize))) > > +(match (unsigned_integer_sat_add @0 @1) > > + (bit_ior:c > > + (usadd_left_part_1 @0 @1) > > + (usadd_right_part_2 @0 @1)) > > + (if (optimize))) > > +(match (unsigned_integer_sat_add @0 @1) > > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep) > > + (if (optimize))) > > +(match (unsigned_integer_sat_add @0 @1) > > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep) > > + (if (optimize))) > > + > > /* x > y && x !=3D XXX_MIN --> x > y > > x > y && x =3D=3D XXX_MIN --> false . */ > > (for eqne (eq ne) > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index ad14f9328b9..3f2cb46aff8 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3") > > OPTAB_NX(add_optab, "add$Q$a3") > > OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfun= c) > > OPTAB_VX(addv_optab, "add$F$a3") > > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', > > gen_signed_fixed_libfunc) > > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', > > gen_unsigned_fixed_libfunc) > > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', > > gen_signed_fixed_libfunc) > > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', > > gen_unsigned_fixed_libfunc) > > OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', > gen_int_fp_fixed_libfunc) > > OPTAB_NX(sub_optab, "sub$F$a3") > > OPTAB_NX(sub_optab, "sub$Q$a3") > ... > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index 87c2acff386..77924cf10f8 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo, > > return pattern_stmt; > > } > > > > +static gimple * > > +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *typ= e_out, > > + tree op_0, tree op_1) > > +{ > > + tree itype =3D TREE_TYPE (op_0); > > + tree vtype =3D get_vectype_for_scalar_type (vinfo, itype); > > + > > + if (vtype =3D=3D NULL_TREE) > > + return NULL; > > + > > + if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype, > > OPTIMIZE_FOR_SPEED)) > > + return NULL; > > + > > + *type_out =3D vtype; > > + > > + gcall *call =3D gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op= _1); > > + gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL)); > > + gimple_call_set_nothrow (call, /* nothrow_p */ true); > > + gimple_set_location (call, gimple_location (last_stmt)); > > + > > + vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt); > > + > > + return call; > > +} >=20 > The function has only one caller, you should just inline it into the patt= ern. >=20 > > +/* > > + * Try to detect saturation add pattern (SAT_ADD), aka below gimple: > > + * _7 =3D _4 + _6; > > + * _8 =3D _4 > _7; > > + * _9 =3D (long unsigned int) _8; > > + * _10 =3D -_9; > > + * _12 =3D _7 | _10; > > + * > > + * And then simplied to > > + * _12 =3D .SAT_ADD (_4, _6); > > + */ > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tre= e)); > > + > > +static gimple * > > +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo, > > + tree *type_out) > > +{ > > + gimple *last_stmt =3D stmt_vinfo->stmt; > > + >=20 > STMT_VINFO_STMT (stmt_vinfo); >=20 > > + if (!is_gimple_assign (last_stmt)) > > + return NULL; > > + > > + tree res_ops[2]; > > + tree lhs =3D gimple_assign_lhs (last_stmt); >=20 > Once you inline vect_sat_add_build_call you can do the check for > vtype here, which is the cheaper check so perform it early. >=20 > Otherwise this looks really good! >=20 > Thanks for working on it, >=20 > Tamar >=20 > > + > > + if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL)) > > + { > > + gimple *call =3D vect_sat_add_build_call (vinfo, last_stmt, type= _out, > > + res_ops[0], res_ops[1]); > > + if (call) > > + return call; > > + } > > + > > + return NULL; > > +} > > + > > /* Detect a signed division by a constant that wouldn't be > > otherwise vectorized: > > > > @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[= ] =3D > { > > { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, > > { vect_recog_divmod_pattern, "divmod" }, > > { vect_recog_mult_pattern, "mult" }, > > + { vect_recog_sat_add_pattern, "sat_add" }, > > { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" }, > > { vect_recog_gcond_pattern, "gcond" }, > > { vect_recog_bool_pattern, "bool" }, > > -- > > 2.34.1