From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by sourceware.org (Postfix) with ESMTPS id 20058384AB47 for ; Mon, 13 May 2024 13:36:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 20058384AB47 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 20058384AB47 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=192.198.163.14 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1715607401; cv=pass; b=h6qKjEVTxS/qV1rugvZAOG8lLfUt37MQ5KHBiuuu5ffcJkCuK59TFZM4coOq9raiaGcfOCu+CWkFMNQNfmcpZvgr6ajzRLyG6U0ywS8/c3js+TgNqO0O4tVySpmjo3udBxkrgqqK70SSvk0VeLPuufg+GHGth3TwzSZsaQdZ7lU= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1715607401; c=relaxed/simple; bh=3kcncwatVYuF5t4OUSHFRaKPfr8UJkcmA2AuC+HlYso=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=jokJcQ479tcRyx6t+Kar81NmFjhqvzrNXrDMdhXANKWjTD5Rbu448sH093HBi72Sa1g6jJsqQLHA/qNDTukxYolttT9JnanjSn3AkJzoUsaA3tKjUH9cql/ziGEEMmmxdUvL3z6SShakLujRd7RuECM1aSz8TKENeAQ57huyDbo= ARC-Authentication-Results: i=2; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715607399; x=1747143399; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=3kcncwatVYuF5t4OUSHFRaKPfr8UJkcmA2AuC+HlYso=; b=MPQGA5GflZY7WkMHH4uCjD/o+qQA7SJc85x1vIiKaCQiBsTaSuaS0DnK tyOncX3yHkKEEhtEDpClM4eW7818JjK4exUBPDtB7QPSYkMPPRhJ8xakR GHMxm5R+GCgaEsjLcth105hSNsUnDGId8879fkxXzWQIHiGWJg5ajT6tD QmY1hBjSmjW3qNkQimIAFKlQBFS7zIK4LiRHsiPTVTc3AIHnKFirMWk5V zHqeyjOSC/P7OlR2DOoXlScL1TqGjgw4HkiQOTMzBVNagUgULZr+kGx8u ThqMdksOY0843VtM1R/58CAXIZhXmGS0GBfLjQ3i8cCYI4RTLYljT1Bw7 A==; X-CSE-ConnectionGUID: Nw5GQ8vHQye8whpdI1VpdA== X-CSE-MsgGUID: uIBuwgWATbCjTv2fLO4rHQ== X-IronPort-AV: E=McAfee;i="6600,9927,11072"; a="11758139" X-IronPort-AV: E=Sophos;i="6.08,158,1712646000"; d="scan'208";a="11758139" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2024 06:36:32 -0700 X-CSE-ConnectionGUID: rTiDZ+daSIagUCg3o4S1yQ== X-CSE-MsgGUID: 5MnGvr3QQDmHwkz2zMa2YA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,158,1712646000"; d="scan'208";a="35214980" Received: from orsmsx601.amr.corp.intel.com ([10.22.229.14]) by orviesa004.jf.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 13 May 2024 06:36:32 -0700 Received: from orsmsx612.amr.corp.intel.com (10.22.229.25) by ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Mon, 13 May 2024 06:36:31 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX612.amr.corp.intel.com (10.22.229.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Mon, 13 May 2024 06:36:31 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend Transport; Mon, 13 May 2024 06:36:31 -0700 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.172) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Mon, 13 May 2024 06:36:31 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=As08/XXDa1RcT5U7U8c3RrxY9C7g4gRUoxCbYkUxWswSpQ6atqoEz+uYQ9R9FwMYaxiVLui5QTMYlzjs5ykhXgoDDNIfsZEZ4Z+ay49Q/4OS88izMBSWy94j8SKF0Tpb0UhERxR1uVJTkkEdHaQwgVQAQn/khM7UzRRFLnSFYmx7mMW+I0Has9B/Bme1CFet4czVaBL7AzWaih3mbRJSZSPt8O3WwJbPaMmXBZyppbtqfcA5YlWaxM0LhTwDPcfhNFJaEFKw1IZ+uEcswNcONc3GEUguHZhv2SfGxofr7zPOgjFwIXFQku6l2JDJSBArkM14DYXaya1M1UI7pQXOJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iKzz+ify11uNZNpdD29UHALpIlObNXLwRbYOQGoXRjg=; b=iGxKYxt4iDuRQi8I7QsEg8IDssGlaX6zSpVB0QCySCVk+ab+rMeUdRP4Wa1tDL0u/Ecgy496wvQ+Q7VN7WG8xh5XPoLAK2A3F+I1YKLzcVmpQpejr3rtdoS7K1Y26aKEibPCYTwzv7xfFH0cSkjJtHqM/PhZI72PDLPYmyH0qeT5vYGKBDhsvRZt3iEVZ1XVh0AzJNleWZSvpSWlT4ByFN8ne458AufFDjDl1jkO2YGBXXcD1NxIGk8f+jqucxtOpU8r9W9kATtHa8s6RcEjNKsK4hVUpunJ3YSOh2GJjr7W94oggOTObAaDsmaaSQZh6r0ZlGvjE9hXW5PNhq3ZRQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MW5PR11MB5908.namprd11.prod.outlook.com (2603:10b6:303:194::10) by MN0PR11MB6278.namprd11.prod.outlook.com (2603:10b6:208:3c2::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7544.55; Mon, 13 May 2024 13:36:28 +0000 Received: from MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::aaa8:bc22:5fb0:5ed0]) by MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::aaa8:bc22:5fb0:5ed0%5]) with mapi id 15.20.7544.052; Mon, 13 May 2024 13:36:28 +0000 From: "Li, Pan2" To: Tamar Christina , "gcc-patches@gcc.gnu.org" CC: "juzhe.zhong@rivai.ai" , "kito.cheng@gmail.com" , "richard.guenther@gmail.com" , "Liu, Hongtao" Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Thread-Topic: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Thread-Index: AQHan8Rv3d0VuIuNKUCXPobkNkl64LGU6wCAgAAzopA= Date: Mon, 13 May 2024 13:36:28 +0000 Message-ID: References: <20240406120755.2692291-1-pan2.li@intel.com> <20240506144805.725379-1-pan2.li@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MW5PR11MB5908:EE_|MN0PR11MB6278:EE_ x-ms-office365-filtering-correlation-id: cc79eba5-6cba-4e3b-1897-08dc7351b145 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0;ARA:13230031|366007|1800799015|376005|38070700009; x-microsoft-antispam-message-info: =?us-ascii?Q?IOKJzz2opNi5oNG4Ug/kZAgyCFczT31cs5J2sBbLnANAiqZ2FgUrAwdf+IxG?= =?us-ascii?Q?gzG90Cv4kXMLLnUTEge2MX+7Q7BYWRLLf/XFKobV20hqsx5B4rxs1Ac4Nu6H?= =?us-ascii?Q?WgcFUtaj862kGveiO+MJvnT45ImURyGXFLyBg0piXRwfBjah88tGX2IucFWI?= =?us-ascii?Q?mubDIbiAx1CuBSaZie30wEk3aalRq0ZbgcRnu5u9tAfc8TmMXnVS2cLeRHWl?= =?us-ascii?Q?TFGjMfgHbM58FGJ2M/T2zcjbQoCFQAZ5B/r98EjRV6vZHS6V/NjivzDrdous?= =?us-ascii?Q?n6gdEdI5Q5B4EsF8Wfr/4LB/bJMjCa3ZtZl7nYUIa1/uzw0T0DFGJKpTBam+?= =?us-ascii?Q?YJgdE9l9vqusQYrbs+Y24BvEwLLk0b+R/56Cfxwe8IWWJVIJqAuskZhlmXXW?= =?us-ascii?Q?S6+jBSC4JOgs3fHBa8YahUzFdmHDB4FeGG/2Ej3IJ6FyfxaZfn+LI/NR6Cup?= =?us-ascii?Q?xA3cBvS9ucX14P3SqjsZ+iPceoXwjfjqzvl4JWIiKj9/HWnKveQ86fgL91pV?= =?us-ascii?Q?F94q+InfuUJB7ot7rG9d8XJEZ7s/BpE9FNX6YPOHUW9NjJm8OAzWYm5cGT9Q?= =?us-ascii?Q?qV6CRnGLDnkMs4vVHjQ45A3vDqozkXjqYTwr8bPlDL5B33ANPDgxaN8VWzlQ?= =?us-ascii?Q?uCfgDLAa5oZYKOKqyxas5WcGZsS+ZYVfSIh/tchWwpCK7i2hpZKI4arxMko7?= =?us-ascii?Q?VLuh+lAFZ4kbbjiB3Yb1pxaCdBKSGPFwzJu3JLdGNJlRkrxgJ/eJE3+Lw+WR?= =?us-ascii?Q?6K8IHvRSMGk8SCQ3oE/HyKytInrtKxzUSRrPCiTOFBt/WhQt6qsMm26b1/G1?= =?us-ascii?Q?oQHprGjoUU5beiorUoHo1v2Ix5BHGXfGbVJNF6GDn4WvP2l0WooPFIQcw31v?= =?us-ascii?Q?c0B0lNwSFyAaiJjCXzpZhfuGIpxQbCiSW1ONQHFnOpBI+IaGTY12Ck9UX0He?= =?us-ascii?Q?vbDWeq1+4odYZHKTbbIeXW9aKDjH+RXoghjDZ0kr28FgAvHchxoQ8zapQ4K4?= =?us-ascii?Q?gky18Wp2o5tF+xhTxOpYGf4qtFYe+gdWdd8mpP6QEKW95zhbwfF0asuQqPj7?= =?us-ascii?Q?wGGQZrR59viniSbBU+RgBE+qfrnhqD5esxpaX86d3KwarSMLJtUNvabmgJNb?= =?us-ascii?Q?tIrr77xq83MIua7nPWqqfjx+ncIQo9dnq92tW8feshwWE4whTi4ldXt7skLu?= =?us-ascii?Q?cRXqUmUUzKNrZ0NOjPTuDH+srLBxGHzEqgveK8uw0ean6FaxP5iH2D2wvoKy?= =?us-ascii?Q?q/BrVYJtVa6GQFVGj+A3mIHAuJZnzWnrWgbdwGsOmIQNa9t6IFglC8aTFIjX?= =?us-ascii?Q?sNjf7A2nXlh/SlsrIEtu6+fvljHdisB1Y5bTJKG8Pqyz+A=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW5PR11MB5908.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366007)(1800799015)(376005)(38070700009);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?rv/KrxuUj09ov6lqGokxwIAGr9ZYkmQ5h5OUW9HomtHfNkF8pKJxgwE7g2n1?= =?us-ascii?Q?GSc7Y6djYwd++jQmzddOfeHXZB1HbPAKpkMVph3oiGnuGkvlIlHN4WD6ZL1C?= =?us-ascii?Q?sYdKzD0SoRnbH+civTZXfbWh0PxsoyRX0VmS7TeT5o4R3QTm5w21gSynC61c?= =?us-ascii?Q?ZmZLtLRdmv3B4/T2mKSjE/QqDllVccfvdWHYj8q+BBpD66BgX1URHyoRSL2m?= =?us-ascii?Q?KNpleh7NkKOctMC4kckV7cQQWCIzrVhLPLW1cstOgYJs097DrMoA++QqNVZe?= =?us-ascii?Q?sT6shca57u4NEpxrM4M+D9zRn3yUjOzXxhaL+uSOJr3DJY8qrslgvprPxuBH?= =?us-ascii?Q?ZVpwvSqH4EO7zu/h4gk70AmFzj14Vk6RA4nzJTavJgUXgnen7hbRFDxG0oLt?= =?us-ascii?Q?5kKxMS7jcKfI7lFTEwpf0r+q88DhTa5tmyhTUevgsLIHpiH4/8gLzqrUdLuZ?= =?us-ascii?Q?sQKeZK/wSDxlJfWHbtCQtYd30StQ7tLvNsz62K1Q/v2Pj+Gnu5+pJSdunXK9?= =?us-ascii?Q?cys/0pfIUDKYLLe+m4fGX+0AjwWKdTvLZiONblf7KfKxNEaWMX3PLVE9xpOi?= =?us-ascii?Q?gwkk1q5TKOPbniIc7Ju9f2WHmzkVMEP3WS3HCwjALiMP5dq70RX5J+i/RigI?= =?us-ascii?Q?ribDN6WwCKToDkE1ITjFPvokBwy5rT2ZXkrgR60419sBFUVTsAYek+OlihVe?= =?us-ascii?Q?+CGT1lC6CI5kenrIY/21NeQ2qniJOPOstLWQZqVk1es/iYw7qJcVFP42k+Hz?= =?us-ascii?Q?643B2fLFyRU5Rcrbb7VCsCnRl62mKx9cpNfbX5JInrV0qSwdOdgseyYZuFhc?= =?us-ascii?Q?yGEJ1ichfdywkfqSa/5bQEYVIPEl8RVU0QNYVH7JR1H5cWERdMtZFlnLIeWj?= =?us-ascii?Q?i1u4nGN84rPYkcQZD6JR+OfhtntiqWrZCgyY4F6nX5JSsAxrL4DVk7bCzKLW?= =?us-ascii?Q?rKgN+E8yz7av1WjU2hcQP6Vz+jj++tdyzAn1oGQNHzFFA3v51qQ0TfpMK326?= =?us-ascii?Q?C2Cm9w3yUhGcmEmFeV+yzCC1ugP3b4mdNuYM4+9La51RQgEf+3g95cpvYkf3?= =?us-ascii?Q?gPfHzZQ+ITMZVlS29iUBakF2E6IlSpeluIg8AoupFvn2CGzxKBF0MAuqaHLl?= =?us-ascii?Q?b5gjx7pfn4XNxnbvTduwljJmbJGSFsiAYVAMoKpUNQBe0ualDBQmqUm2JbML?= =?us-ascii?Q?5DPNDtq29KcCekzPejwUAgTVQbluVGBoRy5fwOC+W/YwcEihcRN918AN200B?= =?us-ascii?Q?lE6lAFEFUpLGuEIyoRx355YMzUO9JL8zdB8iUS3vt5lNDLcs+T0v3VtBT+E9?= =?us-ascii?Q?cpOIAD9ZQTnRig6PYQYS9Z/xoLj38ACLdP7pzhdKULE6ydnAJah5JfW9+i2p?= =?us-ascii?Q?VAMCiY7YcfYG0/PD9RUgFxD9ueCfKkBindX7utpz5Nkm/PwwD9tivuWzatFc?= =?us-ascii?Q?nFlSeIxUBVHP26sz1hes52znnwU/5socPMB58AZH/X33QUy7bIAykPhV8nOP?= =?us-ascii?Q?rfuV8vgyAxU+v8YM6bsMrTYEivwh1DJ5+7uKnHxU2wNUhB5Qt8JnEvLT6nNn?= =?us-ascii?Q?3coJ+TkRgIpj9dOlXag=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MW5PR11MB5908.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: cc79eba5-6cba-4e3b-1897-08dc7351b145 X-MS-Exchange-CrossTenant-originalarrivaltime: 13 May 2024 13:36:28.3232 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: XZbQ9asIU2oUbhAFqAaEWV+2b8bo4oiR+Hi4nbJTgB1eZv6r7ny8R6APQRDCrsBJA24OBCDcbZFONEKVd2aEhA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR11MB6278 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thanks Tamer for comments. > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when o= ptimizing for size. Sure thing, let me update it in v5. > Hmm why do you iterate independently over the statements? The block below= already visits > Every statement doesn't it? Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as = the shape changed, or shall we put it to the previous pass ? > The root of your match is a BIT_IOR_EXPR expression, so I think you just = need to change the entry below to: > > case BIT_IOR_EXPR: > match_saturation_arith (&gsi, stmt, m_cfg_changed_p); > /* fall-through */ > case BIT_XOR_EXPR: > match_uaddc_usubc (&gsi, stmt, code); > break; There are other shapes (not covered in this patch) of SAT_ADD like below br= anch version, the IOR should be one of the ROOT. Thus doesn't add case here. Then, shall we take case for each shape here ? Both works f= or me. #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x + y) >=3D x ? (x + y) : -1; \ } SAT_ADD_U_1(uint32_t) Pan -----Original Message----- From: Tamar Christina =20 Sent: Monday, May 13, 2024 5:10 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;= Liu, Hongtao Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsign= ed scalar int Hi Pan, > -----Original Message----- > From: pan2.li@intel.com > Sent: Monday, May 6, 2024 3:48 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina > ; richard.guenther@gmail.com; > hongtao.liu@intel.com; Pan Li > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned= scalar > int >=20 > From: Pan Li >=20 > This patch would like to add the middle-end presentation for the > saturation add. Aka set the result of add to the max when overflow. > It will take the pattern similar as below. >=20 > SAT_ADD (x, y) =3D> (x + y) | (-(TYPE)((TYPE)(x + y) < x)) >=20 > Take uint8_t as example, we will have: >=20 > * SAT_ADD (1, 254) =3D> 255. > * SAT_ADD (1, 255) =3D> 255. > * SAT_ADD (2, 255) =3D> 255. > * SAT_ADD (255, 255) =3D> 255. >=20 > Given below example for the unsigned scalar integer uint64_t: >=20 > uint64_t sat_add_u64 (uint64_t x, uint64_t y) > { > return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); > } >=20 > Before this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > long unsigned int _1; > _Bool _2; > long unsigned int _3; > long unsigned int _4; > uint64_t _7; > long unsigned int _10; > __complex__ long unsigned int _11; >=20 > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _11 =3D .ADD_OVERFLOW (x_5(D), y_6(D)); > _1 =3D REALPART_EXPR <_11>; > _10 =3D IMAGPART_EXPR <_11>; > _2 =3D _10 !=3D 0; > _3 =3D (long unsigned int) _2; > _4 =3D -_3; > _7 =3D _1 | _4; > return _7; > ;; succ: EXIT >=20 > } >=20 > After this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > uint64_t _7; >=20 > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _7 =3D .SAT_ADD (x_5(D), y_6(D)); [tail call] > return _7; > ;; succ: EXIT > } >=20 > We perform the tranform during widen_mult because that the sub-expr of > SAT_ADD will be optimized to .ADD_OVERFLOW. We need to try the .SAT_ADD > pattern first and then .ADD_OVERFLOW, or we may never catch the pattern > .SAT_ADD. Meanwhile, the isel pass is after widen_mult and then we > cannot perform the .SAT_ADD pattern match as the sub-expr will be > optmized to .ADD_OVERFLOW first. >=20 > The below tests are passed for this patch: > 1. The riscv fully regression tests. > 2. The aarch64 fully regression tests. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. >=20 > PR target/51492 > PR target/112600 >=20 > gcc/ChangeLog: >=20 > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD > to the return true switch case(s). > * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. > * match.pd: Add unsigned SAT_ADD match. > * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. > * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern > func decl generated in match.pd match. > (match_saturation_arith): New func impl to match the saturation arith. > (math_opts_dom_walker::after_dom_children): Try match saturation > arith. >=20 > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc | 1 + > gcc/internal-fn.def | 2 ++ > gcc/match.pd | 28 ++++++++++++++++++++++++ > gcc/optabs.def | 4 ++-- > gcc/tree-ssa-math-opts.cc | 46 > +++++++++++++++++++++++++++++++++++++++ > 5 files changed, 79 insertions(+), 2 deletions(-) >=20 > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 0a7053c2286..73045ca8c8c 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn) > case IFN_UBSAN_CHECK_MUL: > case IFN_ADD_OVERFLOW: > case IFN_MUL_OVERFLOW: > + case IFN_SAT_ADD: > case IFN_VEC_WIDEN_PLUS: > case IFN_VEC_WIDEN_PLUS_LO: > case IFN_VEC_WIDEN_PLUS_HI: > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index 848bb9dbff3..25badbb86e5 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST > | ECF_NOTHROW, first, > DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, > first, > smulhrs, umulhrs, binary) >=20 > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, > binary) > + > DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary) > DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary) > DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary) > diff --git a/gcc/match.pd b/gcc/match.pd > index d401e7503e6..7058e4cbe29 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > || POINTER_TYPE_P (itype)) > && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) >=20 > +/* Unsigned Saturation Add */ > +(match (usadd_left_part @0 @1) > + (plus:c @0 @1) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@1))))) > + > +(match (usadd_right_part @0 @1) > + (negate (convert (lt (plus:c @0 @1) @0))) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@1))))) > + > +(match (usadd_right_part @0 @1) > + (negate (convert (gt @0 (plus:c @0 @1)))) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@1))))) > + > +/* Unsigned saturation add, case 1 (branchless): > + SAT_U_ADD =3D (X + Y) | - ((X + Y) < X) or > + SAT_U_ADD =3D (X + Y) | - (X > (X + Y)). */ > +(match (unsigned_integer_sat_add @0 @1) > + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1))) > + > /* x > y && x !=3D XXX_MIN --> x > y > x > y && x =3D=3D XXX_MIN --> false . */ > (for eqne (eq ne) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index ad14f9328b9..3f2cb46aff8 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3") > OPTAB_NX(add_optab, "add$Q$a3") > OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc) > OPTAB_VX(addv_optab, "add$F$a3") > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', > gen_signed_fixed_libfunc) > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', > gen_unsigned_fixed_libfunc) > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', > gen_signed_fixed_libfunc) > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', > gen_unsigned_fixed_libfunc) > OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libf= unc) > OPTAB_NX(sub_optab, "sub$F$a3") > OPTAB_NX(sub_optab, "sub$Q$a3") > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index 705f4a4695a..35a46edc9f6 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple > *cast_stmt, gimple *&use_stmt, > return 0; > } >=20 > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)= ); > + > +/* > + * Try to match saturation arith pattern(s). > + * 1. SAT_ADD (unsigned) > + * _7 =3D _4 + _6; > + * _8 =3D _4 > _7; > + * _9 =3D (long unsigned int) _8; > + * _10 =3D -_9; > + * _12 =3D _7 | _10; > + * =3D> > + * _12 =3D .SAT_ADD (_4, _6); */ > +static bool > +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt, > + bool *cfg_changed_p) > +{ > + gcall *call =3D NULL; > + bool changed_p =3D false; > + > + gcc_assert (is_gimple_assign (stmt)); > + > + tree ops[2]; > + tree lhs =3D gimple_assign_lhs (stmt); > + > + if (gimple_unsigned_integer_sat_add (lhs, ops, NULL) > + && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs), > + OPTIMIZE_FOR_SPEED)) I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when opt= imizing for size. > + { > + call =3D gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1= ]); > + gimple_call_set_lhs (call, lhs); > + gsi_replace (gsi, call, true); > + changed_p =3D true; > + *cfg_changed_p =3D changed_p; > + } > + > + return changed_p; > +} > + > /* Recognize for unsigned x > x =3D y - z; > if (x > y) > @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children > (basic_block bb) >=20 > fma_deferring_state fma_state (param_avoid_fma_max_bits > 0); >=20 > + for (gsi =3D gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple *stmt =3D gsi_stmt (gsi); > + > + if (is_gimple_assign (stmt)) > + match_saturation_arith (&gsi, stmt, m_cfg_changed_p); > + } > + Hmm why do you iterate independently over the statements? The block below a= lready visits Every statement doesn't it? The root of your match is a BIT_IOR_EXPR expression, so I think you just ne= ed to change the entry below to: case BIT_IOR_EXPR: match_saturation_arith (&gsi, stmt, m_cfg_changed_p); /* fall-through */ case BIT_XOR_EXPR: match_uaddc_usubc (&gsi, stmt, code); break; Patch is looking good! Thanks again for working on this. Regards, Tamar > for (gsi =3D gsi_after_labels (bb); !gsi_end_p (gsi);) > { > gimple *stmt =3D gsi_stmt (gsi); > -- > 2.34.1