From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2104.outbound.protection.outlook.com [40.107.236.104]) by sourceware.org (Postfix) with ESMTPS id 0EDA23855012 for ; Wed, 16 Jun 2021 18:49:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0EDA23855012 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Jj/nRaD97pY8cIkBLCt/w6UwslXU2/cRIdpWDKT8IQZ0usbQ55eE+ux3GsswITalanQuOUeLZkCNfW5wjzLThTHSOd3pO2lTQ4oaon3Y1H/k1T2ffyPE3kQ6YIwkNwXSGBFctdVUraQEbwCmq2l/1y3OZkdIqIa2ICLoZD5xNYZQ07onCGS8KVJS6PF5scrMYfbSK5GyW9F6JLelVnXanvgzy8tao9eHEedjs+RyguTazp8Kgu2hewUejO4V400ynt6ZbvOz3POixV4XyacZOKEaowfgPLGmzeusvgER2wrxLsOcyOIULW/1uO/aQTovWbj6ZukRvw5YXu2/3OhOpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Wk3SDIBg0twtUrmd9osSXtFFiXEQbaJg/qCW/sz+Eag=; b=fhVyrBhLwBTMjshoRJWnWR0znbKNJyrb4iTq0EscSEc12gZT2Q9mg4DsDBt5DnHaQyp6oqWLkcHYiKTlXI3ckaCdqhDuo/mvizaA7ReHNKZKD2lVOOt6L0B/nwVIhqppxvXA+e3WSUSx5crvAFojljEaJ5bl9ctQZNBT5xuH2odZZT+hPbMfkwnRPfwB/1uMPQyANuuRvHDqkcvDBovBca4uqaAwQlpoyB9JgtpFo344jV4k0XPQx+BJaAbceI6fMeLbWcs/N3W3iext1R2Jrbju4OvpftheIaT4xrKq/4g2LKgTO19WxMBQJdElpI27ru9o+xi0QBQLmtltpQZRnQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=microsoft.com; dmarc=pass action=none header.from=microsoft.com; dkim=pass header.d=microsoft.com; arc=none Received: from MWHPR2101MB0811.namprd21.prod.outlook.com (2603:10b6:301:7b::39) by MW4PR21MB2027.namprd21.prod.outlook.com (2603:10b6:303:11e::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4264.7; Wed, 16 Jun 2021 18:49:46 +0000 Received: from MWHPR2101MB0811.namprd21.prod.outlook.com ([fe80::4110:8b6d:83ba:8df7]) by MWHPR2101MB0811.namprd21.prod.outlook.com ([fe80::4110:8b6d:83ba:8df7%7]) with mapi id 15.20.4264.004; Wed, 16 Jun 2021 18:49:46 +0000 From: Victor Tong To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] Thread-Topic: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] Thread-Index: AQHXJoDrR4fklL4IYku64pH3Fv0p3qrIMcqAgDljMP+ABwtigIAO0uy/ Date: Wed, 16 Jun 2021 18:49:46 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Enabled=True; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SiteId=72f988bf-86f1-41af-91ab-2d7cd011db47; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SetDate=2021-06-16T18:49:46.817Z; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Name=General; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_ContentBits=0; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 77ab6aba-b814-45ad-4075-08d930f78341 x-ms-traffictypediagnostic: MW4PR21MB2027: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: rgfYrQsQ70xu1PoOCsOMAwXzK+YJNX9qOxk8gnnmg0i/eihIzedap3sws4EW3HPrTvnpKFzQSK7MuriTzaT5ZjKNLPNaL7DMpE3f2+xS/m3UDeXu/KkeYh70d2cPORm+gk0h+VCpXZ8rM/Ojtxm6MVtH6IQpXMb/Nw+Fu87MVwLLfEOFm+nKS/nC9TrfG66LVwFoBGjjvlCo+xdMpOLVYxvPJv2rWiCTmQ9589T0mYrZnAhVBtjrc6mOZ7vj6p1T708f27BWFeFpvvfUWDo8INzNbB8tuuUAIbiXN6uWXZQMbzWgZpcaVUSM+Orsn7qZiCLbxg5RfdXefAxytb6VkebYrLXimJhT74NueGVnuod8LeR1cdSZOQq1UgHCcXqpyr6qOCtEJK4w4tonzXBACThwPz3wLqAUOjj5qcjixTZQyIcGj5w5JK9ap3GkHTJh2dxI9V0++gsixgFX1u3A6KkrEEEpy+hSDQfQ19i5cVfW+pAKEbxVTzDHnbSyqroUEgtGZd70pvkP/Mu4bOI2j4uBJ7SDO+WdWtfkXxeIvCLz4G0zJORNtd4P3l+9pO6gF68AZ5hn8ino5IEZdDEICn/PkM7zo++VGGMc8MpMneMfyxriWMHp+DLRBEiQeNfCShu6nmjIMr09V4Z1x0ES2A== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MWHPR2101MB0811.namprd21.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(8936002)(9686003)(5660300002)(86362001)(55016002)(2906002)(186003)(76116006)(64756008)(66446008)(66476007)(66556008)(71200400001)(66946007)(52536014)(26005)(4326008)(316002)(8676002)(6916009)(33656002)(7696005)(38100700002)(83380400001)(478600001)(53546011)(6506007)(122000001)(82950400001)(82960400001)(8990500004)(10290500003); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?1bGd+h/chEKYYoY0XRoiDPZG+dCbwy7+uekMfyYmDgdeXUaV/l/sCfJAzD?= =?iso-8859-1?Q?MA7TwDt90/ffNrGAiYnTkLxv0xCegxG8p2FIGbkaJQuRMswwfJnqBpEks6?= =?iso-8859-1?Q?KD607QcLLjlfsoRRDXUiDe89N9xdx8yM8eiSNoR9BQQ+HioauIMPspNmsV?= =?iso-8859-1?Q?/cW8IaFuunvDgIX15UgdfsHfpF1v2BooUrMyeRi8gpfd8GtuAbtHPU/uGL?= =?iso-8859-1?Q?U547aB2BrptdNw031wFStMWwYlUUxu4a+zkJ0DKmS1+o17b2uWsJ8KwUKq?= =?iso-8859-1?Q?WUlSK39Sw5Q7A9cOG/YWaMwGuC4Y4lu+PlBfgx5BtH3wZMZsNy9OGDJ7mx?= =?iso-8859-1?Q?jMNLXeBZh60b5O4c9IOxVqWFIp5lnuf1kZeHXbGx/Zf9YTCWhuO7W8ZIDf?= =?iso-8859-1?Q?A49KwoEliFzKT7MuBRc+eeCSSZ+P4c5HNluL5FMYn8M8UUYbqOFU4IBz48?= =?iso-8859-1?Q?AypN7h+k6t961C9DRz6iR6RS6sONv8ZsnTIGrhPW+ppENqchhccBBX+cSn?= =?iso-8859-1?Q?uZyUFVBCPWpej5HnNBBE9CgU2niWZcI3Nzp/uGuU7XhH3IwbGvc9Ic0cN4?= =?iso-8859-1?Q?VdVLcmqxViq7Oo+fIfF+y4mCQhy90cESvb7KSuYF5h9EEn7beze5Zs0SSb?= =?iso-8859-1?Q?QV5lRaUOWUpPDzbYPLl7hcMIUqCLnt2qTtgBXUcFmrbhY/j8SllxAfXpRb?= =?iso-8859-1?Q?vutH7kghlLqROT79Ft05vNRpfhqIVyFCBd8ZB2uBeXmUR5C6yF55NRRImy?= =?iso-8859-1?Q?uDuZWxfjdJCpOxMpLzui55GV45uCY0BWrrgvRnqa0dpPWpsMhzC5a372RD?= =?iso-8859-1?Q?13/nWSxGOLI1x5gAukLior5NxXfroko3r5n/oTFB6jWW8guWfKQ7/Q2K6R?= =?iso-8859-1?Q?VWOwWtiCkCIZ0aMMZZQl2/iMg0DGXqSAMp72+KvdMKHl945HsCy+sOLP4X?= =?iso-8859-1?Q?SUNsiYjCoViA6zq8NTg3+mq54XnlFWxtd+4NgZ/rTFbVyQaHMR/g67btQw?= =?iso-8859-1?Q?pFLA110HeOHth8LLeMfrTGq1M3npF05+yI+kPxZE75pJ7GHWZy4dt284uc?= =?iso-8859-1?Q?8tP1WBPVKwhDQcXWRhnb2f7y8fzD4pyHPl/HIiWFGlCpd+Dhq6C/9Y35ms?= =?iso-8859-1?Q?fCxiKt/IsRvlrJzvzJMfXw0VqKmmcxdzzJbstHqtt8eKoU7QQ8y26ZMPOe?= =?iso-8859-1?Q?EUQ9/Cz29HWtzH8XwZ9GKYcgBajjkqNBE51DAqg52e7Th4RfVXNyQFnmFq?= =?iso-8859-1?Q?qQiQ7/KeZvwbQGwgT41QjVYmhQsZQBVGcmHPnUen3xUMM2svFjIWkudl3e?= =?iso-8859-1?Q?YAO3NFgS/qcxp43PO/jATuWe1u69TciXUjD2l2veohRiJHw=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MWHPR2101MB0811.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 77ab6aba-b814-45ad-4075-08d930f78341 X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2021 18:49:46.5193 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: J0VTsyTaecmqxoqPNO/lu6p0sypJofvOZiWLwafh7HR+1xd9iZL1ianu+EGXo7uqMvvfOcBp7JfInNz4Iqf9NA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR21MB2027 X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2021 18:49:51 -0000 Hi Richard,=0A= =0A= Thanks for the feedback. From what you said, I can think of two possible so= lutions (though I'm not sure if either is feasible/fully correct):=0A= =0A= Option 1: Have the new X * (Y / X) --> Y - (Y % X) optimization only run in= scenarios that don't interfere with the existing X - (X / Y) * Y --> X % Y= optimization. =0A= =0A= This would involve checking the expression one level up to see if there's a= subtraction that would trigger the existing optimization. I looked through= the match.pd file and couldn't find a bail condition like this. It doesn't= seem like there's a link from an expression to its parent expression one l= evel up. This also feels a bit counter-intuitive since it would be doing th= e opposite of the bottom-up expression matching where the compiler would li= ke to match a larger expression rather than a smaller one.=0A= =0A= Option 2: Add a new pattern to support scenarios that the existing nop_conv= ert pattern bails out on.=0A= =0A= Existing pattern:=0A= =0A= (simplify=0A= (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1))= )=0A= (view_convert @1))=0A= =0A= New pattern to add:=0A= =0A= /* X - (X - Y) --> Y */=0A= (simplify=0A= (minus @0 (convert? (minus @@0 @1)))=0A= (if (INTEGRAL_TYPE_P (type) =0A= && TYPE_OVERFLOW_UNDEFINED(type)=0A= && INTEGRAL_TYPE_P (TREE_TYPE(@1))=0A= && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))=0A= && !TYPE_UNSIGNED (TREE_TYPE (@1))=0A= && !TYPE_UNSIGNED (type)=0A= && TYPE_PRECISION (TREE_TYPE (@1)) <=3D TYPE_PRECISION (type))=0A= (convert @1)))=0A= =0A= I think the truncation concerns that you brought up should be covered if th= e external expression type precision is greater than or equal to the intern= al expression type. There may be a sign extension operation (which is why t= he nop_convert check fails) but that shouldn't affect the value of the expr= ession. And if the types involved are signed integers where overflow/underf= low results in undefined behavior, the X - (X - Y) --> Y optimization shoul= d be legal.=0A= =0A= Please correct me if I'm wrong with either one of these options, or if you = can think of a better option to fix the regression.=0A= =0A= Thanks,=0A= Victor=0A= =0A= =0A= =0A= =0A= From: Richard Biener =0A= Sent: Monday, June 7, 2021 1:25 AM=0A= To: Victor Tong =0A= Cc: gcc-patches@gcc.gnu.org =0A= Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division fo= llowed by multiply [PR95176] =0A= =A0=0A= On Wed, Jun 2, 2021 at 10:55 PM Victor Tong wrote:= =0A= >=0A= > Hi Richard,=0A= >=0A= > Thanks for reviewing my patch. I did a search online and you're right -- = there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y = - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggerin= g on vector types.=0A= >=0A= > I looked into why the following pattern isn't triggering:=0A= >=0A= >=A0=A0 (simplify=0A= >=A0=A0=A0 (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))=0A= >=A0=A0=A0 (view_convert @1))=0A= >=0A= > The nop_converts expand into tree_nop_conversion_p checks. In fn2() of th= e testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching l= ooks like:=0A= >=0A= > 42 - (long int) (42 - 42 % x)=0A= >=0A= > When looking at the right-hand side of the expression (the (long int) (42= - 42 % x)), the tree_nop_conversion_p check fails because of the type prec= ision difference. The expression inside of the cast has a 32-bit precision = and the outer expression has a 64-bit precision.=0A= >=0A= > I looked around at other patterns and it seems like nop_convert and view_= convert are used because of underflow/overflow concerns. I'm not familiar w= ith the two constructs. What's the difference between using them and checki= ng TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UNDE= FINED is true, the second pattern that I added (X - (X - Y) --> Y) gets tri= ggered.=0A= =0A= But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the=0A= conversion is the problematic one and=0A= conversions have implementation defined behavior.=A0 Now, the above does=0A= not match because it wasn't designed to,=0A= and for non-constant '42' it would have needed a (convert ...) around=0A= the first @0 as well (matching of constants is=0A= by value, not by value + type).=0A= =0A= That said, your=0A= =0A= +/* X - (X - Y) --> Y */=0A= +(simplify=0A= + (minus (convert1? @0) (convert2? (minus @@0 @1)))=0A= + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&=0A= TYPE_OVERFLOW_UNDEFINED(type))=0A= +=A0 (convert @1)))=0A= =0A= would match (int)x - (int)(x - y) where you assert the outer subtract=0A= has undefined behavior=0A= on overflow but the inner subtract could wrap and the (int) conversion=0A= can be truncating=0A= or widening.=A0 Is that really always a valid transform then?=0A= =0A= Richard.=0A= =0A= > Thanks,=0A= > Victor=0A= >=0A= >=0A= > From: Richard Biener =0A= > Sent: Tuesday, April 27, 2021 1:29 AM=0A= > To: Victor Tong =0A= > Cc: gcc-patches@gcc.gnu.org =0A= > Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division foll= owed by multiply [PR95176]=0A= >=0A= > On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches=0A= > wrote:=0A= > >=0A= > > Hello,=0A= > >=0A= > > This patch fixes PR tree-optimization/95176. A new pattern in match.pd = was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was= also added to cover this scenario.=0A= > >=0A= > > The new pattern interfered with the existing pattern of "X - (X / Y) * = Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c)= , the new pattern is applied causing the existing pattern to no longer appl= y. This results in worse code generation because the expression is left as = "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was= added to this patch to avoid this regression.=0A= > >=0A= > > I also didn't remove the existing pattern because it triggered in more = cases than the new pattern because of a tree_invariant_p check that's inser= ted by genmatch for the new pattern.=0A= >=0A= > Yes, we do not handle using Y multiple times when it might contain=0A= > side-effects in GENERIC folding=0A= > (comments in genmatch suggest we can use save_expr but we don't=0A= > implement this [anymore]).=0A= >=0A= > On GIMPLE there's also the issue that your new pattern creates a=0A= > complex expression which=0A= > makes it failed to be used by value-numbering for example where the=0A= > old pattern was OK=0A= > (eventually, if no conversion was required).=0A= >=0A= > So indeed it looks OK to preserve both.=0A= >=0A= > I wonder why you needed the=0A= >=0A= > +/* X - (X - Y) --> Y */=0A= > +(simplify=0A= > + (minus (convert1? @0) (convert2? (minus @@0 @1)))=0A= > + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&=0A= > TYPE_OVERFLOW_UNDEFINED(type))=0A= > +=A0 (convert @1)))=0A= >=0A= > pattern since it should be handled by=0A= >=0A= >=A0=A0 /* Match patterns that allow contracting a plus-minus pair=0A= >=A0=A0=A0=A0=A0 irrespective of overflow issues.=A0 */=0A= >=A0=A0 /* (A +- B) - A=A0=A0=A0=A0=A0=A0 ->=A0 +- B */=0A= >=A0=A0 /* (A +- B) -+ B=A0=A0=A0=A0=A0 ->=A0 A */=0A= >=A0=A0 /* A - (A +- B)=A0=A0=A0=A0=A0=A0 -> -+ B */=0A= >=A0=A0 /* A +- (B -+ A)=A0=A0=A0=A0=A0 ->=A0 +- B */=0A= >=0A= > in particular=0A= >=0A= >=A0=A0 (simplify=0A= >=A0=A0=A0 (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))=0A= >=A0=A0=A0 (view_convert @1))=0A= >=0A= > if there's supported cases missing I'd rather extend this pattern than=0A= > replicating it.=0A= >=0A= > +/* X * (Y / X) is the same as Y - (Y % X).=A0 */=0A= > +(simplify=0A= > + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))=0A= > + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))=0A= > +=A0 (minus (convert @1) (convert (trunc_mod @1 @0)))))=0A= >=0A= > note that if you're allowing vector types you have to use=0A= > (view_convert ...) in the=0A= > transform and you also need to make sure that the target can expand=0A= > the modulo - I suspect that's an issue with the existing pattern as well.= =0A= > I don't know of any vector ISA that supports modulo (or integer=0A= > division, that is).=0A= > Restricting the patterns to integer types is probably the most=0A= > sensible solution.=0A= >=0A= > Thanks,=0A= > Richard.=0A= >=0A= > > I verified that all "make -k check" tests pass when targeting x86_64-pc= -linux-gnu.=0A= > >=0A= > > 2021-03-31=A0 Victor Tong=A0 =0A= > >=0A= > > gcc/ChangeLog:=0A= > >=0A= > >=A0=A0=A0=A0=A0=A0=A0=A0 * match.pd: Two new patterns: One to optimize d= ivision followed by multiply and the other to avoid a regression as explain= ed above=0A= > >=0A= > > gcc/testsuite/ChangeLog:=0A= > >=0A= > >=A0=A0=A0=A0=A0=A0=A0=A0 * gcc.dg/tree-ssa/20030807-10.c: Update existin= g test to look for a subtraction because a shift is no longer emitted=0A= > >=A0=A0=A0=A0=A0=A0=A0=A0 * gcc.dg/pr95176.c: New test to cover optimizin= g division followed by multiply=0A= > >=0A= > > I don't have write access to the GCC repo but I've completed the FSF pa= perwork as I plan to make more contributions in the future. I'm looking for= a sponsorship from an existing GCC maintainer before applying for write ac= cess.=0A= > >=0A= > > Thanks,=0A= > > Victor=