From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03on2137.outbound.protection.outlook.com [40.107.103.137]) by sourceware.org (Postfix) with ESMTPS id 270793857427 for ; Tue, 25 Oct 2022 13:00:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 270793857427 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=Syrmia.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Syrmia.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=B+oLIk1DJI+TyGrWzfvOmtrftNU+rkPWUThvYsGu6fVVrJxQrKayFW18Y0EepVJZSWiVK3KG+4gSMZVU+2WaxS5d7Xb8EMT7FMI0A7UvM67oQsqkdPzmUNQ55K/PfiSBHfQyx0KjarOQVYsc+JTM/3eulFoKqBgKvJBNaP8qYx+X4Qi5yXDrAJ/LtbDSq4HFwg6Cxry8tveilS/b++5Jhbub5SbwIgHzD82TTABglIDesZff0TMp7IjfE7g/j2DYmhTtdf4h3u6WwW1aP5+7wmyrh23blZsz8S4QDsE3GJk2x7+qV2wXqEMrk9oqsbK7/tnbgfhuLmCd114uFLkgKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DVuXVJbFLrAJlxI+t3vo4tFQDQW3a0vt5t9nYbaDrEM=; b=jmlWKrhpb0TKKiEqT2c2e+sJHczC84SXdA+Ln87KYGVduJi3rPh8WaX2quk37I2bOlEj+l1m7fkfnE1QAywUPchHT8EwOP6PnRYHiKLw7UgKueD5OAeUosqpoyTuM8AngrZGfnrFljhaxC3UVtnIle+ByxTJ8dpQ1rTQikyHRjS+r+2fmgRC2p+ygJGGv0xh0u89ovgnBRvlcgvK7LvGsyZEMasmXvKriydFmy4zj6JrUZWgqCawB4WaxR8srCMc4d2JT4OFJjH8CEFYPYAMNxkOcPC2omaICrPTmbWp8ry3+tOm3nKBwIsay5aqCc8hiT0+lR/OdxQCQ3s5ytaqag== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=syrmia.com; dmarc=pass action=none header.from=syrmia.com; dkim=pass header.d=syrmia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=syrmia.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DVuXVJbFLrAJlxI+t3vo4tFQDQW3a0vt5t9nYbaDrEM=; b=HeeVShbKzJkFHKHlT7OcsX6arHqCbI3dsbg6J1hNYPYgNt/jrFxbPlt0ZUMvY9+2L++FDgINLYI6wEPswqdsDu6FMV0X7Afbt8+d//dloTPy1bUw0Ym51p3zDB31EPxOBuaG78o0GddUwJ2Pv8xL8VAoW3UxoTYFOQEVFSZed/A= Received: from AM0PR03MB4882.eurprd03.prod.outlook.com (2603:10a6:208:fb::17) by AS8PR03MB8004.eurprd03.prod.outlook.com (2603:10a6:20b:42b::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5746.21; Tue, 25 Oct 2022 13:00:22 +0000 Received: from AM0PR03MB4882.eurprd03.prod.outlook.com ([fe80::7f26:4554:fc25:8412]) by AM0PR03MB4882.eurprd03.prod.outlook.com ([fe80::7f26:4554:fc25:8412%3]) with mapi id 15.20.5746.028; Tue, 25 Oct 2022 13:00:22 +0000 From: Dimitrije Milosevic To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , Djordje Todorovic Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. Thread-Topic: [PATCH 1/2] ivopts: Revert computation of address cost complexity. Thread-Index: AQHY5VSpCkpNZ9wRE0qr2B1vutRVJa4e+a6AgAACkkc= Date: Tue, 25 Oct 2022 13:00:22 +0000 Message-ID: References: <20221021135203.626255-1-dimitrije.milosevic@syrmia.com> <20221021135203.626255-2-dimitrije.milosevic@syrmia.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=Syrmia.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: AM0PR03MB4882:EE_|AS8PR03MB8004:EE_ x-ms-office365-filtering-correlation-id: ea3c3ae5-8859-4feb-4132-08dab688e064 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: HDomLRzVGj8wRsySibtflY4PcGDbyIX2UviCHM9KMcSGSu0j+XqhI0w82z9XrJbhE7bXKyfeBSdgb4LaQPa9/Dt8ZtrWBS8VUPkvZ0u4Ill04v3VcFN57m5w4I4q3+gYjaSw9lWIalQYY51OLCsd+b13CMHW42fEw/MgsHV3Z3FI4qJvDn4CQWSKR6hPm9Jsk0GSPsrlTldD805tQyWKtjFxOgVIjZtbtbkI1uwpFXKEMVHvqqkKy6+X/ZKXAYLPDt47/aBsOMxEq39A/1MInQ3afNQU9ZW5pVImQ76j8XlWLmEGBHBAZiMPwD9bfxk4SmkvIlFq0t658Bylb2wlcgPUj9eCkLLKiKRzBtnx+ZE3JKciHPnYyORxIqOVh1VINaXBuX/d8FBv/D9YxcK1AX7zEOqs4etK5pCHt1SsJBZN4wLP/v3jlJxUUhNCXyMGTPCUIQJO+JdcMt8Yi+da9hRgCGEX+7igX5RSII6nPYcYMFQSZBRpRC38pWOSkhE0nv7lXVR8ZCqfToToXhF6+bZAZDOv2mBVgf0mThTg6X4fwmyS4e5VKYHMU+FzUEP/IgS9EOE4Mjy5OWZ0reu5DUY8hfpJaAgECI+sw+OxIlJakCB4KPJbqxhbIdsZnxOEel3xDauPU5na96YB2rOXKaUYbc+qR/zsWQd7zc25cDCMZqXjZmbeNcBcvLHVgwEH9QTEMSh/8ADHYoKMUsIStN7Nc7R92mPVAGVZq7M/ILgRUMF7y/tSbZUktDHFfzZE x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AM0PR03MB4882.eurprd03.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(376002)(366004)(346002)(39830400003)(396003)(136003)(451199015)(76116006)(30864003)(8676002)(4326008)(66476007)(66556008)(66446008)(66946007)(64756008)(91956017)(52536014)(2906002)(41300700001)(55016003)(8936002)(316002)(6916009)(33656002)(54906003)(5660300002)(478600001)(83380400001)(38070700005)(122000001)(38100700002)(107886003)(26005)(6506007)(7696005)(53546011)(86362001)(66574015)(9686003)(186003)(71200400001);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-2?Q?gaZfLjGLA4OXvRna9HeI6vz3jXLlAkUJZcoOmA2Naxozr9pc+UunpaMwig?= =?iso-8859-2?Q?dV8kvQqminD0I2jnTEJ8pWO5bgS+FwuGObIuVU9IauQ1T1AYm8b2AV2wj5?= =?iso-8859-2?Q?XDaptorEc0/SIR08VLXzzk+vfHOhwH6/slJZ4O2q9S8R+raOCEhvIvBnyu?= =?iso-8859-2?Q?XcJWEZHmKT2rJGTQcbb/PTifG6KmVYlUzWyHU7Kn5g8MrSbIWCO8mnBZ+3?= =?iso-8859-2?Q?4jtNusgHgojDJi5cVZhxfxAb4m4yWV654/88cHZ/UswTURYpOsr8IO8owc?= =?iso-8859-2?Q?5IjGvKzKj4hRjCKuiTRte/3BAoBtAvF1xPjdAf4+97XaVOEXSDe5BzDfJt?= =?iso-8859-2?Q?KlD+PKYam/c6YSKhz1IL/k0f10e0jSeIBNoVr86+65Oy+XNXHVYjFQFNDV?= =?iso-8859-2?Q?QaVm1ZopW1Ngzx7Vv+2W9n8vb+gsGVmpwcL0Hlbakal0yguLaTFzaAebh2?= =?iso-8859-2?Q?sYyx5i/WigxPlLkT1QKPQR5IZOQWqhrt0uLjmWzrnOajtvbzwSJIexZ91A?= =?iso-8859-2?Q?qRhz66/HaM4lPMbJPI3X83j3G6BCVFfUOiNEZ9srdjsYIP4hhvurmzNs8T?= =?iso-8859-2?Q?YmxgbQiPazDEqu9a31mdI4hFw9Yq/LznBMkvS3LYiaDiZvTp+8TZrX54za?= =?iso-8859-2?Q?G6fcySlgNgOzHIZ4ql6vVQTiKSwttTRijZaWHxNY83XD/vL7ojXJPSNJks?= =?iso-8859-2?Q?gDSfCpTrjnbKu/MkpjtB0xuvtUaAM0Q50QF1tr+CUpGQfIraSM8pUxbkt5?= =?iso-8859-2?Q?pTXb0Lgq+WLSN/H7oxFs05fNdvVaqCuyUUy0mwxwiBZicWj0tG6GSasDYr?= =?iso-8859-2?Q?Mb7hzU5wqA8nQWRiBkzG1bAgAfcIcTpNiqOd4C+P0WXXCBvUM7hiNqAMF7?= =?iso-8859-2?Q?JoncWyE2Gr7W9/1gubrhv6AhliZdwdtoiVeqAyFZpOc0VvtzoSrRM+QMsS?= =?iso-8859-2?Q?MIo/BjxofUXtHOJnMcxSoFfLU3ppifo9rlZILIaumgAn09Y3x+nmNQUlq7?= =?iso-8859-2?Q?DuAOtwWnCmxah0tG0dDiNxCEU6HGecSpO2E9tR8no70IqdKz75ePae8ruy?= =?iso-8859-2?Q?eIDjSY7iuZ55JL5puybgCxC4EyEoTlAwNsgKBq+7M/ywBSWxyKURkikw7T?= =?iso-8859-2?Q?b1Oy/Yw3VSfOdwC41GrEetRqIy+1EKDueX0sJWFGYIYkSS2wyMaGnAyaV/?= =?iso-8859-2?Q?/1byTZyC3go6IOCIg3haIUZqZ/wqpP7cMigvDWaq9B3yFve4u+c93QfSJg?= =?iso-8859-2?Q?FEK/GejkkNSM9rpdQIhRI5As3uV0AQXXpaza6+DYYSdLYCLYZu1dgXKDtu?= =?iso-8859-2?Q?uoqd5M6mOVQUtWuEkV6p98JaHRtKV64D8HtKYsP/UaQXve2F8Mz+96Wgvx?= =?iso-8859-2?Q?xgX2pQ5Ozqvy2+KtHcwHzT2FwEd8VjqL/+PVDuFNO5glYyUCg2WPgZI23J?= =?iso-8859-2?Q?Ra2UKF1ZwQuxtTCRaimhnHJNmuYQpKaNsu2lqgzeRrHYqJvI70ZvBTymKz?= =?iso-8859-2?Q?yrxDMfe9TYjzfaI7TJyx2H9mUqNCzTAH+j+1ps+eY+cQLM5EqatB58lEW9?= =?iso-8859-2?Q?BgYVJlD3874o6DFZ5anKAixxU+WTW+P3vvOieMcU+2aETKH53egjINC2T5?= =?iso-8859-2?Q?z2fYSFBPaA7g/UpR3DFsjBDhCy/wlIuCS3lMBRXAacBvprk2N4edUxiQ?= =?iso-8859-2?Q?=3D=3D?= Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: syrmia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: AM0PR03MB4882.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ea3c3ae5-8859-4feb-4132-08dab688e064 X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Oct 2022 13:00:22.2351 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 19214a73-c1ab-4e19-8f59-14bdcb09a66e X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Z0aJ1HiNBhy94vk9XPRcBpdKTds2bnOHB5gGi9goME9g1Bl1NWs011K8Yba0t1W9A+QPPhFRbVF0keK22+Box1fQoSL3kIZVZoYpjiTvO38= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR03MB8004 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard,=0A= =0A= > I don't follow how only having BASE + OFFSET addressing prevents=0A= > calculation of complexity for other addressing modes? Can you explain?= =0A= =0A= It's the valid_mem_ref_p target hook that prevents complexity calculation= =0A= for other addressing modes (for Mips and RISC-V).=0A= Here's the snippet of the algorithm (after f9f69dd) for the complexity =0A= calculation, which is located at the beginning of the get_address_cost=0A= function in tree-ssa-loop-ivopts.cc:=0A= =0A= if (!aff_combination_const_p (aff_inv))=0A= {=0A= parts.index =3D integer_one_node;=0A= /* Addressing mode "base + index". */=0A= ok_without_ratio_p =3D valid_mem_ref_p (mem_mode, as, &parts);=0A= if (ratio !=3D 1)=0A= {=0A= parts.step =3D wide_int_to_tree (type, ratio);=0A= /* Addressing mode "base + index << scale". */=0A= ok_with_ratio_p =3D valid_mem_ref_p (mem_mode, as, &parts);=0A= if (!ok_with_ratio_p)=0A= parts.step =3D NULL_TREE;=0A= }=0A= ...=0A= =0A= The algorithm "builds up" the actual addressing mode step-by-step,=0A= starting from BASE + INDEX. However, if valid_mem_ref_p returns=0A= negative, parts.* is set to NULL_TREE and we bail out. For Mips (or =0A= RISC-V), it always returns negative (given we are building the addressing= =0A= mode up from BASE + INDEX), since Mips allows BASE + OFFSET only=0A= (see the case PLUS in mips_classify_address in config/mips/mips.cc).=0A= The result is that all addressing modes besides BASE + OFFSET, for Mips=0A= (or RISC-V) have complexities of 0. f9f69dd introduced calls to valid_mem_r= ef_p=0A= target hook, which were not there before, and I'm not sure why exactly.=0A= =0A= > Do you have a testcase that shows how both changes improve IV selection= =0A= > for MIPS?=0A= =0A= Certainly, consider the following test case:=0A= =0A= void daxpy(float *vector1, float *vector2, int n, float fp_const)=0A= {=0A= for (int i =3D 0; i < n; ++i)=0A= vector1[i] +=3D fp_const * vector2[i];=0A= }=0A= =0A= void dgefa(float *vector, int m, int n, int l)=0A= {=0A= for (int i =3D 0; i < n - 1; ++i)=0A= {=0A= for (int j =3D i + 1; j < n; ++j)=0A= {=0A= float t =3D vector[m * j + l];=0A= daxpy(&vector[m * i + i + 1], &vector[m * j + i + 1], n - (i + 1), t);= =0A= }=0A= }=0A= }=0A= =0A= At the third inner loop (which gets inlined from daxpy), an unoptimal candi= date=0A= selection takes place.=0A= Worth noting is that f9f69dd doesn't change the costs (they are, however, m= ultiplied by=0A= a factor, but what was lesser/greater before is lesser/greater after). Here= 's how complexities=0A= stand:=0A= =0A= =3D=3D=3D=3D=3D Before f9f69dd =3D=3D=3D=3D=3D=0A= Group 1:=0A= cand cost compl. inv.expr. inv.vars=0A= 1 13 1 3; NIL;=0A= 2 13 2 4; NIL;=0A= 4 9 1 5; NIL;=0A= 5 1 0 NIL; NIL;=0A= 7 9 1 3; NIL;=0A= =3D=3D=3D=3D=3D Before f9f69dd =3D=3D=3D=3D=3D=0A= =3D=3D=3D=3D=3D After f9f69dd =3D=3D=3D=3D=3D=0A= Group 1:=0A= cand cost compl. inv.expr. inv.vars=0A= 1 10 0 4; NIL;=0A= 2 10 0 5; NIL;=0A= 4 6 0 6; NIL;=0A= 5 1 0 NIL; NIL;=0A= 7 6 0 4; NIL;=0A= =3D=3D=3D=3D=3D After f9f69dd =3D=3D=3D=3D=3D=0A= =0A= Notice how all complexities are zero, even though the candidates have=0A= different addressing modes.=0A= =0A= For this particular example, this leads to a different candidate selection:= =0A= =0A= =3D=3D=3D=3D=3D Before f9f69dd =3D=3D=3D=3D=3D=0A= Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:=0A= Candidate 4:=0A= Var befor: ivtmp.17_52=0A= Var after: ivtmp.17_103=0A= Incr POS: before exit test=0A= IV struct:=0A= Type: unsigned long=0A= Base: (unsigned long) (vector_27(D) + _10)=0A= Step: 4=0A= Object: (void *) vector_27(D)=0A= Biv: N=0A= Overflowness wrto loop niter: Overflow=0A= Candidate 5:=0A= Var befor: ivtmp.18_99=0A= Var after: ivtmp.18_98=0A= Incr POS: before exit test=0A= IV struct:=0A= Type: unsigned long=0A= Base: (unsigned long) (vector_27(D) + _14)=0A= Step: 4=0A= Object: (void *) vector_27(D)=0A= Biv: N=0A= Overflowness wrto loop niter: Overflow=0A= =3D=3D=3D=3D=3D Before f9f69dd =3D=3D=3D=3D=3D=0A= =3D=3D=3D=3D=3D After f9f69dd =3D=3D=3D=3D=3D=0A= Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 1 IVs:=0A= Candidate 4:=0A= Var befor: ivtmp.17_52=0A= Var after: ivtmp.17_103=0A= Incr POS: before exit test=0A= IV struct:=0A= Type: unsigned long=0A= Base: (unsigned long) (vector_27(D) + _10)=0A= Step: 4=0A= Object: (void *) vector_27(D)=0A= Biv: N=0A= Overflowness wrto loop niter: Overflow=0A= =3D=3D=3D=3D=3D After f9f69dd =3D=3D=3D=3D=3D=0A= =0A= which, in turn, leads to the following assembly sequence:=0A= =0A= =3D=3D=3D=3D=3D Before f9f69dd =3D=3D=3D=3D=3D=0A= .L83:=0A= lwc1 $f5,0($3)=0A= lwc1 $f8,0($2)=0A= lwc1 $f7,4($2)=0A= lwc1 $f6,8($2)=0A= lwc1 $f9,12($2)=0A= lwc1 $f10,16($2)=0A= maddf.s $f8,$f0,$f5=0A= lwc1 $f11,20($2)=0A= lwc1 $f12,24($2)=0A= lwc1 $f13,28($2)=0A= ld $12,72($sp)=0A= swc1 $f8,0($2)=0A= lwc1 $f14,4($3)=0A= maddf.s $f7,$f0,$f14=0A= swc1 $f7,4($2)=0A= lwc1 $f15,8($3)=0A= maddf.s $f6,$f0,$f15=0A= swc1 $f6,8($2)=0A= lwc1 $f16,12($3)=0A= maddf.s $f9,$f0,$f16=0A= swc1 $f9,12($2)=0A= lwc1 $f17,16($3)=0A= maddf.s $f10,$f0,$f17=0A= swc1 $f10,16($2)=0A= lwc1 $f18,20($3)=0A= maddf.s $f11,$f0,$f18=0A= swc1 $f11,20($2)=0A= lwc1 $f19,24($3)=0A= maddf.s $f12,$f0,$f19=0A= swc1 $f12,24($2)=0A= lwc1 $f20,28($3)=0A= maddf.s $f13,$f0,$f20=0A= swc1 $f13,28($2)=0A= daddiu $2,$2,32=0A= bne $2,$12,.L83=0A= daddiu $3,$3,32=0A= ...=0A= =3D=3D=3D=3D=3D Before f9f69dd =3D=3D=3D=3D=3D=0A= =3D=3D=3D=3D=3D After f9f69dd =3D=3D=3D=3D=3D=0A= .L93:=0A= dsubu $18,$2,$4=0A= lwc1 $f13,0($2)=0A= daddu $19,$18,$5=0A= daddiu $16,$2,4=0A= lwc1 $f14,0($19)=0A= dsubu $17,$16,$4=0A= daddu $25,$17,$5=0A= lwc1 $f15,4($2)=0A= daddiu $19,$2,12=0A= daddiu $20,$2,8=0A= maddf.s $f13,$f1,$f14=0A= dsubu $16,$19,$4=0A= daddiu $17,$2,16=0A= dsubu $18,$20,$4=0A= daddu $19,$16,$5=0A= daddiu $16,$2,20=0A= lwc1 $f10,8($2)=0A= daddu $20,$18,$5=0A= lwc1 $f16,12($2)=0A= dsubu $18,$17,$4=0A= lwc1 $f17,16($2)=0A= dsubu $17,$16,$4=0A= lwc1 $f18,20($2)=0A= daddiu $16,$2,24=0A= lwc1 $f20,24($2)=0A= daddu $18,$18,$5=0A= swc1 $f13,0($2)=0A= daddu $17,$17,$5=0A= lwc1 $f19,0($25)=0A= daddiu $25,$2,28=0A= lwc1 $f11,28($2)=0A= daddiu $2,$2,32=0A= dsubu $16,$16,$4=0A= dsubu $25,$25,$4=0A= maddf.s $f15,$f1,$f19=0A= daddu $16,$16,$5=0A= daddu $25,$25,$5=0A= swc1 $f15,-28($2)=0A= lwc1 $f21,0($20)=0A= ld $20,48($sp)=0A= maddf.s $f10,$f1,$f21=0A= swc1 $f10,-24($2)=0A= lwc1 $f22,0($19)=0A= maddf.s $f16,$f1,$f22=0A= swc1 $f16,-20($2)=0A= lwc1 $f23,0($18)=0A= maddf.s $f17,$f1,$f23=0A= swc1 $f17,-16($2)=0A= lwc1 $f0,0($17)=0A= maddf.s $f18,$f1,$f0=0A= swc1 $f18,-12($2)=0A= lwc1 $f7,0($16)=0A= maddf.s $f20,$f1,$f7=0A= swc1 $f20,-8($2)=0A= lwc1 $f12,0($25)=0A= maddf.s $f11,$f1,$f12=0A= bne $2,$20,.L93=0A= swc1 $f11,-4($2)=0A= ...=0A= =3D=3D=3D=3D=3D After f9f69dd =3D=3D=3D=3D=3D=0A= =0A= Notice the additional instructions used for index calculation, due to=0A= unoptimal candidate selection.=0A= =0A= Regards,=0A= Dimitrije=0A= =0A= From: Richard Biener =0A= Sent: Tuesday, October 25, 2022 1:08 PM=0A= To: Dimitrije Milosevic =0A= Cc: gcc-patches@gcc.gnu.org ; Djordje Todorovic =0A= Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complex= ity. =0A= =A0=0A= On Fri, Oct 21, 2022 at 3:56 PM Dimitrije Milosevic=0A= wrote:=0A= >=0A= > From: Dimitrije Milo=B9evi=E6 =0A= >=0A= > This patch reverts the computation of address cost complexity=0A= > to the legacy one. After f9f69dd, complexity is calculated=0A= > using the valid_mem_ref_p target hook. Architectures like=0A= > Mips only allow BASE + OFFSET addressing modes, which in turn=0A= > prevents the calculation of complexity for other addressing=0A= > modes, resulting in non-optimal candidate selection.=0A= =0A= I don't follow how only having BASE + OFFSET addressing prevents=0A= calculation of complexity for other addressing modes?=A0 Can you explain?= =0A= =0A= Do you have a testcase that shows how both changes improve IV selection=0A= for MIPS?=0A= =0A= >=0A= > gcc/ChangeLog:=0A= >=0A= >=A0=A0=A0=A0=A0=A0=A0=A0 * tree-ssa-address.cc (multiplier_allowed_in_addr= ess_p): Change=0A= >=A0=A0=A0=A0=A0=A0=A0=A0 to non-static.=0A= >=A0=A0=A0=A0=A0=A0=A0=A0 * tree-ssa-address.h (multiplier_allowed_in_addre= ss_p): Declare.=0A= >=A0=A0=A0=A0=A0=A0=A0=A0 * tree-ssa-loop-ivopts.cc (compute_symbol_and_var= _present): Reintroduce.=0A= >=A0=A0=A0=A0=A0=A0=A0=A0 (compute_min_and_max_offset): Likewise.=0A= >=A0=A0=A0=A0=A0=A0=A0=A0 (get_address_cost): Revert=0A= >=A0=A0=A0=A0=A0=A0=A0=A0 complexity calculation.=0A= >=0A= > Signed-off-by: Dimitrije Milosevic =0A= > ---=0A= >=A0 gcc/tree-ssa-address.cc=A0=A0=A0=A0 |=A0=A0 2 +-=0A= >=A0 gcc/tree-ssa-address.h=A0=A0=A0=A0=A0 |=A0=A0 2 +=0A= >=A0 gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--= =0A= >=A0 3 files changed, 207 insertions(+), 11 deletions(-)=0A= >=0A= > diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc=0A= > index ba7b7c93162..442f54f0165 100644=0A= > --- a/gcc/tree-ssa-address.cc=0A= > +++ b/gcc/tree-ssa-address.cc=0A= > @@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)=0A= >=A0=A0=A0=A0 validity for a memory reference accessing memory of mode MODE= in address=0A= >=A0=A0=A0=A0 space AS.=A0 */=0A= >=0A= > -static bool=0A= > +bool=0A= >=A0 multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mod= e,=0A= >=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 addr_space_t as)=0A= >=A0 {=0A= > diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h=0A= > index 95143a099b9..09f36ee2f19 100644=0A= > --- a/gcc/tree-ssa-address.h=0A= > +++ b/gcc/tree-ssa-address.h=0A= > @@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,=0A= >=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 class aff_= tree *, tree, tree, tree, bool);=0A= >=A0 extern void copy_ref_info (tree, tree);=0A= >=A0 tree maybe_fold_tmr (tree);=0A= > +bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode = mode,=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 addr_space_t as);=0A= >=0A= >=A0 extern unsigned int preferred_mem_scale_factor (tree base,=0A= >=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ma= chine_mode mem_mode,=0A= > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc=0A= > index a6f926a68ef..d53ba05a4f6 100644=0A= > --- a/gcc/tree-ssa-loop-ivopts.cc=0A= > +++ b/gcc/tree-ssa-loop-ivopts.cc=0A= > @@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly= _int64 ainc_offset,=0A= >=A0=A0=A0 return infinite_cost;=0A= >=A0 }=0A= >=0A= > +static void=0A= > +compute_symbol_and_var_present (tree e1, tree e2,=0A= > +=A0=A0=A0=A0=A0=A0 bool *symbol_present, bool *var_present)=0A= > +{=0A= > +=A0 poly_uint64_pod off1, off2;=0A= > +=0A= > +=A0 e1 =3D strip_offset (e1, &off1);=0A= > +=A0 e2 =3D strip_offset (e2, &off2);=0A= > +=0A= > +=A0 STRIP_NOPS (e1);=0A= > +=A0 STRIP_NOPS (e2);=0A= > +=0A= > +=A0 if (TREE_CODE (e1) =3D=3D ADDR_EXPR)=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0 poly_int64_pod diff;=0A= > +=A0=A0=A0=A0=A0 if (ptr_difference_const (e1, e2, &diff))=0A= > +=A0 {=0A= > +=A0=A0=A0 *symbol_present =3D false;=0A= > +=A0=A0=A0 *var_present =3D false;=0A= > +=A0=A0=A0 return;=0A= > +=A0 }=0A= > +=0A= > +=A0=A0=A0=A0=A0 if (integer_zerop (e2))=0A= > +=A0 {=0A= > +=A0=A0=A0 tree core;=0A= > +=A0=A0=A0 poly_int64_pod bitsize;=0A= > +=A0=A0=A0 poly_int64_pod bitpos;=0A= > +=A0=A0=A0 widest_int mul;=0A= > +=A0=A0=A0 tree toffset;=0A= > +=A0=A0=A0 machine_mode mode;=0A= > +=A0=A0=A0 int unsignedp, reversep, volatilep;=0A= > +=0A= > +=A0=A0=A0 core =3D get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, = &bitpos,=0A= > +=A0=A0=A0=A0=A0 &toffset, &mode, &unsignedp, &reversep, &volatilep);=0A= > +=0A= > +=A0=A0=A0 if (toffset !=3D 0=0A= > +=A0=A0=A0 || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)=0A= > +=A0=A0=A0 || reversep=0A= > +=A0=A0=A0 || !VAR_P (core))=0A= > +=A0=A0=A0=A0=A0 {=0A= > +=A0=A0=A0 *symbol_present =3D false;=0A= > +=A0=A0=A0 *var_present =3D true;=0A= > +=A0=A0=A0 return;=0A= > +=A0=A0=A0=A0=A0 }=0A= > +=0A= > +=A0=A0=A0 if (TREE_STATIC (core)=0A= > +=A0=A0=A0 || DECL_EXTERNAL (core))=0A= > +=A0=A0=A0=A0=A0 {=0A= > +=A0=A0=A0 *symbol_present =3D true;=0A= > +=A0=A0=A0 *var_present =3D false;=0A= > +=A0=A0=A0 return;=0A= > +=A0=A0=A0=A0=A0 }=0A= > +=0A= > +=A0=A0=A0 *symbol_present =3D false;=0A= > +=A0=A0=A0 *var_present =3D true;=0A= > +=A0=A0=A0 return;=0A= > +=A0 }=0A= > +=0A= > +=A0=A0=A0=A0=A0 *symbol_present =3D false;=0A= > +=A0=A0=A0=A0=A0 *var_present =3D true;=0A= > +=A0=A0=A0 }=0A= > +=A0 *symbol_present =3D false;=0A= > +=0A= > +=A0 if (operand_equal_p (e1, e2, 0))=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0 *var_present =3D false;=0A= > +=A0=A0=A0=A0=A0 return;=0A= > +=A0=A0=A0 }=0A= > +=0A= > +=A0 *var_present =3D true;=0A= > +}=0A= > +=0A= > +static void=0A= > +compute_min_and_max_offset (addr_space_t as,=0A= > +=A0=A0=A0=A0=A0=A0 machine_mode mem_mode, poly_int64_pod *min_offset,=0A= > +=A0=A0=A0=A0=A0=A0 poly_int64_pod *max_offset)=0A= > +{=0A= > +=A0 machine_mode address_mode =3D targetm.addr_space.address_mode (as);= =0A= > +=A0 HOST_WIDE_INT i;=0A= > +=A0 poly_int64_pod off, width;=0A= > +=A0 rtx addr;=0A= > +=A0 rtx reg1;=0A= > +=0A= > +=A0 reg1 =3D gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);=0A= > +=0A= > +=A0 width =3D GET_MODE_BITSIZE (address_mode) - 1;=0A= > +=A0 if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 width =3D HOST_BITS_PER_WIDE_INT - 1;=0A= > +=A0 gcc_assert (width.is_constant ());=0A= > +=A0 addr =3D gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);=0A= > +=0A= > +=A0 off =3D 0;=0A= > +=A0 for (i =3D width.to_constant (); i >=3D 0; i--)=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0 off =3D -(HOST_WIDE_INT_1U << i);=0A= > +=A0=A0=A0=A0=A0 XEXP (addr, 1) =3D gen_int_mode (off, address_mode);=0A= > +=A0=A0=A0=A0=A0 if (memory_address_addr_space_p (mem_mode, addr, as))=0A= > +=A0=A0=A0 break;=0A= > +=A0=A0=A0 }=0A= > +=A0 if (i =3D=3D -1)=0A= > +=A0=A0=A0 *min_offset =3D 0;=0A= > +=A0 else=0A= > +=A0=A0=A0 *min_offset =3D off;=0A= > +=A0 // *min_offset =3D (i =3D=3D -1? 0 : off);=0A= > +=0A= > +=A0 for (i =3D width.to_constant (); i >=3D 0; i--)=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0 off =3D (HOST_WIDE_INT_1U << i) - 1;=0A= > +=A0=A0=A0=A0=A0 XEXP (addr, 1) =3D gen_int_mode (off, address_mode);=0A= > +=A0=A0=A0=A0=A0 if (memory_address_addr_space_p (mem_mode, addr, as))=0A= > +=A0=A0=A0 break;=0A= > +=A0=A0=A0 /* For some strict-alignment targets, the offset must be natur= ally=0A= > +=A0=A0=A0=A0=A0 aligned.=A0 Try an aligned offset if mem_mode is not QIm= ode.=A0 */=0A= > +=A0=A0=A0=A0=A0 off =3D mem_mode !=3D QImode=0A= > +=A0=A0=A0=A0=A0 ? (HOST_WIDE_INT_1U << i)=0A= > +=A0=A0=A0=A0=A0 - (GET_MODE_SIZE (mem_mode))=0A= > +=A0=A0=A0=A0=A0 : 0;=0A= > +=A0=A0=A0=A0=A0 if (known_gt (off, 0))=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0 XEXP (addr, 1) =3D gen_int_mode (off, address_mode);=0A= > +=A0=A0=A0=A0=A0 if (memory_address_addr_space_p (mem_mode, addr, as))=0A= > +=A0=A0=A0 break;=0A= > +=A0=A0=A0 }=0A= > +=A0=A0=A0 }=0A= > +=A0 if (i =3D=3D -1)=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 off =3D 0;=0A= > +=A0 *max_offset =3D off;=0A= > +}=0A= > +=0A= >=A0 /* Return cost of computing USE's address expression by using CAND.=0A= >=A0=A0=A0=A0 AFF_INV and AFF_VAR represent invariant and variant parts of = the=0A= >=A0=A0=A0=A0 address expression, respectively.=A0 If AFF_INV is simple, st= ore=0A= > @@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct= iv_use *use,=0A= >=A0=A0=A0 /* Only true if ratio !=3D 1.=A0 */=0A= >=A0=A0=A0 bool ok_with_ratio_p =3D false;=0A= >=A0=A0=A0 bool ok_without_ratio_p =3D false;=0A= > +=A0 tree ubase =3D use->iv->base;=0A= > +=A0 tree cbase =3D cand->iv->base, cstep =3D cand->iv->step;=0A= > +=A0 tree utype =3D TREE_TYPE (ubase), ctype;=0A= > +=A0 unsigned HOST_WIDE_INT cstepi;=0A= > +=A0 bool symbol_present =3D false, var_present =3D false, stmt_is_after_= increment;=0A= > +=A0 poly_int64_pod min_offset, max_offset;=0A= > +=A0 bool offset_p, ratio_p;=0A= >=0A= >=A0=A0=A0 if (!aff_combination_const_p (aff_inv))=0A= >=A0=A0=A0=A0=A0 {=0A= > @@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struc= t iv_use *use,=0A= >=A0=A0=A0 gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));= =0A= >=A0=A0=A0 cost +=3D address_cost (addr, mem_mode, as, speed);=0A= >=0A= > -=A0 if (parts.symbol !=3D NULL_TREE)=0A= > -=A0=A0=A0 cost.complexity +=3D 1;=0A= > -=A0 /* Don't increase the complexity of adding a scaled index if it's=0A= > -=A0=A0=A0=A0 the only kind of index that the target allows.=A0 */=0A= > -=A0 if (parts.step !=3D NULL_TREE && ok_without_ratio_p)=0A= > -=A0=A0=A0 cost.complexity +=3D 1;=0A= > -=A0 if (parts.base !=3D NULL_TREE && parts.index !=3D NULL_TREE)=0A= > -=A0=A0=A0 cost.complexity +=3D 1;=0A= > -=A0 if (parts.offset !=3D NULL_TREE && !integer_zerop (parts.offset))=0A= > -=A0=A0=A0 cost.complexity +=3D 1;=0A= > +=A0 if (cst_and_fits_in_hwi (cstep))=0A= > +=A0=A0=A0 cstepi =3D int_cst_value (cstep);=0A= > +=A0 else=0A= > +=A0=A0=A0 cstepi =3D 0;=0A= > +=0A= > +=A0 STRIP_NOPS (cbase);=0A= > +=A0 ctype =3D TREE_TYPE (cbase);=0A= > +=0A= > +=A0 stmt_is_after_increment =3D stmt_after_increment (data->current_loop= , cand,=0A= > +=A0=A0=A0 use->stmt);=0A= > +=0A= > +=A0 if (cst_and_fits_in_hwi (cbase))=0A= > +=A0=A0=A0 compute_symbol_and_var_present (ubase, build_int_cst (utype, 0= ),=0A= > +=A0=A0=A0=A0=A0 &symbol_present, &var_present);=0A= > +=A0 else if (ratio =3D=3D 1)=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0 tree real_cbase =3D cbase;=0A= > +=0A= > +=A0=A0=A0=A0=A0 /* Check to see if any adjustment is needed.=A0 */=0A= > +=A0=A0=A0=A0=A0 if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increm= ent)=0A= > +=A0=A0=A0=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 aff_tree real_cbase_aff;=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 aff_tree cstep_aff;=0A= > +=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 tree_to_aff_combination (cbase, TREE_TYPE (real= _cbase),=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 &real_cbase_aff);=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 tree_to_aff_combination (cstep, TREE_TYPE (cste= p), &cstep_aff);=0A= > +=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 aff_combination_add (&real_cbase_aff, &cstep_af= f);=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 real_cbase =3D aff_combination_to_tree (&real_c= base_aff);=0A= > +=A0=A0=A0=A0=A0=A0 }=0A= > +=A0=A0=A0 compute_symbol_and_var_present (ubase, real_cbase,=0A= > +=A0=A0=A0=A0=A0 &symbol_present, &var_present);=0A= > +=A0=A0=A0 }=0A= > +=A0 else if (!POINTER_TYPE_P (ctype)=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0 && multiplier_allowed_in_address_p=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 (ratio, mem_mode,=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 TYPE_= ADDR_SPACE (TREE_TYPE (utype))))=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0 tree real_cbase =3D cbase;=0A= > +=0A= > +=A0=A0=A0=A0=A0 if (cstepi =3D=3D 0 && stmt_is_after_increment)=0A= > +=A0=A0=A0=A0=A0=A0 {=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 if (POINTER_TYPE_P (ctype))=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 real_cbase =3D fold_build2 (POINTER_PLUS_= EXPR, ctype, cbase, cstep);=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0 else=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 real_cbase =3D fold_build2 (PLUS_EXPR, ct= ype, cbase, cstep);=0A= > +=A0=A0=A0=A0=A0=A0 }=0A= > +=A0=A0=A0=A0=A0 real_cbase =3D fold_build2 (MULT_EXPR, ctype, real_cbase= ,=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 build_int_cst (ctype, ratio));=0A= > +=A0=A0=A0 compute_symbol_and_var_present (ubase, real_cbase,=0A= > +=A0=A0=A0=A0=A0 &symbol_present, &var_present);=0A= > +=A0=A0=A0 }=0A= > +=A0 else=0A= > +=A0=A0=A0 {=0A= > +=A0=A0=A0 compute_symbol_and_var_present (ubase, build_int_cst (utype, 0= ),=0A= > +=A0=A0=A0=A0=A0 &symbol_present, &var_present);=0A= > +=A0=A0=A0 }=0A= > +=0A= > +=A0 compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);= =0A= > +=A0 offset_p =3D maybe_ne (aff_inv->offset, 0)=0A= > +=A0=A0=A0=A0=A0=A0 && known_le (min_offset, aff_inv->offset)=0A= > +=A0=A0=A0=A0=A0=A0 && known_le (aff_inv->offset, max_offset);=0A= > +=A0 ratio_p =3D (ratio !=3D 1=0A= > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 && multiplier_allowed_in_address_p (ra= tio, mem_mode, as));=0A= > +=0A= > +=A0 cost.complexity =3D (symbol_present !=3D 0) + (var_present !=3D 0)= =0A= > +=A0=A0=A0=A0=A0=A0 + offset_p + ratio_p;=0A= >=0A= >=A0=A0=A0 return cost;=0A= >=A0 }=0A= > --=0A= > 2.25.1=0A= >=