From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by sourceware.org (Postfix) with ESMTPS id 0EF013858D33 for ; Wed, 9 Aug 2023 19:31:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0EF013858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-x52a.google.com with SMTP id 41be03b00d2f7-563e860df0fso168489a12.2 for ; Wed, 09 Aug 2023 12:31:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691609504; x=1692214304; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:from:to:cc:subject:date:message-id :reply-to; bh=Jxx/rNtlgk1xDAZAj+8wVXJCbVWSrJ5ZuPgWxqQbY3c=; b=Aig1OmMTcWKr7XB7GJDE+5SjZLk/VEWwYbKCwLa+qviJYeDuTxmIhttTbyKynBSAkJ QMm/BqSlVTo5yuMtqfvu9198UZN46HS9JxCRpntSgZhMxi6xzVBJKWPE42XgP4sQw3hy UurHmrJX5EHo1oPE4e7o1k24yUp+Eeq2xi+nEKyfR4tljUQtLgu918zIZT/Ofke3YPX7 dAQkGmgg5sBxyiAo+s3uKqnzCJGsLHz9mDZGo3jZkC3MLEGaUZ/PgIN79WWn51nr/rOu OrL5/fps/zucHVyL2Yk0f6FbxxwTrP8zkJyrvc655G447133Pysb+ZUL+IMmRLpoGSM1 TvZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691609504; x=1692214304; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Jxx/rNtlgk1xDAZAj+8wVXJCbVWSrJ5ZuPgWxqQbY3c=; b=b1Cp/XOOCaX+fL6JEH3KYy+Zp/eoiA82DAFCVm1jpyRR0/gMfNgcc+igv4uV2+M3yR TExzHw1RoLn+7p938Wu8QX4a5rO7KhiGfb7JLZQ/4L6cTQMQ0LD/w7BvVjXh9KViXAHF +EJzalW2fc8QcuF8FkHbeNaec/FqRrCwTcC+BYUUYTHUQzrzkskF3rnogAYD4xp0EBp+ VjM9Jl29hEhacKT0NGhdKoh0k5EtZPwTjsYQtCF9AsK/iij5UBNr5gxPLeFRDo9BzgDO mBU9P9Jd0cLG3AK2QGRXMUpJw0Xz/ymEmL/ZEanLRvlyPf2ivU3ZBSE4ByL/YJLaFIkt ryLQ== X-Gm-Message-State: AOJu0Yw88aEebaHV/tzwa6VaWklUDxtHubvhTmOwt8E3dXUF/n8nDtis Fb1BMK+aoFB7JTMBqRFU8KftdWy723k= X-Google-Smtp-Source: AGHT+IEWsXLV+g0wo9xtn0GwLfJL2B/5QRx/4DBwpFQ9vKvy8BKFKfofvIBScX4pu2rnxwrvtGQlcQ== X-Received: by 2002:a17:90a:c285:b0:267:909f:3719 with SMTP id f5-20020a17090ac28500b00267909f3719mr248586pjt.19.1691609503843; Wed, 09 Aug 2023 12:31:43 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id jh2-20020a170903328200b001a6d4ea7301sm11537511plb.251.2023.08.09.12.31.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 09 Aug 2023 12:31:43 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------Lh75QF0MhUN50QuoNY8gAYyr" Message-ID: <606e52ed-019d-9347-5a24-41385b62f567@gmail.com> Date: Wed, 9 Aug 2023 13:31:41 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: RISC-V: Folding memory for FP + constant case Content-Language: en-US To: Jivan Hakobyan , gcc-patches@gcc.gnu.org References: From: Jeff Law In-Reply-To: X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------Lh75QF0MhUN50QuoNY8gAYyr Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote: > Accessing local arrays element turned into load form (fp + (index << C1)) + > C2 address. > In the case when access is in the loop we got loop invariant computation. > For some reason, moving out that part cannot be done in > loop-invariant passes. > But we can handle that in target-specific hook (legitimize_address). > That provides an opportunity to rewrite memory access more suitable for the > target architecture. > > This patch solves the mentioned case by rewriting mentioned case to ((fp + > C2) + (index << C1)) > I have evaluated it on SPEC2017 and got an improvement on leela (over 7b > instructions, > .39% of the dynamic count) and dwarfs the regression for gcc (14m > instructions, .0012% > of the dynamic count). > > > gcc/ChangeLog: > * config/riscv/riscv.cc (riscv_legitimize_address): Handle folding. > (mem_shadd_or_shadd_rtx_p): New predicate. So I poked a bit more in this space today. As you may have noted, Manolis's patch still needs another rev. But I was able to test this patch in conjunction with the f-m-o patch as well as the additional improvements made to hard register cprop. The net result was that this patch still shows a nice decrease in instruction counts on leela. It's a bit of a mixed bag elsewhere. I dove a bit deeper into the small regression in x264. In the case I looked at the reason the patch regresses is the original form of the address calculations exposes a common subexpression ie addr1 = (reg1 << 2) + fp + C1 addr2 = (reg1 << 2) + fp + C2 (reg1 << 2) + fp is a common subexpression resulting in something like this as we leave CSE: t = (reg1 << 2) + fp; addr1 = t + C1 addr2 = t + C2 mem (addr1) mem (addr2) C1 and C2 are small constants, so combine generates t = (reg1 << 2) + fp; mem (t+C1) mem (t+C2) FP elimination occurs after IRA and we get: t2 = sp + C3 t = (reg << 2) + t2 mem (t + C1) mem (t + C2) Not bad. Manolis's work should allow us to improve that a bit more. With this patch we don't capture the CSE and ultimately generate slightly worse code. This kind of issue is fairly inherent in reassociations -- and given the regression is 2 orders of magnitude smaller than the improvement my inclination is to go forward with this patch. I've fixed a few formatting issues and changed once conditional to use CONST_INT_P rather than checking the code directory and pushed the final version to the trunk. Thanks for your patience. jeff --------------Lh75QF0MhUN50QuoNY8gAYyr Content-Type: text/plain; charset=UTF-8; name="P" Content-Disposition: attachment; filename="P" Content-Transfer-Encoding: base64 Y29tbWl0IGExNmRjNzI5ZmRhOWZhYmQ2NDcyZDUwY2NlNDU3OTFjYjNiNmFkYTgKQXV0aG9y OiBKaXZhbiBIYWtvYnlhbiA8aml2YW5oYWtvYnlhbjlAZ21haWwuY29tPgpEYXRlOiAgIFdl ZCBBdWcgOSAxMzoyNjo1OCAyMDIzIC0wNjAwCgogICAgUklTQy1WOiBGb2xkaW5nIG1lbW9y eSBmb3IgRlAgKyBjb25zdGFudCBjYXNlCiAgICAKICAgIEFjY2Vzc2luZyBsb2NhbCBhcnJh eXMgZWxlbWVudCB0dXJuZWQgaW50byBsb2FkIGZvcm0gKGZwICsgKGluZGV4IDw8IEMxKSkg KwogICAgQzIgYWRkcmVzcy4KICAgIAogICAgSW4gdGhlIGNhc2Ugd2hlbiBhY2Nlc3MgaXMg aW4gdGhlIGxvb3Agd2UgZ290IGxvb3AgaW52YXJpYW50IGNvbXB1dGF0aW9uLiAgRm9yCiAg ICBzb21lIHJlYXNvbiwgbW92aW5nIG91dCB0aGF0IHBhcnQgY2Fubm90IGJlIGRvbmUgaW4g bG9vcC1pbnZhcmlhbnQgcGFzc2VzLiAgQnV0CiAgICB3ZSBjYW4gaGFuZGxlIHRoYXQgaW4g dGFyZ2V0LXNwZWNpZmljIGhvb2sgKGxlZ2l0aW1pemVfYWRkcmVzcykuICBUaGF0IHByb3Zp ZGVzCiAgICBhbiBvcHBvcnR1bml0eSB0byByZXdyaXRlIG1lbW9yeSBhY2Nlc3MgbW9yZSBz dWl0YWJsZSBmb3IgdGhlIHRhcmdldAogICAgYXJjaGl0ZWN0dXJlLgogICAgCiAgICBUaGlz IHBhdGNoIHNvbHZlcyB0aGUgbWVudGlvbmVkIGNhc2UgYnkgcmV3cml0aW5nIG1lbnRpb25l ZCBjYXNlIHRvICgoZnAgKwogICAgQzIpICsgKGluZGV4IDw8IEMxKSkKICAgIAogICAgSSBo YXZlIGV2YWx1YXRlZCBpdCBvbiBTUEVDMjAxNyBhbmQgZ290IGFuIGltcHJvdmVtZW50IG9u IGxlZWxhIChvdmVyIDdiCiAgICBpbnN0cnVjdGlvbnMsIC4zOSUgb2YgdGhlIGR5bmFtaWMg Y291bnQpIGFuZCBkd2FyZnMgdGhlIHJlZ3Jlc3Npb24gZm9yIGdjYyAoMTRtCiAgICBpbnN0 cnVjdGlvbnMsIC4wMDEyJSBvZiB0aGUgZHluYW1pYyBjb3VudCkuCiAgICAKICAgIGdjYy9D aGFuZ2VMb2c6CiAgICAgICAgICAgICogY29uZmlnL3Jpc2N2L3Jpc2N2LmNjIChyaXNjdl9s ZWdpdGltaXplX2FkZHJlc3MpOiBIYW5kbGUgZm9sZGluZy4KICAgICAgICAgICAgKG1lbV9z aGFkZF9vcl9zaGFkZF9ydHhfcCk6IE5ldyBmdW5jdGlvbi4KCmRpZmYgLS1naXQgYS9nY2Mv Y29uZmlnL3Jpc2N2L3Jpc2N2LmNjIGIvZ2NjL2NvbmZpZy9yaXNjdi9yaXNjdi5jYwppbmRl eCA3Nzg5MmRhMjkyMC4uN2YyMDQxYTU0YmEgMTAwNjQ0Ci0tLSBhL2djYy9jb25maWcvcmlz Y3YvcmlzY3YuY2MKKysrIGIvZ2NjL2NvbmZpZy9yaXNjdi9yaXNjdi5jYwpAQCAtMTgwNSw2 ICsxODA1LDIyIEBAIHJpc2N2X3Nob3J0ZW5fbHdfb2Zmc2V0IChydHggYmFzZSwgSE9TVF9X SURFX0lOVCBvZmZzZXQpCiAgIHJldHVybiBhZGRyOwogfQogCisvKiBIZWxwZXIgZm9yIHJp c2N2X2xlZ2l0aW1pemVfYWRkcmVzcy4gR2l2ZW4gWCwgcmV0dXJuIHRydWUgaWYgaXQKKyAg IGlzIGEgbGVmdCBzaGlmdCBieSAxLCAyIG9yIDMgcG9zaXRpb25zIG9yIGEgbXVsdGlwbHkg YnkgMiwgNCBvciA4LgorCisgICBUaGlzIHJlc3BlY3RpdmVseSByZXByZXNlbnQgY2Fub25p Y2FsIHNoaWZ0LWFkZCBydHhzIG9yIHNjYWxlZAorICAgbWVtb3J5IGFkZHJlc3Nlcy4gICov CitzdGF0aWMgYm9vbAorbWVtX3NoYWRkX29yX3NoYWRkX3J0eF9wIChydHggeCkKK3sKKyAg cmV0dXJuICgoR0VUX0NPREUgKHgpID09IEFTSElGVAorCSAgIHx8IEdFVF9DT0RFICh4KSA9 PSBNVUxUKQorCSAgJiYgQ09OU1RfSU5UX1AgKFhFWFAgKHgsIDEpKQorCSAgJiYgKChHRVRf Q09ERSAoeCkgPT0gQVNISUZUICYmIElOX1JBTkdFIChJTlRWQUwgKFhFWFAgKHgsIDEpKSwg MSwgMykpCisJICAgICAgfHwgKEdFVF9DT0RFICh4KSA9PSBNVUxUCisJCSAgJiYgSU5fUkFO R0UgKGV4YWN0X2xvZzIgKElOVFZBTCAoWEVYUCAoeCwgMSkpKSwgMSwgMykpKSk7Cit9CisK IC8qIFRoaXMgZnVuY3Rpb24gaXMgdXNlZCB0byBpbXBsZW1lbnQgTEVHSVRJTUlaRV9BRERS RVNTLiAgSWYgWCBjYW4KICAgIGJlIGxlZ2l0aW1pemVkIGluIGEgd2F5IHRoYXQgdGhlIGdl bmVyaWMgbWFjaGluZXJ5IG1pZ2h0IG5vdCBleHBlY3QsCiAgICByZXR1cm4gYSBuZXcgYWRk cmVzcywgb3RoZXJ3aXNlIHJldHVybiBOVUxMLiAgTU9ERSBpcyB0aGUgbW9kZSBvZgpAQCAt MTgzMCw2ICsxODQ2LDMyIEBAIHJpc2N2X2xlZ2l0aW1pemVfYWRkcmVzcyAocnR4IHgsIHJ0 eCBvbGR4IEFUVFJJQlVURV9VTlVTRUQsCiAgICAgICBydHggYmFzZSA9IFhFWFAgKHgsIDAp OwogICAgICAgSE9TVF9XSURFX0lOVCBvZmZzZXQgPSBJTlRWQUwgKFhFWFAgKHgsIDEpKTsK IAorICAgICAgLyogSGFuZGxlIChwbHVzIChwbHVzIChtdWx0IChhKSAobWVtX3NoYWRkX2Nv bnN0YW50KSkgKGZwKSkgKEMpKSBjYXNlLiAgKi8KKyAgICAgIGlmIChHRVRfQ09ERSAoYmFz ZSkgPT0gUExVUyAmJiBtZW1fc2hhZGRfb3Jfc2hhZGRfcnR4X3AgKFhFWFAgKGJhc2UsIDAp KQorCSAgJiYgU01BTExfT1BFUkFORCAob2Zmc2V0KSkKKwl7CisJICBydHggaW5kZXggPSBY RVhQIChiYXNlLCAwKTsKKwkgIHJ0eCBmcCA9IFhFWFAgKGJhc2UsIDEpOworCSAgaWYgKFJF R05PIChmcCkgPT0gVklSVFVBTF9TVEFDS19WQVJTX1JFR05VTSkKKwkgICAgeworCisJICAg ICAgLyogSWYgd2Ugd2VyZSBnaXZlbiBhIE1VTFQsIHdlIG11c3QgZml4IHRoZSBjb25zdGFu dAorCQkgYXMgd2UncmUgZ29pbmcgdG8gY3JlYXRlIHRoZSBBU0hJRlQgZm9ybS4gICovCisJ ICAgICAgaW50IHNoaWZ0X3ZhbCA9IElOVFZBTCAoWEVYUCAoaW5kZXgsIDEpKTsKKwkgICAg ICBpZiAoR0VUX0NPREUgKGluZGV4KSA9PSBNVUxUKQorCQlzaGlmdF92YWwgPSBleGFjdF9s b2cyIChzaGlmdF92YWwpOworCisJICAgICAgcnR4IHJlZzEgPSBnZW5fcmVnX3J0eCAoUG1v ZGUpOworCSAgICAgIHJ0eCByZWcyID0gZ2VuX3JlZ19ydHggKFBtb2RlKTsKKwkgICAgICBy dHggcmVnMyA9IGdlbl9yZWdfcnR4IChQbW9kZSk7CisJICAgICAgcmlzY3ZfZW1pdF9iaW5h cnkgKFBMVVMsIHJlZzEsIGZwLCBHRU5fSU5UIChvZmZzZXQpKTsKKwkgICAgICByaXNjdl9l bWl0X2JpbmFyeSAoQVNISUZULCByZWcyLCBYRVhQIChpbmRleCwgMCksIEdFTl9JTlQgKHNo aWZ0X3ZhbCkpOworCSAgICAgIHJpc2N2X2VtaXRfYmluYXJ5IChQTFVTLCByZWczLCByZWcy LCByZWcxKTsKKworCSAgICAgIHJldHVybiByZWczOworCSAgICB9CisJfQorCiAgICAgICBp ZiAoIXJpc2N2X3ZhbGlkX2Jhc2VfcmVnaXN0ZXJfcCAoYmFzZSwgbW9kZSwgZmFsc2UpKQog CWJhc2UgPSBjb3B5X3RvX21vZGVfcmVnIChQbW9kZSwgYmFzZSk7CiAgICAgICBpZiAob3B0 aW1pemVfZnVuY3Rpb25fZm9yX3NpemVfcCAoY2Z1bikK --------------Lh75QF0MhUN50QuoNY8gAYyr--