From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by sourceware.org (Postfix) with ESMTPS id 50C2A3857BB3 for ; Wed, 23 Aug 2023 21:14:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 50C2A3857BB3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-68bed2c786eso203798b3a.0 for ; Wed, 23 Aug 2023 14:14:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692825293; x=1693430093; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:from:to:cc:subject:date:message-id :reply-to; bh=I7lRyHnbDOwpY3NMuCkyYt7rgXCc02PJ4fQBVU/rGMA=; b=W7QJhj5mSoGf0DJB/ahIrVwSp5YLTJhHNZ6qdhTUwrujTBjHpBJ9lC3IAbyRKU8eaO AeUbCRmZpjwkT2Ie2f1J2ZXf5Gi2Mda63vkpQxmbnwSUsEFNHQC5uvTk/GcIc6YdAS53 Xnoq2F+hT9mkEsUkW6OZOB8f35R5j2THIdwgtc8vbsZp01k0YhfsmdgiUIdjZ5tj2Ijs PbWNBBWTHRwRNWQlJ81KSM/NOoVk6OLzhy1Dbb647e45cqUicKAZJwWo67zm/D1gZazD Y8MHGzFjmNbIVDbI2x/L/IECxGvjanUBiJud20kMN//F+s+FMXDdZCsNECCcAjz6g1jc MKxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692825293; x=1693430093; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=I7lRyHnbDOwpY3NMuCkyYt7rgXCc02PJ4fQBVU/rGMA=; b=k7h+3GoYVfM5lbYWGTkyfjWXpLgy4sgtCi3IWpGdX3G8mKpwmEKCuKKHID+r4uBqGn dgrGtCPKM4dTikKn4HOP8pDR+eKcTtf8jbkCS/E1ysCaJvq741PxMZ1K0DXlILpKSlsg dMTdjMrZaZYIEOyPBpiVRJ9jo+Vm1Rt2++LJHyrvaXxyH5QVIVo5ilt6JScubxBVzt+H uico8gaxu6svhv1aZ2g2g8s2ufwdW0c0TJjeXKjMihYtDGA/1CznS0Z6cwn7VLiEx+k/ OjVMtstEtTrLO8ws9s2zUmh/szkHR6DzfFur52JTu+Fz2RNNEVN+CM1QjcHUqKBXEbme eCbQ== X-Gm-Message-State: AOJu0YwTPZOXKK9sDBB2vhDExyzbKStzOSaBMnTLsgqNst4hlFwGl0c9 J4hQLqL4jvT+rA6XPwVOaWwd63hOtpc= X-Google-Smtp-Source: AGHT+IEPxtd3QnbS8/CX/Yy58WO2fhv6tXW9ORYX4a9Val1EWfWaTHEbPMnIuBSyycXtRVLmeNdwzQ== X-Received: by 2002:a05:6a00:1346:b0:688:7aee:9d2a with SMTP id k6-20020a056a00134600b006887aee9d2amr13347972pfu.8.1692825293340; Wed, 23 Aug 2023 14:14:53 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id c24-20020aa78c18000000b0068bbe3073b6sm2369326pfd.181.2023.08.23.14.14.52 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Aug 2023 14:14:52 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------A7xhEkfztjVtOuvK3xIv6T0k" Message-ID: Date: Wed, 23 Aug 2023 15:14:51 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [committed] Improve quality of code from LRA register elimination Content-Language: en-US To: gcc-patches@gcc.gnu.org References: <66015ebf-ebec-c249-1f48-3949da228b18@ventanamicro.com> From: Jeff Law In-Reply-To: <66015ebf-ebec-c249-1f48-3949da228b18@ventanamicro.com> X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------A7xhEkfztjVtOuvK3xIv6T0k Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 8/23/23 14:13, Jeff Law wrote: > This is primarily Jivan's work, I'm mostly responsible for the write-up > and coordinating with Vlad on a few questions. > > On targets with limitations on immediates usable in arithmetic > instructions, LRA's register elimination phase can construct fairly poor > code. > > This example (from the GCC testsuite) illustrates the problem well. > > > int  consume (void *); > int foo (void) { >   int x[1000000]; >   return consume (x + 1000); > } > > If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc > -mabi=lp64d", then you'll get this code (up to the call to consume()). > > > >         .cfi_startproc >         li      t0,-4001792 >         li      a0,-3997696 >         li      a5,4001792 >         addi    sp,sp,-16 >         .cfi_def_cfa_offset 16 >         addi    t0,t0,1792 >         addi    a0,a0,1696 >         addi    a5,a5,-1792 >         sd      ra,8(sp) >         add     a5,a5,a0 >         add     sp,sp,t0 >         .cfi_def_cfa_offset 4000016 >         .cfi_offset 1, -8 >         add     a0,a5,sp >         call    consume > > Of particular interest is the value in a0 when we call consume. We > compute that horribly inefficiently.   If we back-substitute from the > final assignment to a0 we get... > > a0 = a5 + sp > a0 = a5 + (sp + t0) > a0 = (a5 + a0) + (sp + t0) > a0 = ((a5 - 1792) + a0) + (sp + t0) > a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0) > a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792)) > a0 = (a5 + (a0 + 1696)) + (sp + t0)  // removed offsetting terms > a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0) > a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0) > a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + t0) > a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + -4001792) > a0 = (-3997696 + 1696) + (sp -16) // removed offsetting terms > a0 = sp - 3990616 > > That's a pretty convoluted way to compute sp - 3990616. > > Something like this would be notably better (not great, but we need both > the stack adjustment and the address of the object to pass to consume): > > > >    addi sp,sp,-16 >    sd ra,8(sp) >    li t0,-4001792 >    addi t0,t0,1792 >    add sp,sp,t0 >    li a0,4096 >    addi a0,a0,-96 >    add a0,sp,a0 >    call consume > > > The problem is LRA's elimination code is not handling the case where we > have (plus (reg1) (reg2) where reg1 is an eliminable register and reg2 > has a known equivalency, particularly a constant. > > If we can determine that reg2 is equivalent to a constant and treat > (plus (reg1) (reg2)) in the same way we'd treat (plus (reg1) > (const_int)) then we can get the desired code. > > This eliminates about 19b instructions, or roughly 1% for deepsjeng on > rv64.  There are improvements elsewhere, but they're relatively small. > This may ultimately lessen the value of Manolis's fold-mem-offsets > patch.  So we'll have to evaluate that again once he posts a new version. > > Bootstrapped and regression tested on x86_64 as well as bootstrapped on > rv64.  Earlier versions have been tested against spec2017.  Pre-approved > by Vlad in a private email conversation (thanks Vlad!). > > Committed to the trunk, Whoops. Attached the wrong patch :-) This is the right one. jeff --------------A7xhEkfztjVtOuvK3xIv6T0k Content-Type: text/plain; charset=UTF-8; name="P" Content-Disposition: attachment; filename="P" Content-Transfer-Encoding: base64 Y29tbWl0IDY2MTliM2Q0YzE1Y2Q3NTQ3OThiMTA0OGM2N2YzODA2YmJjYzJlNmQKQXV0aG9y OiBKaXZhbiBIYWtvYnlhbiA8aml2YW5oYWtvYnlhbjlAZ21haWwuY29tPgpEYXRlOiAgIFdl ZCBBdWcgMjMgMTQ6MTA6MzAgMjAyMyAtMDYwMAoKICAgIEltcHJvdmUgcXVhbGl0eSBvZiBj b2RlIGZyb20gTFJBIHJlZ2lzdGVyIGVsaW1pbmF0aW9uCiAgICAKICAgIFRoaXMgaXMgcHJp bWFyaWx5IEppdmFuJ3Mgd29yaywgSSdtIG1vc3RseSByZXNwb25zaWJsZSBmb3IgdGhlIHdy aXRlLXVwIGFuZAogICAgY29vcmRpbmF0aW5nIHdpdGggVmxhZCBvbiBhIGZldyBxdWVzdGlv bnMuCiAgICAKICAgIE9uIHRhcmdldHMgd2l0aCBsaW1pdGF0aW9ucyBvbiBpbW1lZGlhdGVz IHVzYWJsZSBpbiBhcml0aG1ldGljIGluc3RydWN0aW9ucywKICAgIExSQSdzIHJlZ2lzdGVy IGVsaW1pbmF0aW9uIHBoYXNlIGNhbiBjb25zdHJ1Y3QgZmFpcmx5IHBvb3IgY29kZS4KICAg IAogICAgVGhpcyBleGFtcGxlIChmcm9tIHRoZSBHQ0MgdGVzdHN1aXRlKSBpbGx1c3RyYXRl cyB0aGUgcHJvYmxlbSB3ZWxsLgogICAgCiAgICBpbnQgIGNvbnN1bWUgKHZvaWQgKik7CiAg ICBpbnQgZm9vICh2b2lkKSB7CiAgICAgIGludCB4WzEwMDAwMDBdOwogICAgICByZXR1cm4g Y29uc3VtZSAoeCArIDEwMDApOwogICAgfQogICAgCiAgICBJZiB5b3UgY29tcGlsZSBvbiBy aXNjdjY0LWxpbnV4LWdudSB3aXRoICItTzIgLW1hcmNoPXJ2NjRnYyAtbWFiaT1scDY0ZCIs IHRoZW4KICAgIHlvdSdsbCBnZXQgdGhpcyBjb2RlICh1cCB0byB0aGUgY2FsbCB0byBjb25z dW1lKCkpLgogICAgCiAgICAgICAgICAgIC5jZmlfc3RhcnRwcm9jCiAgICAgICAgICAgIGxp ICAgICAgdDAsLTQwMDE3OTIKICAgICAgICAgICAgbGkgICAgICBhMCwtMzk5NzY5NgogICAg ICAgICAgICBsaSAgICAgIGE1LDQwMDE3OTIKICAgICAgICAgICAgYWRkaSAgICBzcCxzcCwt MTYKICAgICAgICAgICAgLmNmaV9kZWZfY2ZhX29mZnNldCAxNgogICAgICAgICAgICBhZGRp ICAgIHQwLHQwLDE3OTIKICAgICAgICAgICAgYWRkaSAgICBhMCxhMCwxNjk2CiAgICAgICAg ICAgIGFkZGkgICAgYTUsYTUsLTE3OTIKICAgICAgICAgICAgc2QgICAgICByYSw4KHNwKQog ICAgICAgICAgICBhZGQgICAgIGE1LGE1LGEwCiAgICAgICAgICAgIGFkZCAgICAgc3Asc3As dDAKICAgICAgICAgICAgLmNmaV9kZWZfY2ZhX29mZnNldCA0MDAwMDE2CiAgICAgICAgICAg IC5jZmlfb2Zmc2V0IDEsIC04CiAgICAgICAgICAgIGFkZCAgICAgYTAsYTUsc3AKICAgICAg ICAgICAgY2FsbCAgICBjb25zdW1lCiAgICAKICAgIE9mIHBhcnRpY3VsYXIgaW50ZXJlc3Qg aXMgdGhlIHZhbHVlIGluIGEwIHdoZW4gd2UgY2FsbCBjb25zdW1lLiBXZSBjb21wdXRlIHRo YXQKICAgIGhvcnJpYmx5IGluZWZmaWNpZW50bHkuICAgSWYgd2UgYmFjay1zdWJzdGl0dXRl IGZyb20gdGhlIGZpbmFsIGFzc2lnbm1lbnQgdG8gYTAKICAgIHdlIGdldC4uLgogICAgCiAg ICBhMCA9IGE1ICsgc3AKICAgIGEwID0gYTUgKyAoc3AgKyB0MCkKICAgIGEwID0gKGE1ICsg YTApICsgKHNwICsgdDApCiAgICBhMCA9ICgoYTUgLSAxNzkyKSArIGEwKSArIChzcCArIHQw KQogICAgYTAgPSAoKGE1IC0gMTc5MikgKyAoYTAgKyAxNjk2KSkgKyAoc3AgKyB0MCkKICAg IGEwID0gKChhNSAtIDE3OTIpICsgKGEwICsgMTY5NikpICsgKHNwICsgKHQwICsgMTc5Mikp CiAgICBhMCA9IChhNSArIChhMCArIDE2OTYpKSArIChzcCArIHQwKSAgLy8gcmVtb3ZlZCBv ZmZzZXR0aW5nIHRlcm1zCiAgICBhMCA9IChhNSArIChhMCArIDE2OTYpKSArICgoc3AgLSAx NikgKyB0MCkKICAgIGEwID0gKDQwMDE3OTIgKyAoYTAgKyAxNjk2KSkgKyAoKHNwIC0gMTYp ICsgdDApCiAgICBhMCA9ICg0MDAxNzkyICsgKC0zOTk3Njk2ICsgMTY5NikpICsgKChzcCAt IDE2KSArIHQwKQogICAgYTAgPSAoNDAwMTc5MiArICgtMzk5NzY5NiArIDE2OTYpKSArICgo c3AgLSAxNikgKyAtNDAwMTc5MikKICAgIGEwID0gKC0zOTk3Njk2ICsgMTY5NikgKyAoc3Ag LTE2KSAvLyByZW1vdmVkIG9mZnNldHRpbmcgdGVybXMKICAgIGEwID0gc3AgLSAzOTkwNjE2 CiAgICAKICAgIFRoYXQncyBhIHByZXR0eSBjb252b2x1dGVkIHdheSB0byBjb21wdXRlIHNw IC0gMzk5MDYxNi4KICAgIAogICAgU29tZXRoaW5nIGxpa2UgdGhpcyB3b3VsZCBiZSBub3Rh Ymx5IGJldHRlciAobm90IGdyZWF0LCBidXQgd2UgbmVlZCBib3RoIHRoZQogICAgc3RhY2sg YWRqdXN0bWVudCBhbmQgdGhlIGFkZHJlc3Mgb2YgdGhlIG9iamVjdCB0byBwYXNzIHRvIGNv bnN1bWUpOgogICAgCiAgICAgICBhZGRpIHNwLHNwLC0xNgogICAgICAgc2QgcmEsOChzcCkK ICAgICAgIGxpIHQwLC00MDAxNzkyCiAgICAgICBhZGRpIHQwLHQwLDE3OTIKICAgICAgIGFk ZCBzcCxzcCx0MAogICAgICAgbGkgYTAsNDA5NgogICAgICAgYWRkaSBhMCxhMCwtOTYKICAg ICAgIGFkZCBhMCxzcCxhMAogICAgICAgY2FsbCBjb25zdW1lCiAgICAKICAgIFRoZSBwcm9i bGVtIGlzIExSQSdzIGVsaW1pbmF0aW9uIGNvZGUgaXMgbm90IGhhbmRsaW5nIHRoZSBjYXNl IHdoZXJlIHdlIGhhdmUKICAgIChwbHVzIChyZWcxKSAocmVnMikgd2hlcmUgcmVnMSBpcyBh biBlbGltaW5hYmxlIHJlZ2lzdGVyIGFuZCByZWcyIGhhcyBhIGtub3duCiAgICBlcXVpdmFs ZW5jeSwgcGFydGljdWxhcmx5IGEgY29uc3RhbnQuCiAgICAKICAgIElmIHdlIGNhbiBkZXRl cm1pbmUgdGhhdCByZWcyIGlzIGVxdWl2YWxlbnQgdG8gYSBjb25zdGFudCBhbmQgdHJlYXQg KHBsdXMKICAgIChyZWcxKSAocmVnMikpIGluIHRoZSBzYW1lIHdheSB3ZSdkIHRyZWF0IChw bHVzIChyZWcxKSAoY29uc3RfaW50KSkgdGhlbiB3ZSBjYW4KICAgIGdldCB0aGUgZGVzaXJl ZCBjb2RlLgogICAgCiAgICBUaGlzIGVsaW1pbmF0ZXMgYWJvdXQgMTliIGluc3RydWN0aW9u cywgb3Igcm91Z2hseSAxJSBmb3IgZGVlcHNqZW5nIG9uIHJ2NjQuCiAgICBUaGVyZSBhcmUg aW1wcm92ZW1lbnRzIGVsc2V3aGVyZSwgYnV0IHRoZXkncmUgcmVsYXRpdmVseSBzbWFsbC4g IFRoaXMgbWF5CiAgICB1bHRpbWF0ZWx5IGxlc3NlbiB0aGUgdmFsdWUgb2YgTWFub2xpcydz IGZvbGQtbWVtLW9mZnNldHMgcGF0Y2guICBTbyB3ZSdsbCBoYXZlCiAgICB0byBldmFsdWF0 ZSB0aGF0IGFnYWluIG9uY2UgaGUgcG9zdHMgYSBuZXcgdmVyc2lvbi4KICAgIAogICAgQm9v dHN0cmFwcGVkIGFuZCByZWdyZXNzaW9uIHRlc3RlZCBvbiB4ODZfNjQgYXMgd2VsbCBhcyBi b290c3RyYXBwZWQgb24gcnY2NC4KICAgIEVhcmxpZXIgdmVyc2lvbnMgaGF2ZSBiZWVuIHRl c3RlZCBhZ2FpbnN0IHNwZWMyMDE3LiAgUHJlLWFwcHJvdmVkIGJ5IFZsYWQgaW4gYQogICAg cHJpdmF0ZSBlbWFpbCBjb252ZXJzYXRpb24gKHRoYW5rcyBWbGFkISkuCiAgICAKICAgIENv bW1pdHRlZCB0byB0aGUgdHJ1bmssCiAgICAKICAgIGdjYy8KICAgICAgICAgICAgKiBscmEt ZWxpbWluYXRpb25zLmNjIChlbGltaW5hdGVfcmVnc19pbl9pbnNuKTogVXNlIGVxdWl2YWxl bmNlcyB0bwogICAgICAgICAgICB0byBoZWxwIHNpbXBsaWZ5IGNvZGUgZnVydGhlci4KCmRp ZmYgLS1naXQgYS9nY2MvbHJhLWVsaW1pbmF0aW9ucy5jYyBiL2djYy9scmEtZWxpbWluYXRp b25zLmNjCmluZGV4IDNjNThkNGEzODE1Li5kZjYxM2NkZGE3NiAxMDA2NDQKLS0tIGEvZ2Nj L2xyYS1lbGltaW5hdGlvbnMuY2MKKysrIGIvZ2NjL2xyYS1lbGltaW5hdGlvbnMuY2MKQEAg LTkyNiw2ICs5MjYsMTggQEAgZWxpbWluYXRlX3JlZ3NfaW5faW5zbiAocnR4X2luc24gKmlu c24sIGJvb2wgcmVwbGFjZV9wLCBib29sIGZpcnN0X3AsCiAgICAgICAvKiBGaXJzdCBzZWUg aWYgdGhlIHNvdXJjZSBpcyBvZiB0aGUgZm9ybSAocGx1cyAoLi4uKSBDU1QpLiAgKi8KICAg ICAgIGlmIChwbHVzX3NyYyAmJiBwb2x5X2ludF9ydHhfcCAoWEVYUCAocGx1c19zcmMsIDEp LCAmb2Zmc2V0KSkKIAlwbHVzX2NzdF9zcmMgPSBwbHVzX3NyYzsKKyAgICAgIC8qIElmIHdl IGFyZSBkb2luZyBpbml0aWFsIG9mZnNldCBjb21wdXRhdGlvbiwgdGhlbiB1dGlsaXplCisJ IGVxaXZhbGVuY2VzIHRvIGRpc2NvdmVyIGEgY29uc3RhbnQgZm9yIHRoZSBzZWNvbmQgdGVy bQorCSBvZiBQTFVTX1NSQy4gICovCisgICAgICBlbHNlIGlmIChwbHVzX3NyYyAmJiBSRUdf UCAoWEVYUCAocGx1c19zcmMsIDEpKSkKKwl7CisJICBpbnQgcmVnbm8gPSBSRUdOTyAoWEVY UCAocGx1c19zcmMsIDEpKTsKKwkgIGlmIChyZWdubyA8IGlyYV9yZWdfZXF1aXZfbGVuCisJ ICAgICAgJiYgaXJhX3JlZ19lcXVpdltyZWdub10uY29uc3RhbnQgIT0gTlVMTF9SVFgKKwkg ICAgICAmJiAhcmVwbGFjZV9wCisJICAgICAgJiYgcG9seV9pbnRfcnR4X3AgKGlyYV9yZWdf ZXF1aXZbcmVnbm9dLmNvbnN0YW50LCAmb2Zmc2V0KSkKKwkgICAgcGx1c19jc3Rfc3JjID0g cGx1c19zcmM7CisJfQogICAgICAgLyogQ2hlY2sgdGhhdCB0aGUgZmlyc3Qgb3BlcmFuZCBv ZiB0aGUgUExVUyBpcyBhIGhhcmQgcmVnIG9yCiAJIHRoZSBsb3dwYXJ0IHN1YnJlZyBvZiBv bmUuICAqLwogICAgICAgaWYgKHBsdXNfY3N0X3NyYykK --------------A7xhEkfztjVtOuvK3xIv6T0k--