From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id 7C5B13857B9B for ; Thu, 8 Jun 2023 05:37:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7C5B13857B9B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-662f0feafb2so178387b3a.1 for ; Wed, 07 Jun 2023 22:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686202643; x=1688794643; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=z05QREQ7xSQAXYZBFReJv06Hw07ywaI3roBL8XNnaeY=; b=GCDs3OkumC61iQFLkEDYSxNNZRrhrb4Dxecgfca26x2XxGPSHst/Q7jiSizSd/vl2Z dICzWcFncrv805wndWDvDWLjSyrddz7CFqu8mR5LO5AF8SSZoxt+fvsbaF0Y84MX+hYi O149YUNGlYysAimevGD2bqtbB6IfyQpDcgCro8rMbiDtClImw2/YrDuVH6hEIxOzICek 0FM7a2crPiac1W7MZJfTDUj8cxACMg6eJMcZOmyeggUy+LaDWedJt+c/8DSjl/08l74y 4R5G7HkyZJmfjqqb/cYFCRtz3I4a9Pvc1KIfSFc9cNWejbzyXFAPf4XDTMYZ/o5RvZ/h f8cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686202643; x=1688794643; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z05QREQ7xSQAXYZBFReJv06Hw07ywaI3roBL8XNnaeY=; b=MJeCku88Uc8krfFWkP/6ZggggRLgvanQeOayBT6aTSZoVdhbC28ESDZlESFMKNYC7o ipgQOI4DazkG55S0kE80dpCS0CkmybbHPlQASY7RZ+NLmIHHNYukyBF3BBv5wnZw/Qtd yTES5jjt5nFiVNRbBk0CPM7SvOSmmlAf8QTeBGihiW7zbGN0wzEvQkHAjqVTVMp4opUm eriqz1dbf8+0rkUiipVh+CCQC418jPjC/344z+fc9C9XVVpBc0ZCwMQVeiITgXYOIdvm m8E+B4iO9YNVH6YB9Qrj85Qj76o6vQEKyPSpVWBGFAujp8b3VCLsX1vaiREILFmCuIBL j1yw== X-Gm-Message-State: AC+VfDyRSrMnt4H/TwZBDK63ATKoJtlSyT9R7h9aGI4Cn5pOVOA1Ewe8 V7nMCEz14dNRQVz0yk652xQ= X-Google-Smtp-Source: ACHHUZ69R+5keyt3H1RtEP2mvf2Tsw4eJKGM+ekKBJYcKNIrAlS1TRk+ksLX2ju5VDbqL/rDgz837A== X-Received: by 2002:a05:6a20:430b:b0:100:c3fe:a653 with SMTP id h11-20020a056a20430b00b00100c3fea653mr1273915pzk.29.1686202642514; Wed, 07 Jun 2023 22:37:22 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id d16-20020aa78690000000b006505bae11bcsm289026pfo.23.2023.06.07.22.37.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Jun 2023 22:37:22 -0700 (PDT) Message-ID: <91d71dae-b235-fbd0-c8f0-001b7f1e444c@gmail.com> Date: Wed, 7 Jun 2023 23:37:20 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets. Content-Language: en-US To: Manolis Tsamis , gcc-patches@gcc.gnu.org Cc: Richard Biener , Palmer Dabbelt , Philipp Tomsich , Kito Cheng References: <20230525123550.1072506-1-manolis.tsamis@vrull.eu> <20230525123550.1072506-2-manolis.tsamis@vrull.eu> From: Jeff Law In-Reply-To: <20230525123550.1072506-2-manolis.tsamis@vrull.eu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/25/23 06:35, Manolis Tsamis wrote: > Implementation of the new RISC-V optimization pass for memory offset > calculations, documentation and testcases. > > gcc/ChangeLog: > > * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs. > * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new > pass. > * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare. > * config/riscv/riscv.opt: New options. > * config/riscv/t-riscv: New build rule. > * doc/invoke.texi: Document new option. > * config/riscv/riscv-fold-mem-offsets.cc: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/fold-mem-offsets-1.c: New test. > * gcc.target/riscv/fold-mem-offsets-2.c: New test. > * gcc.target/riscv/fold-mem-offsets-3.c: New test. So not going into the guts of the patch yet. From a benchmark standpoint the only two that get out of the +-0.05% range are mcf and deepsjeng (from a dynamic instruction standpoint). So from an evaluation standpoint we can probably focus our efforts there. And as we know, mcf is actually memory bound, so while improving its dynamic instruction count is good, the end performance improvement may be marginal. As I mentioned to Philipp many months ago this reminds me a lot of a problem I've seen before. Basically register elimination emits code that can be terrible in some circumstances. So I went and poked at this again. I think the key difference between now and what I was dealing with before is for the cases that really matter for rv64 we have a shNadd insn in the sequence. That private port I was working on before did not have shNadd (don't ask, I probably can't tell). Our target also had reg+reg addressing modes. What I can't remember was if we were trying harder to fold the constant terms into the memory reference or if we were more focused on the reg+reg. Ultimately it's probably not that important to remember -- the key is there are very significant differences in the target's capabilities which impact how we should be generating code in this case. Those differences affect the code we generate *and* the places where we can potentially get control and do some address rewriting. A key sequence in mcf looks something like this in IRA, others have similar structure: > (insn 237 234 239 26 (set (reg:DI 377) > (plus:DI (ashift:DI (reg:DI 200 [ _173 ]) > (const_int 3 [0x3])) > (reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd} > (nil)) > (insn 239 237 235 26 (set (reg/f:DI 380) > (plus:DI (reg:DI 513) > (reg:DI 377))) "pbeampp.c":139:15 5 {adddi3} > (expr_list:REG_DEAD (reg:DI 377) > (expr_list:REG_EQUAL (plus:DI (reg:DI 377) > (const_int -32768 [0xffffffffffff8000])) > (nil)))) [ ... ] > (insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ]) > (mem/f:DI (plus:DI (reg/f:DI 380) > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit} > (expr_list:REG_DEAD (reg/f:DI 380) > (nil))) The key here is insn 237. It's generally going to be bad to have FP show up in a shadd insn because its going to be eliminated into sp+offset. That'll generate an input reload before insn 237 and we can't do any combination with the constant in insn 239. After LRA it looks like this: > (insn 1540 234 1541 26 (set (reg:DI 11 a1 [750]) > (const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit} > (nil)) > (insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749]) > (plus:DI (reg:DI 11 a1 [750]) > (const_int -272 [0xfffffffffffffef0]))) "pbeampp.c":139:15 5 {adddi3} > (expr_list:REG_EQUAL (const_int 32496 [0x7ef0]) > (nil))) > (insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795]) > (plus:DI (reg/f:DI 2 sp) > (const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3} > (nil)) > (insn 1542 1611 237 26 (set (reg:DI 12 a2 [749]) > (plus:DI (reg:DI 12 a2 [749]) > (reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3} > (nil)) > (insn 237 1542 239 26 (set (reg:DI 12 a2 [377]) > (plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200]) > (const_int 3 [0x3])) > (reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd} > (nil)) > (insn 239 237 235 26 (set (reg/f:DI 12 a2 [380]) > (plus:DI (reg:DI 10 a0 [513]) > (reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3} > (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377]) > (const_int -32768 [0xffffffffffff8000])) > (nil))) [ ... ] > (insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204]) > (mem/f:DI (plus:DI (reg/f:DI 12 a2 [380]) > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit} > (nil)) Reload/LRA made an absolute mess of that code. But before we add a new pass (target specific or generic), I think it may be in our best interest experiment a bit of creative rewriting to preserve the shadd, but without the frame pointer. Perhaps also looking for a way to fold the constants, both the explicit ones and the implicit one from FP elimination. This looks particularly promising: > Trying 237, 239 -> 240: > 237: r377:DI=r200:DI<<0x3+frame:DI > REG_DEAD r200:DI > 239: r380:DI=r513:DI+r377:DI > REG_DEAD r377:DI > REG_EQUAL r377:DI-0x8000 > 240: r204:DI=[r380:DI+0x118] > REG_DEAD r380:DI > Failed to match this instruction: > (set (reg/f:DI 204 [ _177 ]) > (mem/f:DI (plus:DI (plus:DI (plus:DI (mult:DI (reg:DI 200 [ _173 ]) > (const_int 8 [0x8])) > (reg/f:DI 65 frame)) > (reg:DI 513)) > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) We could reassociate this as t1 = r200 * 8 + r513 t2 = frame + 280 t3 = t1 + t2 r204 = *t3 Which after elimination would be t1 = r2000 * 8 + r513 t2 = sp + C + 280 t3 = t1 + t2 r204 = *t3 C + 280 will simplify. And we'll probably end up in the addptr3 case which I think gives us a chance to write this a bit so that we end up wit t1 = r200 * 8 + r513 t2 = sp + t1 r204 = *(t2 + 280 + C) Or at least I *think* we might be able to get there. Anyway, as I said, I think this deserves a bit of playing around before we jump straight into adding a new pass. jeff