From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by sourceware.org (Postfix) with ESMTPS id 1AEEB3858D33 for ; Mon, 12 Jun 2023 07:36:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1AEEB3858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2567b589d3bso1928678a91.0 for ; Mon, 12 Jun 2023 00:36:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1686555416; x=1689147416; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OsiChSMM+mHGUjO9WAdev4EsBBRsV6nPdxjFp9j0oWg=; b=a/0KpTalF1wxxoLdE2nchWa03gdANBDS6kYCwdtBSKfE3u5gkIqhe6Xjk3BJWiBfNH lwdOCARpu2ROxYjGcFIiLgmfTl9YHeGglfXWsID6d+jvrkm1xGetcvbLgJATrmHfg12V XUfHBbrqDUSlv8MQsqGO9fnVbYD4/Br3pgfSp0wWKvxC1uIZeR1oQXboDBLO7XXZnWTZ JpcJk4H5I3fWed6zK5kPVxgXO4l4YGVzlOwqQwB7UizRuA7UnIqaxhFXyqCCZlmeL/yB LPtrldtnXM7pu37TYCvyD9E4QutbXlXbuMMzfC5dnYTlRDv5MorBAhYBDZhZUKKIoNSs oJxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686555416; x=1689147416; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OsiChSMM+mHGUjO9WAdev4EsBBRsV6nPdxjFp9j0oWg=; b=V87FWVdDhV4JphOwZa3LrGJKjqsXCVCOWu5wI87RoClnQpJvgVHmDMn7GObhcf4fAX TStBpl0/fQ73/l8o2NJh1uV3PrwYi28O/pW6Iyu9ffiw/ZJGICd2bENcweApipED1UcL 6/acoiagk//BCZXvA13UZ+C/QcIBDzTJlFAt1t9ujgZ5dgbMaGTO9AoweNbYmJzKXWHe AMisKa8y0S1L39fzqKmhktRkrcoeiXD0DS0LtzPWqlueeB8kzfwpcMDkvSQOoqLiEwAs HhYZzgVc8DBkcitvgp0+5C+o65iA9yWpkP/D9/Kzf1r5PbflITZhiyfKs1ofpWAog0nD 8jhg== X-Gm-Message-State: AC+VfDwBYgeMUcW08+wFyelQWvRC7lgAPnpdzJ8bIoXdmOxJRcC2pQW0 TFSf5gzzB5gibrJScpaS13M/3VUynCTsTubYseLKq0TQkDtfMj5+/yE= X-Google-Smtp-Source: ACHHUZ6eFVFu8rWYJq+jEPuQ84dwGF4y7O5Wm/BCjknY01nATXjPs9HoQ4ndsBDEFHOJ4b1dP5Dh0PF+MP5oR3FpHXY= X-Received: by 2002:a17:90b:2345:b0:25b:da7f:9a53 with SMTP id ms5-20020a17090b234500b0025bda7f9a53mr2581074pjb.26.1686555416593; Mon, 12 Jun 2023 00:36:56 -0700 (PDT) MIME-Version: 1.0 References: <20230525123550.1072506-1-manolis.tsamis@vrull.eu> <20230525123550.1072506-2-manolis.tsamis@vrull.eu> <91d71dae-b235-fbd0-c8f0-001b7f1e444c@gmail.com> In-Reply-To: <91d71dae-b235-fbd0-c8f0-001b7f1e444c@gmail.com> From: Manolis Tsamis Date: Mon, 12 Jun 2023 10:36:20 +0300 Message-ID: Subject: Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets. To: Jeff Law Cc: gcc-patches@gcc.gnu.org, Richard Biener , Palmer Dabbelt , Philipp Tomsich , Kito Cheng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Jun 8, 2023 at 8:37=E2=80=AFAM Jeff Law wro= te: > > > > On 5/25/23 06:35, Manolis Tsamis wrote: > > Implementation of the new RISC-V optimization pass for memory offset > > calculations, documentation and testcases. > > > > gcc/ChangeLog: > > > > * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs. > > * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a n= ew > > pass. > > * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Decla= re. > > * config/riscv/riscv.opt: New options. > > * config/riscv/t-riscv: New build rule. > > * doc/invoke.texi: Document new option. > > * config/riscv/riscv-fold-mem-offsets.cc: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/fold-mem-offsets-1.c: New test. > > * gcc.target/riscv/fold-mem-offsets-2.c: New test. > > * gcc.target/riscv/fold-mem-offsets-3.c: New test. > So not going into the guts of the patch yet. > > From a benchmark standpoint the only two that get out of the +-0.05% > range are mcf and deepsjeng (from a dynamic instruction standpoint). So > from an evaluation standpoint we can probably focus our efforts there. > And as we know, mcf is actually memory bound, so while improving its > dynamic instruction count is good, the end performance improvement may > be marginal. > Even if late, one question for the dynamic instruction numbers. Was this measured just with f-m-o or with the stack pointer fold patch applied too? I remember I was getting better improvements in the past, but most of the cases had to do with the stack pointer so the second patch is necessary. > As I mentioned to Philipp many months ago this reminds me a lot of a > problem I've seen before. Basically register elimination emits code > that can be terrible in some circumstances. So I went and poked at this > again. > > I think the key difference between now and what I was dealing with > before is for the cases that really matter for rv64 we have a shNadd > insn in the sequence. That private port I was working on before did not > have shNadd (don't ask, I probably can't tell). Our target also had > reg+reg addressing modes. What I can't remember was if we were trying > harder to fold the constant terms into the memory reference or if we > were more focused on the reg+reg. Ultimately it's probably not that > important to remember -- the key is there are very significant > differences in the target's capabilities which impact how we should be > generating code in this case. Those differences affect the code we > generate *and* the places where we can potentially get control and do > some address rewriting. > > A key sequence in mcf looks something like this in IRA, others have > similar structure: > > > (insn 237 234 239 26 (set (reg:DI 377) > > (plus:DI (ashift:DI (reg:DI 200 [ _173 ]) > > (const_int 3 [0x3])) > > (reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd} > > (nil)) > > (insn 239 237 235 26 (set (reg/f:DI 380) > > (plus:DI (reg:DI 513) > > (reg:DI 377))) "pbeampp.c":139:15 5 {adddi3} > > (expr_list:REG_DEAD (reg:DI 377) > > (expr_list:REG_EQUAL (plus:DI (reg:DI 377) > > (const_int -32768 [0xffffffffffff8000])) > > (nil)))) > [ ... ] > > (insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ]) > > (mem/f:DI (plus:DI (reg/f:DI 380) > > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.= c":139:15 179 {*movdi_64bit} > > (expr_list:REG_DEAD (reg/f:DI 380) > > (nil))) > > > The key here is insn 237. It's generally going to be bad to have FP > show up in a shadd insn because its going to be eliminated into > sp+offset. That'll generate an input reload before insn 237 and we > can't do any combination with the constant in insn 239. > > After LRA it looks like this: > > > (insn 1540 234 1541 26 (set (reg:DI 11 a1 [750]) > > (const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bi= t} > > (nil)) > > (insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749]) > > (plus:DI (reg:DI 11 a1 [750]) > > (const_int -272 [0xfffffffffffffef0]))) "pbeampp.c":139:15 = 5 {adddi3} > > (expr_list:REG_EQUAL (const_int 32496 [0x7ef0]) > > (nil))) > > (insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795]) > > (plus:DI (reg/f:DI 2 sp) > > (const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3} > > (nil)) > > (insn 1542 1611 237 26 (set (reg:DI 12 a2 [749]) > > (plus:DI (reg:DI 12 a2 [749]) > > (reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3} > > (nil)) > > (insn 237 1542 239 26 (set (reg:DI 12 a2 [377]) > > (plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200]) > > (const_int 3 [0x3])) > > (reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd} > > (nil)) > > (insn 239 237 235 26 (set (reg/f:DI 12 a2 [380]) > > (plus:DI (reg:DI 10 a0 [513]) > > (reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3} > > (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377]) > > (const_int -32768 [0xffffffffffff8000])) > > (nil))) > [ ... ] > > (insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204]) > > (mem/f:DI (plus:DI (reg/f:DI 12 a2 [380]) > > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.= c":139:15 179 {*movdi_64bit} > > (nil)) > > > Reload/LRA made an absolute mess of that code. > > But before we add a new pass (target specific or generic), I think it > may be in our best interest experiment a bit of creative rewriting to > preserve the shadd, but without the frame pointer. Perhaps also looking > for a way to fold the constants, both the explicit ones and the implicit > one from FP elimination. > > This looks particularly promising: > > > Trying 237, 239 -> 240: > > 237: r377:DI=3Dr200:DI<<0x3+frame:DI > > REG_DEAD r200:DI > > 239: r380:DI=3Dr513:DI+r377:DI > > REG_DEAD r377:DI > > REG_EQUAL r377:DI-0x8000 > > 240: r204:DI=3D[r380:DI+0x118] > > REG_DEAD r380:DI > > Failed to match this instruction: > > (set (reg/f:DI 204 [ _177 ]) > > (mem/f:DI (plus:DI (plus:DI (plus:DI (mult:DI (reg:DI 200 [ _173 ]) > > (const_int 8 [0x8])) > > (reg/f:DI 65 frame)) > > (reg:DI 513)) > > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) > > > We could reassociate this as > > t1 =3D r200 * 8 + r513 > t2 =3D frame + 280 > t3 =3D t1 + t2 > r204 =3D *t3 > > Which after elimination would be > > t1 =3D r2000 * 8 + r513 > t2 =3D sp + C + 280 > t3 =3D t1 + t2 > r204 =3D *t3 > > C + 280 will simplify. And we'll probably end up in the addptr3 case > which I think gives us a chance to write this a bit so that we end up wit > t1 =3D r200 * 8 + r513 > t2 =3D sp + t1 > r204 =3D *(t2 + 280 + C) > > > Or at least I *think* we might be able to get there. Anyway, as I said, > I think this deserves a bit of playing around before we jump straight > into adding a new pass. > > jeff >