From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) by sourceware.org (Postfix) with ESMTPS id 648673858D3C for ; Tue, 25 Jul 2023 11:24:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 648673858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oi1-x22d.google.com with SMTP id 5614622812f47-3a3efee1d44so3972706b6e.3 for ; Tue, 25 Jul 2023 04:24:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690284297; x=1690889097; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=AU2v+GEwAO5cXnLOGyC7tH8tXSUJ7ALdpgng2c1b/OA=; b=C33Wn4ZDRqAv36wqJPnB1kU4m1b1PCkbOT9SYJUTDV4ZR4oH0Rwy1hQUvJBLlF1fgP 1dyALGx+M5M/mZHR0I06wRgG/FG2psoHq4Z2ek2mnQVxmEGP+d1wwCSMucQ3uYQ3Hbwc H+J1MO/zbZSJG7S+GvyUS+PQeLpgyD0IGjAvEo95Bt7ks5BeD93jpGCfklECPzf3+ZXN NhboOzq2djc+rQqP63Le8g5l9Hol8RmeJXThMT6ATKQDZ/KWNYbihcQ+YF+DhRlfi7Vt rLAsVZnL/LjUjQZefT/ovoPf2PIHNm4SyvYjIFwER5eRWYPFYgvOcgCxhBc2eakFRW/Q xxTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690284297; x=1690889097; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AU2v+GEwAO5cXnLOGyC7tH8tXSUJ7ALdpgng2c1b/OA=; b=iJrFpdsuzZcchpF38YenP0N3Mbo++CZNxr+mR2OxMIQoAHylhCmOpBMNmlgbocZX9C VEvIpdC8Ub161UwL+gRTMZUYcXmtLVL31B9W69WDctJLK0EdZXxG+9erKqLTIpJhXsW/ bi6PMfweO53EdivjwCV7GuP3OnOe3CFLWEDLR3z7IRzjYTtcihhgXi1yO6T0fRVbFUti 4oBvXj3A866dj7lKExuudozUyL/dZMwRDyHw2y/IHE0pmQmjhW2KnF4rmRgKHY8U/l3z UKFYuxlacio68HxE3LW3/H0XRYQba59H7t8ekaQjVtv79AN+h2t8mZjEGg7j4k+3hbqa T3Bg== X-Gm-Message-State: ABy/qLYk4vaKvTSYg7P8qnh9XjDb4CX+T+ahBwbVWkVp7bLBe7WvlV81 k1JLJnyfOIsBNH246ki3fD9sWPzUigkRbN0FhhQ= X-Google-Smtp-Source: APBJJlHAZ3rZ9pB9w16EF1UhQv7JuZm7tDsWCJGkOQuzkFBSHo+ibwiwYxqHi2l2excytJRH5v5k1lMrvoXO5FM//Do= X-Received: by 2002:a05:6808:f8e:b0:397:ec35:f5a6 with SMTP id o14-20020a0568080f8e00b00397ec35f5a6mr13320437oiw.57.1690284297187; Tue, 25 Jul 2023 04:24:57 -0700 (PDT) MIME-Version: 1.0 References: <3c1f0f8a-34ed-abb2-8a49-3083a2cc55d2@gmail.com> In-Reply-To: <3c1f0f8a-34ed-abb2-8a49-3083a2cc55d2@gmail.com> From: Jivan Hakobyan Date: Tue, 25 Jul 2023 15:24:40 +0400 Message-ID: Subject: Re: RISC-V: Folding memory for FP + constant case To: Jeff Law Cc: gcc-patches@gcc.gnu.org Content-Type: multipart/alternative; boundary="0000000000001a7ef206014dfb78" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000001a7ef206014dfb78 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi. I re-run the benchmarks and hopefully got the same profit. I also compared the leela's code and figured out the reason. Actually, my and Manolis's patches do the same thing. The difference is only execution order. Because of f-m-o held after the register allocation it cannot eliminate redundant move 'sp' to another register. Here is an example. int core_bench_state(int *ptr) { > int final_counts[100] =3D {0}; while (*ptr) { > int id =3D foo(); > final_counts[id]++; > ptr++; > } return final_counts[0]; > } For this loop, the f-m-o pass generates the following. .L3: call foo * mv a5,sp* sh2add a0,a0,a5 lw a5,0(a0) lw a4,4(s0) addi s0,s0,4 addiw a5,a5,1 sw a5,0(a0) bne a4,zero,.L3 Here '*mv a5, sp*' instruction is redundant. Leela's FastState::try_move() function has a loop that iterates over 1.3 B times and contains 5 memory folding cases (5 redundant moves). Besides that, I have checked the build failure on x264_r. It is already fixed on the third version. On Sat, Jul 15, 2023 at 10:16=E2=80=AFAM Jeff Law w= rote: > > > On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote: > > Accessing local arrays element turned into load form (fp + (index << > > C1)) + C2 address. In the case when access is in the loop we got loop > > invariant computation. For some reason, moving out that part cannot > > be done in loop-invariant passes. But we can handle that in > > target-specific hook (legitimize_address). That provides an > > opportunity to rewrite memory access more suitable for the target > > architecture. > > > > This patch solves the mentioned case by rewriting mentioned case to > > ((fp + C2) + (index << C1)) I have evaluated it on SPEC2017 and got > > an improvement on leela (over 7b instructions, .39% of the dynamic > > count) and dwarfs the regression for gcc (14m instructions, .0012% of > > the dynamic count). > > > > > > gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_address): > > Handle folding. (mem_shadd_or_shadd_rtx_p): New predicate. > So I still need to give the new version a review. But a high level > question -- did you re-run the benchmarks with this version to verify > that we still saw the same nice improvement in leela? > > The reason I ask is when I use this on Ventana's internal tree I don't > see any notable differences in the dynamic instruction counts. And > probably the most critical difference between the upstream tree and > Ventana's tree in this space is Ventana's internal tree has an earlier > version of the fold-mem-offsets work from Manolis. > > It may ultimately be the case that this work and Manolis's f-m-o patch > have a lot of overlap in terms of their final effect on code generation. > Manolis's pass runs much later (after register allocation), so it's > not going to address the loop-invariant-code-motion issue that > originally got us looking into this space. But his pass is generic > enough that it helps other targets. So we may ultimately want both. > > Anyway, just wanted to verify if this variant is still showing the nice > improvement on leela that the prior version did. > > Jeff > > ps. I know you're on PTO. No rush on responding -- enjoy the time off. > > --=20 With the best regards Jivan Hakobyan --0000000000001a7ef206014dfb78--