From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe29.google.com (mail-vs1-xe29.google.com [IPv6:2607:f8b0:4864:20::e29]) by sourceware.org (Postfix) with ESMTPS id 74CF93857C51 for ; Wed, 30 Jun 2021 08:48:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 74CF93857C51 Received: by mail-vs1-xe29.google.com with SMTP id x1so1255193vsc.1 for ; Wed, 30 Jun 2021 01:48:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=lkLtgmWSAQJ6gT+Tfe7aCFy5FwPEwBpwBHw03XWasvM=; b=Esj/MU0fC4NmcecGEVWJEPq5RI/3StUDHlDSD8pbIbXFdm5i5TlzzuFppdY3OAGOaV KVmiG7ZrXqoXycis4gf/W7VQXkvofJAyzyT34SbQ/k/qg3sSKWqlQBcD4l1hkn9cRvPo /Mf9RcpQoHnGy3LmkJk01SFIY7h6ZEg0kG+QeLbEmAswBDIB5S72REpJAk56BHjZcdKz BSNQ4iazRleSOsA3w4Nt8kALbzuOTht+w9Jn0Qr1MUvchpNBDpA2zNAPJDF+JdAb1P6i htHvoLwFZcvtIeEbxLgsif64GYDpWl07yVboyDEtudX329j+DsDtgbTvUNrAR7m1MXzz oYOQ== X-Gm-Message-State: AOAM533craq1yx2o0A0/t+oU7YbUDJdXl2znxfV5IluCqb7A7QBItGtp e6eGxcouNWVfQNg2uABryeqpVZmeXydhRvuCFME= X-Google-Smtp-Source: ABdhPJyJHj4tP1Gfp76hlS+XdbbaUysbSSM87hQBYoB14FfTm9vhOfdl/gflJykF8ikloKqe4CIb64KVuLyL1ui40uQ= X-Received: by 2002:a67:1906:: with SMTP id 6mr29011387vsz.6.1625042927992; Wed, 30 Jun 2021 01:48:47 -0700 (PDT) MIME-Version: 1.0 References: <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com> In-Reply-To: From: Hongtao Liu Date: Wed, 30 Jun 2021 16:53:54 +0800 Message-ID: Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328] To: "Kewen.Lin" Cc: GCC Patches , Vladimir Makarov , bergner@linux.ibm.com, Bill Schmidt , Segher Boessenkool , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jun 2021 08:48:50 -0000 On Mon, Jun 28, 2021 at 3:27 PM Kewen.Lin wrote: > > on 2021/6/28 =E4=B8=8B=E5=8D=883:20, Hongtao Liu wrote: > > On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu wrote: > >> > >> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin wrote: > >>> > >>> Hi! > >>> > >>> on 2021/6/9 =E4=B8=8B=E5=8D=881:18, Kewen.Lin via Gcc-patches wrote: > >>>> Hi, > >>>> > >>>> PR100328 has some details about this issue, I am trying to > >>>> brief it here. In the hottest function LBM_performStreamCollideTRT > >>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions > >>>> (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style > >>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg > >>>> class have 64 registers whose foregoing 32 ones make up the > >>>> whole FLOAT_REG. There are some differences for these two > >>>> flavors, taking "*fma4_fpr" as example: > >>>> > >>>> (define_insn "*fma4_fpr" > >>>> [(set (match_operand:SFDF 0 "gpc_reg_operand" "=3D,wa,wa") > >>>> (fma:SFDF > >>>> (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa") > >>>> (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0") > >>>> (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))] > >>>> > >>>> // wa =3D> A VSX register (VSR), vs0=E2=80=A6vs63, aka. VSX_REG. > >>>> // (f/d) =3D> A floating point register, aka. FLOAT_REG. > >>>> > >>>> So for VSX_REG, we only have the destructive form, when VSX_REG > >>>> alternative being used, the operand 2 or operand 3 is required > >>>> to be the same as operand 0. reload has to take care of this > >>>> constraint and create some non-free register copies if required. > >>>> > >>>> Assuming one fma insn looks like: > >>>> op0 =3D FMA (op1, op2, op3) > >>>> > >>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead= , > >>>> IRA simply creates three shuffle copies for them (here the operand > >>>> order matters, since with the same freq, the one with smaller number > >>>> takes preference), but IMO both op2 and op3 should take higher prior= ity > >>>> in copy queue due to the matching constraint. > >>>> > >>>> I noticed that there is one function ira_get_dup_out_num, which mean= t > >>>> to create this kind of constraint copy, but the below code looks to > >>>> refuse to create if there is an alternative which has valid regclass > >>>> without spilled need. > >>>> > >>>> default: > >>>> { > >>>> enum constraint_num cn =3D lookup_constraint (str); > >>>> enum reg_class cl =3D reg_class_for_constraint (cn); > >>>> if (cl !=3D NO_REGS > >>>> && !targetm.class_likely_spilled_p (cl)) > >>>> goto fail > >>>> > >>>> ... > >>>> > >>>> I cooked one patch attached to make ira respect this kind of matchin= g > >>>> constraint guarded with one parameter. As I stated in the PR, I was > >>>> not sure this is on the right track. The RFC patch is to check the > >>>> matching constraint in all alternatives, if there is one alternative > >>>> with matching constraint and matches the current preferred regclass > >>>> (or best of allocno?), it will record the output operand number and > >>>> further create one constraint copy for it. Normally it can get the > >>>> priority against shuffle copies and the matching constraint will get > >>>> satisfied with higher possibility, reload doesn't create extra copie= s > >>>> to meet the matching constraint or the desirable register class when > >>>> it has to. > >>>> > >>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly st= ay > >>>> as shuffle copies, and later any of A,B,C,D gets assigned by one > >>>> hardware register which is a VSX register (VSX_REG) but not a FP > >>>> register (FLOAT_REG), which means it has to pay costs once we can NO= T > >>>> go with VSX alternatives, so at that time it's important to respect > >>>> the matching constraint then we can increase the freq for the remain= ing > >>>> copies related to this (A/B, A/C, A/D). This idea requires some sid= e > >>>> tables to record some information and seems a bit complicated in the > >>>> current framework, so the proposed patch aggressively emphasizes the > >>>> matching constraint at the time of creating copies. > >>>> > >>> > >>> Comparing with the original patch (v1), this patch v3 has > >>> considered: (this should be v2 for this mail list, but bump > >>> it to be consistent as PR's). > >>> > >>> - Excluding the case where for one preferred register class > >>> there can be two or more alternatives, one of them has the > >>> matching constraint, while another doesn't have. So for > >>> the given operand, even if it's assigned by a hardware reg > >>> which doesn't meet the matching constraint, it can simply > >>> use the alternative which doesn't have matching constraint > >>> so no register move is needed. One typical case is > >>> define_insn *mov_internal2 on rs6000. So we > >>> shouldn't create constraint copy for it. > >>> > >>> - The possible free register move in the same register class, > >>> disable this if so since the register move to meet the > >>> constraint is considered as free. > >>> > >>> - Making it on by default, suggested by Segher & Vladimir, we > >>> hope to get rid of the parameter if the benchmarking result > >>> looks good on major targets. > >>> > >>> - Tweaking cost when either of matching constraint two sides > >>> is hardware register. Before this patch, the constraint > >>> copy is simply taken as a real move insn for pref and > >>> conflict cost with one hardware register, after this patch, > >>> it's allowed that there are several input operands > >>> respecting the same matching constraint (but in different > >>> alternatives), so we should take it to be like shuffle copy > >>> for some cases to avoid over preferring/disparaging. > >>> > >>> Please check the PR comments for more details. > >>> > >>> This patch can be bootstrapped & regtested on > >>> powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some > >>> "XFAIL->XPASS" failures on aarch64-linux-gnu. The failure list > >>> was attached in the PR and thought the new assembly looks > >>> improved (expected). > >>> > >>> With option Ofast unroll, this patch can help to improve SPEC2017 > >>> bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while > >>> 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any > >>> remarkable degradations. Here's SPEC2017 rate result tested on AMD milan option is: -march=3Dznver2 -Ofast -funroll-loops -mfpmath=3Dsse -flto fprate: 503.bwaves_r 0.01 (A) shliclel219 507.cactuBSSN_r -0.19 (A) shliclel219 508.namd_r 0.02 (A) shliclel219 510.parest_r -0.68 (A) shliclel219 511.povray_r 1.59 (A) shliclel219 521.wrf_r 0.19 (A) shliclel219 526.blender_r 0.68 (A) shliclel219 527.cam4_r -0.30 (A) shliclel219 538.imagick_r -3.81 <- (A) shliclel219 544.nab_r 0.02 (A) shliclel219 549.fotonik3d_r 0.02 (A) shliclel219 554.roms_r -0.43 (A) shliclel219 997.specrand_fr -3.80 <- (A) shliclel219 Geometric mean: -0.52 intrate: 500.perlbench_r -1.54 (A) shliclel219 502.gcc_r -0.38 (A) shliclel219 505.mcf_r -0.10 (A) shliclel219 520.omnetpp_r -0.24 (A) shliclel219 523.xalancbmk_r -1.04 (A) shliclel219 525.x264_r 0.31 (A) shliclel219 531.deepsjeng_r -0.02 (A) shliclel219 541.leela_r 0.95 (A) shliclel219 548.exchange2_r 0.08 (A) shliclel219 557.xz_r -0.40 (A) shliclel219 Geometric mean: -0.24 > >>> > >>> Since this patch likely benefits x86_64 and aarch64, but I don't > >>> have performance machines with these arches at hand, could > >>> someone kindly help to benchmark it if possible? > >> I can help test it on Intel cascade lake and AMD milan. > > > Thanks for your help, Hongtao! > > > > And could you rebase your patch on the lastest trunk, i got several > > failures when applying the patch > > ~ git apply ira-v3.diff > > error: patch failed: gcc/doc/invoke.texi:13845 > > error: gcc/doc/invoke.texi: patch does not apply > > error: patch failed: gcc/ira-conflicts.c:233 > > error: gcc/ira-conflicts.c: patch does not apply > > error: patch failed: gcc/ira-int.h:971 > > error: gcc/ira-int.h: patch does not apply > > error: patch failed: gcc/ira.c:1922 > > error: gcc/ira.c: patch does not apply > > error: patch failed: gcc/params.opt:330 > > error: gcc/params.opt: patch does not apply > > > > I think it's due to unexpected git stat lines in previously attached diff= . > > I have attached the format-patch file. Please have a check. Thanks! > > > BR, > Kewen --=20 BR, Hongtao