From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D2A9C3969809 for ; Wed, 30 Jun 2021 15:42:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D2A9C3969809 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 45C4A106F; Wed, 30 Jun 2021 08:42:34 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 33E633F718; Wed, 30 Jun 2021 08:42:33 -0700 (PDT) From: Richard Sandiford To: "Kewen.Lin" Mail-Followup-To: "Kewen.Lin" , Hongtao Liu , GCC Patches , Vladimir Makarov , bergner@linux.ibm.com, Bill Schmidt , Segher Boessenkool , richard.sandiford@arm.com Cc: Hongtao Liu , GCC Patches , Vladimir Makarov , bergner@linux.ibm.com, Bill Schmidt , Segher Boessenkool Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328] References: <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com> Date: Wed, 30 Jun 2021 16:42:32 +0100 In-Reply-To: (Kewen Lin's message of "Mon, 28 Jun 2021 15:27:19 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jun 2021 15:42:39 -0000 "Kewen.Lin" writes: > on 2021/6/28 =E4=B8=8B=E5=8D=883:20, Hongtao Liu wrote: >> On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu wrote: >>> >>> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin wrote: >>>> >>>> Hi! >>>> >>>> on 2021/6/9 =E4=B8=8B=E5=8D=881:18, Kewen.Lin via Gcc-patches wrote: >>>>> Hi, >>>>> >>>>> PR100328 has some details about this issue, I am trying to >>>>> brief it here. In the hottest function LBM_performStreamCollideTRT >>>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions >>>>> (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style >>>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg >>>>> class have 64 registers whose foregoing 32 ones make up the >>>>> whole FLOAT_REG. There are some differences for these two >>>>> flavors, taking "*fma4_fpr" as example: >>>>> >>>>> (define_insn "*fma4_fpr" >>>>> [(set (match_operand:SFDF 0 "gpc_reg_operand" "=3D,wa,wa") >>>>> (fma:SFDF >>>>> (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa") >>>>> (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0") >>>>> (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))] >>>>> >>>>> // wa =3D> A VSX register (VSR), vs0=E2=80=A6vs63, aka. VSX_REG. >>>>> // (f/d) =3D> A floating point register, aka. FLOAT_REG. >>>>> >>>>> So for VSX_REG, we only have the destructive form, when VSX_REG >>>>> alternative being used, the operand 2 or operand 3 is required >>>>> to be the same as operand 0. reload has to take care of this >>>>> constraint and create some non-free register copies if required. >>>>> >>>>> Assuming one fma insn looks like: >>>>> op0 =3D FMA (op1, op2, op3) >>>>> >>>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead, >>>>> IRA simply creates three shuffle copies for them (here the operand >>>>> order matters, since with the same freq, the one with smaller number >>>>> takes preference), but IMO both op2 and op3 should take higher priori= ty >>>>> in copy queue due to the matching constraint. >>>>> >>>>> I noticed that there is one function ira_get_dup_out_num, which meant >>>>> to create this kind of constraint copy, but the below code looks to >>>>> refuse to create if there is an alternative which has valid regclass >>>>> without spilled need. >>>>> >>>>> default: >>>>> { >>>>> enum constraint_num cn =3D lookup_constraint (str); >>>>> enum reg_class cl =3D reg_class_for_constraint (cn); >>>>> if (cl !=3D NO_REGS >>>>> && !targetm.class_likely_spilled_p (cl)) >>>>> goto fail >>>>> >>>>> ... >>>>> >>>>> I cooked one patch attached to make ira respect this kind of matching >>>>> constraint guarded with one parameter. As I stated in the PR, I was >>>>> not sure this is on the right track. The RFC patch is to check the >>>>> matching constraint in all alternatives, if there is one alternative >>>>> with matching constraint and matches the current preferred regclass >>>>> (or best of allocno?), it will record the output operand number and >>>>> further create one constraint copy for it. Normally it can get the >>>>> priority against shuffle copies and the matching constraint will get >>>>> satisfied with higher possibility, reload doesn't create extra copies >>>>> to meet the matching constraint or the desirable register class when >>>>> it has to. >>>>> >>>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay >>>>> as shuffle copies, and later any of A,B,C,D gets assigned by one >>>>> hardware register which is a VSX register (VSX_REG) but not a FP >>>>> register (FLOAT_REG), which means it has to pay costs once we can NOT >>>>> go with VSX alternatives, so at that time it's important to respect >>>>> the matching constraint then we can increase the freq for the remaini= ng >>>>> copies related to this (A/B, A/C, A/D). This idea requires some side >>>>> tables to record some information and seems a bit complicated in the >>>>> current framework, so the proposed patch aggressively emphasizes the >>>>> matching constraint at the time of creating copies. >>>>> >>>> >>>> Comparing with the original patch (v1), this patch v3 has >>>> considered: (this should be v2 for this mail list, but bump >>>> it to be consistent as PR's). >>>> >>>> - Excluding the case where for one preferred register class >>>> there can be two or more alternatives, one of them has the >>>> matching constraint, while another doesn't have. So for >>>> the given operand, even if it's assigned by a hardware reg >>>> which doesn't meet the matching constraint, it can simply >>>> use the alternative which doesn't have matching constraint >>>> so no register move is needed. One typical case is >>>> define_insn *mov_internal2 on rs6000. So we >>>> shouldn't create constraint copy for it. >>>> >>>> - The possible free register move in the same register class, >>>> disable this if so since the register move to meet the >>>> constraint is considered as free. >>>> >>>> - Making it on by default, suggested by Segher & Vladimir, we >>>> hope to get rid of the parameter if the benchmarking result >>>> looks good on major targets. >>>> >>>> - Tweaking cost when either of matching constraint two sides >>>> is hardware register. Before this patch, the constraint >>>> copy is simply taken as a real move insn for pref and >>>> conflict cost with one hardware register, after this patch, >>>> it's allowed that there are several input operands >>>> respecting the same matching constraint (but in different >>>> alternatives), so we should take it to be like shuffle copy >>>> for some cases to avoid over preferring/disparaging. >>>> >>>> Please check the PR comments for more details. >>>> >>>> This patch can be bootstrapped & regtested on >>>> powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some >>>> "XFAIL->XPASS" failures on aarch64-linux-gnu. The failure list >>>> was attached in the PR and thought the new assembly looks >>>> improved (expected). >>>> >>>> With option Ofast unroll, this patch can help to improve SPEC2017 >>>> bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while >>>> 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any >>>> remarkable degradations. >>>> >>>> Since this patch likely benefits x86_64 and aarch64, but I don't >>>> have performance machines with these arches at hand, could >>>> someone kindly help to benchmark it if possible? >>> I can help test it on Intel cascade lake and AMD milan. > > > Thanks for your help, Hongtao! > > >> And could you rebase your patch on the lastest trunk, i got several >> failures when applying the patch >> ~ git apply ira-v3.diff >> error: patch failed: gcc/doc/invoke.texi:13845 >> error: gcc/doc/invoke.texi: patch does not apply >> error: patch failed: gcc/ira-conflicts.c:233 >> error: gcc/ira-conflicts.c: patch does not apply >> error: patch failed: gcc/ira-int.h:971 >> error: gcc/ira-int.h: patch does not apply >> error: patch failed: gcc/ira.c:1922 >> error: gcc/ira.c: patch does not apply >> error: patch failed: gcc/params.opt:330 >> error: gcc/params.opt: patch does not apply >>=20 > > I think it's due to unexpected git stat lines in previously attached diff. > > I have attached the format-patch file. Please have a check. Thanks! FWIW, this seems to be neutral for SPEC 2017 on AArch64. The SVE XFAIL->XPASS transitions mean it's definitely a good thing for AArch64 in that respect though. Thanks, Richard