From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x935.google.com (mail-ua1-x935.google.com [IPv6:2607:f8b0:4864:20::935]) by sourceware.org (Postfix) with ESMTPS id 54BCF3839C6B for ; Mon, 28 Jun 2021 07:15:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 54BCF3839C6B Received: by mail-ua1-x935.google.com with SMTP id e7so6572590uaj.11 for ; Mon, 28 Jun 2021 00:15:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FsaWIl/VD0MvqAWsvbLNoMlmP/ZuuMShSapJALJUk2M=; b=SkgUPPlx0rjymiWj0ik3T5hFvFawK74Fcc/EfY8ABCrg22MvY/8YOY5qnU8McEVDCC nY6puAc1srpFkbyIs4vGKMdGGoQy4NUJaEOlkp/7eMS6jEiAJBMBsnRqZEtPoF5ybP19 xTYAYNf7IQB5Nym8Ppxu8Qa0c5y6EySBh3Yo0kUTSVsogGJRbwhW0r8pewAokaPopvVv kBWGmyMYYK+U/qJ3y4y5Z0TIqHumLM7gz4W+us0BkcBk3QxF3Nlh1zkeTfqHThXODo4H M+UmxPCfQzbr8SpKr0ekav6xMm7COiAmbMrohN8kloEGAyLj3HxopezK1JWjLRv6GBCq qKZQ== X-Gm-Message-State: AOAM5307Yr330PnPFXecTf0RCCCn7nq2BU4KDyN6yGmMkv+mnF/IaI3h ewz1/7wULvKdEs92xTIhY4VT1H/35NkzcIfG704= X-Google-Smtp-Source: ABdhPJxnCVMT8ujrbLCgaTeiXcHJ6ETBgkd6VwV9xnJspbrw1t79LhI3OIuSYb9sJYRAK9SBRNd7XvISs8nEf53qsRg= X-Received: by 2002:ab0:185a:: with SMTP id j26mr18717366uag.33.1624864517326; Mon, 28 Jun 2021 00:15:17 -0700 (PDT) MIME-Version: 1.0 References: <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com> In-Reply-To: From: Hongtao Liu Date: Mon, 28 Jun 2021 15:20:27 +0800 Message-ID: Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328] To: "Kewen.Lin" Cc: GCC Patches , Vladimir Makarov , bergner@linux.ibm.com, Bill Schmidt , Segher Boessenkool , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2021 07:15:20 -0000 On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu wrote: > > On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin wrote: > > > > Hi! > > > > on 2021/6/9 =E4=B8=8B=E5=8D=881:18, Kewen.Lin via Gcc-patches wrote: > > > Hi, > > > > > > PR100328 has some details about this issue, I am trying to > > > brief it here. In the hottest function LBM_performStreamCollideTRT > > > of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions > > > (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style > > > insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg > > > class have 64 registers whose foregoing 32 ones make up the > > > whole FLOAT_REG. There are some differences for these two > > > flavors, taking "*fma4_fpr" as example: > > > > > > (define_insn "*fma4_fpr" > > > [(set (match_operand:SFDF 0 "gpc_reg_operand" "=3D,wa,wa") > > > (fma:SFDF > > > (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa") > > > (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0") > > > (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))] > > > > > > // wa =3D> A VSX register (VSR), vs0=E2=80=A6vs63, aka. VSX_REG. > > > // (f/d) =3D> A floating point register, aka. FLOAT_REG. > > > > > > So for VSX_REG, we only have the destructive form, when VSX_REG > > > alternative being used, the operand 2 or operand 3 is required > > > to be the same as operand 0. reload has to take care of this > > > constraint and create some non-free register copies if required. > > > > > > Assuming one fma insn looks like: > > > op0 =3D FMA (op1, op2, op3) > > > > > > The best regclass of them are VSX_REG, when op1,op2,op3 are all dead, > > > IRA simply creates three shuffle copies for them (here the operand > > > order matters, since with the same freq, the one with smaller number > > > takes preference), but IMO both op2 and op3 should take higher priori= ty > > > in copy queue due to the matching constraint. > > > > > > I noticed that there is one function ira_get_dup_out_num, which meant > > > to create this kind of constraint copy, but the below code looks to > > > refuse to create if there is an alternative which has valid regclass > > > without spilled need. > > > > > > default: > > > { > > > enum constraint_num cn =3D lookup_constraint (str); > > > enum reg_class cl =3D reg_class_for_constraint (cn); > > > if (cl !=3D NO_REGS > > > && !targetm.class_likely_spilled_p (cl)) > > > goto fail > > > > > > ... > > > > > > I cooked one patch attached to make ira respect this kind of matching > > > constraint guarded with one parameter. As I stated in the PR, I was > > > not sure this is on the right track. The RFC patch is to check the > > > matching constraint in all alternatives, if there is one alternative > > > with matching constraint and matches the current preferred regclass > > > (or best of allocno?), it will record the output operand number and > > > further create one constraint copy for it. Normally it can get the > > > priority against shuffle copies and the matching constraint will get > > > satisfied with higher possibility, reload doesn't create extra copies > > > to meet the matching constraint or the desirable register class when > > > it has to. > > > > > > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly sta= y > > > as shuffle copies, and later any of A,B,C,D gets assigned by one > > > hardware register which is a VSX register (VSX_REG) but not a FP > > > register (FLOAT_REG), which means it has to pay costs once we can NOT > > > go with VSX alternatives, so at that time it's important to respect > > > the matching constraint then we can increase the freq for the remaini= ng > > > copies related to this (A/B, A/C, A/D). This idea requires some side > > > tables to record some information and seems a bit complicated in the > > > current framework, so the proposed patch aggressively emphasizes the > > > matching constraint at the time of creating copies. > > > > > > > Comparing with the original patch (v1), this patch v3 has > > considered: (this should be v2 for this mail list, but bump > > it to be consistent as PR's). > > > > - Excluding the case where for one preferred register class > > there can be two or more alternatives, one of them has the > > matching constraint, while another doesn't have. So for > > the given operand, even if it's assigned by a hardware reg > > which doesn't meet the matching constraint, it can simply > > use the alternative which doesn't have matching constraint > > so no register move is needed. One typical case is > > define_insn *mov_internal2 on rs6000. So we > > shouldn't create constraint copy for it. > > > > - The possible free register move in the same register class, > > disable this if so since the register move to meet the > > constraint is considered as free. > > > > - Making it on by default, suggested by Segher & Vladimir, we > > hope to get rid of the parameter if the benchmarking result > > looks good on major targets. > > > > - Tweaking cost when either of matching constraint two sides > > is hardware register. Before this patch, the constraint > > copy is simply taken as a real move insn for pref and > > conflict cost with one hardware register, after this patch, > > it's allowed that there are several input operands > > respecting the same matching constraint (but in different > > alternatives), so we should take it to be like shuffle copy > > for some cases to avoid over preferring/disparaging. > > > > Please check the PR comments for more details. > > > > This patch can be bootstrapped & regtested on > > powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some > > "XFAIL->XPASS" failures on aarch64-linux-gnu. The failure list > > was attached in the PR and thought the new assembly looks > > improved (expected). > > > > With option Ofast unroll, this patch can help to improve SPEC2017 > > bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while > > 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any > > remarkable degradations. > > > > Since this patch likely benefits x86_64 and aarch64, but I don't > > have performance machines with these arches at hand, could > > someone kindly help to benchmark it if possible? > I can help test it on Intel cascade lake and AMD milan. And could you rebase your patch on the lastest trunk, i got several failures when applying the patch ~ git apply ira-v3.diff error: patch failed: gcc/doc/invoke.texi:13845 error: gcc/doc/invoke.texi: patch does not apply error: patch failed: gcc/ira-conflicts.c:233 error: gcc/ira-conflicts.c: patch does not apply error: patch failed: gcc/ira-int.h:971 error: gcc/ira-int.h: patch does not apply error: patch failed: gcc/ira.c:1922 error: gcc/ira.c: patch does not apply error: patch failed: gcc/params.opt:330 error: gcc/params.opt: patch does not apply > > > > Many thanks in advance! > > > > btw, you can simply ignore the part about parameter > > ira-consider-dup-in-all-alts (its name/description), it's sort of > > stale, I let it be for now as we will likely get rid of it. > > > > BR, > > Kewen > > ----- > > gcc/ChangeLog: > > > > * doc/invoke.texi (ira-consider-dup-in-all-alts): Document new > > parameter. > > * ira.c (ira_get_dup_out_num): Adjust as parameter > > param_ira_consider_dup_in_all_alts. > > * params.opt (ira-consider-dup-in-all-alts): New. > > * ira-conflicts.c (process_regs_for_copy): Add one parameter > > single_input_op_has_cstr_p. > > (get_freq_for_shuffle_copy): New function. > > (add_insn_allocno_copies): Adjust as single_input_op_has_cstr_p= . > > * ira-int.h (ira_get_dup_out_num): Add one bool parameter. > > > > -- > BR, > Hongtao --=20 BR, Hongtao