From: Hongtao Liu <crazylht@gmail.com>
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Vladimir Makarov <vmakarov@redhat.com>,
bergner@linux.ibm.com, Bill Schmidt <wschmidt@linux.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]
Date: Wed, 30 Jun 2021 16:53:54 +0800 [thread overview]
Message-ID: <CAMZc-bxSTVMan+bvjPx7OGRs=YmzWMvKxN9TmQ7Rtz0ReSuAFw@mail.gmail.com> (raw)
In-Reply-To: <bb221383-bb9f-d6c6-3982-0f93fbb9aecf@linux.ibm.com>
On Mon, Jun 28, 2021 at 3:27 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/6/28 下午3:20, Hongtao Liu wrote:
> > On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>
> >>> Hi!
> >>>
> >>> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
> >>>> Hi,
> >>>>
> >>>> PR100328 has some details about this issue, I am trying to
> >>>> brief it here. In the hottest function LBM_performStreamCollideTRT
> >>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
> >>>> (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style
> >>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
> >>>> class have 64 registers whose foregoing 32 ones make up the
> >>>> whole FLOAT_REG. There are some differences for these two
> >>>> flavors, taking "*fma<mode>4_fpr" as example:
> >>>>
> >>>> (define_insn "*fma<mode>4_fpr"
> >>>> [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
> >>>> (fma:SFDF
> >>>> (match_operand:SFDF 1 "gpc_reg_operand" "%<Ff>,wa,wa")
> >>>> (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
> >>>> (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa")))]
> >>>>
> >>>> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
> >>>> // <Ff> (f/d) => A floating point register, aka. FLOAT_REG.
> >>>>
> >>>> So for VSX_REG, we only have the destructive form, when VSX_REG
> >>>> alternative being used, the operand 2 or operand 3 is required
> >>>> to be the same as operand 0. reload has to take care of this
> >>>> constraint and create some non-free register copies if required.
> >>>>
> >>>> Assuming one fma insn looks like:
> >>>> op0 = FMA (op1, op2, op3)
> >>>>
> >>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
> >>>> IRA simply creates three shuffle copies for them (here the operand
> >>>> order matters, since with the same freq, the one with smaller number
> >>>> takes preference), but IMO both op2 and op3 should take higher priority
> >>>> in copy queue due to the matching constraint.
> >>>>
> >>>> I noticed that there is one function ira_get_dup_out_num, which meant
> >>>> to create this kind of constraint copy, but the below code looks to
> >>>> refuse to create if there is an alternative which has valid regclass
> >>>> without spilled need.
> >>>>
> >>>> default:
> >>>> {
> >>>> enum constraint_num cn = lookup_constraint (str);
> >>>> enum reg_class cl = reg_class_for_constraint (cn);
> >>>> if (cl != NO_REGS
> >>>> && !targetm.class_likely_spilled_p (cl))
> >>>> goto fail
> >>>>
> >>>> ...
> >>>>
> >>>> I cooked one patch attached to make ira respect this kind of matching
> >>>> constraint guarded with one parameter. As I stated in the PR, I was
> >>>> not sure this is on the right track. The RFC patch is to check the
> >>>> matching constraint in all alternatives, if there is one alternative
> >>>> with matching constraint and matches the current preferred regclass
> >>>> (or best of allocno?), it will record the output operand number and
> >>>> further create one constraint copy for it. Normally it can get the
> >>>> priority against shuffle copies and the matching constraint will get
> >>>> satisfied with higher possibility, reload doesn't create extra copies
> >>>> to meet the matching constraint or the desirable register class when
> >>>> it has to.
> >>>>
> >>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
> >>>> as shuffle copies, and later any of A,B,C,D gets assigned by one
> >>>> hardware register which is a VSX register (VSX_REG) but not a FP
> >>>> register (FLOAT_REG), which means it has to pay costs once we can NOT
> >>>> go with VSX alternatives, so at that time it's important to respect
> >>>> the matching constraint then we can increase the freq for the remaining
> >>>> copies related to this (A/B, A/C, A/D). This idea requires some side
> >>>> tables to record some information and seems a bit complicated in the
> >>>> current framework, so the proposed patch aggressively emphasizes the
> >>>> matching constraint at the time of creating copies.
> >>>>
> >>>
> >>> Comparing with the original patch (v1), this patch v3 has
> >>> considered: (this should be v2 for this mail list, but bump
> >>> it to be consistent as PR's).
> >>>
> >>> - Excluding the case where for one preferred register class
> >>> there can be two or more alternatives, one of them has the
> >>> matching constraint, while another doesn't have. So for
> >>> the given operand, even if it's assigned by a hardware reg
> >>> which doesn't meet the matching constraint, it can simply
> >>> use the alternative which doesn't have matching constraint
> >>> so no register move is needed. One typical case is
> >>> define_insn *mov<mode>_internal2 on rs6000. So we
> >>> shouldn't create constraint copy for it.
> >>>
> >>> - The possible free register move in the same register class,
> >>> disable this if so since the register move to meet the
> >>> constraint is considered as free.
> >>>
> >>> - Making it on by default, suggested by Segher & Vladimir, we
> >>> hope to get rid of the parameter if the benchmarking result
> >>> looks good on major targets.
> >>>
> >>> - Tweaking cost when either of matching constraint two sides
> >>> is hardware register. Before this patch, the constraint
> >>> copy is simply taken as a real move insn for pref and
> >>> conflict cost with one hardware register, after this patch,
> >>> it's allowed that there are several input operands
> >>> respecting the same matching constraint (but in different
> >>> alternatives), so we should take it to be like shuffle copy
> >>> for some cases to avoid over preferring/disparaging.
> >>>
> >>> Please check the PR comments for more details.
> >>>
> >>> This patch can be bootstrapped & regtested on
> >>> powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some
> >>> "XFAIL->XPASS" failures on aarch64-linux-gnu. The failure list
> >>> was attached in the PR and thought the new assembly looks
> >>> improved (expected).
> >>>
> >>> With option Ofast unroll, this patch can help to improve SPEC2017
> >>> bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
> >>> 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
> >>> remarkable degradations.
Here's SPEC2017 rate result tested on AMD milan
option is: -march=znver2 -Ofast -funroll-loops -mfpmath=sse -flto
fprate:
503.bwaves_r 0.01 (A) shliclel219
507.cactuBSSN_r -0.19 (A) shliclel219
508.namd_r 0.02 (A) shliclel219
510.parest_r -0.68 (A) shliclel219
511.povray_r 1.59 (A) shliclel219
521.wrf_r 0.19 (A) shliclel219
526.blender_r 0.68 (A) shliclel219
527.cam4_r -0.30 (A) shliclel219
538.imagick_r -3.81 <- (A) shliclel219
544.nab_r 0.02 (A) shliclel219
549.fotonik3d_r 0.02 (A) shliclel219
554.roms_r -0.43 (A) shliclel219
997.specrand_fr -3.80 <- (A) shliclel219
Geometric mean: -0.52
intrate:
500.perlbench_r -1.54 (A) shliclel219
502.gcc_r -0.38 (A) shliclel219
505.mcf_r -0.10 (A) shliclel219
520.omnetpp_r -0.24 (A) shliclel219
523.xalancbmk_r -1.04 (A) shliclel219
525.x264_r 0.31 (A) shliclel219
531.deepsjeng_r -0.02 (A) shliclel219
541.leela_r 0.95 (A) shliclel219
548.exchange2_r 0.08 (A) shliclel219
557.xz_r -0.40 (A) shliclel219
Geometric mean: -0.24
> >>>
> >>> Since this patch likely benefits x86_64 and aarch64, but I don't
> >>> have performance machines with these arches at hand, could
> >>> someone kindly help to benchmark it if possible?
> >> I can help test it on Intel cascade lake and AMD milan.
>
>
> Thanks for your help, Hongtao!
>
>
> > And could you rebase your patch on the lastest trunk, i got several
> > failures when applying the patch
> > ~ git apply ira-v3.diff
> > error: patch failed: gcc/doc/invoke.texi:13845
> > error: gcc/doc/invoke.texi: patch does not apply
> > error: patch failed: gcc/ira-conflicts.c:233
> > error: gcc/ira-conflicts.c: patch does not apply
> > error: patch failed: gcc/ira-int.h:971
> > error: gcc/ira-int.h: patch does not apply
> > error: patch failed: gcc/ira.c:1922
> > error: gcc/ira.c: patch does not apply
> > error: patch failed: gcc/params.opt:330
> > error: gcc/params.opt: patch does not apply
> >
>
> I think it's due to unexpected git stat lines in previously attached diff.
>
> I have attached the format-patch file. Please have a check. Thanks!
>
>
> BR,
> Kewen
--
BR,
Hongtao
next prev parent reply other threads:[~2021-06-30 8:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-09 5:18 [RFC/PATCH] ira: Consider matching constraints " Kewen.Lin
2021-06-28 6:26 ` [RFC/PATCH v3] ira: Support more matching constraint forms " Kewen.Lin
2021-06-28 7:12 ` Hongtao Liu
2021-06-28 7:20 ` Hongtao Liu
2021-06-28 7:27 ` Kewen.Lin
2021-06-30 8:53 ` Hongtao Liu [this message]
2021-06-30 9:42 ` Kewen.Lin
2021-06-30 10:18 ` Hongtao Liu
2021-06-30 15:42 ` Richard Sandiford
2021-07-02 2:18 ` Kewen.Lin
2021-06-30 15:24 ` Vladimir Makarov
2021-07-02 2:11 ` [PATCH v4] " Kewen.Lin
2021-07-02 2:28 ` [PATCH] i386: Disable param ira-consider-dup-in-all-alts [PR100328] Kewen.Lin
2021-07-02 8:05 ` Uros Bizjak
2021-07-05 13:04 ` [PATCH v4] ira: Support more matching constraint forms with param [PR100328] Vladimir Makarov
2021-06-30 15:25 ` [RFC/PATCH v3] " Vladimir Makarov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMZc-bxSTVMan+bvjPx7OGRs=YmzWMvKxN9TmQ7Rtz0ReSuAFw@mail.gmail.com' \
--to=crazylht@gmail.com \
--cc=bergner@linux.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=linkw@linux.ibm.com \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
--cc=vmakarov@redhat.com \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).