public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: Hongtao Liu <crazylht@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	Vladimir Makarov <vmakarov@redhat.com>,
	bergner@linux.ibm.com, Bill Schmidt <wschmidt@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>
Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]
Date: Wed, 30 Jun 2021 16:42:32 +0100	[thread overview]
Message-ID: <mpt4kdf2wbb.fsf@arm.com> (raw)
In-Reply-To: <bb221383-bb9f-d6c6-3982-0f93fbb9aecf@linux.ibm.com> (Kewen Lin's message of "Mon, 28 Jun 2021 15:27:19 +0800")

"Kewen.Lin" <linkw@linux.ibm.com> writes:
> on 2021/6/28 下午3:20, Hongtao Liu wrote:
>> On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
>>>
>>> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi!
>>>>
>>>> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
>>>>> Hi,
>>>>>
>>>>> PR100328 has some details about this issue, I am trying to
>>>>> brief it here.  In the hottest function LBM_performStreamCollideTRT
>>>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
>>>>> (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
>>>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
>>>>> class have 64 registers whose foregoing 32 ones make up the
>>>>> whole FLOAT_REG.  There are some differences for these two
>>>>> flavors, taking "*fma<mode>4_fpr" as example:
>>>>>
>>>>> (define_insn "*fma<mode>4_fpr"
>>>>>   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
>>>>>       (fma:SFDF
>>>>>         (match_operand:SFDF 1 "gpc_reg_operand" "%<Ff>,wa,wa")
>>>>>         (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
>>>>>         (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa")))]
>>>>>
>>>>> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
>>>>> // <Ff> (f/d) => A floating point register, aka. FLOAT_REG.
>>>>>
>>>>> So for VSX_REG, we only have the destructive form, when VSX_REG
>>>>> alternative being used, the operand 2 or operand 3 is required
>>>>> to be the same as operand 0.  reload has to take care of this
>>>>> constraint and create some non-free register copies if required.
>>>>>
>>>>> Assuming one fma insn looks like:
>>>>>   op0 = FMA (op1, op2, op3)
>>>>>
>>>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
>>>>> IRA simply creates three shuffle copies for them (here the operand
>>>>> order matters, since with the same freq, the one with smaller number
>>>>> takes preference), but IMO both op2 and op3 should take higher priority
>>>>> in copy queue due to the matching constraint.
>>>>>
>>>>> I noticed that there is one function ira_get_dup_out_num, which meant
>>>>> to create this kind of constraint copy, but the below code looks to
>>>>> refuse to create if there is an alternative which has valid regclass
>>>>> without spilled need.
>>>>>
>>>>>       default:
>>>>>       {
>>>>>         enum constraint_num cn = lookup_constraint (str);
>>>>>         enum reg_class cl = reg_class_for_constraint (cn);
>>>>>         if (cl != NO_REGS
>>>>>             && !targetm.class_likely_spilled_p (cl))
>>>>>           goto fail
>>>>>
>>>>>        ...
>>>>>
>>>>> I cooked one patch attached to make ira respect this kind of matching
>>>>> constraint guarded with one parameter.  As I stated in the PR, I was
>>>>> not sure this is on the right track.  The RFC patch is to check the
>>>>> matching constraint in all alternatives, if there is one alternative
>>>>> with matching constraint and matches the current preferred regclass
>>>>> (or best of allocno?), it will record the output operand number and
>>>>> further create one constraint copy for it.  Normally it can get the
>>>>> priority against shuffle copies and the matching constraint will get
>>>>> satisfied with higher possibility, reload doesn't create extra copies
>>>>> to meet the matching constraint or the desirable register class when
>>>>> it has to.
>>>>>
>>>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
>>>>> as shuffle copies, and later any of A,B,C,D gets assigned by one
>>>>> hardware register which is a VSX register (VSX_REG) but not a FP
>>>>> register (FLOAT_REG), which means it has to pay costs once we can NOT
>>>>> go with VSX alternatives, so at that time it's important to respect
>>>>> the matching constraint then we can increase the freq for the remaining
>>>>> copies related to this (A/B, A/C, A/D).  This idea requires some side
>>>>> tables to record some information and seems a bit complicated in the
>>>>> current framework, so the proposed patch aggressively emphasizes the
>>>>> matching constraint at the time of creating copies.
>>>>>
>>>>
>>>> Comparing with the original patch (v1), this patch v3 has
>>>> considered: (this should be v2 for this mail list, but bump
>>>> it to be consistent as PR's).
>>>>
>>>>   - Excluding the case where for one preferred register class
>>>>     there can be two or more alternatives, one of them has the
>>>>     matching constraint, while another doesn't have.  So for
>>>>     the given operand, even if it's assigned by a hardware reg
>>>>     which doesn't meet the matching constraint, it can simply
>>>>     use the alternative which doesn't have matching constraint
>>>>     so no register move is needed.  One typical case is
>>>>     define_insn *mov<mode>_internal2 on rs6000.  So we
>>>>     shouldn't create constraint copy for it.
>>>>
>>>>   - The possible free register move in the same register class,
>>>>     disable this if so since the register move to meet the
>>>>     constraint is considered as free.
>>>>
>>>>   - Making it on by default, suggested by Segher & Vladimir, we
>>>>     hope to get rid of the parameter if the benchmarking result
>>>>     looks good on major targets.
>>>>
>>>>   - Tweaking cost when either of matching constraint two sides
>>>>     is hardware register.  Before this patch, the constraint
>>>>     copy is simply taken as a real move insn for pref and
>>>>     conflict cost with one hardware register, after this patch,
>>>>     it's allowed that there are several input operands
>>>>     respecting the same matching constraint (but in different
>>>>     alternatives), so we should take it to be like shuffle copy
>>>>     for some cases to avoid over preferring/disparaging.
>>>>
>>>> Please check the PR comments for more details.
>>>>
>>>> This patch can be bootstrapped & regtested on
>>>> powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some
>>>> "XFAIL->XPASS" failures on aarch64-linux-gnu.  The failure list
>>>> was attached in the PR and thought the new assembly looks
>>>> improved (expected).
>>>>
>>>> With option Ofast unroll, this patch can help to improve SPEC2017
>>>> bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
>>>> 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
>>>> remarkable degradations.
>>>>
>>>> Since this patch likely benefits x86_64 and aarch64, but I don't
>>>> have performance machines with these arches at hand, could
>>>> someone kindly help to benchmark it if possible?
>>> I can help test it on Intel cascade lake and AMD milan.
>
>
> Thanks for your help, Hongtao!
>
>
>> And could you rebase your patch on the lastest trunk, i got several
>> failures when applying the patch
>> ~ git apply ira-v3.diff
>> error: patch failed: gcc/doc/invoke.texi:13845
>> error: gcc/doc/invoke.texi: patch does not apply
>> error: patch failed: gcc/ira-conflicts.c:233
>> error: gcc/ira-conflicts.c: patch does not apply
>> error: patch failed: gcc/ira-int.h:971
>> error: gcc/ira-int.h: patch does not apply
>> error: patch failed: gcc/ira.c:1922
>> error: gcc/ira.c: patch does not apply
>> error: patch failed: gcc/params.opt:330
>> error: gcc/params.opt: patch does not apply
>> 
>
> I think it's due to unexpected git stat lines in previously attached diff.
>
> I have attached the format-patch file.  Please have a check.  Thanks!

FWIW, this seems to be neutral for SPEC 2017 on AArch64.  The SVE
XFAIL->XPASS transitions mean it's definitely a good thing for
AArch64 in that respect though.

Thanks,
Richard

  parent reply	other threads:[~2021-06-30 15:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09  5:18 [RFC/PATCH] ira: Consider matching constraints " Kewen.Lin
2021-06-28  6:26 ` [RFC/PATCH v3] ira: Support more matching constraint forms " Kewen.Lin
2021-06-28  7:12   ` Hongtao Liu
2021-06-28  7:20     ` Hongtao Liu
2021-06-28  7:27       ` Kewen.Lin
2021-06-30  8:53         ` Hongtao Liu
2021-06-30  9:42           ` Kewen.Lin
2021-06-30 10:18             ` Hongtao Liu
2021-06-30 15:42         ` Richard Sandiford [this message]
2021-07-02  2:18           ` Kewen.Lin
2021-06-30 15:24   ` Vladimir Makarov
2021-07-02  2:11     ` [PATCH v4] " Kewen.Lin
2021-07-02  2:28       ` [PATCH] i386: Disable param ira-consider-dup-in-all-alts [PR100328] Kewen.Lin
2021-07-02  8:05         ` Uros Bizjak
2021-07-05 13:04       ` [PATCH v4] ira: Support more matching constraint forms with param [PR100328] Vladimir Makarov
2021-06-30 15:25   ` [RFC/PATCH v3] " Vladimir Makarov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpt4kdf2wbb.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=bergner@linux.ibm.com \
    --cc=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=segher@kernel.crashing.org \
    --cc=vmakarov@redhat.com \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).