From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linkw@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id A3E833857815
 for <gcc-patches@gcc.gnu.org>; Fri,  2 Jul 2021 02:18:44 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A3E833857815
Received: from pps.filterd (m0098419.ppops.net [127.0.0.1])
 by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 16224g3L077249; Thu, 1 Jul 2021 22:18:42 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0b-001b2d01.pphosted.com with ESMTP id 39hn92p1sb-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 01 Jul 2021 22:18:41 -0400
Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 16225ukK086549;
 Thu, 1 Jul 2021 22:18:41 -0400
Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com
 [149.81.74.106])
 by mx0b-001b2d01.pphosted.com with ESMTP id 39hn92p1rw-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 01 Jul 2021 22:18:41 -0400
Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1])
 by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 162241L4003680;
 Fri, 2 Jul 2021 02:18:39 GMT
Received: from b06cxnps3074.portsmouth.uk.ibm.com
 (d06relay09.portsmouth.uk.ibm.com [9.149.109.194])
 by ppma04fra.de.ibm.com with ESMTP id 39duv8hd7q-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 02 Jul 2021 02:18:39 +0000
Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com
 [9.149.105.62])
 by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 1622IaZo31130036
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 2 Jul 2021 02:18:36 GMT
Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 4C8BBAE051;
 Fri,  2 Jul 2021 02:18:36 +0000 (GMT)
Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 7FDDDAE04D;
 Fri,  2 Jul 2021 02:18:33 +0000 (GMT)
Received: from KewenLins-MacBook-Pro.local (unknown [9.200.53.103])
 by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Fri,  2 Jul 2021 02:18:33 +0000 (GMT)
Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with
 param [PR100328]
To: richard.sandiford@arm.com
References: <c8ca748c-d7fd-6fb1-5ef2-567935d38722@linux.ibm.com>
 <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com>
 <CAMZc-bzzUSrg33sL=9HUYHs_CaLY1XGR+P6cFK57uAQJ2CD_oQ@mail.gmail.com>
 <CAMZc-bxuwG-XiEWRfDRsA3CDj8jdbTppr6GuiWhM3wkH030-Gg@mail.gmail.com>
 <bb221383-bb9f-d6c6-3982-0f93fbb9aecf@linux.ibm.com>
 <mpt4kdf2wbb.fsf@arm.com>
Cc: Hongtao Liu <crazylht@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>,
 Vladimir Makarov <vmakarov@redhat.com>, bergner@linux.ibm.com,
 Bill Schmidt <wschmidt@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Message-ID: <f4e18d0d-b4a3-0a06-2bee-c01271b8b864@linux.ibm.com>
Date: Fri, 2 Jul 2021 10:18:31 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
 Gecko/20100101 Thunderbird/78.10.0
In-Reply-To: <mpt4kdf2wbb.fsf@arm.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: MeiEgJVcAs0cGFh1sGxvxRWEAYnVevfb
X-Proofpoint-ORIG-GUID: 0UCRMYXwBzig61YTPnMIbpBIkd_Oc3zc
Content-Transfer-Encoding: 8bit
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790
 definitions=2021-07-01_12:2021-07-01,
 2021-07-01 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 malwarescore=0
 impostorscore=0 suspectscore=0 bulkscore=0 phishscore=0 clxscore=1015
 mlxlogscore=999 mlxscore=0 priorityscore=1501 spamscore=0 adultscore=0
 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2104190000 definitions=main-2107020009
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Jul 2021 02:18:46 -0000

Hi Richard,

on 2021/6/30 下午11:42, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> on 2021/6/28 下午3:20, Hongtao Liu wrote:
>>> On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
>>>>
>>>> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
>>>>>> Hi,
>>>>>>
>>>>>> PR100328 has some details about this issue, I am trying to
>>>>>> brief it here.  In the hottest function LBM_performStreamCollideTRT
>>>>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
>>>>>> (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
>>>>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
>>>>>> class have 64 registers whose foregoing 32 ones make up the
>>>>>> whole FLOAT_REG.  There are some differences for these two
>>>>>> flavors, taking "*fma<mode>4_fpr" as example:
>>>>>>
>>>>>> (define_insn "*fma<mode>4_fpr"
>>>>>>   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
>>>>>>       (fma:SFDF
>>>>>>         (match_operand:SFDF 1 "gpc_reg_operand" "%<Ff>,wa,wa")
>>>>>>         (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
>>>>>>         (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa")))]
>>>>>>
>>>>>> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
>>>>>> // <Ff> (f/d) => A floating point register, aka. FLOAT_REG.
>>>>>>
>>>>>> So for VSX_REG, we only have the destructive form, when VSX_REG
>>>>>> alternative being used, the operand 2 or operand 3 is required
>>>>>> to be the same as operand 0.  reload has to take care of this
>>>>>> constraint and create some non-free register copies if required.
>>>>>>
>>>>>> Assuming one fma insn looks like:
>>>>>>   op0 = FMA (op1, op2, op3)
>>>>>>
>>>>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
>>>>>> IRA simply creates three shuffle copies for them (here the operand
>>>>>> order matters, since with the same freq, the one with smaller number
>>>>>> takes preference), but IMO both op2 and op3 should take higher priority
>>>>>> in copy queue due to the matching constraint.
>>>>>>
>>>>>> I noticed that there is one function ira_get_dup_out_num, which meant
>>>>>> to create this kind of constraint copy, but the below code looks to
>>>>>> refuse to create if there is an alternative which has valid regclass
>>>>>> without spilled need.
>>>>>>
>>>>>>       default:
>>>>>>       {
>>>>>>         enum constraint_num cn = lookup_constraint (str);
>>>>>>         enum reg_class cl = reg_class_for_constraint (cn);
>>>>>>         if (cl != NO_REGS
>>>>>>             && !targetm.class_likely_spilled_p (cl))
>>>>>>           goto fail
>>>>>>
>>>>>>        ...
>>>>>>
>>>>>> I cooked one patch attached to make ira respect this kind of matching
>>>>>> constraint guarded with one parameter.  As I stated in the PR, I was
>>>>>> not sure this is on the right track.  The RFC patch is to check the
>>>>>> matching constraint in all alternatives, if there is one alternative
>>>>>> with matching constraint and matches the current preferred regclass
>>>>>> (or best of allocno?), it will record the output operand number and
>>>>>> further create one constraint copy for it.  Normally it can get the
>>>>>> priority against shuffle copies and the matching constraint will get
>>>>>> satisfied with higher possibility, reload doesn't create extra copies
>>>>>> to meet the matching constraint or the desirable register class when
>>>>>> it has to.
>>>>>>
>>>>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
>>>>>> as shuffle copies, and later any of A,B,C,D gets assigned by one
>>>>>> hardware register which is a VSX register (VSX_REG) but not a FP
>>>>>> register (FLOAT_REG), which means it has to pay costs once we can NOT
>>>>>> go with VSX alternatives, so at that time it's important to respect
>>>>>> the matching constraint then we can increase the freq for the remaining
>>>>>> copies related to this (A/B, A/C, A/D).  This idea requires some side
>>>>>> tables to record some information and seems a bit complicated in the
>>>>>> current framework, so the proposed patch aggressively emphasizes the
>>>>>> matching constraint at the time of creating copies.
>>>>>>
>>>>>
>>>>> Comparing with the original patch (v1), this patch v3 has
>>>>> considered: (this should be v2 for this mail list, but bump
>>>>> it to be consistent as PR's).
>>>>>
>>>>>   - Excluding the case where for one preferred register class
>>>>>     there can be two or more alternatives, one of them has the
>>>>>     matching constraint, while another doesn't have.  So for
>>>>>     the given operand, even if it's assigned by a hardware reg
>>>>>     which doesn't meet the matching constraint, it can simply
>>>>>     use the alternative which doesn't have matching constraint
>>>>>     so no register move is needed.  One typical case is
>>>>>     define_insn *mov<mode>_internal2 on rs6000.  So we
>>>>>     shouldn't create constraint copy for it.
>>>>>
>>>>>   - The possible free register move in the same register class,
>>>>>     disable this if so since the register move to meet the
>>>>>     constraint is considered as free.
>>>>>
>>>>>   - Making it on by default, suggested by Segher & Vladimir, we
>>>>>     hope to get rid of the parameter if the benchmarking result
>>>>>     looks good on major targets.
>>>>>
>>>>>   - Tweaking cost when either of matching constraint two sides
>>>>>     is hardware register.  Before this patch, the constraint
>>>>>     copy is simply taken as a real move insn for pref and
>>>>>     conflict cost with one hardware register, after this patch,
>>>>>     it's allowed that there are several input operands
>>>>>     respecting the same matching constraint (but in different
>>>>>     alternatives), so we should take it to be like shuffle copy
>>>>>     for some cases to avoid over preferring/disparaging.
>>>>>
>>>>> Please check the PR comments for more details.
>>>>>
>>>>> This patch can be bootstrapped & regtested on
>>>>> powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some
>>>>> "XFAIL->XPASS" failures on aarch64-linux-gnu.  The failure list
>>>>> was attached in the PR and thought the new assembly looks
>>>>> improved (expected).
>>>>>
>>>>> With option Ofast unroll, this patch can help to improve SPEC2017
>>>>> bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
>>>>> 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
>>>>> remarkable degradations.
>>>>>
>>>>> Since this patch likely benefits x86_64 and aarch64, but I don't
>>>>> have performance machines with these arches at hand, could
>>>>> someone kindly help to benchmark it if possible?
>>>> I can help test it on Intel cascade lake and AMD milan.
>>
>>
>> Thanks for your help, Hongtao!
>>
>>
>>> And could you rebase your patch on the lastest trunk, i got several
>>> failures when applying the patch
>>> ~ git apply ira-v3.diff
>>> error: patch failed: gcc/doc/invoke.texi:13845
>>> error: gcc/doc/invoke.texi: patch does not apply
>>> error: patch failed: gcc/ira-conflicts.c:233
>>> error: gcc/ira-conflicts.c: patch does not apply
>>> error: patch failed: gcc/ira-int.h:971
>>> error: gcc/ira-int.h: patch does not apply
>>> error: patch failed: gcc/ira.c:1922
>>> error: gcc/ira.c: patch does not apply
>>> error: patch failed: gcc/params.opt:330
>>> error: gcc/params.opt: patch does not apply
>>>
>>
>> I think it's due to unexpected git stat lines in previously attached diff.
>>
>> I have attached the format-patch file.  Please have a check.  Thanks!
> 
> FWIW, this seems to be neutral for SPEC 2017 on AArch64.  The SVE
> XFAIL->XPASS transitions mean it's definitely a good thing for
> AArch64 in that respect though.

Thanks for the information!  It gives us the confidence to turn it on by
default now, I've removed those xfails in the latest version
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574315.html.

BR,
Kewen