From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 1E3C83939C38 for ; Wed, 30 Jun 2021 09:42:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1E3C83939C38 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15U9YmpE138179; Wed, 30 Jun 2021 05:42:53 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 39gnc79grt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 30 Jun 2021 05:42:53 -0400 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 15U9Zq5n147068; Wed, 30 Jun 2021 05:42:52 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 39gnc79gr3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 30 Jun 2021 05:42:52 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 15U9Yfr8027164; Wed, 30 Jun 2021 09:42:50 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma03ams.nl.ibm.com with ESMTP id 39duv89pwg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 30 Jun 2021 09:42:50 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 15U9f9KZ23331276 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 30 Jun 2021 09:41:09 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E56785208F; Wed, 30 Jun 2021 09:42:47 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.40]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id C3C9E52065; Wed, 30 Jun 2021 09:42:45 +0000 (GMT) Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328] To: Hongtao Liu Cc: GCC Patches , Vladimir Makarov , bergner@linux.ibm.com, Bill Schmidt , Segher Boessenkool , Richard Sandiford References: <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com> From: "Kewen.Lin" Message-ID: <9efdee8d-043e-77cf-3e0a-efea7169f38c@linux.ibm.com> Date: Wed, 30 Jun 2021 17:42:43 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: A9Ie4sj6DmN_pi_YnCpcRBVvdCx4mduA X-Proofpoint-GUID: yIwTpLJMaR02FdxcF_7o2xhBN7LUfIS_ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-30_02:2021-06-29, 2021-06-30 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 mlxscore=0 phishscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 impostorscore=0 bulkscore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106300062 X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jun 2021 09:42:59 -0000 on 2021/6/30 下午4:53, Hongtao Liu wrote: > On Mon, Jun 28, 2021 at 3:27 PM Kewen.Lin wrote: >> >> on 2021/6/28 下午3:20, Hongtao Liu wrote: >>> On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu wrote: >>>> >>>> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin wrote: >>>>> >>>>> Hi! >>>>> >>>>> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote: >>>>>> Hi, >>>>>> >>>>>> PR100328 has some details about this issue, I am trying to >>>>>> brief it here. In the hottest function LBM_performStreamCollideTRT >>>>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions >>>>>> (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style >>>>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg >>>>>> class have 64 registers whose foregoing 32 ones make up the >>>>>> whole FLOAT_REG. There are some differences for these two >>>>>> flavors, taking "*fma4_fpr" as example: >>>>>> >>>>>> (define_insn "*fma4_fpr" >>>>>> [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa") >>>>>> (fma:SFDF >>>>>> (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa") >>>>>> (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0") >>>>>> (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))] >>>>>> >>>>>> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG. >>>>>> // (f/d) => A floating point register, aka. FLOAT_REG. >>>>>> >>>>>> So for VSX_REG, we only have the destructive form, when VSX_REG >>>>>> alternative being used, the operand 2 or operand 3 is required >>>>>> to be the same as operand 0. reload has to take care of this >>>>>> constraint and create some non-free register copies if required. >>>>>> >>>>>> Assuming one fma insn looks like: >>>>>> op0 = FMA (op1, op2, op3) >>>>>> >>>>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead, >>>>>> IRA simply creates three shuffle copies for them (here the operand >>>>>> order matters, since with the same freq, the one with smaller number >>>>>> takes preference), but IMO both op2 and op3 should take higher priority >>>>>> in copy queue due to the matching constraint. >>>>>> >>>>>> I noticed that there is one function ira_get_dup_out_num, which meant >>>>>> to create this kind of constraint copy, but the below code looks to >>>>>> refuse to create if there is an alternative which has valid regclass >>>>>> without spilled need. >>>>>> >>>>>> default: >>>>>> { >>>>>> enum constraint_num cn = lookup_constraint (str); >>>>>> enum reg_class cl = reg_class_for_constraint (cn); >>>>>> if (cl != NO_REGS >>>>>> && !targetm.class_likely_spilled_p (cl)) >>>>>> goto fail >>>>>> >>>>>> ... >>>>>> >>>>>> I cooked one patch attached to make ira respect this kind of matching >>>>>> constraint guarded with one parameter. As I stated in the PR, I was >>>>>> not sure this is on the right track. The RFC patch is to check the >>>>>> matching constraint in all alternatives, if there is one alternative >>>>>> with matching constraint and matches the current preferred regclass >>>>>> (or best of allocno?), it will record the output operand number and >>>>>> further create one constraint copy for it. Normally it can get the >>>>>> priority against shuffle copies and the matching constraint will get >>>>>> satisfied with higher possibility, reload doesn't create extra copies >>>>>> to meet the matching constraint or the desirable register class when >>>>>> it has to. >>>>>> >>>>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay >>>>>> as shuffle copies, and later any of A,B,C,D gets assigned by one >>>>>> hardware register which is a VSX register (VSX_REG) but not a FP >>>>>> register (FLOAT_REG), which means it has to pay costs once we can NOT >>>>>> go with VSX alternatives, so at that time it's important to respect >>>>>> the matching constraint then we can increase the freq for the remaining >>>>>> copies related to this (A/B, A/C, A/D). This idea requires some side >>>>>> tables to record some information and seems a bit complicated in the >>>>>> current framework, so the proposed patch aggressively emphasizes the >>>>>> matching constraint at the time of creating copies. >>>>>> >>>>> >>>>> Comparing with the original patch (v1), this patch v3 has >>>>> considered: (this should be v2 for this mail list, but bump >>>>> it to be consistent as PR's). >>>>> >>>>> - Excluding the case where for one preferred register class >>>>> there can be two or more alternatives, one of them has the >>>>> matching constraint, while another doesn't have. So for >>>>> the given operand, even if it's assigned by a hardware reg >>>>> which doesn't meet the matching constraint, it can simply >>>>> use the alternative which doesn't have matching constraint >>>>> so no register move is needed. One typical case is >>>>> define_insn *mov_internal2 on rs6000. So we >>>>> shouldn't create constraint copy for it. >>>>> >>>>> - The possible free register move in the same register class, >>>>> disable this if so since the register move to meet the >>>>> constraint is considered as free. >>>>> >>>>> - Making it on by default, suggested by Segher & Vladimir, we >>>>> hope to get rid of the parameter if the benchmarking result >>>>> looks good on major targets. >>>>> >>>>> - Tweaking cost when either of matching constraint two sides >>>>> is hardware register. Before this patch, the constraint >>>>> copy is simply taken as a real move insn for pref and >>>>> conflict cost with one hardware register, after this patch, >>>>> it's allowed that there are several input operands >>>>> respecting the same matching constraint (but in different >>>>> alternatives), so we should take it to be like shuffle copy >>>>> for some cases to avoid over preferring/disparaging. >>>>> >>>>> Please check the PR comments for more details. >>>>> >>>>> This patch can be bootstrapped & regtested on >>>>> powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some >>>>> "XFAIL->XPASS" failures on aarch64-linux-gnu. The failure list >>>>> was attached in the PR and thought the new assembly looks >>>>> improved (expected). >>>>> >>>>> With option Ofast unroll, this patch can help to improve SPEC2017 >>>>> bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while >>>>> 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any >>>>> remarkable degradations. > > Here's SPEC2017 rate result tested on AMD milan > option is: -march=znver2 -Ofast -funroll-loops -mfpmath=sse -flto > > fprate: > 503.bwaves_r 0.01 (A) shliclel219 > 507.cactuBSSN_r -0.19 (A) shliclel219 > 508.namd_r 0.02 (A) shliclel219 > 510.parest_r -0.68 (A) shliclel219 > 511.povray_r 1.59 (A) shliclel219 > 521.wrf_r 0.19 (A) shliclel219 > 526.blender_r 0.68 (A) shliclel219 > 527.cam4_r -0.30 (A) shliclel219 > 538.imagick_r -3.81 <- (A) shliclel219 > 544.nab_r 0.02 (A) shliclel219 > 549.fotonik3d_r 0.02 (A) shliclel219 > 554.roms_r -0.43 (A) shliclel219 > 997.specrand_fr -3.80 <- (A) shliclel219 > Geometric mean: -0.52 > intrate: > 500.perlbench_r -1.54 (A) shliclel219 > 502.gcc_r -0.38 (A) shliclel219 > 505.mcf_r -0.10 (A) shliclel219 > 520.omnetpp_r -0.24 (A) shliclel219 > 523.xalancbmk_r -1.04 (A) shliclel219 > 525.x264_r 0.31 (A) shliclel219 > 531.deepsjeng_r -0.02 (A) shliclel219 > 541.leela_r 0.95 (A) shliclel219 > 548.exchange2_r 0.08 (A) shliclel219 > 557.xz_r -0.40 (A) shliclel219 > Geometric mean: -0.24 Roger, thanks! The result looks not good, I think I'll disable it for target x86_64 in next version. By the way, bmk 519.lbm_r seemed missing, just curious whether due to that it failed to build even with baseline? BR, Kewen