From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 5B01A3858D32; Mon, 5 Sep 2022 06:23:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5B01A3858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2855niSo031203; Mon, 5 Sep 2022 06:23:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : mime-version; s=pp1; bh=MN4vOWFQ6B/dlOnGaGpNOF0mZG2WpSPVe+Ax7Qsius4=; b=j09s1uoNSRF6m2kXE3SONUiIdqqm5vMvIpL8KTPt6SCjIPEjmkOmf0MjIQia8fkAT0BC paz7oJJ6Z71UkZjCHPCASfRdTCyR51T+Hf+GfiWlqppVxgB/ymU627qs8fJqehIvxa7f QIIC4qSep5o8WUYDLQ3MuovZkA9ONbZ5IemWddgo72sr7eGtBvmYBlstDen9E57X2D3O wC1G/rz9S3bt8Lcc85eNXDV+R6qQe/Wc4TTvwKYh680NY/6/De53XgtRCn/o2cbvWQkm MS2AJesxuhLZBhNYC+Tc61X1X3zVoLdWmr+dwpcyj0ADxk0sbGUcfU1GNtlmQotY6Qc7 0A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jdba4gnyb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Sep 2022 06:23:02 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2856ALOr032666; Mon, 5 Sep 2022 06:23:01 GMT Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jdba4gny1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Sep 2022 06:23:01 +0000 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2856LMMg002652; Mon, 5 Sep 2022 06:23:00 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma01dal.us.ibm.com with ESMTP id 3jbxj9e4pe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Sep 2022 06:23:00 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2856MxF32294516 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 5 Sep 2022 06:22:59 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A8311AE066; Mon, 5 Sep 2022 06:22:59 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5A921AE064; Mon, 5 Sep 2022 06:22:59 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTPS; Mon, 5 Sep 2022 06:22:59 +0000 (GMT) From: Jiufu Guo To: "Kewen.Lin" Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, meissner@linux.ibm.com, gcc-patches@gcc.gnu.org Subject: Re: [PATCH 1/2] Using pli(paddi) and rotate to build 64bit constants References: <20220901032400.23692-1-guojiufu@linux.ibm.com> Date: Mon, 05 Sep 2022 14:22:57 +0800 In-Reply-To: (Kewen Lin's message of "Fri, 2 Sep 2022 12:07:45 +0800") Message-ID: <7ebkruuypq.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: rJcXutsg_WH4DMnDI8c5bFvBX9ncCkR- X-Proofpoint-GUID: Ka021DxrtHKLEQQXbDwb-iRSoi2jsAZK X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-09-05_04,2022-09-05_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 clxscore=1015 bulkscore=0 mlxscore=0 lowpriorityscore=0 malwarescore=0 priorityscore=1501 phishscore=0 spamscore=0 impostorscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2209050029 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Kewen.Lin" writes: > Hi Jeff, > > Thanks for the patch, some comments on nits are inline. > > on 2022/9/1 11:24, Jiufu Guo wrote: >> Hi, >> >> As mentioned in PR106550, since pli could support 34bits immediate, we could >> use less instructions(3insn would be ok) to build 64bits constant with pli. >> >> For example, for constant 0x020805006106003, we could generate it with: >> asm code1: >> pli 9,101736451 (0x6106003) >> sldi 9,9,32 >> paddi 9,9, 2130000 (0x0208050) >> >> or asm code2: >> pli 10, 2130000 >> pli 9, 101736451 >> rldimi 9, 10, 32, 0 >> >> Testing with simple cases as below, run them a lot of times: >> f1.c >> long __attribute__ ((noinline)) foo (long *arg,long *,long*) >> { >> *arg = 0x2351847027482577; >> } >> 5insns: base >> pli+sldi+paddi: similar -0.08% >> pli+pli+rldimi: faster +0.66% >> >> f2.c >> long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) >> { >> *arg = 0x2351847027482577; >> *arg2 = 0x3257845024384680; >> *arg3 = 0x1245abcef9240dec; >> } >> 5nisns: base >> pli+sldi+paddi: faster +1.35% >> pli+pli+rldimi: faster +5.49% >> >> f2.c would be more meaningful. Because 'sched passes' are effective for >> f2.c, but 'scheds' do less thing for f1.c. >> >> Compare with previous patch: >> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599525.html >> This one updates code slightly and extracts changes on rs6000.md to a >> seperate patch. >> >> This patch pass boostrap and regtest on ppc64le(includes p10). >> Is it ok for trunk? >> >> BR, >> Jeff(Jiufu) >> >> >> PR target/106550 >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for >> constant building. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr106550.c: New test. >> >> --- >> gcc/config/rs6000/rs6000.cc | 39 +++++++++++++++++++++ >> gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 ++++++++ >> 2 files changed, 53 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index df491bee2ea..1ccb2ff30a1 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -10181,6 +10181,45 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) >> gen_rtx_IOR (DImode, copy_rtx (temp), >> GEN_INT (ud1))); >> } >> + else if (TARGET_PREFIXED) >> + { >> + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ >> + if (can_create_pseudo_p ()) >> + { >> + temp = gen_reg_rtx (DImode); >> + rtx temp1 = gen_reg_rtx (DImode); >> + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); >> + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); >> + > > Nit: copy_rtx here seems not necessary, as both temp and temp1 are with CODE REG. > The function copy_rtx returns the given rtx for code REG. > >> + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, >> + GEN_INT (0xffffffff))); >> + } >> + >> + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ >> + else >> + { >> + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); >> + >> + emit_move_insn (copy_rtx (dest), >> + gen_rtx_ASHIFT (DImode, copy_rtx (dest), >> + GEN_INT (32))); >> + >> + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; >> + > > The REGNO usage has asserted dest is with CODE REG, if it's always true > I don't see why we need copy_rtx around. Or do I miss something? Thanks a lot for point this out! Yes, now rs6000_emit_set_long_const is called on DImode for constant splitter; and it should be always with CODE REG, and then copy_rtx would not be needed here! I would update the patch accordingly. > >> + /* Use paddi for the low32 bits. */ >> + if (ud2 != 0 && ud1 != 0 && can_use_paddi) >> + emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest), >> + GEN_INT ((ud2 << 16) | ud1))); >> + /* Use oris, ori for low32 bits. */ >> + if (ud2 != 0 && (ud1 == 0 || !can_use_paddi)) >> + emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest, >> + gen_rtx_IOR (DImode, copy_rtx (dest), >> + GEN_INT (ud2 << 16))); >> + if (ud1 != 0 && (ud2 == 0 || !can_use_paddi)) >> + emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest), >> + GEN_INT (ud1))); >> + } >> + } >> else >> { >> temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> new file mode 100644 >> index 00000000000..c6f4116bb9a >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> @@ -0,0 +1,14 @@ >> +/* PR target/106550 */ >> +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ >> + > > Need to check power10_ok, like: > /* { dg-require-effective-target power10_ok } */ > > Nit: -std=c99 is not needed? Thanks for catching this! BR, Jeff(Jiufu) > > BR, > Kewen