From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 442703858D3C; Mon, 28 Nov 2022 02:35:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 442703858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AS1hIfH008418; Mon, 28 Nov 2022 02:35:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type; s=pp1; bh=BkYY4hmStZ81HJ89kLWp41+rUAnPWInYM3dhTlvciUA=; b=dK52R6PwINYURobE0YsvdkCPpPavOOwSuyoZ3eWd6mT96Lx0osm3DISOHysICLRBTqDZ KKAzgC+NJYbRxt8DKJUhelwIcsFmAhACyO55PtR8xOMnAOk64yN03XTqKQEIAs94g9tv dgmSXoNq5Ul3FXQ4WBv4zeDU1tNq+AijDbdP+5X/2gbQoOCd6rvaxRNb65aojSV3YWqQ bOobMUiT2aP4GKsD01jul7OcYi4jUYDnsxSvHzLDvWOHE+dm3I/r9qv8LiXxhmw/U8wD j7a+pWA7qCLjBaaieKTTFeLw9fPVB7Cq82dcls103LJ+olC1sjSG+uWMhQNrWGvPuwZy SQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m3vfjbpgc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Nov 2022 02:35:39 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AS2UAm3012206; Mon, 28 Nov 2022 02:35:39 GMT Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m3vfjbpg5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Nov 2022 02:35:39 +0000 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AS2ZJR2010052; Mon, 28 Nov 2022 02:35:38 GMT Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by ppma01dal.us.ibm.com with ESMTP id 3m3ae9nfmr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Nov 2022 02:35:38 +0000 Received: from smtpav06.dal12v.mail.ibm.com ([9.208.128.130]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AS2ZZDf43516642 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Nov 2022 02:35:35 GMT Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C4A6A58059; Mon, 28 Nov 2022 02:35:36 +0000 (GMT) Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8B24C58055; Mon, 28 Nov 2022 02:35:36 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav06.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 28 Nov 2022 02:35:36 +0000 (GMT) From: Jiufu Guo To: "Kewen.Lin" Cc: Segher Boessenkool , dje.gcc@gmail.com, linkw@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: [PATCH]rs6000: Load high and low part of 64bit constant independently References: <20220915083052.74903-1-guojiufu@linux.ibm.com> <7b482d49-3928-552c-ccf5-d391684b7f2b@linux.ibm.com> <7ev8n3p3u6.fsf@pike.rch.stglabs.ibm.com> <20221125154603.GH25951@gate.crashing.org> <2de1185f-8de0-85a8-3d84-121f31fbd630@linux.ibm.com> Date: Mon, 28 Nov 2022 10:35:36 +0800 In-Reply-To: <2de1185f-8de0-85a8-3d84-121f31fbd630@linux.ibm.com> (Kewen Lin's message of "Mon, 28 Nov 2022 09:50:41 +0800") Message-ID: <7elenv24cn.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: fiRJSOHc_yahU4vl3t0ETlgIAO-Kp1cq X-Proofpoint-GUID: 27WrjBI4HE9btyp6fvVeBWI9lsLy1kry X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-28_02,2022-11-25_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 phishscore=0 mlxscore=0 malwarescore=0 bulkscore=0 adultscore=0 spamscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 clxscore=1015 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211280018 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Kewen/Segher, Thanks a lot for your review! I updated the patch accordingly as below for message/code/testcase: For a complicate 64bit constant, blow is one instruction-sequence to build: lis 9,0x800a ori 9,9,0xabcd sldi 9,9,32 oris 9,9,0xc167 ori 9,9,0xfa16 while we can also use below sequence to build: lis 9,0xc167 lis 10,0x800a ori 9,9,0xfa16 ori 10,10,0xabcd rldimi 9,10,32,0 This sequence is using 2 registers to build high and low part firstly, and then merge them. In parallel aspect, this sequence would be faster. (Ofcause, using 1 more register with potential register pressure). The instruction sequence with two registers for parallel version can be generated only if can_create_pseudo_p. Otherwise, the one register sequence is generated. Bootstrap and regtest pass on ppc64le. Is this ok for trunk? gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate more parallel code if can_create_pseudo_p. gcc/testsuite/ChangeLog: * gcc.target/powerpc/parall_5insn_const.c: New test. --- gcc/config/rs6000/rs6000.cc | 45 +++++++++++-------- .../gcc.target/powerpc/parall_5insn_const.c | 27 +++++++++++ 2 files changed, 53 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index eb7ad5e954f..877b314a57a 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10337,26 +10337,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) } else { - temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - - emit_move_insn (copy_rtx (temp), - GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); - if (ud3 != 0) - emit_move_insn (copy_rtx (temp), - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud3))); + if (can_create_pseudo_p ()) + { + /* lis H,U4; ori H,U3; lis L,U2; ori L,U1; rldimi L,H,32,0. */ + rtx high = gen_reg_rtx (DImode); + rtx low = gen_reg_rtx (DImode); + HOST_WIDE_INT num = (ud2 << 16) | ud1; + rs6000_emit_set_long_const (low, (num ^ 0x80000000) - 0x80000000); + num = (ud4 << 16) | ud3; + rs6000_emit_set_long_const (high, (num ^ 0x80000000) - 0x80000000); + emit_insn (gen_rotldi3_insert_3 (dest, high, GEN_INT (32), low, + GEN_INT (0xffffffff))); + } + else + { + /* lis A,U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1. */ + emit_move_insn (dest, + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); + if (ud3 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3))); - emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_ASHIFT (DImode, copy_rtx (temp), - GEN_INT (32))); - if (ud2 != 0) - emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud2 << 16))); - if (ud1 != 0) - emit_move_insn (dest, - gen_rtx_IOR (DImode, copy_rtx (temp), - GEN_INT (ud1))); + emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32))); + if (ud2 != 0) + emit_move_insn (dest, + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16))); + if (ud1 != 0) + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1))); + } } } diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c new file mode 100644 index 00000000000..e3a9a7264cf --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mno-prefixed -save-temps" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ + +void __attribute__ ((noinline)) foo (unsigned long long *a) +{ + /* 2 lis + 2 ori + 1 rldimi for each constant. */ + *a++ = 0x800aabcdc167fa16ULL; + *a++ = 0x7543a876867f616ULL; +} + +long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL}; +int +main () +{ + long long res[2]; + + foo (res); + if (__builtin_memcmp (res, A, sizeof (res)) != 0) + __builtin_abort (); + + return 0; +} -- 2.17.1 BR, Jeff (Jiufu) "Kewen.Lin" writes: > Hi Segher, > > on 2022/11/25 23:46, Segher Boessenkool wrote: >> Hi! >> >> On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote: >>> "Kewen.Lin" writes: >>>> on 2022/9/15 16:30, Jiufu Guo wrote: >>>>> For a complicate 64bit constant, blow is one instruction-sequence to >>>>> build: >>>>> lis 9,0x800a >>>>> ori 9,9,0xabcd >>>>> sldi 9,9,32 >>>>> oris 9,9,0xc167 >>>>> ori 9,9,0xfa16 >>>>> >>>>> while we can also use below sequence to build: >>>>> lis 9,0xc167 >>>>> lis 10,0x800a >>>>> ori 9,9,0xfa16 >>>>> ori 10,10,0xabcd >>>>> rldimi 9,10,32,0 >>>>> This sequence is using 2 registers to build high and low part firstly, >>>>> and then merge them. >>>>> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more >>>>> register with potential register pressure). >> >> And crucially this patch only uses two registers if can_create_pseudo_p. >> Please mention that. >> >>>>> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit >>>>> constant build. >> >> If you don't give details of what this does, just say "Update." please. >> But update to what? >> >> "Generate more parallel code if can_create_pseudo_p." maybe? >> >>>>> + rtx H = gen_reg_rtx (DImode); >>>>> + rtx L = gen_reg_rtx (DImode); >> >> Please don't use all-uppercase variable names, those are for macros. In >> fact, don't use uppercase in variable (and function etc.) names at all, >> unless there is a really good reason to. >> >> Just call it "high" and "low", or "hi" and "lo", or something? >> >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c >>>>> @@ -0,0 +1,27 @@ >>>>> +/* { dg-do run } */ >>>>> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ >>>> >>>> Why do we need power7 here? >>> power8/9 are also ok for this case. Actually, O just want to >>> avoid to use new p10 instruction, like "pli", and then selected >>> an old arch option. >> >> Why does it need _at least_ p7, is the question (as I understand it). >> > > Yeah, that's what I was intended to ask, since those insns to be scanned > don't actually require Power7 or later. > >> To prohibit pli etc. you can do -mno-prefixed (which works on all older >> CPUs just as well), or skip the test if prefixed insns are enabled, or >> scan for the then generated code as well. The first option is by far >> the simplest. > > Yeah, using -mno-prefixed is perfect here, nice! > > BR, > Kewen