From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id A6A463858D1E; Fri, 2 Sep 2022 06:56:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A6A463858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2826lRpT030632; Fri, 2 Sep 2022 06:56:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type; s=pp1; bh=N7Y88FTtqHg3VRNbudJ2oJ38FJQhV4kDFSVovMnz3Ek=; b=HpL+SpRdtY+ndXV+6YK7hAEIUVOYfEknYCr0WnBidSvE+9005nzPXsBOgVJ1WlAMMXrM +Uy9g3i0IM+6G5YKEUQheDKT+XqG97HIbHvquJXr+e0eFX8CTsEM/ThUvQegUQj3Z8nk VTOqeIJi4wLXxYq+3Vq3cgqXCpt+qg8AFyMcQ1U4Qtm/lB8FpYfyypZv0kIuDi+Iei47 hXIdLIESA6oRHePzSPqMYn5qAMvf1R0Vi8M8lec8oUXEb1F+wJHEmjhjK0JsKM8V82Af PrOV5Ug5TJZjaAGcNgck6/bYaRCNHc5sCQOhEH3nOeuY18Xay7YckGFD7tAgFCpRPVwU VQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3jbcvdg95m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Sep 2022 06:56:25 +0000 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2826pmYw015270; Fri, 2 Sep 2022 06:56:25 GMT Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3jbcvdg95e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Sep 2022 06:56:25 +0000 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2826o6l0012514; Fri, 2 Sep 2022 06:56:24 GMT Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by ppma05wdc.us.ibm.com with ESMTP id 3j7awa6c2u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Sep 2022 06:56:24 +0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2826uN7r10814068 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 2 Sep 2022 06:56:23 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BC09A136053; Fri, 2 Sep 2022 06:56:23 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7A60C136051; Fri, 2 Sep 2022 06:56:23 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTPS; Fri, 2 Sep 2022 06:56:23 +0000 (GMT) From: Jiufu Guo To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, meissner@linux.ibm.com Subject: Re: [PATCH 1/2] Using pli(paddi) and rotate to build 64bit constants References: <20220901032400.23692-1-guojiufu@linux.ibm.com> <20220901215233.GJ25951@gate.crashing.org> Date: Fri, 02 Sep 2022 14:56:21 +0800 In-Reply-To: <20220901215233.GJ25951@gate.crashing.org> (Segher Boessenkool's message of "Thu, 1 Sep 2022 16:52:33 -0500") Message-ID: <7eedwuw9gq.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ix0X70ce2JkO0OHxnSOwh69riUs4bkK6 X-Proofpoint-ORIG-GUID: lqb_4d4Ctvc_HA4eswBJrPPZyODtwAPc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-09-01_12,2022-08-31_03,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 malwarescore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 lowpriorityscore=0 spamscore=0 impostorscore=0 mlxscore=0 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2209020029 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Segher Boessenkool writes: > Hi! > > This patch is a clear improvement :-) > > On Thu, Sep 01, 2022 at 11:24:00AM +0800, Jiufu Guo wrote: >> As mentioned in PR106550, since pli could support 34bits immediate, we could >> use less instructions(3insn would be ok) to build 64bits constant with pli. > >> For example, for constant 0x020805006106003, we could generate it with: >> asm code1: >> pli 9,101736451 (0x6106003) >> sldi 9,9,32 >> paddi 9,9, 2130000 (0x0208050) > > 3 insns, 2 insns dependent on the previous, each. Yeap. > >> or asm code2: >> pli 10, 2130000 >> pli 9, 101736451 >> rldimi 9, 10, 32, 0 > > 3 insns, 1 insn dependent on both others. Yes. > >> Testing with simple cases as below, run them a lot of times: >> f1.c >> long __attribute__ ((noinline)) foo (long *arg,long *,long*) >> { >> *arg = 0x2351847027482577; >> } >> 5insns: base >> pli+sldi+paddi: similar -0.08% >> pli+pli+rldimi: faster +0.66% > > This mostly tests how well this micro-benchmark is scheduled. More time > is spent in the looping and function calls (not shown)! > >> f2.c >> long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) >> { >> *arg = 0x2351847027482577; >> *arg2 = 0x3257845024384680; >> *arg3 = 0x1245abcef9240dec; >> } >> 5nisns: base >> pli+sldi+paddi: faster +1.35% >> pli+pli+rldimi: faster +5.49% >> >> f2.c would be more meaningful. Because 'sched passes' are effective for >> f2.c, but 'scheds' do less thing for f1.c. > > It still is a too small example to mean much without looking at a > pipeview, or at the very least perf. But the results show a solid > improvement as expected ;-) Right, checking how the 'cycles' are using on each instructions would be more meaningful to demonstrate how the runtime is changing. > >> gcc/ChangeLog: >> PR target/106550 >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for >> constant building. > > "Use pli." ? Thanks, will update. > >> gcc/testsuite/ChangeLog: >> PR target/106550 >> * gcc.target/powerpc/pr106550.c: New test. > >> + else if (TARGET_PREFIXED) >> + { >> + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ > > But not just 9 and 10. Use A and B or X and Y or H and L or something > like that? OK, will updata accordingly. > > The comment goes... > >> + if (can_create_pseudo_p ()) >> + { > > ... here. > >> + temp = gen_reg_rtx (DImode); >> + rtx temp1 = gen_reg_rtx (DImode); >> + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); >> + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); >> + >> + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, >> + GEN_INT (0xffffffff))); >> + } >> + > > No blank line here please. > >> + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ >> + else >> + { > > The comment goes here, in the block it refers to. Comments for a block > are the first thing *in* the block. OK, great! I like the format you sugguested here :-) > >> + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); >> + >> + emit_move_insn (copy_rtx (dest), >> + gen_rtx_ASHIFT (DImode, copy_rtx (dest), >> + GEN_INT (32))); >> + >> + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; > > There should be a test that we so the right thing (or *a* right thing, > anyway; a working thing; but hopefully a reasonably fast thing) for > !can_use_paddi. To catch this test point, we need let the splitter run after RA, and register 0 happen to be the dest of an assignment. Oh, below case would be useful for this test point: /* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */ /* force the constant splitter run after RA: -fdisable-rtl-split1 a few assignments to make sure r0 is allocated as dest of an assign. */ void foo (unsigned long long *a) { *a++ = 0x020805006106003; *a++ = 0x2351847027482587; *a++ = 0x22513478874a2578; *a++ = 0x02180570670b003; *a++ = 0x2311847029488587; *a++ = 0x335184b02748757f; *a++ = 0x720805006096003; *a++ = 0x23a18b70a74e257e; *a++ = 0x2a518a70a74a2567; *a++ = 0x5208a5da0606a03; *a++ = 0x1391a47a2749257a; *a++ = 0x235a847027488576; *a++ = 0x23a1847027482677; } /* { dg-final { scan-assembler-times {\moris\M} 1 } } */ /* { dg-final { scan-assembler-times {\mori\M} 1 } } */ I will add this test case in patch. Is this ok? Any sugguestions? > >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> @@ -0,0 +1,14 @@ >> +/* PR target/106550 */ >> +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ >> + >> +void >> +foo (unsigned long long *a) >> +{ >> + *a++ = 0x020805006106003; >> + *a++ = 0x2351847027482577; >> +} >> + >> +/* 3 insns for each constant: pli+sldi+paddi or pli+pli+rldimi. >> + And 3 additional insns: std+std+blr: 9 insns totally. */ >> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ > > Also test the expected insns separately please? The std's (just with > \mstd so it will catch all variations as well), the blr, the pli's and > the rldimi etc.? The reason of using "(?n)^\s+[a-z]" is to keep this test case pass no matter the splitter running before or after RA. Yes, using real instructions to check would be better. Will update the case to check real instructions 'pli' and 'rldimi'. > > We also should test all special cases as well. Especially those that do > not happen all over the place, we will notice something is broken there > easy enough. But unlikely cases can take years to show up. Understand :) BR, Jeff(Jiufu) > > Okay for trunk with the formatting fixed. Thank you! > > > Segher