From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 253403858D1E; Wed, 17 Aug 2022 02:32:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 253403858D1E Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27H2NbTg002513; Wed, 17 Aug 2022 02:32:10 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3j0qgf858x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Aug 2022 02:32:10 +0000 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 27H2RIOK015191; Wed, 17 Aug 2022 02:32:09 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3j0qgf858p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Aug 2022 02:32:09 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 27H2KkSw020515; Wed, 17 Aug 2022 02:32:09 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma04wdc.us.ibm.com with ESMTP id 3hx3k9nt5f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 17 Aug 2022 02:32:09 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 27H2W8xC60228058 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 17 Aug 2022 02:32:08 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CF776124053; Wed, 17 Aug 2022 02:32:08 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9256E124052; Wed, 17 Aug 2022 02:32:08 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTPS; Wed, 17 Aug 2022 02:32:08 +0000 (GMT) From: Jiufu Guo To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org Subject: Re: [RFC]rs6000: split complicated constant to memory References: <20220815052519.194582-1-guojiufu@linux.ibm.com> <20220815211225.GJ25951@gate.crashing.org> Date: Wed, 17 Aug 2022 10:32:05 +0800 In-Reply-To: <20220815211225.GJ25951@gate.crashing.org> (Segher Boessenkool's message of "Mon, 15 Aug 2022 16:12:25 -0500") Message-ID: <7ek077k3bu.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: CarMGSjGBQ74EgvGSP-YPX3K7qP8Vq4l X-Proofpoint-ORIG-GUID: AIqHlzRX6v4s_grKPXpkO6FU-Mp5jkuz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-17_02,2022-08-16_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 suspectscore=0 adultscore=0 malwarescore=0 mlxscore=0 clxscore=1015 phishscore=0 mlxlogscore=991 spamscore=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208170007 X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2022 02:32:12 -0000 Hi, Segher Boessenkool writes: > Hi! > > On Mon, Aug 15, 2022 at 01:25:19PM +0800, Jiufu Guo wrote: >> This patch tries to put the constant into constant pool if building the >> constant requires 3 or more instructions. >> >> But there is a concern: I'm wondering if this patch is really profitable. >> >> Because, as I tested, 1. for simple case, if instructions are not been run >> in parallel, loading constant from memory maybe faster; but 2. if there >> are some instructions could run in parallel, loading constant from memory >> are not win comparing with building constant. As below examples. >> >> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect; >> for f2.c and f4.c, 'loading' constant are visibly slower. >> >> For real-world cases, both kinds of code sequences exist. >> >> So, I'm not sure if we need to push this patch. >> >> Run a lot of times (1000000000) below functions to check runtime. >> f1.c: >> long foo (long *arg, long*, long *) >> { >> *arg = 0x1234567800000000; >> } >> asm building constant: >> lis 10,0x1234 >> ori 10,10,0x5678 >> sldi 10,10,32 >> vs. asm loading >> addis 10,2,.LC0@toc@ha >> ld 10,.LC0@toc@l(10) > > This is just a load insn, unless this is the only thing needing the TOC. > You can use crtl->uses_const_pool as an approximation here, to figure > out if we have that case? Thanks for point out this! crtl->uses_const_pool is set to 1 in force_const_mem. create_TOC_reference would be called after force_const_mem. One concern: there maybe the case that crtl->uses_const_pool was not clear to zero after related symbols are optimized out. > >> The runtime between 'building' and 'loading' are similar: some times the >> 'building' is faster; sometimes 'loading' is faster. And the difference is >> slight. > > When there is only one constant, sure. But that isn't the expensive > case we need to avoid :-) Yes. If there are other instructions around, scheduler could optimized the 'building' instructions to be in parallel with other instructions. If we emit 'building' instruction in split1 pass (before sched1), these 'building constant' instructions may be more possible to be scheduled better. Then 'building form' maybe not bad. > >> addis 9,2,.LC2@toc@ha >> ld 7,.LC0@toc@l(7) >> ld 10,.LC1@toc@l(10) >> ld 9,.LC2@toc@l(9) >> For this case, 'loading' is always slower than 'building' (>15%). > > Only if there is nothing else to do, and only in cases where code size > does not matter (i.e. microbenchmarks). Yes, 'loading' may save code size slightly. > >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c >> @@ -0,0 +1,11 @@ >> +/* PR target/63281 */ >> +/* { dg-do compile { target lp64 } } */ >> +/* { dg-options "-O2 -std=c99" } */ > > Why std=c99 btw? The default is c17. Is there something we need to > disable here? Oh, this option is not required. Thanks! BR, Jeff(Jiufu) > > > Segher