From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 845D23856DFE; Thu, 16 Jun 2022 07:47:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 845D23856DFE Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25G7Nr5E013320; Thu, 16 Jun 2022 07:47:55 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gqk06aupg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Jun 2022 07:47:55 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25G7bJNB015946; Thu, 16 Jun 2022 07:47:54 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gqk06aung-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Jun 2022 07:47:54 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25G7ZOFG011079; Thu, 16 Jun 2022 07:47:53 GMT Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by ppma03dal.us.ibm.com with ESMTP id 3gmjpa8f9e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Jun 2022 07:47:53 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25G7lrlE10551880 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 Jun 2022 07:47:53 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EFAC728059; Thu, 16 Jun 2022 07:47:52 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9FAA428058; Thu, 16 Jun 2022 07:47:52 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTPS; Thu, 16 Jun 2022 07:47:52 +0000 (GMT) From: Jiufu Guo To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, amodra@gmail.com Subject: Re: [PATCH V2]rs6000: Store complicated constant into pool References: <20220614132355.117887-1-guojiufu@linux.ibm.com> <20220614223421.GB25951@gate.crashing.org> <20220615174441.GD25951@gate.crashing.org> Date: Thu, 16 Jun 2022 15:47:49 +0800 In-Reply-To: <20220615174441.GD25951@gate.crashing.org> (Segher Boessenkool's message of "Wed, 15 Jun 2022 12:44:41 -0500") Message-ID: <7eczf99hi2.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: -CKfAAagmBPrZUYYrGFslP52SyFXqxLy X-Proofpoint-ORIG-GUID: vmnE0iZLEsJqYROZCDdk4blQL_1o_-lm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_04,2022-06-15_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 malwarescore=0 impostorscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 suspectscore=0 priorityscore=1501 adultscore=0 spamscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206160028 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jun 2022 07:47:58 -0000 Segher Boessenkool writes: Hi Segher! Thanks for your comments and suggestions!! > Hi! > > On Wed, Jun 15, 2022 at 04:19:41PM +0800, Jiufu Guo wrote: >> Segher Boessenkool writes: >> > Have you tried with different limits? >> I drafted cases(C code, and updated asm code) to test runtime costs for >> different code sequences: building constants with 5 insns; with 3 >> insns and 2 insns; and loading from rodata. > > I'm not asking about trivial examples here. Have you tried the > resulting compiler, on some big real-life code, with various choices for > these essentially random choices, on different cpus? I tested the patch with spec2017 on p9 and p10. I only tested combined variations: 'threshold 2 insns' + 'constant cost COSTS_N_INSNS(N) - 1' (or do not -1). And I find only few benchmarks are affected noticeablely. I will test more variations. Any suggestions about workloads? > > Huge changes like this need quite a bit of experimental data to back it > up, or some convincing other evidence. That is the only way to move > forward: anything else will resemble Brownian motion :-( Understood! I would collect more data to see if it is good in general. > >> > And, p10 is a red herring, you >> > actually test if prefixed insns are used. >> >> 'pld' is the instruction which we care about instead target p10 :-). So >> in patch, 'TARGET_PREFIXED' is tested. > > Huh. I would think "pli" is the most important thing here! Which is > not a load instruction at all, notwithstanding its name. "pli" could help a lot on constant building! When loading constant from memory, "pld" would also be good for some cases. :-) > >> >> --- a/gcc/config/rs6000/rs6000.cc >> >> +++ b/gcc/config/rs6000/rs6000.cc >> >> @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void) >> >> static bool >> >> rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) >> >> { >> >> - if (GET_CODE (x) == HIGH >> >> - && GET_CODE (XEXP (x, 0)) == UNSPEC) >> >> + /* Exclude CONSTANT HIGH part. e.g. >> >> + (high:DI (symbol_ref:DI ("var") [flags 0xc0] )). */ >> >> + if (GET_CODE (x) == HIGH) >> >> return true; >> > >> > So, why is this? You didn't explain. >> >> In the function rs6000_cannot_force_const_mem, we already excluded >> "HIGH with UNSPEC" like: "high:P (unspec:P [symbol_ref..". >> I find, "high:DI (symbol_ref:DI" is similar, which may also need to >> exclude. Half high part of address would be invalid for constant pool. >> >> Or maybe, we could use below code to exclude it. >> /* high part of a symbol_ref could not be put into constant pool. */ >> if (GET_CODE (x) == HIGH >> && (GET_CODE (XEXP (x, 0)) == UNSPEC || SYMBOL_REF_P (XEXP (x, 0)))) >> >> One interesting thing is how the rtx "(high:DI (symbol_ref:DI (var" is >> generated. It seems in the middle of optimization, it is checked to >> see if ok to store in constant memory. > > It probably is fine to not allow the HIGH of *anything* to be put in a > constant pool, sure. But again, that is a change independent of other > things, and it could use some supporting evidence. In code, I will add comments about these examples of high half part address that need to be excluded. /* The high part address is invalid for constant pool. The form of address high part would be: "high:P (unspec:P [symbol_ref". In the middle of one pass, the form could also be "high:DI (symbol_ref". Returns true for rtx with HIGH code. */ > >> >> case CONST_DOUBLE: >> >> case CONST_WIDE_INT: >> >> + /* It may needs a few insns for const to SET. -1 for outer SET code. */ >> >> + if (outer_code == SET) >> >> + { >> >> + *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1; >> >> + return true; >> >> + } >> >> + /* FALLTHRU */ >> >> + >> >> case CONST: >> >> case HIGH: >> >> case SYMBOL_REF: >> > >> > But this again isn't an obvious improvement at all, and needs >> > performance testing -- separately of the other changes. >> >> This code updates the cost of constant assigning, we need this to avoid >> CSE to replace the constant loading with constant assigning. > > No. This is rtx_cost, which makes a cost estimate of random RTL, not > typically corresponding to existing RTL insns at all. We do not use it > a lot anymore thankfully (we use insn_cost in most places), but things > like CSE still use it. This change increases rtx_cost for constant on SET. Because CSE is using the rtx_cost. Then with this, we could avoid CSE get lower cost incorrectly on complicated constant assigning. > >> The above change (the threshold for storing const in the pool) depends >> on this code. > > Yes, and a gazillion other places as well still (or about 50 really :-) ) > >> > And this is completely independent of the rest as well. Cost 5 or 7 are >> > completely random numbers, why did you pick these? Does it work better >> > than 8, or 4, etc.? >> For 'pld', asm code would be "pld %r9,.LC0@pcrel"; as tests, it is >> faster than 2insns (like, "li+oris"), but still slower than one >> instruction (e.g. li, pli). So, "COSTS_N_INSNS (2) - 1" is selected. >> And with this, cse will not replace 'pld' with 'li + oris'. > > (That is WinNT assembler syntax btw. We never use that. We write > either "pld 9,.LC0@pcrel" or "pld r9,.LC0@pcrel".) Thanks. We used to see asm code: "pld 9,.LC0@pcrel". Which I want to show is "pld r9,.LC0@pcrel" :-). I did not find a way to dump this asm code. "-mregnames" generates "%r9". > > COSTS_N_INSNS(1) means 4, always. The costs are not additive at all in > reality of course, so you cannot simply add them and get any sensible > result :-( > > Why did you choose 7? Your explanation makes 5 and 6 good candidates > as well. My money is on 6 performing best here, but there is no way to > know without actually trying out things. Thanks for sugguestion! As tests, it would be a value between 4-8. I also selected "5"(slightly slower than 4--one instruction), selected "7" because it may means close to the cost of two instructions. "6" maybe better, I will have a test. > >> >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> >> @@ -1,7 +1,7 @@ >> >> /* { dg-do compile { target { powerpc*-*-* } } } */ >> >> /* { dg-require-effective-target lp64 } */ >> >> /* { dg-options "-O" } */ >> >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ >> >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ >> > >> > Why? This is still better generated in code, no? It should never be >> > loaded from a constant pool (it is hex 4000_0000_0000_0000, easy to >> > construct with just one or two insns). >> >> For p8/9, two insns "lis 0x4000+sldi 32" are used: >> addis %r3,%r2,.LANCHOR0@toc@ha >> addi %r3,%r3,.LANCHOR0@toc@l >> lis %r9,0x4000 >> sldi %r9,%r9,32 >> add %r3,%r3,%r9 >> blr > > That does not mean putting this constant in the constant pool is a good > idea at all, of course. > >> On p10, as expected, 'pld' would be better than 'lis+sldi'. > > Is it? With simple cases, it shows 'pld' seems better. For perlbench, it may also indicate this. But I did not test this part separately. As you suggested, I will collect more data to check this change. > >> And for this case, it also saves other instructions(addis/addi): >> pld %r3,.LC1@pcrel >> blr > > This explanation then belongs in your commit message :-) > >> > Btw, you need to write >> > \m(?:rldimi|ld|pld)\M >> Oh, thanks! I did not aware of this. > > (?:xxx) is just the same as (xxx), but the parentheses in the latter are > capturing, which messes up scan-assembler-times, so you need > non-capturing grouping. Thanks. > >> Maybe I should separate out two cases one for "-mdejagnu-cpu=power10", >> and one for "-mdejagnu-cpu=power7", because the load instructions >> the number would be different. > > Yeah, it is probably best to never allow both a load and non-load insns > in the same check. OK, thanks. > >> > So no doubt this will improve things, but we need testing of each part >> > separately. Also look at code size, or differences in the generated >> > code in general: this is much more sensitive to detect than performance, >> > and is not itself sensitive to things like system load, so a) is easier >> > to measure, and b) has more useful outputs, outputs that tell more of >> > the whole story. >> Thanks for your suggestions! >> I also feel it is very sensitive when I test performance. > > Yeah, there are constants *everywhere* in typical compiler output, so a > small change will easily move the needle already. > >> The change >> (e.g. cost of constant) may affect other optimization passes indirectly. > > That too, yes. > >> I will also check the object changes (object size or differences). I may >> use objects from GCC bootstrap and GCC test suite. > > Good choice :-) > > Looking forward to it, Any comments or suggestions, thanks for point out! BR, Jiufu Guo > > > Segher