From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id F32B238582B4; Wed, 15 Jun 2022 17:45:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F32B238582B4 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 25FHifDs002199; Wed, 15 Jun 2022 12:44:42 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 25FHifMw002198; Wed, 15 Jun 2022 12:44:41 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Wed, 15 Jun 2022 12:44:41 -0500 From: Segher Boessenkool To: Jiufu Guo Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, amodra@gmail.com Subject: Re: [PATCH V2]rs6000: Store complicated constant into pool Message-ID: <20220615174441.GD25951@gate.crashing.org> References: <20220614132355.117887-1-guojiufu@linux.ibm.com> <20220614223421.GB25951@gate.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2022 17:45:45 -0000 Hi! On Wed, Jun 15, 2022 at 04:19:41PM +0800, Jiufu Guo wrote: > Segher Boessenkool writes: > > Have you tried with different limits? > I drafted cases(C code, and updated asm code) to test runtime costs for > different code sequences: building constants with 5 insns; with 3 > insns and 2 insns; and loading from rodata. I'm not asking about trivial examples here. Have you tried the resulting compiler, on some big real-life code, with various choices for these essentially random choices, on different cpus? Huge changes like this need quite a bit of experimental data to back it up, or some convincing other evidence. That is the only way to move forward: anything else will resemble Brownian motion :-( > > And, p10 is a red herring, you > > actually test if prefixed insns are used. > > 'pld' is the instruction which we care about instead target p10 :-). So > in patch, 'TARGET_PREFIXED' is tested. Huh. I would think "pli" is the most important thing here! Which is not a load instruction at all, notwithstanding its name. > >> --- a/gcc/config/rs6000/rs6000.cc > >> +++ b/gcc/config/rs6000/rs6000.cc > >> @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void) > >> static bool > >> rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) > >> { > >> - if (GET_CODE (x) == HIGH > >> - && GET_CODE (XEXP (x, 0)) == UNSPEC) > >> + /* Exclude CONSTANT HIGH part. e.g. > >> + (high:DI (symbol_ref:DI ("var") [flags 0xc0] )). */ > >> + if (GET_CODE (x) == HIGH) > >> return true; > > > > So, why is this? You didn't explain. > > In the function rs6000_cannot_force_const_mem, we already excluded > "HIGH with UNSPEC" like: "high:P (unspec:P [symbol_ref..". > I find, "high:DI (symbol_ref:DI" is similar, which may also need to > exclude. Half high part of address would be invalid for constant pool. > > Or maybe, we could use below code to exclude it. > /* high part of a symbol_ref could not be put into constant pool. */ > if (GET_CODE (x) == HIGH > && (GET_CODE (XEXP (x, 0)) == UNSPEC || SYMBOL_REF_P (XEXP (x, 0)))) > > One interesting thing is how the rtx "(high:DI (symbol_ref:DI (var" is > generated. It seems in the middle of optimization, it is checked to > see if ok to store in constant memory. It probably is fine to not allow the HIGH of *anything* to be put in a constant pool, sure. But again, that is a change independent of other things, and it could use some supporting evidence. > >> case CONST_DOUBLE: > >> case CONST_WIDE_INT: > >> + /* It may needs a few insns for const to SET. -1 for outer SET code. */ > >> + if (outer_code == SET) > >> + { > >> + *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1; > >> + return true; > >> + } > >> + /* FALLTHRU */ > >> + > >> case CONST: > >> case HIGH: > >> case SYMBOL_REF: > > > > But this again isn't an obvious improvement at all, and needs > > performance testing -- separately of the other changes. > > This code updates the cost of constant assigning, we need this to avoid > CSE to replace the constant loading with constant assigning. No. This is rtx_cost, which makes a cost estimate of random RTL, not typically corresponding to existing RTL insns at all. We do not use it a lot anymore thankfully (we use insn_cost in most places), but things like CSE still use it. > The above change (the threshold for storing const in the pool) depends > on this code. Yes, and a gazillion other places as well still (or about 50 really :-) ) > > And this is completely independent of the rest as well. Cost 5 or 7 are > > completely random numbers, why did you pick these? Does it work better > > than 8, or 4, etc.? > For 'pld', asm code would be "pld %r9,.LC0@pcrel"; as tests, it is > faster than 2insns (like, "li+oris"), but still slower than one > instruction (e.g. li, pli). So, "COSTS_N_INSNS (2) - 1" is selected. > And with this, cse will not replace 'pld' with 'li + oris'. (That is WinNT assembler syntax btw. We never use that. We write either "pld 9,.LC0@pcrel" or "pld r9,.LC0@pcrel".) COSTS_N_INSNS(1) means 4, always. The costs are not additive at all in reality of course, so you cannot simply add them and get any sensible result :-( Why did you choose 7? Your explanation makes 5 and 6 good candidates as well. My money is on 6 performing best here, but there is no way to know without actually trying out things. > >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> @@ -1,7 +1,7 @@ > >> /* { dg-do compile { target { powerpc*-*-* } } } */ > >> /* { dg-require-effective-target lp64 } */ > >> /* { dg-options "-O" } */ > >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ > >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ > > > > Why? This is still better generated in code, no? It should never be > > loaded from a constant pool (it is hex 4000_0000_0000_0000, easy to > > construct with just one or two insns). > > For p8/9, two insns "lis 0x4000+sldi 32" are used: > addis %r3,%r2,.LANCHOR0@toc@ha > addi %r3,%r3,.LANCHOR0@toc@l > lis %r9,0x4000 > sldi %r9,%r9,32 > add %r3,%r3,%r9 > blr That does not mean putting this constant in the constant pool is a good idea at all, of course. > On p10, as expected, 'pld' would be better than 'lis+sldi'. Is it? > And for this case, it also saves other instructions(addis/addi): > pld %r3,.LC1@pcrel > blr This explanation then belongs in your commit message :-) > > Btw, you need to write > > \m(?:rldimi|ld|pld)\M > Oh, thanks! I did not aware of this. (?:xxx) is just the same as (xxx), but the parentheses in the latter are capturing, which messes up scan-assembler-times, so you need non-capturing grouping. > Maybe I should separate out two cases one for "-mdejagnu-cpu=power10", > and one for "-mdejagnu-cpu=power7", because the load instructions > the number would be different. Yeah, it is probably best to never allow both a load and non-load insns in the same check. > > So no doubt this will improve things, but we need testing of each part > > separately. Also look at code size, or differences in the generated > > code in general: this is much more sensitive to detect than performance, > > and is not itself sensitive to things like system load, so a) is easier > > to measure, and b) has more useful outputs, outputs that tell more of > > the whole story. > Thanks for your suggestions! > I also feel it is very sensitive when I test performance. Yeah, there are constants *everywhere* in typical compiler output, so a small change will easily move the needle already. > The change > (e.g. cost of constant) may affect other optimization passes indirectly. That too, yes. > I will also check the object changes (object size or differences). I may > use objects from GCC bootstrap and GCC test suite. Good choice :-) Looking forward to it, Segher