From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 20FE6384F8A5; Fri, 25 Nov 2022 15:47:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 20FE6384F8A5 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 2APFk4AQ028085; Fri, 25 Nov 2022 09:46:04 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 2APFk3XU028083; Fri, 25 Nov 2022 09:46:03 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Fri, 25 Nov 2022 09:46:03 -0600 From: Segher Boessenkool To: Jiufu Guo Cc: "Kewen.Lin" , dje.gcc@gmail.com, linkw@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: [PATCH]rs6000: Load high and low part of 64bit constant independently Message-ID: <20221125154603.GH25951@gate.crashing.org> References: <20220915083052.74903-1-guojiufu@linux.ibm.com> <7b482d49-3928-552c-ccf5-d391684b7f2b@linux.ibm.com> <7ev8n3p3u6.fsf@pike.rch.stglabs.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7ev8n3p3u6.fsf@pike.rch.stglabs.ibm.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote: > "Kewen.Lin" writes: > > on 2022/9/15 16:30, Jiufu Guo wrote: > >> For a complicate 64bit constant, blow is one instruction-sequence to > >> build: > >> lis 9,0x800a > >> ori 9,9,0xabcd > >> sldi 9,9,32 > >> oris 9,9,0xc167 > >> ori 9,9,0xfa16 > >> > >> while we can also use below sequence to build: > >> lis 9,0xc167 > >> lis 10,0x800a > >> ori 9,9,0xfa16 > >> ori 10,10,0xabcd > >> rldimi 9,10,32,0 > >> This sequence is using 2 registers to build high and low part firstly, > >> and then merge them. > >> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more > >> register with potential register pressure). And crucially this patch only uses two registers if can_create_pseudo_p. Please mention that. > >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit > >> constant build. If you don't give details of what this does, just say "Update." please. But update to what? "Generate more parallel code if can_create_pseudo_p." maybe? > >> + rtx H = gen_reg_rtx (DImode); > >> + rtx L = gen_reg_rtx (DImode); Please don't use all-uppercase variable names, those are for macros. In fact, don't use uppercase in variable (and function etc.) names at all, unless there is a really good reason to. Just call it "high" and "low", or "hi" and "lo", or something? > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > >> @@ -0,0 +1,27 @@ > >> +/* { dg-do run } */ > >> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -save-temps" } */ > > > > Why do we need power7 here? > power8/9 are also ok for this case. Actually, O just want to > avoid to use new p10 instruction, like "pli", and then selected > an old arch option. Why does it need _at least_ p7, is the question (as I understand it). To prohibit pli etc. you can do -mno-prefixed (which works on all older CPUs just as well), or skip the test if prefixed insns are enabled, or scan for the then generated code as well. The first option is by far the simplest. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > @@ -0,1 +1,27 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8 -save-temps" } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ p8 here makes no sense, we also want good and correct code generated for older CPUs, so we should not preevent testing on those. For that reason you shouldn't use -mcpu= when not needed. Like here :-)