From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 62D793858D33; Tue, 10 Jan 2023 14:23:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 62D793858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 30AEMkZ6029562; Tue, 10 Jan 2023 08:22:46 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 30AEMjJh029561; Tue, 10 Jan 2023 08:22:45 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Tue, 10 Jan 2023 08:22:45 -0600 From: Segher Boessenkool To: Jiufu Guo Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org Subject: Re: [PATCH] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd Message-ID: <20230110142245.GR25951@gate.crashing.org> References: <20230110134527.194389-1-guojiufu@linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230110134527.194389-1-guojiufu@linux.ibm.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On Tue, Jan 10, 2023 at 09:45:27PM +0800, Jiufu Guo wrote: > As mentioned in PR108338, on p9, we could use mtvsrws to implement > the conversion from SI#0 to SF (or lowpart DI to SF). And we find > we can also enhance the conversion from highpart DI to SF (as the > case in this patch). > > This patch enhances these conversions accordingly. Those aren't conversions, they are just bitcasts, reinterpreting the same datum as something else, but keeping all bits the same. The mtvsrws insn moves a SImode value from a GPR to a VSR, splatting it in all four lanes. You'll typically want a xscvspdpn or similar after that -- but with the value splat in all lanes it will surely be in the lane that later instruction needs the data to be in :-) > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -158,6 +158,7 @@ (define_c_enum "unspec" > UNSPEC_HASHCHK > UNSPEC_XXSPLTIDP_CONST > UNSPEC_XXSPLTIW_CONST > + UNSPEC_P9V_MTVSRWS > ]) Is it hard to decribe this without unspec? Unspecs prevent the compiler from optimising things (unless you very carefully implement all of that manually -- but if you just write it as plain RTL most things fall into place automatically). There are many existing patterns that needlessly and detrimentally use unspecs, but we should improve on that, not make it worse :-) > @@ -8203,10 +8204,19 @@ (define_insn_and_split "movsf_from_si" > rtx op2 = operands[2]; > rtx op1_di = gen_rtx_REG (DImode, REGNO (op1)); > > - /* Move SF value to upper 32-bits for xscvspdpn. */ > - emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32))); > - emit_insn (gen_p8_mtvsrd_sf (op0, op2)); > - emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0)); > + if (TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE) > + { > + emit_insn (gen_p9_mtvsrws (op0, op1_di)); > + emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0)); > + } This does not require TARGET_POWERPC64? P9 implies we have direct moves (those are implied by P8 already). We also do not need to test for vector I think? > +(define_code_iterator any_rshift [ashiftrt lshiftrt]) > + > ;; For extracting high part element from DImode register like: > ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;} > ;; split it before reload with "and mask" to avoid generating shift right > ;; 32 bit then shift left 32 bit. > -(define_insn_and_split "movsf_from_si2" > +(define_insn_and_split "movsf_from_si2_" > [(set (match_operand:SF 0 "gpc_reg_operand" "=wa") > (unspec:SF > [(subreg:SI > - (ashiftrt:DI > + (any_rshift:DI > (match_operand:DI 1 "input_operand" "r") > (const_int 32)) > 0)] Hrm. You can write this with just a subreg, no shift is needed at all. Can you please try that instead? That is nastiness for endianness, but that is still preferable over introducing new complexities like this. > +(define_insn "p9_mtvsrws" > + [(set (match_operand:SF 0 "register_operand" "=wa") > + (unspec:SF [(match_operand:DI 1 "register_operand" "r")] > + UNSPEC_P9V_MTVSRWS))] > + "TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE" > + "mtvsrws %x0,%1" > + [(set_attr "type" "mtvsr")]) (Same issues here as above). > +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */ mtvsrws does not need ppc64. > +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */ > +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */ These two do of course. > +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { has_arch_pwr8 && has_arch_ppc64 } } } } */ But this doesn't. Segher