From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=FgJ5=5H=kernel.crashing.org=segher@sourceware.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
	by sourceware.org (Postfix) with ESMTP id 62D793858D33;
	Tue, 10 Jan 2023 14:23:48 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 62D793858D33
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
	by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 30AEMkZ6029562;
	Tue, 10 Jan 2023 08:22:46 -0600
Received: (from segher@localhost)
	by gate.crashing.org (8.14.1/8.14.1/Submit) id 30AEMjJh029561;
	Tue, 10 Jan 2023 08:22:45 -0600
X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f
Date: Tue, 10 Jan 2023 08:22:45 -0600
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org
Subject: Re: [PATCH] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd
Message-ID: <20230110142245.GR25951@gate.crashing.org>
References: <20230110134527.194389-1-guojiufu@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230110134527.194389-1-guojiufu@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi!

On Tue, Jan 10, 2023 at 09:45:27PM +0800, Jiufu Guo wrote:
> As mentioned in PR108338, on p9, we could use mtvsrws to implement
> the conversion from SI#0 to SF (or lowpart DI to SF).  And we find
> we can also enhance the conversion from highpart DI to SF (as the
> case in this patch).
> 
> This patch enhances these conversions accordingly.

Those aren't conversions, they are just bitcasts, reinterpreting the
same datum as something else, but keeping all bits the same.

The mtvsrws insn moves a SImode value from a GPR to a VSR, splatting it
in all four lanes.  You'll typically want a xscvspdpn or similar after
that -- but with the value splat in all lanes it will surely be in the
lane that later instruction needs the data to be in :-)

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,7 @@ (define_c_enum "unspec"
>     UNSPEC_HASHCHK
>     UNSPEC_XXSPLTIDP_CONST
>     UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_P9V_MTVSRWS
>    ])

Is it hard to decribe this without unspec?  Unspecs prevent the compiler
from optimising things (unless you very carefully implement all of that
manually -- but if you just write it as plain RTL most things fall into
place automatically).

There are many existing patterns that needlessly and detrimentally use
unspecs, but we should improve on that, not make it worse :-)

> @@ -8203,10 +8204,19 @@ (define_insn_and_split "movsf_from_si"
>    rtx op2 = operands[2];
>    rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>  
> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +  if (TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
> +    {
> +       emit_insn (gen_p9_mtvsrws (op0, op1_di));
> +       emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +    }

This does not require TARGET_POWERPC64?

P9 implies we have direct moves (those are implied by P8 already).  We
also do not need to test for vector I think?

> +(define_code_iterator any_rshift [ashiftrt lshiftrt])
> +
>  ;; For extracting high part element from DImode register like:
>  ;;     {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>  ;; split it before reload with "and mask" to avoid generating shift right
>  ;; 32 bit then shift left 32 bit.
> -(define_insn_and_split "movsf_from_si2"
> +(define_insn_and_split "movsf_from_si2_<any_rshift:code>"
>    [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>  	    (unspec:SF
>  	     [(subreg:SI
> -	       (ashiftrt:DI
> +	       (any_rshift:DI
>  		(match_operand:DI 1 "input_operand" "r")
>  		(const_int 32))
>  	       0)]

Hrm.  You can write this with just a subreg, no shift is needed at all.
Can you please try that instead?  That is nastiness for endianness, but
that is still preferable over introducing new complexities like this.

> +(define_insn "p9_mtvsrws"
> +  [(set (match_operand:SF 0 "register_operand" "=wa")
> +	(unspec:SF [(match_operand:DI 1 "register_operand" "r")]
> +		   UNSPEC_P9V_MTVSRWS))]
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrws %x0,%1"
> +  [(set_attr "type" "mtvsr")])

(Same issues here as above).

> +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */

mtvsrws does not need ppc64.

> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */
> +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */

These two do of course.

> +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { has_arch_pwr8 && has_arch_ppc64 } } } } */

But this doesn't.


Segher