public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org
Subject: Re: [PATCH] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd
Date: Tue, 10 Jan 2023 08:22:45 -0600	[thread overview]
Message-ID: <20230110142245.GR25951@gate.crashing.org> (raw)
In-Reply-To: <20230110134527.194389-1-guojiufu@linux.ibm.com>

Hi!

On Tue, Jan 10, 2023 at 09:45:27PM +0800, Jiufu Guo wrote:
> As mentioned in PR108338, on p9, we could use mtvsrws to implement
> the conversion from SI#0 to SF (or lowpart DI to SF).  And we find
> we can also enhance the conversion from highpart DI to SF (as the
> case in this patch).
> 
> This patch enhances these conversions accordingly.

Those aren't conversions, they are just bitcasts, reinterpreting the
same datum as something else, but keeping all bits the same.

The mtvsrws insn moves a SImode value from a GPR to a VSR, splatting it
in all four lanes.  You'll typically want a xscvspdpn or similar after
that -- but with the value splat in all lanes it will surely be in the
lane that later instruction needs the data to be in :-)

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,7 @@ (define_c_enum "unspec"
>     UNSPEC_HASHCHK
>     UNSPEC_XXSPLTIDP_CONST
>     UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_P9V_MTVSRWS
>    ])

Is it hard to decribe this without unspec?  Unspecs prevent the compiler
from optimising things (unless you very carefully implement all of that
manually -- but if you just write it as plain RTL most things fall into
place automatically).

There are many existing patterns that needlessly and detrimentally use
unspecs, but we should improve on that, not make it worse :-)

> @@ -8203,10 +8204,19 @@ (define_insn_and_split "movsf_from_si"
>    rtx op2 = operands[2];
>    rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>  
> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +  if (TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
> +    {
> +       emit_insn (gen_p9_mtvsrws (op0, op1_di));
> +       emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +    }

This does not require TARGET_POWERPC64?

P9 implies we have direct moves (those are implied by P8 already).  We
also do not need to test for vector I think?

> +(define_code_iterator any_rshift [ashiftrt lshiftrt])
> +
>  ;; For extracting high part element from DImode register like:
>  ;;     {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>  ;; split it before reload with "and mask" to avoid generating shift right
>  ;; 32 bit then shift left 32 bit.
> -(define_insn_and_split "movsf_from_si2"
> +(define_insn_and_split "movsf_from_si2_<any_rshift:code>"
>    [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>  	    (unspec:SF
>  	     [(subreg:SI
> -	       (ashiftrt:DI
> +	       (any_rshift:DI
>  		(match_operand:DI 1 "input_operand" "r")
>  		(const_int 32))
>  	       0)]

Hrm.  You can write this with just a subreg, no shift is needed at all.
Can you please try that instead?  That is nastiness for endianness, but
that is still preferable over introducing new complexities like this.

> +(define_insn "p9_mtvsrws"
> +  [(set (match_operand:SF 0 "register_operand" "=wa")
> +	(unspec:SF [(match_operand:DI 1 "register_operand" "r")]
> +		   UNSPEC_P9V_MTVSRWS))]
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrws %x0,%1"
> +  [(set_attr "type" "mtvsr")])

(Same issues here as above).

> +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */

mtvsrws does not need ppc64.

> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */
> +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { has_arch_ppc64 && has_arch_pwr9 } } } } */

These two do of course.

> +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { has_arch_pwr8 && has_arch_ppc64 } } } } */

But this doesn't.


Segher

  reply	other threads:[~2023-01-10 14:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-10 13:45 Jiufu Guo
2023-01-10 14:22 ` Segher Boessenkool [this message]
2023-01-11  3:48   ` Jiufu Guo
2023-02-17  1:34 Jiufu Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230110142245.GR25951@gate.crashing.org \
    --to=segher@kernel.crashing.org \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=guojiufu@linux.ibm.com \
    --cc=linkw@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).