Re: cpymem for RISCV with v extension

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <jeffreyalaw@gmail.com>
To: Joern Rennecke <joern.rennecke@embecosm.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: cpymem for RISCV with v extension
Date: Fri, 4 Aug 2023 14:52:31 -0600	[thread overview]
Message-ID: <ee21fdc0-3106-1239-8ef1-bae896d29fc6@gmail.com> (raw)
In-Reply-To: <CAMqJFCpuUTjoaPzJvr7fxxWZm1d=FZ=K7feigkXOYx6A1yh+Sw@mail.gmail.com>



On 7/17/23 22:47, Joern Rennecke wrote:
> Subject:
> cpymem for RISCV with v extension
> From:
> Joern Rennecke <joern.rennecke@embecosm.com>
> Date:
> 7/17/23, 22:47
> 
> To:
> GCC Patches <gcc-patches@gcc.gnu.org>
> 
> 
> As discussed on last week's patch call, this patch uses either a
> straight copy or an opaque pattern that emits the loop as assembly to
> optimize cpymem for the 'v' extension.
> I used Ju-Zhe Zhong's patch - starting in git with:
> 
> Author: zhongjuzhe<66454988+zhongjuzhe@users.noreply.github.com>
> Date:   Mon Mar 21 14:20:42 2022 +0800
> 
>        PR for RVV support using splitted small chunks (#334)
> 
> as a starting point, even though not all that much of the original code remains.
> 
> Regression tested on x86_64-pc-linux-gnu X
>      riscv-sim
>      riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
>      riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
>      riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
>      riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
>      riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
>      riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
>      riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
> 
> 
> cpymem-diff-20230718.txt
> 
> 2023-07-12  Ju-Zhe Zhong<juzhe.zhong@rivai.ai>
>              Joern Rennecke<joern.rennecke@embecosm.com>
> 
> 	* config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
> 	Declare.
> 	* config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
> 	New function.
> 	* config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move.
> 	* config/riscv/vector.md (@cpymem_straight<P:mode><V_WHOLE:mode>):
> 	New define_insn patterns.
> 	(@cpymem_loop<P:mode><V_WHOLE:mode>): Likewise.
> 	(@cpymem_loop_fast<P:mode><V_WHOLE:mode>): Likewise.
> 

> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index b4884a30872..e61110fa3ad 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -49,6 +49,7 @@
>   #include "tm-constrs.h"
>   #include "rtx-vector-builder.h"
>   #include "targhooks.h"
> +#include "predict.h"
Not sure this is needed, but I didn't scan for it explicitly.  If it's 
not needed, then remove it.



> +  if (CONST_INT_P (length_in))
> +    {
> +      HOST_WIDE_INT length = INTVAL (length_in);
> +
> +    /* By using LMUL=8, we can copy as many bytes in one go as there
> +       are bits in a vector register.  If the entire block thus fits,
> +       we don't need a loop.  */
> +    if (length <= TARGET_MIN_VLEN)
> +      {
> +	need_loop = false;
> +
> +	/* If a single scalar load / store pair can do the job, leave it
> +	   to the scalar code to do that.  */
> +
> +	if (pow2p_hwi (length) && length <= potential_ew)
> +	  return false;
> +      }
We could probably argue over the threshold for doing the copy on the 
scalar side, but I don't think it's necessary.  Once we start seeing V 
hardware we can revisit.


> +
> +      /* Find the vector mode to use.  Using the largest possible element
> +	 size is likely to give smaller constants, and thus potentially
> +	 reducing code size.  However, if we need a loop, we need to update
> +	 the pointers, and that is more complicated with a larger element
> +	 size, unless we use an immediate, which prevents us from dynamically
> +	 using the largets transfer size that the hart supports.  And then,
> +	 unless we know the*exact*  vector size of the hart, we'd need
> +	 multiple vsetvli / branch statements, so it's not even a size win.
> +	 If, in the future, we find an RISCV-V implementation that is slower
> +	 for small element widths, we might allow larger element widths for
> +	 loops too.  */
s/largets/largest/

And a space is missing in "the*extact*"

Note that I think the proposed glibc copier does allow larger elements 
widths for this case.

> +
> +	  /* Unless we get an implementation that's slow for small element
> +	     size / non-word-aligned accesses, we assume that the hardware
> +	     handles this well, and we don't want to complicate the code
> +	     with shifting word contents around or handling extra bytes at
> +	     the start and/or end.  So we want the total transfer size and
> +	     alignemnt to fit with the element size.  */
s/alignemnt/alignment/

Yes, let's not try to support every uarch we can envision and instead do 
a good job on the uarches we know about.    If a uarch with slow element 
or non-word aligned accesses comes along, they can propose changes at 
that time.



> +
> +	  // The VNx*?I modes have a factor of riscv_vector_chunks for nunits.
Comment might need updating after the recent work to adjust the modes. 
I don't recall if we kept the VNx*?I modes or not.

So the adjustments are all comment related, so this is OK after fixing 
the comments.  Just post the update for archival purposes and consider 
it pre-approved for the trunk.

Thanks and sorry for the wait Joern.

jeff

next prev parent reply	other threads:[~2023-08-04 20:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18  4:47 Joern Rennecke
2023-08-04 20:52 ` Jeff Law [this message]
2023-08-15  1:46   ` Joern Rennecke
2023-08-15 13:46     ` Jeff Law
2023-08-04 23:10 钟居哲
2023-08-04 23:17 ` Jeff Law
2023-08-04 23:34   ` 钟居哲
2023-08-15  8:12     ` Joern Rennecke
2023-08-15  9:16       ` juzhe.zhong
2023-08-15 14:06         ` Jeff Law
2023-08-15 14:04       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee21fdc0-3106-1239-8ef1-bae896d29fc6@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=joern.rennecke@embecosm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).