From: Jeff Law <jeffreyalaw@gmail.com>
To: Joern Rennecke <joern.rennecke@embecosm.com>,
GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: cpymem for RISCV with v extension
Date: Fri, 4 Aug 2023 14:52:31 -0600 [thread overview]
Message-ID: <ee21fdc0-3106-1239-8ef1-bae896d29fc6@gmail.com> (raw)
In-Reply-To: <CAMqJFCpuUTjoaPzJvr7fxxWZm1d=FZ=K7feigkXOYx6A1yh+Sw@mail.gmail.com>
On 7/17/23 22:47, Joern Rennecke wrote:
> Subject:
> cpymem for RISCV with v extension
> From:
> Joern Rennecke <joern.rennecke@embecosm.com>
> Date:
> 7/17/23, 22:47
>
> To:
> GCC Patches <gcc-patches@gcc.gnu.org>
>
>
> As discussed on last week's patch call, this patch uses either a
> straight copy or an opaque pattern that emits the loop as assembly to
> optimize cpymem for the 'v' extension.
> I used Ju-Zhe Zhong's patch - starting in git with:
>
> Author: zhongjuzhe<66454988+zhongjuzhe@users.noreply.github.com>
> Date: Mon Mar 21 14:20:42 2022 +0800
>
> PR for RVV support using splitted small chunks (#334)
>
> as a starting point, even though not all that much of the original code remains.
>
> Regression tested on x86_64-pc-linux-gnu X
> riscv-sim
> riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
> riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
> riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
> riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
> riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
> riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
> riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
>
>
> cpymem-diff-20230718.txt
>
> 2023-07-12 Ju-Zhe Zhong<juzhe.zhong@rivai.ai>
> Joern Rennecke<joern.rennecke@embecosm.com>
>
> * config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
> Declare.
> * config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
> New function.
> * config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move.
> * config/riscv/vector.md (@cpymem_straight<P:mode><V_WHOLE:mode>):
> New define_insn patterns.
> (@cpymem_loop<P:mode><V_WHOLE:mode>): Likewise.
> (@cpymem_loop_fast<P:mode><V_WHOLE:mode>): Likewise.
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index b4884a30872..e61110fa3ad 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -49,6 +49,7 @@
> #include "tm-constrs.h"
> #include "rtx-vector-builder.h"
> #include "targhooks.h"
> +#include "predict.h"
Not sure this is needed, but I didn't scan for it explicitly. If it's
not needed, then remove it.
> + if (CONST_INT_P (length_in))
> + {
> + HOST_WIDE_INT length = INTVAL (length_in);
> +
> + /* By using LMUL=8, we can copy as many bytes in one go as there
> + are bits in a vector register. If the entire block thus fits,
> + we don't need a loop. */
> + if (length <= TARGET_MIN_VLEN)
> + {
> + need_loop = false;
> +
> + /* If a single scalar load / store pair can do the job, leave it
> + to the scalar code to do that. */
> +
> + if (pow2p_hwi (length) && length <= potential_ew)
> + return false;
> + }
We could probably argue over the threshold for doing the copy on the
scalar side, but I don't think it's necessary. Once we start seeing V
hardware we can revisit.
> +
> + /* Find the vector mode to use. Using the largest possible element
> + size is likely to give smaller constants, and thus potentially
> + reducing code size. However, if we need a loop, we need to update
> + the pointers, and that is more complicated with a larger element
> + size, unless we use an immediate, which prevents us from dynamically
> + using the largets transfer size that the hart supports. And then,
> + unless we know the*exact* vector size of the hart, we'd need
> + multiple vsetvli / branch statements, so it's not even a size win.
> + If, in the future, we find an RISCV-V implementation that is slower
> + for small element widths, we might allow larger element widths for
> + loops too. */
s/largets/largest/
And a space is missing in "the*extact*"
Note that I think the proposed glibc copier does allow larger elements
widths for this case.
> +
> + /* Unless we get an implementation that's slow for small element
> + size / non-word-aligned accesses, we assume that the hardware
> + handles this well, and we don't want to complicate the code
> + with shifting word contents around or handling extra bytes at
> + the start and/or end. So we want the total transfer size and
> + alignemnt to fit with the element size. */
s/alignemnt/alignment/
Yes, let's not try to support every uarch we can envision and instead do
a good job on the uarches we know about. If a uarch with slow element
or non-word aligned accesses comes along, they can propose changes at
that time.
> +
> + // The VNx*?I modes have a factor of riscv_vector_chunks for nunits.
Comment might need updating after the recent work to adjust the modes.
I don't recall if we kept the VNx*?I modes or not.
So the adjustments are all comment related, so this is OK after fixing
the comments. Just post the update for archival purposes and consider
it pre-approved for the trunk.
Thanks and sorry for the wait Joern.
jeff
next prev parent reply other threads:[~2023-08-04 20:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 4:47 Joern Rennecke
2023-08-04 20:52 ` Jeff Law [this message]
2023-08-15 1:46 ` Joern Rennecke
2023-08-15 13:46 ` Jeff Law
2023-08-04 23:10 钟居哲
2023-08-04 23:17 ` Jeff Law
2023-08-04 23:34 ` 钟居哲
2023-08-15 8:12 ` Joern Rennecke
2023-08-15 9:16 ` juzhe.zhong
2023-08-15 14:06 ` Jeff Law
2023-08-15 14:04 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ee21fdc0-3106-1239-8ef1-bae896d29fc6@gmail.com \
--to=jeffreyalaw@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=joern.rennecke@embecosm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).