From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: rguenther <rguenther@suse.de>
Cc: richard.sandiford <richard.sandiford@arm.com>,
pan2.li <pan2.li@intel.com>,
gcc-patches <gcc-patches@gcc.gnu.org>,
incarnation.p.lee <incarnation.p.lee@outlook.com>,
Kito.cheng <kito.cheng@sifive.com>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Date: Thu, 2 Mar 2023 16:37:40 +0800 [thread overview]
Message-ID: <BAC4E848E6794482+2023030216373961458254@rivai.ai> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2303020822560.27913@jbgna.fhfr.qr>
[-- Attachment #1: Type: text/plain, Size: 13617 bytes --]
Fortunately, we won't have aggregates, arrays of vbool*_t in the future.
I think it's not an issue.
juzhe.zhong@rivai.ai
From: Richard Biener
Date: 2023-03-02 16:25
To: juzhe.zhong
CC: richard.sandiford; pan2.li; gcc-patches; Pan Li; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Thu, 2 Mar 2023, juzhe.zhong@rivai.ai wrote:
> >> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
> >> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
> >> Or is the GCC size larger in some cases than the number of bytes
> >> loaded and stored?
> For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack for register spillling
> according to ADJUST_BYTESIZE.
> After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (vsetvl e8mf8).
> After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (vsetvl e8mf2).
> After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (vsetvl e8mf4).
> After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (vsetvl e8m1).
>
> Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_BYTESIZE
> are equal to the real number of bytes of load/store instruction that RVV ISA define.
>
> Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2BI,VNx4BI,
> we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward according
> to the machine_mode as long as long GCC didn't do the incorrect elimination in middle-end.
>
> Besides, poly (1,1) is 1/8 of machine vector-length which is already really a small number,
> which is the real number bytes loaded/stored for VNx8BI.
> You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually load/stored by appropriate vsetvl
> since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's totally fine so far as long as we can
> gurantee the correctness and I think optimizing such memory storage consuming is trivial.
>
> >> And does it equal the size of the corresponding LLVM machine type?
>
> Well, for some reason, in case of register spilling, LLVM consume much more memory than GCC.
> And they always do whole register load/store (a single vector register vector-length) for register spilling.
> That's another story (I am not going to talk too much about this since it's a quite ugly implementation).
> They don't model the types accurately according RVV ISA for register spilling.
>
> In case of normal load/store like:
> vbool8_t v2 = *(vbool8_t*)in; *(vbool8_t*)(out + 100) = v2;
> This kind of load/store, their load/stores instructions of codegen are accurate.
> Even though their instructions are accurate for load/store accessing behavior, I am not sure whether size
> of their machine type is accurate.
>
> For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
> VNx2BI of GCC is represented as vscale x 2 x i1
> in LLVM IR.
> I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1.
> I didn't take a deep a look at it.
>
> I think this question is not that important, no matter whether VNx1BI and VNx2BI are modeled accurately in case of ADUST_BYTESIZE
> in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case of their bytesize,
> I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine for RVV even though in some case, their memory allocation
> is not accurate in compiler.
I'm not sure how it works for variable-length types but isn't
sizeof (vbool8_t) part of the ABI and thus its TYPE_SIZE / GET_MODE_SIZE
are relevant there? It might of course be that you can never have
these types as part of aggregates, arrays or objects of them address-taken
in which case the issue is moot?
Richard.
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2023-03-02 00:14
> To: Li\, Pan2
> CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> "Li, Pan2" <pan2.li@intel.com> writes:
> > Thanks all for so much valuable and helpful materials.
> >
> > As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
> > the precision and mode size need to be adjusted as below.
> >
> > Precision size [1, 2, 4, 8, 16, 32, 64]
> > Mode size [1, 1, 1, 1, 2, 4, 8]
> >
> > Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
> > The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
> > Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
> >
> > The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
>
> Ah, OK! In that case, would the following help:
>
> Turn:
>
> mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
>
> into:
>
> if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
> &mode_size[E_%smode]))
> mode_size[E_%smode] = -1;
>
> where -1 is an "obviously wrong" value.
>
> Ports that might hit the -1 are then responsible for setting the size
> later, via ADJUST_BYTESIZE.
>
> After all the adjustments are complete, genmodes asserts that no size is
> known_eq to -1.
>
> That way, target-independent code doesn't need to guess what the
> correct behaviour is.
>
> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
> And does it equal the size of the corresponding LLVM machine type?
> Or is the GCC size larger in some cases than the number of bytes
> loaded and stored?
>
> (You and Juzhe have probably answered that question before, sorry,
> but I'm still not 100% sure of the answer. Personally, I think I would
> find the ISA behaviour easier to understand if the explanation doesn't
> involve poly_ints. It would be good to understand things "as the
> architecture sees then" rather than in terms of GCC concepts.)
>
> Thanks,
> Richard
>
> > Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
> > with the adjustment and fails on exact_div assertions.
> >
> > Besides the precision adjustment, I am not sure if we can narrow down the problem to.
> >
> >
> > 1. Defined the real size of both the precision and mode size to align the riscv ISA.
> > 2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
> >
> > Could you please share your professional suggestions about this? Thank you all again and have a nice day!
> >
> > Pan
> >
> > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> > Sent: Wednesday, March 1, 2023 10:19 PM
> > To: rguenther <rguenther@suse.de>
> > Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >
> >>> So given the above I think that modeling the size as being the same
> >>> but with accurate precision would work. It's then only the size of the
> >>> padding in bytes we cannot represent with poly-int which should be fine.
> >
> >>> Correct?
> > Yes.
> >
> >>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> >>> memory address well-defined? That is, how is the padding handled
> >>> by the machine load/store instructions?
> >
> > storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
> > load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
> > You can see here for this case (LLVM):
> > https://godbolt.org/z/P9e1adrd3
> > foo: # @foo
> > vsetvli a2, zero, e8, mf8, ta, ma
> > vsm.v v0, (a0)
> > vsetvli a2, zero, e8, mf4, ta, ma
> > vlm.v v8, (a0)
> > vsm.v v8, (a1)
> > ret
> >
> > We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
> > they have same bytesize.
> >
> > First we emit vsetvl e8mf8 +vsm for VNx1BI
> > Then we emit vsetvl e8mf8 + vlm for VNx2BI
> >
> > Thanks.
> > ________________________________
> > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> >
> > From: Richard Biener<mailto:rguenther@suse.de>
> > Date: 2023-03-01 22:03
> > To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
> > CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > On Wed, 1 Mar 2023, Richard Biener wrote:
> >
> >> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
> >>
> >> > Let's me first introduce RVV load/store basics and stack allocation.
> >> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
> >> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
> >> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
> >> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> >> >
> >> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
> >> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
> >> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
> >> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
> >> >
> >> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
> >> >
> >> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
> >> > emiting the accurate RVV instruction for them according to RVV ISA.
> >> >
> >> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
> >> > The instruction for these modes as follows:
> >> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
> >> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
> >> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
> >> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
> >> >
> >> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
> >> > This implementation is fine even though their memory storage is not accurate.
> >> >
> >> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
> >> >
> >> > (Note: Load same memory base)
> >> > load v0 VNx1BI from base0
> >> > load v1 VNx2BI from base0
> >> > load v2 VNx4BI from base0
> >> > load v3 VNx8BI from base0
> >> >
> >> > store v0 base1
> >> > store v1 base2
> >> > store v2 base3
> >> > store v3 base4
> >> >
> >> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
> >> >
> >> > Then it will become:
> >> >
> >> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
> >> >
> >> > store v0 base1
> >> > store v0 base2
> >> > store v0 base3
> >> > store v0 base4
> >> >
> >> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> >> > and GCC will not do th incorrect elimination for RVV.
> >> >
> >> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
> >> > but accurate data memory access load store behavior.
> >>
> >> So given the above I think that modeling the size as being the same
> >> but with accurate precision would work. It's then only the size of the
> >> padding in bytes we cannot represent with poly-int which should be fine.
> >>
> >> Correct?
> >
> > Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> > memory address well-defined? That is, how is the padding handled
> > by the machine load/store instructions?
> >
> > Richard.
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2023-03-02 8:37 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-16 15:11 incarnation.p.lee
[not found] ` <9800822AA73B1E3D+5F679DFB-633A-446F-BB7F-59ADEEE67E50@rivai.ai>
2023-02-17 7:18 ` Li, Pan2
2023-02-17 7:36 ` Richard Biener
2023-02-17 8:39 ` Li, Pan2
2023-02-21 6:36 ` Li, Pan2
2023-02-21 8:28 ` Kito Cheng
2023-02-24 5:08 ` juzhe.zhong
2023-02-24 7:21 ` Li, Pan2
2023-02-27 3:43 ` Li, Pan2
2023-02-27 14:24 ` Richard Sandiford
2023-02-27 15:13 ` 盼 李
2023-02-28 2:27 ` Li, Pan2
2023-02-28 9:50 ` Richard Sandiford
2023-02-28 9:59 ` 盼 李
2023-02-28 14:07 ` Li, Pan2
2023-03-01 10:11 ` Richard Sandiford
2023-03-01 10:46 ` juzhe.zhong
2023-03-01 10:55 ` 盼 李
2023-03-01 11:11 ` Richard Sandiford
2023-03-01 11:26 ` 盼 李
2023-03-01 11:53 ` 盼 李
2023-03-01 12:03 ` Richard Sandiford
2023-03-01 12:13 ` juzhe.zhong
2023-03-01 12:27 ` 盼 李
2023-03-01 12:33 ` Richard Biener
2023-03-01 12:56 ` Pan Li
2023-03-01 13:11 ` Richard Biener
2023-03-01 13:19 ` Richard Sandiford
2023-03-01 13:26 ` Richard Biener
2023-03-01 13:50 ` juzhe.zhong
2023-03-01 13:59 ` Richard Biener
2023-03-01 14:03 ` Richard Biener
2023-03-01 14:19 ` juzhe.zhong
2023-03-01 15:42 ` Li, Pan2
2023-03-01 15:46 ` Pan Li
2023-03-01 16:14 ` Richard Sandiford
2023-03-01 22:53 ` juzhe.zhong
2023-03-02 6:07 ` Li, Pan2
2023-03-02 8:25 ` Richard Biener
2023-03-02 8:37 ` juzhe.zhong [this message]
2023-03-02 9:39 ` Richard Sandiford
2023-03-02 10:19 ` juzhe.zhong
[not found] ` <2023030121501634323743@rivai.ai>
2023-03-01 13:52 ` juzhe.zhong
2023-03-02 5:55 ` [PATCH v2] " pan2.li
2023-03-02 9:43 ` Richard Sandiford
2023-03-02 14:46 ` Li, Pan2
2023-03-02 17:54 ` Richard Sandiford
2023-03-03 2:34 ` Li, Pan2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BAC4E848E6794482+2023030216373961458254@rivai.ai \
--to=juzhe.zhong@rivai.ai \
--cc=gcc-patches@gcc.gnu.org \
--cc=incarnation.p.lee@outlook.com \
--cc=kito.cheng@sifive.com \
--cc=pan2.li@intel.com \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).