public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: <juzhe.zhong@rivai.ai>
To: richard.sandiford <richard.sandiford@arm.com>,
	 gcc-patches <gcc-patches@gcc.gnu.org>
Cc: rguenther <rguenther@suse.de>,
	 "Pan Li" <incarnation.p.lee@outlook.com>,
	 pan2.li <pan2.li@intel.com>,  kito.cheng <kito.cheng@sifive.com>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Date: Wed, 1 Mar 2023 21:52:55 +0800	[thread overview]
Message-ID: <15BABD96BF2038FC+2023030121525471809945@rivai.ai> (raw)
In-Reply-To: <2023030121501634323743@rivai.ai>

[-- Attachment #1: Type: text/plain, Size: 5652 bytes --]

Sorry for missleading typo.

>> VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
>> VNx2BI: vsevl e8mf8 + vlm,  loading 1/4 of poly (1,1) storage.
>> VNx4BI: vsevl e8mf8 + vlm,  loading 1/2 of poly (1,1) storage.
>> VNx8BI: vsevl e8mf8 + vlm,  loading 1 of poly (1,1) storage.

It should be:
 VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
 VNx2BI: vsevl e8mf4 + vlm,  loading 1/4 of poly (1,1) storage.
 VNx4BI: vsevl e8mf2 + vlm,  loading 1/2 of poly (1,1) storage.
 VNx8BI: vsevl e8m1 + vlm,  loading 1 of poly (1,1) storage.

Plz be aware of this . Thanks. 


juzhe.zhong@rivai.ai
 
From: juzhe.zhong@rivai.ai
Date: 2023-03-01 21:50
To: richard.sandiford; gcc-patches
CC: rguenther; Pan Li; pan2.li; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Let's me first introduce RVV load/store basics  and stack allocation.
For scalable vector memory allocation, we allocate memory according to machine vector-length.
To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.

Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
the same memory storage size since we can't model them accurately according to precision or you bitsize.

They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).

Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
emiting  the accurate RVV instruction for them according to RVV ISA.

VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size  poly (1,1)
The instruction for these modes as follows:
VNx1BI: vsevl e8mf8 + vlm,  loading 1/8 of poly (1,1) storage.
VNx2BI: vsevl e8mf8 + vlm,  loading 1/4 of poly (1,1) storage.
VNx4BI: vsevl e8mf8 + vlm,  loading 1/2 of poly (1,1) storage.
VNx8BI: vsevl e8mf8 + vlm,  loading 1 of poly (1,1) storage.

So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
This implementation is fine even though their memory storage is not accurate.

However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:

(Note: Load same memory base)
load v0 VNx1BI from base0
load v1 VNx2BI from base0
load v2 VNx4BI from base0
load v3 VNx8BI from base0

store v0 base1
store v1 base2
store v2 base3
store v3 base4

This program sequence, in GCC, it will eliminate the last 3 load instructions.

Then it will become:

load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)

store v0 base1
store v0 base2
store v0 base3
store v0 base4

This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
and GCC will not do th incorrect elimination for RVV. 

I think it can work fine  even though these 4 modes consume inaccurate memory storage size
but accurate data memory access load store behavior.

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-03-01 21:19
To: Pan Li via Gcc-patches
CC: Richard Biener; Pan Li; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
>
> The different precision try to resolve the below bugs, the second vlm(with different size of load bytes compared to first one)
> is eliminated because vbool8 and vbool16 have the same precision size, aka [8, 8].
>
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>
> addi    a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi    a1,a1,200
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v   v24,0(a1)
 
But I think it's important to think about the patch as more than a way
of fixing the bug above.  The aim has to be to describe the modes as they
really are.
 
I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong".
A GET_MODE_SIZE that is too small would cause problems.  So would a
GET_MODE_SIZE that is too big.
 
Like Richard says, I think the question comes down to the amount of padding.
Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits
of padding for even X and 0 bits of padding for odd X?
 
I agree getting rid of GET_MODE_SIZE and representing everything in bits
would avoid the problem at this point, but I think it would just be pushing
the difficulty elsewhere.  E.g. stack layout will be "interesting" if we
can't work in byte sizes.
 
Thanks,
Richard
 

  parent reply	other threads:[~2023-03-01 13:53 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-16 15:11 incarnation.p.lee
     [not found] ` <9800822AA73B1E3D+5F679DFB-633A-446F-BB7F-59ADEEE67E50@rivai.ai>
2023-02-17  7:18   ` Li, Pan2
2023-02-17  7:36   ` Richard Biener
2023-02-17  8:39     ` Li, Pan2
2023-02-21  6:36       ` Li, Pan2
2023-02-21  8:28         ` Kito Cheng
2023-02-24  5:08           ` juzhe.zhong
2023-02-24  7:21             ` Li, Pan2
2023-02-27  3:43               ` Li, Pan2
2023-02-27 14:24 ` Richard Sandiford
2023-02-27 15:13   ` 盼 李
2023-02-28  2:27     ` Li, Pan2
2023-02-28  9:50       ` Richard Sandiford
2023-02-28  9:59         ` 盼 李
2023-02-28 14:07           ` Li, Pan2
2023-03-01 10:11             ` Richard Sandiford
2023-03-01 10:46               ` juzhe.zhong
2023-03-01 10:55                 ` 盼 李
2023-03-01 11:11                   ` Richard Sandiford
2023-03-01 11:26                     ` 盼 李
2023-03-01 11:53                     ` 盼 李
2023-03-01 12:03                       ` Richard Sandiford
2023-03-01 12:13                         ` juzhe.zhong
2023-03-01 12:27                           ` 盼 李
2023-03-01 12:33                         ` Richard Biener
2023-03-01 12:56                           ` Pan Li
2023-03-01 13:11                             ` Richard Biener
2023-03-01 13:19                             ` Richard Sandiford
2023-03-01 13:26                               ` Richard Biener
2023-03-01 13:50                               ` juzhe.zhong
2023-03-01 13:59                                 ` Richard Biener
2023-03-01 14:03                                   ` Richard Biener
2023-03-01 14:19                                     ` juzhe.zhong
2023-03-01 15:42                                       ` Li, Pan2
2023-03-01 15:46                                         ` Pan Li
2023-03-01 16:14                                         ` Richard Sandiford
2023-03-01 22:53                                           ` juzhe.zhong
2023-03-02  6:07                                             ` Li, Pan2
2023-03-02  8:25                                             ` Richard Biener
2023-03-02  8:37                                               ` juzhe.zhong
2023-03-02  9:39                                                 ` Richard Sandiford
2023-03-02 10:19                                                   ` juzhe.zhong
     [not found]                               ` <2023030121501634323743@rivai.ai>
2023-03-01 13:52                                 ` juzhe.zhong [this message]
2023-03-02  5:55 ` [PATCH v2] " pan2.li
2023-03-02  9:43   ` Richard Sandiford
2023-03-02 14:46     ` Li, Pan2
2023-03-02 17:54       ` Richard Sandiford
2023-03-03  2:34         ` Li, Pan2

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15BABD96BF2038FC+2023030121525471809945@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=incarnation.p.lee@outlook.com \
    --cc=kito.cheng@sifive.com \
    --cc=pan2.li@intel.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).