[PATCH] RISC-V: Fix RVV mask mode size

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] RISC-V: Fix RVV mask mode size
@ 2022-12-14  6:48 juzhe.zhong
  2022-12-16 20:22 ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: juzhe.zhong @ 2022-12-14  6:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, palmer, Ju-Zhe Zhong

From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

This patch is to fix RVV mask modes size. Since mask mode size are adjust
as a whole RVV register size LMUL = 1 which not only make each mask type for
example vbool32_t tied to vint8m1_t but also increase memory consuming.

I notice this issue during development of VSETVL PASS. Since it is not part of
VSETVL support, I seperate it into a single fix patch now.

gcc/ChangeLog:

        * config/riscv/riscv-modes.def (ADJUST_BYTESIZE): Reduce RVV mask mode size.
        * config/riscv/riscv.cc (riscv_v_adjust_bytesize): New function.
        (riscv_modes_tieable_p): Don't tie mask modes which will create issue.
        * config/riscv/riscv.h (riscv_v_adjust_bytesize): New function.

---
 gcc/config/riscv/riscv-modes.def | 14 ++++----
 gcc/config/riscv/riscv.cc        | 61 ++++++++++++++++++++++++++++++++
 gcc/config/riscv/riscv.h         |  1 +
 3 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 556b5c55253..339b41b32eb 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
 ADJUST_ALIGNMENT (VNx32BI, 1);
 ADJUST_ALIGNMENT (VNx64BI, 1);
 
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
+ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
+ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
+ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
+ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
 
 /*
    | Mode        | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1198a08b13e..2d380aa42cb 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -979,6 +979,46 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
   return scale;
 }
 
+/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
+   BYTES for corresponding MODE_VECTOR_BOOL machine_mode.  */
+
+poly_int64
+riscv_v_adjust_bytesize (machine_mode mode, int scale)
+{
+  /* According to RVV ISA, each BOOL element occupy 1-bit.
+     However, GCC assume each BOOL element occupy at least
+     1-bytes. ??? TODO: Maybe we can adjust it and support
+     1-bit BOOL in the future ????
+
+     One solution is to adjust all MODE_VECTOR_BOOL with
+     the same size which is LMUL = 1. However, for VNx1BImode
+     which only occupy a small fractional bytes of a single
+     LMUL = 1 size that is wasting memory usage and increasing
+     memory access traffic.
+
+     Ideally, a RVV mask datatype like 'vbool64_t' for example
+     which is VNx1BI when TARGET_MIN_VLEN > 32 should be the
+     BYTESIZE of 1/8 of vint8mf8_t (VNx1QImode) according to RVV
+     ISA. However, GCC can not support 1-bit bool value, we can
+     only adjust the BYTESIZE to the smallest size which the
+     BYTESIZE of vint8mf8_t (VNx1QImode).
+
+     Base on this circumstance, we can model MODE_VECOR_BOOL
+     as small bytesize as possible so that we could reduce
+     memory traffic and memory consuming.  */
+
+  /* Only adjust BYTESIZE of RVV mask mode.  */
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+  if (riscv_v_ext_vector_mode_p (mode))
+    {
+      if (known_lt (GET_MODE_SIZE (mode), BYTES_PER_RISCV_VECTOR))
+	return GET_MODE_SIZE (mode);
+      else
+	return BYTES_PER_RISCV_VECTOR;
+    }
+  return scale;
+}
+
 /* Return true if X is a valid address for machine mode MODE.  If it is,
    fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
    effect.  */
@@ -5735,6 +5775,27 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 static bool
 riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
+  if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
+    {
+      /* Base on the riscv_v_adjust_bytesize, RVV mask mode is not
+	 accurately modeled. For example, we model VNx1BI as the
+	 BYTESIZE of VNx1QImode even though VNx1BI should be the
+	 1/8 of VNx1QImode BYTESIZE. We shouldn't allow them to be
+	 tieable each other since it produce incorrect codegen.
+
+	 For example:
+	   if (cond == 0) {
+	    vint8mf8_t v = *(vint8mf8_t*)in;
+	   } else {
+	    vbool64_t v = *(vbool64_t*)in;
+	   }
+	 GCC will tie them together which is incorrect since they
+	 are the same BYTESIZE.  */
+      if (GET_MODE_CLASS (mode1) == MODE_VECTOR_BOOL
+	  || GET_MODE_CLASS (mode2) == MODE_VECTOR_BOOL)
+	return mode1 == mode2;
+      return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
+    }
   return (mode1 == mode2
 	  || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
 	       && GET_MODE_CLASS (mode2) == MODE_FLOAT));
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index defb475f948..b9cb6b9859c 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1034,6 +1034,7 @@ extern unsigned riscv_stack_boundary;
 extern unsigned riscv_bytes_per_vector_chunk;
 extern poly_uint16 riscv_vector_chunks;
 extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
+extern poly_int64 riscv_v_adjust_bytesize (enum machine_mode, int);
 /* The number of bits and bytes in a RVV vector.  */
 #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
 #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
-- 
2.36.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fix RVV mask mode size
  2022-12-14  6:48 [PATCH] RISC-V: Fix RVV mask mode size juzhe.zhong
@ 2022-12-16 20:22 ` Jeff Law
  2022-12-17  1:44   ` 钟居哲
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2022-12-16 20:22 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng, palmer

On 12/13/22 23:48, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> This patch is to fix RVV mask modes size. Since mask mode size are adjust
> as a whole RVV register size LMUL = 1 which not only make each mask type for
> example vbool32_t tied to vint8m1_t but also increase memory consuming.
> 
> I notice this issue during development of VSETVL PASS. Since it is not part of
> VSETVL support, I seperate it into a single fix patch now.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-modes.def (ADJUST_BYTESIZE): Reduce RVV mask mode size.
>          * config/riscv/riscv.cc (riscv_v_adjust_bytesize): New function.
>          (riscv_modes_tieable_p): Don't tie mask modes which will create issue.
>          * config/riscv/riscv.h (riscv_v_adjust_bytesize): New function.
So I haven't really studied the masking model for RVV (yet).  But 
there's two models that I'm generally aware of.

One model has a bit per element in the vector we're operating on.  So a 
V4DF will have 4 bits in the mask.  I generally call this the dense or 
packed model.

The other model has a bit for every element for the maximal number of 
elements that can ever appear in a vector.  So if we support an element 
length of 8bits and a 1kbit vector, then the sparse model would have 128 
bits regardless of the size of the object being operated on.  So we'd 
still have 128 bits for V4DF, but the vast majority would be don't cares.

ISTM that you're trying to set the mode size to the smallest possible 
which would seem to argue that you want the dense/packed mask model. 
Does that actually match what the hardware does?  If not, then don't we 
need to convert back and forth?

Or maybe I'm missing something here?!?

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH] RISC-V: Fix RVV mask mode size
  2022-12-16 20:22 ` Jeff Law
@ 2022-12-17  1:44   ` 钟居哲
  2022-12-17  1:53     ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: 钟居哲 @ 2022-12-17  1:44 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: kito.cheng, palmer

[-- Attachment #1: Type: text/plain, Size: 2489 bytes --]

Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
For example vlm or vsm we will load store 8-bit ??? (I am not sure hardward can load store 4bit,but I am sure it definetly not load store the whole register size)
So ideally it should be model more accurate. However, since GCC assumes that 1 BOOL is 1-byte, the only thing I do is to model mask mode as smallest as possible.
Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since it will need to change GCC framework.

juzhe.zhong@rivai.ai

From: Jeff Law
Date: 2022-12-17 04:22
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix RVV mask mode size

On 12/13/22 23:48, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> This patch is to fix RVV mask modes size. Since mask mode size are adjust
> as a whole RVV register size LMUL = 1 which not only make each mask type for
> example vbool32_t tied to vint8m1_t but also increase memory consuming.
> 
> I notice this issue during development of VSETVL PASS. Since it is not part of
> VSETVL support, I seperate it into a single fix patch now.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-modes.def (ADJUST_BYTESIZE): Reduce RVV mask mode size.
>          * config/riscv/riscv.cc (riscv_v_adjust_bytesize): New function.
>          (riscv_modes_tieable_p): Don't tie mask modes which will create issue.
>          * config/riscv/riscv.h (riscv_v_adjust_bytesize): New function.
So I haven't really studied the masking model for RVV (yet).  But 
there's two models that I'm generally aware of.

One model has a bit per element in the vector we're operating on.  So a 
V4DF will have 4 bits in the mask.  I generally call this the dense or 
packed model.

The other model has a bit for every element for the maximal number of 
elements that can ever appear in a vector.  So if we support an element 
length of 8bits and a 1kbit vector, then the sparse model would have 128 
bits regardless of the size of the object being operated on.  So we'd 
still have 128 bits for V4DF, but the vast majority would be don't cares.

ISTM that you're trying to set the mode size to the smallest possible 
which would seem to argue that you want the dense/packed mask model. 
Does that actually match what the hardware does?  If not, then don't we 
need to convert back and forth?

Or maybe I'm missing something here?!?

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fix RVV mask mode size
  2022-12-17  1:44   ` 钟居哲
@ 2022-12-17  1:53     ` Jeff Law
  2022-12-17  2:15       ` 钟居哲
  2022-12-19  7:44       ` Richard Biener
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff Law @ 2022-12-17  1:53 UTC (permalink / raw)
  To: 钟居哲, gcc-patches; +Cc: kito.cheng, palmer



On 12/16/22 18:44, 钟居哲 wrote:
> Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> For example vlm or vsm we will load store 8-bit ??? (I am not sure 
> hardward can load store 4bit,but I am sure it definetly not load store 
> the whole register size)
Most likely than not you end up loading a larger quantity with the high 
bits zero'd.  Interesting that we're using a packed model.  I'd been 
told it was fairly expensive to implement in hardware relative to teh 
cost of implementing the sparse model.

> So ideally it should be model more accurate. However, since GCC assumes 
> that 1 BOOL is 1-byte, the only thing I do is to model mask mode as 
> smallest as possible.
> Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since 
> it will need to change GCC framework.
I'm a bit confused by this.  GCC can support single bit bools, though 
ports often extend them to 8 bits or more for computational efficiency 
purposes.  At least that's the case in general.  Is there something 
particularly special about masks & bools that's causing problems?

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH] RISC-V: Fix RVV mask mode size
  2022-12-17  1:53     ` Jeff Law
@ 2022-12-17  2:15       ` 钟居哲
  2022-12-19  7:44       ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: 钟居哲 @ 2022-12-17  2:15 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: kito.cheng, palmer

[-- Attachment #1: Type: text/plain, Size: 2703 bytes --]

>> Most likely than not you end up loading a larger quantity with the high
>> bits zero'd.  Interesting that we're using a packed model.  I'd been
>> told it was fairly expensive to implement in hardware relative to teh
>> cost of implementing the sparse model.

>> I'm a bit confused by this.  GCC can support single bit bools, though
>> ports often extend them to 8 bits or more for computational efficiency
>> purposes.  At least that's the case in general.  Is there something
>> particularly special about masks & bools that's causing problems?
I am not sure I am on the same page with you. I don't understand what is the
sparse model you said. The only thing I do in this patch is that we change the BYTESIZE VNx1BI for example
as the BYTESIZE of VNx1BI (Original I adjust all mask modes same size as VNx8QImode like LLVM). 
And I print the GET_MODE_SIZE (VNx1BI) the value is the same as VNx1QImode so I assume because GCC model 1-bool same as 1-QI???
Actually I not sure but I am sure after this patch, VNx1BI is adjusted smaller size.

Adjusting mask modes as smaller size always beneficial, since we can use vlm && vsm in register spilling, it can reduce the memory consuming and
load store hardware bandwidth.

Unlike LLVM, LLVM make each fractional vector and mask vector same size as LMUL =1 so they use vl1r/vs1r to do the register spilling which is not
optimal.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2022-12-17 09:53
To: 钟居哲; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix RVV mask mode size
 
 
On 12/16/22 18:44, 钟居哲 wrote:
> Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> For example vlm or vsm we will load store 8-bit ??? (I am not sure 
> hardward can load store 4bit,but I am sure it definetly not load store 
> the whole register size)
Most likely than not you end up loading a larger quantity with the high 
bits zero'd.  Interesting that we're using a packed model.  I'd been 
told it was fairly expensive to implement in hardware relative to teh 
cost of implementing the sparse model.
 
> So ideally it should be model more accurate. However, since GCC assumes 
> that 1 BOOL is 1-byte, the only thing I do is to model mask mode as 
> smallest as possible.
> Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since 
> it will need to change GCC framework.
I'm a bit confused by this.  GCC can support single bit bools, though 
ports often extend them to 8 bits or more for computational efficiency 
purposes.  At least that's the case in general.  Is there something 
particularly special about masks & bools that's causing problems?
 
Jeff
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fix RVV mask mode size
  2022-12-17  1:53     ` Jeff Law
  2022-12-17  2:15       ` 钟居哲
@ 2022-12-19  7:44       ` Richard Biener
  2022-12-27 20:46         ` Jeff Law
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Biener @ 2022-12-19  7:44 UTC (permalink / raw)
  To: Jeff Law; +Cc: 钟居哲, gcc-patches, kito.cheng, palmer

On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 12/16/22 18:44, 钟居哲 wrote:
> > Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> > For example vlm or vsm we will load store 8-bit ??? (I am not sure
> > hardward can load store 4bit,but I am sure it definetly not load store
> > the whole register size)
> Most likely than not you end up loading a larger quantity with the high
> bits zero'd.  Interesting that we're using a packed model.  I'd been
> told it was fairly expensive to implement in hardware relative to teh
> cost of implementing the sparse model.

Since the masks are extra inputs if you use a packed model you need
to wire less bits into the execution units for the masks which I guess
is actually cheaper.  Yes, producing the masks might be more complicated.

> > So ideally it should be model more accurate. However, since GCC assumes
> > that 1 BOOL is 1-byte, the only thing I do is to model mask mode as
> > smallest as possible.
> > Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since
> > it will need to change GCC framework.
> I'm a bit confused by this.  GCC can support single bit bools, though
> ports often extend them to 8 bits or more for computational efficiency
> purposes.  At least that's the case in general.  Is there something
> particularly special about masks & bools that's causing problems?

The only "issue" might be with 4, 2 and 1 bit masks which would
have a size of 8 bits but a precision of less that endianess might
play a role.

Btw, this is all similar to AVX512 where we even don't use
vector BI modes but integer modes for the mask which
then becomes QImode for 1, 2, 4 and 8 bit masks and
HImode for 16, SImode for 32 and DImode for 64 bit masks.

Richard.

> Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fix RVV mask mode size
  2022-12-19  7:44       ` Richard Biener
@ 2022-12-27 20:46         ` Jeff Law
  2023-01-09  7:43           ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2022-12-27 20:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: 钟居哲, gcc-patches, kito.cheng, palmer



On 12/19/22 00:44, Richard Biener wrote:
> On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 12/16/22 18:44, 钟居哲 wrote:
>>> Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
>>> For example vlm or vsm we will load store 8-bit ??? (I am not sure
>>> hardward can load store 4bit,but I am sure it definetly not load store
>>> the whole register size)
>> Most likely than not you end up loading a larger quantity with the high
>> bits zero'd.  Interesting that we're using a packed model.  I'd been
>> told it was fairly expensive to implement in hardware relative to teh
>> cost of implementing the sparse model.
> 
> Since the masks are extra inputs if you use a packed model you need
> to wire less bits into the execution units for the masks which I guess
> is actually cheaper.  Yes, producing the masks might be more complicated.
We went through this at a prior employer and the hardware guys argued 
strongly that a packed model for mask registers was just too expensive 
to implement.  I don't think it was the # of wires, but the muxes.  The 
number of wires into the unit was an issue when we started talking about 
sub-byte masking :-)

Conceptually on the hardware side each bit in the mask corresponds to a 
byte in a vector register.  When the element size is 8 bits, then 
obviously there is a 1:1 correspondence between potentially masked 
elements and bits the mask register.

When the element size is 32 bits, then there are 3 don't care bits in 
the mask register, then a single bit that is queried for masked 
operations.  So if you had a 128bit vector with 32 bits per element, a 
mask register might have a value like:

0xxx 1xxx 1xxx 0xxx

A 128 bit vector with 64 bits per element might be:

0xxx xxxx 1xxx xxxx

Where the xxxs are don't cares and the 0/1 are the masks.



> 
> The only "issue" might be with 4, 2 and 1 bit masks which would
> have a size of 8 bits but a precision of less that endianess might
> play a role.
> 
> Btw, this is all similar to AVX512 where we even don't use
> vector BI modes but integer modes for the mask which
> then becomes QImode for 1, 2, 4 and 8 bit masks and
> HImode for 16, SImode for 32 and DImode for 64 bit masks.
Right.  I think in hindsight that might have been a mistake.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fix RVV mask mode size
  2022-12-27 20:46         ` Jeff Law
@ 2023-01-09  7:43           ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2023-01-09  7:43 UTC (permalink / raw)
  To: Jeff Law; +Cc: 钟居哲, gcc-patches, kito.cheng, palmer

On Tue, Dec 27, 2022 at 9:46 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 12/19/22 00:44, Richard Biener wrote:
> > On Sat, Dec 17, 2022 at 2:54 AM Jeff Law via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >>
> >>
> >> On 12/16/22 18:44, 钟居哲 wrote:
> >>> Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
> >>> For example vlm or vsm we will load store 8-bit ??? (I am not sure
> >>> hardward can load store 4bit,but I am sure it definetly not load store
> >>> the whole register size)
> >> Most likely than not you end up loading a larger quantity with the high
> >> bits zero'd.  Interesting that we're using a packed model.  I'd been
> >> told it was fairly expensive to implement in hardware relative to teh
> >> cost of implementing the sparse model.
> >
> > Since the masks are extra inputs if you use a packed model you need
> > to wire less bits into the execution units for the masks which I guess
> > is actually cheaper.  Yes, producing the masks might be more complicated.
> We went through this at a prior employer and the hardware guys argued
> strongly that a packed model for mask registers was just too expensive
> to implement.  I don't think it was the # of wires, but the muxes.  The
> number of wires into the unit was an issue when we started talking about
> sub-byte masking :-)
>
> Conceptually on the hardware side each bit in the mask corresponds to a
> byte in a vector register.  When the element size is 8 bits, then
> obviously there is a 1:1 correspondence between potentially masked
> elements and bits the mask register.
>
> When the element size is 32 bits, then there are 3 don't care bits in
> the mask register, then a single bit that is queried for masked
> operations.  So if you had a 128bit vector with 32 bits per element, a
> mask register might have a value like:
>
> 0xxx 1xxx 1xxx 0xxx
>
> A 128 bit vector with 64 bits per element might be:
>
> 0xxx xxxx 1xxx xxxx
>
> Where the xxxs are don't cares and the 0/1 are the masks.
>
>
>
> >
> > The only "issue" might be with 4, 2 and 1 bit masks which would
> > have a size of 8 bits but a precision of less that endianess might
> > play a role.
> >
> > Btw, this is all similar to AVX512 where we even don't use
> > vector BI modes but integer modes for the mask which
> > then becomes QImode for 1, 2, 4 and 8 bit masks and
> > HImode for 16, SImode for 32 and DImode for 64 bit masks.
> Right.  I think in hindsight that might have been a mistake.

Yes, vector BI modes would have been better here.
On GCN the mask is always DImode, that would have been the
other (better) alternative here I think.

Richard.

>
> jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-09  7:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-14  6:48 [PATCH] RISC-V: Fix RVV mask mode size juzhe.zhong
2022-12-16 20:22 ` Jeff Law
2022-12-17  1:44   ` 钟居哲
2022-12-17  1:53     ` Jeff Law
2022-12-17  2:15       ` 钟居哲
2022-12-19  7:44       ` Richard Biener
2022-12-27 20:46         ` Jeff Law
2023-01-09  7:43           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).