Yes. There is a trick fix in RVV.

Ideally, each mode should have PRECISION == BITSIZE. However, for RVV, there is a bug which cause incorrect DSE.
We have VNx1BI (occupy 1bit), VNx2BI (occupy 2bit), VNx4BI (occupy 4bit), VNx8BI (occupy 8bit),  since they are having same BYTESIZE,
it cause incorrect DSE.

So we add a trick (ADJUST_PRECISION) to fix it:
https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f 
which will prevent the incorrect DSE.

But the maskbit layout in memory comes wrong since the inconsistency between PRECISION and BITSIZE. 
So, I force GCC handle this in the RISC-V backend for VNx1BI/VNx2BI/VNx4BI.

I think this is RISC-V backend issue and can be well addressed in RISC-V port (as this patch I post). 
No need to bother generic codes since other target could not have the same issues.

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-29 15:53
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Richard Sandiford <richard.sandiford@arm.com> writes:
> Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Hi Juzhe,
>>
>> I find the bug description rather confusing.  What I can see is that
>> the constant in the literal pool is indeed wrong but how would DSE or
>> so play a role there?  Particularly only for the smaller modes?
>>
>> My suspicion would be that the constant in the literal/constant pool
>> is wrong from start to finish.
>>
>> I just played around with the following hunk:
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 542315f88cd..5223c08924f 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
>>            whole element.  Often this is byte_mode and contains more
>>            than one element.  */
>>         unsigned int nelts = GET_MODE_NUNITS (mode);
>> -       unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>> +       unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
>>         unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>>         scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>>
>> With this all your examples pass for me.  We then pack e.g. 16 VNx2BI elements
>> into an int and not just 8.  It would also explain why it works for modes
>> where PRECISION == BITSIZE.  Now it will certainly require a more thorough
>> analysis but maybe it's a start?
>
> Yeah.  Preapproved for trunk & any necessary branches.
 
Sorry, only realised later, but: if the precision can cover fewer
bytes than the bitsize, I suppose there ought to be some zero-byte
padding at the end as well.
 
Thanks,
Richard