From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 6F1AC3858C20
 for <gcc-patches@gcc.gnu.org>; Mon, 22 Aug 2022 08:56:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F1AC3858C20
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 318CC13D5;
 Mon, 22 Aug 2022 01:56:58 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 475073F718;
 Mon, 22 Aug 2022 01:56:54 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>
Mail-Followup-To: "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>,
 gcc-patches <gcc-patches@gcc.gnu.org>, rguenther <rguenther@suse.de>,
 "kito.cheng" <kito.cheng@gmail.com>, richard.sandiford@arm.com
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, rguenther <rguenther@suse.de>,
 "kito.cheng" <kito.cheng@gmail.com>
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1,
 1) and allow the machine_mode definition with poly_uint16 (1, 1)
References: <20220818104608.259204-1-juzhe.zhong@rivai.ai>
 <mptk074lkx1.fsf@arm.com>
 <D3BF74DFE8CE3154+2022081917064521548415@rivai.ai>
 <mptedxclgnt.fsf@arm.com>
 <E16E6B7CF3DCE009+2022081917574606383831@rivai.ai>
 <mpt5yiol7k2.fsf@arm.com>
 <8A43FD564C1D55FA+202208200923041010290@rivai.ai>
 <mptzgfwk7br.fsf@arm.com>
 <787DDEDE66AD1087+202208221645522274775@rivai.ai>
Date: Mon, 22 Aug 2022 09:56:53 +0100
In-Reply-To: <787DDEDE66AD1087+202208221645522274775@rivai.ai> (juzhe's
 message of "Mon, 22 Aug 2022 16:45:52 +0800")
Message-ID: <mpttu64k65m.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-49.6 required=5.0 tests=BAYES_00, BODY_8BITS,
 GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY,
 SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE,
 WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Aug 2022 08:56:58 -0000

"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Thanks for your reply. Your suggestion "-1 - (int) i" is better.
>
> Can I send another patch that changes "builder.quick_push (gen_int_mode (=
-(int) i, int_mode));"
> into =E2=80=9Cbuilder.quick_push (gen_int_mode (-1 - (int) i, int_mode));=
" then you merge it for me?

Yeah, please send a patch, and I'll merge it like you say.

Thanks,
Richard

>
> Or you change it yourself?
>=20=20
> Thank you so much.
>
>
>
>
> juzhe.zhong@rivai.ai
>=20=20
> From: Richard Sandiford
> Date: 2022-08-22 16:31
> To: =E9=92=9F=E5=B1=85=E5=93=B2
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1=
, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> =E9=92=9F=E5=B1=85=E5=93=B2 <juzhe.zhong@rivai.ai> writes:
>> After deep analysis and several tries. I have made a analysis and conclu=
sion as follows.
>> I really appreciate if you could spend some time take a look at the foll=
owing analysis that I made:
>>
>> According to the codes in test_vector_subregs_fore_back:
>>
>> ~~~
>>  poly_uint64 nunits =3D GET_MODE_NUNITS (inner_mode);
>>   unsigned int min_nunits =3D constant_lower_bound (nunits);
>>   scalar_mode int_mode =3D GET_MODE_INNER (inner_mode);
>>   unsigned int count =3D gcd (min_nunits, 4);
>>
>>   rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>   rtx x =3D builder.build ();
>> ~~~~
>>
>> Analyze the codes and tried several patterns:
>>
>> For poly_uint16 (2,2):
>> x =3D (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>         ]
>>     ])
>>
>> For poly_uint16 (4,4):
>> x =3D (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         (const_int 2 [0x2])
>>         (const_int 3 [0x3])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>             (const_int -2 [0xfffffffffffffffe])
>>             (const_int -3 [0xfffffffffffffffd])
>>         ]
>>     ])
>>
>> So, I think I can conclude the pattern rule as follows:
>>
>> poly_uint16 (n,n):
>> x =3D (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         (const_int 1 [0x1])
>>         (const_int 2 [0x2])
>>         (const_int 3 [0x3])
>>         ..................
>>         (const_int n [hex(n)])
>>         repeat [
>>             (const_int 0 [0])
>>             (const_int -1 [0xffffffffffffffff])
>>             (const_int -2 [0xfffffffffffffffe])
>>             (const_int -3 [0xfffffffffffffffd])
>>             .........
>>              (const_int -n [hex(-n)])
>>         ]
>>     ])
>> Am I understanding right?
>>
>> Let's first take a look at the codes you write:
>>
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>   rtx x =3D builder.build ();
>>
>> So for poly_uint (1,1):
>> Ideally according to the codes, you want to generate the pattern like th=
is:
>> x =3D (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int 0 [0])
>>         ]
>>     ])
>=20=20
> Ah, right, the two subsequences are supposed to be different.
>=20=20
>> In your codes, when count =3D 1 for poly_uint16 (1,1). And i < count ite=
ration make i always is 0.
>> In this case, i =3D -i =3D 0.
>>
>> So we will end up with the first value =3D 0, and repeat value is also 0=
, in "rtx x =3D builder.build ();".
>> Because GCC thinks all the value are 0, so the pattern is changed as fol=
lows which is not a stepped vector:
>>
>> x =3D (const_vector:VNx1DI repeat [
>>         (const_int 0 [0])
>>     ])
>>
>> This is a const_vector duplicate vector.
>> This is not the correct pattern you want so it fails in the following ch=
eck.
>>
>> I think the test pattern generator goes wrong in poly_uint16 (1,1) which=
 is the only corner case will go wrong.
>>
>> To fix this, we should adjust test pattern generator to specifically pol=
y_uint16 (1,1). We should make first value
>> and repeat value different. I tried 2 solution and they both pass for po=
ly_uint16 (1,1):
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> Original codes:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> 1st solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (count =3D=3D 1 ? 1 : i, int_mode));
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>
>> x =3D (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int 1 [0x1])
>>         ]
>>     ])
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> 2nd solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (i, int_mode));
>>   for (unsigned int i =3D 0; i < count; ++i)
>>     builder.quick_push (gen_int_mode (count =3D=3D 1 ? -1 : -(int) i, in=
t_mode));
>=20=20
> I don't think we need to single out count =3D=3D 1.  It should be OK to u=
se
> -1 - (int) i unconditionally.
>=20=20
> Thanks,
> Richard
>=20=20
>> x=3D=20
>> (const_vector:VNx1DI [
>>         (const_int 0 [0])
>>         repeat [
>>             (const_int -1 [0xffffffffffffffff])
>>         ]
>>     ])
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>
>> Do you agree with these solution? Or you could offer a better solution t=
o fix this? Thank you so much.
>>
>>
>> juzhe.zhong@rivai.ai
>>=20=20
>> From: Richard Sandiford
>> Date: 2022-08-19 20:52
>> To: juzhe.zhong\@rivai.ai
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (=
1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Ah, right, sorry for the bogus suggestion.
>>>> In that case, what specifically goes wrong?  Which test in
>>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>
>>> Because may_be (nunits, 1) return true. For nunits =3D (1,1), it will e=
xecute test_vector_subregs_modes.
>>> The fail ICE:
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes:=
 FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byt=
e))
>>>   expected: (nil)
>>>
>>>   actual: (const_int 0 [0])
>>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char co=
nst*, rtx_def*, rtx_def*)
>>>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>>> 0x1332504 test_vector_subregs_modes
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>>> 0x1332988 test_vector_subregs_fore_back
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>>> 0x1332ae7 test_vector_subregs
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>>> 0x1332c57 test_vector_ops
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>>> 0x21318fa selftest::run_tests()
>>>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>>> 0x1362a76 toplev::run_self_tests()
>>>         ../../../riscv-gcc/gcc/toplev.cc:2205
>>>
>>> I analyzed the codes:
>>> In test_vector_subregs_fore_back, when nunits =3D (1,1). The expected =
=3D NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) =3D const_in=
t 0.
>>> So the assertion fails.
>>=20=20
>> Hmm, ok, so the subreg operation is unexpected succeeding.
>>=20=20
>>> This is the test for stepped vector using 2 element per pattern.  For p=
oly_uint16 (1,1), it's true it is possible only has 1 element.=20
>>=20=20
>> The stepped case is 3 elements per pattern rather than 2.  In a stepped
>> pattern: a, b, b+n are represented explicitly, then the rest are
>> implicitly b+n*2, b+n*3, etc.
>>=20=20
>> The case being handled by this code is instead the 2-element case:
>> a, b are represented explicitly, then the rest are implicitly all b.
>>=20=20
>> Why is (1,1) different though?  The test is constructing:
>>=20=20
>>   nunits: 1 + 1x
>>   shape: nelts_per_pattern =3D=3D 2, npatterns =3D=3D 1
>>   elements: a, b[, b, b, b, b, ...]
>>=20=20
>> It then tests subregs starting at 0 + 1x (so starting at the first b).
>> But for (2,2) we should have:
>>=20=20
>>   nunits: 2 + 2x
>>   shape: nelts_per_pattern =3D=3D 2, npatterns =3D=3D 2
>>   elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>>=20=20
>> and we should test subregs starting at 0 + 2x (so starting at the
>> first b1).  The two cases should be very similar, it's just that the
>> (2,2) case doubles the number of patterns.
>>=20=20
>>> I think it makes sense to fail the test. However for poly (1,1) machine=
 mode, can we have the chance that some target define this
>>> kind of machine mode only used for intrinsics?  I already developed ful=
l RVV support in GCC (including intrinsc and auto-vectorization).
>>> I only enable auto-vectorization with mode larger than (2,2) and test i=
t fully.
>>> From my experience, it seems the stepped vector only created during VLA=
 auto-vectorization. And I think only allow poly (1,1)mode used in intrinsi=
cs will
>>> not create issues. Am I understanding wrong =EF=BC=9FFeel free to corre=
ct me. Thanks ~
>>=20=20
>> Following on from what I said above, it doesn't look like this particular
>> case is related to stepped vectors.
>>=20=20
>> (1,1) shouldn't (need to) be a special case though.  Any potentital
>> problems that would occur for (1,1) with npatterns=3D=3D1 would also occ=
ur
>> for (n,n) with npatterns=3D=3Dn.  E.g. if stepped vectors are problematic
>> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns=3D=
=3D2)
>> would be problematic for (2,2).
>>=20=20
>> So yeah, preventing a mode being used for autovectorisation would allow
>> the target to have a bit more control over which constants are actually
>> generated.  But it shouldn't be necessary to do that for correctness.
>>=20=20
>> Thanks,
>> Richard
>>=20=20
>>> juzhe.zhong@rivai.ai
>>>=20=20
>>> From: Richard Sandiford
>>> Date: 2022-08-19 17:35
>>> To: juzhe.zhong\@rivai.ai
>>> CC: rguenther; gcc-patches; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int =
(1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Hi, Richard. I tried the codes:
>>>> if (!nunits.is_constant () && maybe_gt (nunits, 1))=20
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>
>>>> It still failed. For nunits =3D (1,1) , maybe_gt (nunits, 1) return tr=
ue value.
>>>=20=20
>>> Ah, right, sorry for the bogus suggestion.
>>>=20=20
>>> In that case, what specifically goes wrong?  Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>=20=20
>>> Thanks,
>>> Richard
>>>=20=20
>>>> But I tried:
>>>> if (!nunits.is_constant () && known_gt (nunits, 1))=20
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It pass. But it report a warning: "warning: comparison between signed =
and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>>
>>>> Finally, I tried:
>>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), =
1))=20
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It passed with no warning.
>>>>
>>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for th=
is?
>>>> Thanks!
>>>>
>>>>
>>>> juzhe.zhong@rivai.ai
>>>>=20=20
>>>> From: Richard Sandiford
>>>> Date: 2022-08-19 16:03
>>>> To: juzhe.zhong
>>>> CC: gcc-patches; rguenther; kito.cheng
>>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int=
 (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>>> juzhe.zhong@rivai.ai writes:
>>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>>
>>>>> Hello. This patch is preparing for following RVV support.
>>>>>
>>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic v=
ector.
>>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invar=
iant of ARM SVE is always 128-bit blocks.
>>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' su=
b-extension and 64bit in 'Zve*' sub-extension.
>>>>>
>>>>> So I define the machine_mode as follows:
>>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>>> The riscv_vector_chunks =3D poly_uint16 (1, 1)
>>>>>
>>>>> The compilation is failed for the stepped vector test:
>>>>> (const_vector:VNx1DI repeat [
>>>>>         (const_int 8 [0x8])
>>>>>         (const_int 7 [0x7])
>>>>>     ])
>>>>>
>>>>> I understand for stepped vector should always have aleast 2 elements =
and stepped vector initialization is common
>>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense =
that report fail for stepped vector of poly_uint16 (1, 1).
>>>>>
>>>>> machine mode with nunits =3D poly_uint16 (1, 1) needs to implemented =
in intrinsics. And I would like to enable RVV auto-vectorization
>>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RIS=
C-V backend. I think it will not create issue if we define
>>>>> vector mode with nunits =3D poly_uint16 (1, 1). Feel free to correct =
me or offer me some other better solutions. Thanks!
>>>>>
>>>>>=20=20=20
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test =
for poly_uint16 (1, 1).
>>>>>
>>>>> ---
>>>>>  gcc/simplify-rtx.cc | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>>> --- a/gcc/simplify-rtx.cc
>>>>> +++ b/gcc/simplify-rtx.cc
>>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inn=
er_mode)
>>>>>    rtx x =3D builder.build ();
>>>>>=20=20
>>>>>    test_vector_subregs_modes (x);
>>>>> -  if (!nunits.is_constant ())
>>>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)=
))
>>>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>=20=20
>>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>>> Something like poly_uint16 (1, 2) would have the same problem as
>>>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>>>> principle.)
>>>>=20=20
>>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>>=20=20
>>>>   if (GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_INT
>>>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>>     {
>>>>       test_vector_ops_series (mode, scalar_reg);
>>>>       test_vector_subregs (mode);
>>>>     }
>>>>=20=20
>>>> Thanks,
>>>> Richard
>>>>=20=20
>>>=20=20
>>=20=20
>=20=20