From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 6F1AC3858C20 for ; Mon, 22 Aug 2022 08:56:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F1AC3858C20 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 318CC13D5; Mon, 22 Aug 2022 01:56:58 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 475073F718; Mon, 22 Aug 2022 01:56:54 -0700 (PDT) From: Richard Sandiford To: "juzhe.zhong\@rivai.ai" Mail-Followup-To: "juzhe.zhong\@rivai.ai" , gcc-patches , rguenther , "kito.cheng" , richard.sandiford@arm.com Cc: gcc-patches , rguenther , "kito.cheng" Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) References: <20220818104608.259204-1-juzhe.zhong@rivai.ai> <8A43FD564C1D55FA+202208200923041010290@rivai.ai> <787DDEDE66AD1087+202208221645522274775@rivai.ai> Date: Mon, 22 Aug 2022 09:56:53 +0100 In-Reply-To: <787DDEDE66AD1087+202208221645522274775@rivai.ai> (juzhe's message of "Mon, 22 Aug 2022 16:45:52 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-49.6 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2022 08:56:58 -0000 "juzhe.zhong@rivai.ai" writes: > Thanks for your reply. Your suggestion "-1 - (int) i" is better. > > Can I send another patch that changes "builder.quick_push (gen_int_mode (= -(int) i, int_mode));" > into =E2=80=9Cbuilder.quick_push (gen_int_mode (-1 - (int) i, int_mode));= " then you merge it for me? Yeah, please send a patch, and I'll merge it like you say. Thanks, Richard > > Or you change it yourself? >=20=20 > Thank you so much. > > > > > juzhe.zhong@rivai.ai >=20=20 > From: Richard Sandiford > Date: 2022-08-22 16:31 > To: =E9=92=9F=E5=B1=85=E5=93=B2 > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1= , 1) and allow the machine_mode definition with poly_uint16 (1, 1) > =E9=92=9F=E5=B1=85=E5=93=B2 writes: >> After deep analysis and several tries. I have made a analysis and conclu= sion as follows. >> I really appreciate if you could spend some time take a look at the foll= owing analysis that I made: >> >> According to the codes in test_vector_subregs_fore_back: >> >> ~~~ >> poly_uint64 nunits =3D GET_MODE_NUNITS (inner_mode); >> unsigned int min_nunits =3D constant_lower_bound (nunits); >> scalar_mode int_mode =3D GET_MODE_INNER (inner_mode); >> unsigned int count =3D gcd (min_nunits, 4); >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x =3D builder.build (); >> ~~~~ >> >> Analyze the codes and tried several patterns: >> >> For poly_uint16 (2,2): >> x =3D (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> >> For poly_uint16 (4,4): >> x =3D (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ] >> ]) >> >> So, I think I can conclude the pattern rule as follows: >> >> poly_uint16 (n,n): >> x =3D (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> .................. >> (const_int n [hex(n)]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ......... >> (const_int -n [hex(-n)]) >> ] >> ]) >> Am I understanding right? >> >> Let's first take a look at the codes you write: >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x =3D builder.build (); >> >> So for poly_uint (1,1): >> Ideally according to the codes, you want to generate the pattern like th= is: >> x =3D (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 0 [0]) >> ] >> ]) >=20=20 > Ah, right, the two subsequences are supposed to be different. >=20=20 >> In your codes, when count =3D 1 for poly_uint16 (1,1). And i < count ite= ration make i always is 0. >> In this case, i =3D -i =3D 0. >> >> So we will end up with the first value =3D 0, and repeat value is also 0= , in "rtx x =3D builder.build ();". >> Because GCC thinks all the value are 0, so the pattern is changed as fol= lows which is not a stepped vector: >> >> x =3D (const_vector:VNx1DI repeat [ >> (const_int 0 [0]) >> ]) >> >> This is a const_vector duplicate vector. >> This is not the correct pattern you want so it fails in the following ch= eck. >> >> I think the test pattern generator goes wrong in poly_uint16 (1,1) which= is the only corner case will go wrong. >> >> To fix this, we should adjust test pattern generator to specifically pol= y_uint16 (1,1). We should make first value >> and repeat value different. I tried 2 solution and they both pass for po= ly_uint16 (1,1): >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Original codes: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> 1st solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count =3D=3D 1 ? 1 : i, int_mode)); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> >> x =3D (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 1 [0x1]) >> ] >> ]) >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> 2nd solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i =3D 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count =3D=3D 1 ? -1 : -(int) i, in= t_mode)); >=20=20 > I don't think we need to single out count =3D=3D 1. It should be OK to u= se > -1 - (int) i unconditionally. >=20=20 > Thanks, > Richard >=20=20 >> x=3D=20 >> (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> Do you agree with these solution? Or you could offer a better solution t= o fix this? Thank you so much. >> >> >> juzhe.zhong@rivai.ai >>=20=20 >> From: Richard Sandiford >> Date: 2022-08-19 20:52 >> To: juzhe.zhong\@rivai.ai >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (= 1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> "juzhe.zhong@rivai.ai" writes: >>>> Ah, right, sorry for the bogus suggestion. >>>> In that case, what specifically goes wrong? Which test in >>>> test_vector_subregs_modes fails, and what does the ICE look like? >>> >>> Because may_be (nunits, 1) return true. For nunits =3D (1,1), it will e= xecute test_vector_subregs_modes. >>> The fail ICE: >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes:= FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byt= e)) >>> expected: (nil) >>> >>> actual: (const_int 0 [0]) >>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 >>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char co= nst*, rtx_def*, rtx_def*) >>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57 >>> 0x1332504 test_vector_subregs_modes >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 >>> 0x1332988 test_vector_subregs_fore_back >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 >>> 0x1332ae7 test_vector_subregs >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 >>> 0x1332c57 test_vector_ops >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 >>> 0x1332c7b selftest::simplify_rtx_cc_tests() >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 >>> 0x21318fa selftest::run_tests() >>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 >>> 0x1362a76 toplev::run_self_tests() >>> ../../../riscv-gcc/gcc/toplev.cc:2205 >>> >>> I analyzed the codes: >>> In test_vector_subregs_fore_back, when nunits =3D (1,1). The expected = =3D NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) =3D const_in= t 0. >>> So the assertion fails. >>=20=20 >> Hmm, ok, so the subreg operation is unexpected succeeding. >>=20=20 >>> This is the test for stepped vector using 2 element per pattern. For p= oly_uint16 (1,1), it's true it is possible only has 1 element.=20 >>=20=20 >> The stepped case is 3 elements per pattern rather than 2. In a stepped >> pattern: a, b, b+n are represented explicitly, then the rest are >> implicitly b+n*2, b+n*3, etc. >>=20=20 >> The case being handled by this code is instead the 2-element case: >> a, b are represented explicitly, then the rest are implicitly all b. >>=20=20 >> Why is (1,1) different though? The test is constructing: >>=20=20 >> nunits: 1 + 1x >> shape: nelts_per_pattern =3D=3D 2, npatterns =3D=3D 1 >> elements: a, b[, b, b, b, b, ...] >>=20=20 >> It then tests subregs starting at 0 + 1x (so starting at the first b). >> But for (2,2) we should have: >>=20=20 >> nunits: 2 + 2x >> shape: nelts_per_pattern =3D=3D 2, npatterns =3D=3D 2 >> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] >>=20=20 >> and we should test subregs starting at 0 + 2x (so starting at the >> first b1). The two cases should be very similar, it's just that the >> (2,2) case doubles the number of patterns. >>=20=20 >>> I think it makes sense to fail the test. However for poly (1,1) machine= mode, can we have the chance that some target define this >>> kind of machine mode only used for intrinsics? I already developed ful= l RVV support in GCC (including intrinsc and auto-vectorization). >>> I only enable auto-vectorization with mode larger than (2,2) and test i= t fully. >>> From my experience, it seems the stepped vector only created during VLA= auto-vectorization. And I think only allow poly (1,1)mode used in intrinsi= cs will >>> not create issues. Am I understanding wrong =EF=BC=9FFeel free to corre= ct me. Thanks ~ >>=20=20 >> Following on from what I said above, it doesn't look like this particular >> case is related to stepped vectors. >>=20=20 >> (1,1) shouldn't (need to) be a special case though. Any potentital >> problems that would occur for (1,1) with npatterns=3D=3D1 would also occ= ur >> for (n,n) with npatterns=3D=3Dn. E.g. if stepped vectors are problematic >> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns=3D= =3D2) >> would be problematic for (2,2). >>=20=20 >> So yeah, preventing a mode being used for autovectorisation would allow >> the target to have a bit more control over which constants are actually >> generated. But it shouldn't be necessary to do that for correctness. >>=20=20 >> Thanks, >> Richard >>=20=20 >>> juzhe.zhong@rivai.ai >>>=20=20 >>> From: Richard Sandiford >>> Date: 2022-08-19 17:35 >>> To: juzhe.zhong\@rivai.ai >>> CC: rguenther; gcc-patches; kito.cheng >>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int = (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>> "juzhe.zhong@rivai.ai" writes: >>>> Hi, Richard. I tried the codes: >>>> if (!nunits.is_constant () && maybe_gt (nunits, 1))=20 >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> >>>> It still failed. For nunits =3D (1,1) , maybe_gt (nunits, 1) return tr= ue value. >>>=20=20 >>> Ah, right, sorry for the bogus suggestion. >>>=20=20 >>> In that case, what specifically goes wrong? Which test in >>> test_vector_subregs_modes fails, and what does the ICE look like? >>>=20=20 >>> Thanks, >>> Richard >>>=20=20 >>>> But I tried: >>>> if (!nunits.is_constant () && known_gt (nunits, 1))=20 >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It pass. But it report a warning: "warning: comparison between signed = and unsigned integer expressions [-Wsign-compare]" during the compilation. >>>> >>>> Finally, I tried: >>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), = 1))=20 >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It passed with no warning. >>>> >>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for th= is? >>>> Thanks! >>>> >>>> >>>> juzhe.zhong@rivai.ai >>>>=20=20 >>>> From: Richard Sandiford >>>> Date: 2022-08-19 16:03 >>>> To: juzhe.zhong >>>> CC: gcc-patches; rguenther; kito.cheng >>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int= (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>>> juzhe.zhong@rivai.ai writes: >>>>> From: zhongjuzhe >>>>> >>>>> Hello. This patch is preparing for following RVV support. >>>>> >>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic v= ector. >>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invar= iant of ARM SVE is always 128-bit blocks. >>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' su= b-extension and 64bit in 'Zve*' sub-extension. >>>>> >>>>> So I define the machine_mode as follows: >>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>>>> The riscv_vector_chunks =3D poly_uint16 (1, 1) >>>>> >>>>> The compilation is failed for the stepped vector test: >>>>> (const_vector:VNx1DI repeat [ >>>>> (const_int 8 [0x8]) >>>>> (const_int 7 [0x7]) >>>>> ]) >>>>> >>>>> I understand for stepped vector should always have aleast 2 elements = and stepped vector initialization is common >>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense = that report fail for stepped vector of poly_uint16 (1, 1). >>>>> >>>>> machine mode with nunits =3D poly_uint16 (1, 1) needs to implemented = in intrinsics. And I would like to enable RVV auto-vectorization >>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RIS= C-V backend. I think it will not create issue if we define >>>>> vector mode with nunits =3D poly_uint16 (1, 1). Feel free to correct = me or offer me some other better solutions. Thanks! >>>>> >>>>>=20=20=20 >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test = for poly_uint16 (1, 1). >>>>> >>>>> --- >>>>> gcc/simplify-rtx.cc | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>>>> index 7d09bf7103d..61e0dfa00d0 100644 >>>>> --- a/gcc/simplify-rtx.cc >>>>> +++ b/gcc/simplify-rtx.cc >>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inn= er_mode) >>>>> rtx x =3D builder.build (); >>>>>=20=20 >>>>> test_vector_subregs_modes (x); >>>>> - if (!nunits.is_constant ()) >>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)= )) >>>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>>=20=20 >>>> I think instead we should use maybe_gt (nunits, 1), on the basis that >>>> the fore_back tests require vectors that have a minimum of 2 elements. >>>> Something like poly_uint16 (1, 2) would have the same problem as >>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >>>> principle.) >>>>=20=20 >>>> This corresponds to the minimum of 3 elements for the stepped tests: >>>>=20=20 >>>> if (GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_INT >>>> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >>>> { >>>> test_vector_ops_series (mode, scalar_reg); >>>> test_vector_subregs (mode); >>>> } >>>>=20=20 >>>> Thanks, >>>> Richard >>>>=20=20 >>>=20=20 >>=20=20 >=20=20