From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A1A653858D20 for ; Tue, 8 Aug 2023 09:57:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A1A653858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D3607152B; Tue, 8 Aug 2023 02:57:59 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D26A63F59C; Tue, 8 Aug 2023 02:57:16 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors References: Date: Tue, 08 Aug 2023 10:57:03 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Sun, 6 Aug 2023 17:55:37 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-26.1 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Prathamesh Kulkarni writes: > On Fri, 4 Aug 2023 at 20:36, Richard Sandiford > wrote: >> >> Full review this time, sorry for the skipping the tests earlier. > Thanks for the detailed review! Please find my responses inline below. >> >> Prathamesh Kulkarni writes: >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc >> > index 7e5494dfd39..680d0e54fd4 100644 >> > --- a/gcc/fold-const.cc >> > +++ b/gcc/fold-const.cc >> > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3. If not see >> > #include "vec-perm-indices.h" >> > #include "asan.h" >> > #include "gimple-range.h" >> > +#include >> >> This should be included by defining INCLUDE_ALGORITHM instead. > Done. Just curious, why do we use this macro instead of directly > including ? AIUI, one of the reasons for having every file start with includes of config.h and (b)system.h, in that order, is to ensure that a small and predictable amount of GCC-specific stuff happens before including the system header files. That helps to avoid OS-specific clashes between GCC code and system headers. But another major reason is that system.h ends by poisoning a lot of stuff that system headers would be entitled to use. >> > + tree_vector_builder builder (vectype, npatterns, nelts_per_pattern); >> > + >> > + // Fill a0 for each pattern >> > + for (unsigned i = 0; i < npatterns; i++) >> > + builder.quick_push (build_int_cst (inner_type, rand () % 100)); >> > + >> > + if (nelts_per_pattern == 1) >> > + return builder.build (); >> > + >> > + // Fill a1 for each pattern >> > + for (unsigned i = 0; i < npatterns; i++) >> > + builder.quick_push (build_int_cst (inner_type, rand () % 100)); >> > + >> > + if (nelts_per_pattern == 2) >> > + return builder.build (); >> > + >> > + for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) >> > + { >> > + tree prev_elem = builder[i - npatterns]; >> > + int prev_elem_val = TREE_INT_CST_LOW (prev_elem); >> > + int val = prev_elem_val + S; >> > + builder.quick_push (build_int_cst (inner_type, val)); >> > + } >> > + >> > + return builder.build (); >> > +} >> > + >> > +static void >> > +validate_res (unsigned npatterns, unsigned nelts_per_pattern, >> > + tree res, tree *expected_res) >> > +{ >> > + ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) == npatterns); >> > + ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) == nelts_per_pattern); >> >> I don't think this is safe when the inputs are randomised. E.g. we >> could by chance end up with a vector of all zeros, which would have >> a single pattern and a single element per pattern, regardless of the >> shapes of the inputs. >> >> Given the way that vector_builder::finalize >> canonicalises the encoding, it should be safe to use: >> >> * VECTOR_CST_NPATTERNS (res) <= npatterns >> * vector_cst_encoded_nelts (res) <= npatterns * nelts_per_pattern >> >> If we do that then... >> >> > + >> > + for (unsigned i = 0; i < vector_cst_encoded_nelts (res); i++) >> >> ...this loop bound should be npatterns * nelts_per_pattern instead. > Ah indeed. Fixed, thanks. The patch instead does: ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) <= npatterns); ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) <= nelts_per_pattern); I think the version I suggested is safer. It's not the goal of the canonicalisation algorithm to reduce both npattners and nelts_per_pattern individually. The algorithm can increase nelts_per_pattern in order to decrease npatterns. >> > + { >> > + tree arg0 = build_vec_cst_rand (integer_type_node, 1, 3, 2); >> > + tree arg1 = build_vec_cst_rand (integer_type_node, 1, 3, 2); >> > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > + >> > + vec_perm_builder builder (arg0_len, 1, 3); >> > + builder.quick_push (arg0_len); >> > + builder.quick_push (arg0_len + 1); >> > + builder.quick_push (arg0_len + 2); >> > + >> > + vec_perm_indices sel (builder, 2, arg0_len); >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, NULL, true); >> > + tree expected_res[] = { vector_cst_elt (arg1, 0), vector_cst_elt (arg1, 1), >> > + vector_cst_elt (arg1, 2) }; >> > + validate_res (1, 3, res, expected_res); >> > + } >> > + >> > + /* Case 3: Leading element of arg1, stepped sequence: pattern 0 of arg0. >> > + sel = {len, 0, 0, 0, 2, 0, ...} >> > + npatterns = 2, nelts_per_pattern = 3. >> > + Use extra pattern {0, ...} to lower number of elements per pattern. */ >> > + { >> > + tree arg0 = build_vec_cst_rand (char_type_node, 1, 3, 2); >> > + tree arg1 = build_vec_cst_rand (char_type_node, 1, 3, 2); >> > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > + >> > + vec_perm_builder builder (arg0_len, 2, 3); >> > + builder.quick_push (arg0_len); >> > + int mask_elems[] = { 0, 0, 0, 2, 0 }; >> > + for (int i = 0; i < 5; i++) >> > + builder.quick_push (mask_elems[i]); >> >> This leaves one of the elements unspecified. > Sorry, I didn't understand. > It first pushes len in: > builder.quick_push (arg0_len) > and then pushes the remaining indices in the loop: > for (int i = 0; i < 5; i++) > builder.quick_push (mask_elems[i]) > So overall, builder will have 6 elements: {len, 0, 0, 0, 2, 0} Ah, right. But in that case I think it would be clearer to put arg0_len in mask_elems. > +/* Try to fold permutation of ARG0 and ARG1 with SEL selector when > + the input vectors are VECTOR_CST. Return NULL_TREE otherwise. > + REASON has same purpose as described in > + valid_mask_for_fold_vec_perm_cst_p. */ > + > + Nit: too much vertical space. > +/* Invoke tests for fold_vec_perm_cst. */ > + > +static void > +test () > +{ > + /* Conditionally execute fold_vec_perm_cst tests, if target supports > + VLA vectors. Use a compile time check so we avoid instantiating > + poly_uint64 with N > 1 on targets that do not support VLA vectors. */ > + if constexpr (poly_int_traits::num_coeffs > 1) That's a C++17 feature, so we can't use it. Instead, how about: static bool is_simple_vla_size (poly_uint64 size) { if (size.is_constant ()) return false; for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i) if (size[i] != (i <= 1 ? size[0] : 0)) return false; return true; } FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) { auto nunits = GET_MODE_NUNITS (mode); if (!is_simple_vla_size (nunits)) continue; if (nunits[0] ...) test_... (mode); ... } test_vnx4si_v4si and test_v4si_vnx4si look good. But with the loop structure above, I think we can apply the test_vnx4si and test_vnx16qi to more cases. So the classification isn't the exact number of elements, but instead a limit. I think the nunits[0] conditions for test_vnx4si are as follows (inspection only, so could be wrong): > +/* Test cases where result and input vectors are VNx4SI */ > + > +static void > +test_vnx4si (machine_mode vmode) > +{ > + /* Case 1: mask = {0, ...} */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 1); > + builder.quick_push (0); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { vector_cst_elt (res, 0) }; > + validate_res (1, 1, res, expected_res); > + } nunits[0] >= 2 (could be all nunits if the inputs had nelts_per_pattern==1, which I think would be better) > + > + /* Case 2: mask = {len, ...} */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 1); > + builder.quick_push (len); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { vector_cst_elt (arg1, 0) }; > + validate_res (1, 1, res, expected_res); > + } same > + > + /* Case 3: mask = { 0, len, ... } */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 2, 1); > + builder.quick_push (0); > + builder.quick_push (len); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0) }; > + validate_res (2, 1, res, expected_res); > + } nunits[0] >= 2 > + > + /* Case 4: mask = { 0, len, 1, len+1, ... } */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 2, 2); > + builder.quick_push (0); > + builder.quick_push (len); > + builder.quick_push (1); > + builder.quick_push (len + 1); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0), > + vector_cst_elt (arg0, 1), vector_cst_elt (arg1, 1) > + }; > + validate_res (2, 2, res, expected_res); > + } nunits[0] >= 2 > + > + /* Case 5: mask = { 0, len, 1, len+1, .... } > + npatterns = 4, nelts_per_pattern = 1 */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 4, 1); > + builder.quick_push (0); > + builder.quick_push (len); > + builder.quick_push (1); > + builder.quick_push (len + 1); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0), > + vector_cst_elt (arg0, 1), vector_cst_elt (arg1, 1) > + }; > + validate_res (4, 1, res, expected_res); > + } nunits[0] >= 4 > + > + /* Case 6: mask = {0, 4, ...} > + npatterns = 1, nelts_per_pattern = 2. > + This should return NULL_TREE because the index 4 may choose > + from either arg0 or arg1 depending on vector length. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 2); > + builder.quick_push (0); > + builder.quick_push (4); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + ASSERT_TRUE (res == NULL_TREE); > + } nunits[0] == 2 or == 4 (could be <= 4 if the inputs had nelts_per_pattern==1) > + > + /* Case 7: npatterns(arg0) = 4 > npatterns(sel) = 2 > + mask = {0, len, 1, len + 1, ...} > + sel_npatterns = 2, sel_nelts_per_pattern = 2. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (arg0_len, 2, 2); > + builder.quick_push (0); > + builder.quick_push (arg0_len); > + builder.quick_push (1); > + builder.quick_push (arg0_len + 1); > + vec_perm_indices sel (builder, 2, arg0_len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0), > + vector_cst_elt (arg0, 1), vector_cst_elt (arg1, 1) > + }; > + validate_res (2, 2, res, expected_res); > + } nunits[0] >= 2 > + > + /* Case 8: sel = {0, 1, 2, ...} > + npatterns = 1, nelts_per_pattern = 3 > + expected res: { arg0[0], arg0[1], arg0[2], ... } */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2); > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (arg0_len, 1, 3); > + builder.quick_push (0); > + builder.quick_push (1); > + builder.quick_push (2); > + > + vec_perm_indices sel (builder, 2, arg0_len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg0, 1), > + vector_cst_elt (arg0, 2) }; > + validate_res (1, 3, res, expected_res); > + } all nunits[0] > + > + /* Case 9: sel = {len, len + 1, len + 2, ... } > + npatterns = 1, nelts_per_pattern = 3 > + expected res: { op1[0], op1[1], op1[2], ... } */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2); > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (arg0_len, 1, 3); > + builder.quick_push (arg0_len); > + builder.quick_push (arg0_len + 1); > + builder.quick_push (arg0_len + 2); > + > + vec_perm_indices sel (builder, 2, arg0_len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, NULL); > + tree expected_res[] = { vector_cst_elt (arg1, 0), vector_cst_elt (arg1, 1), > + vector_cst_elt (arg1, 2) }; > + validate_res (1, 3, res, expected_res); > + } all nunits[0] Same idea for the others. Thanks, Richard