From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by sourceware.org (Postfix) with ESMTPS id DEE103858D32 for ; Thu, 10 Aug 2023 14:33:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DEE103858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x42a.google.com with SMTP id ffacd0b85a97d-318015ade49so985708f8f.0 for ; Thu, 10 Aug 2023 07:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1691678036; x=1692282836; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=X9JG3psjphMyRjn1g3Pfl4QyJ8GJh5nRixXikQnI0/w=; b=wIp9vL9GcbQoNRsfI3o7BusrW3lroUr5J0u1AQ3D9XPk5E1O0BPnNMVwo1WahPbsP+ XP4Gwb5FUbYgbuApwlu5yeMeN/igR7u96f7KlLrGlzThZ+otkSQJNVBTtw7HQKH6n8zE nHAfDrnX/4dw/O6xJW94mPulGyxU9+J/2TjYgh9X23NHsmxazhv/LWFjT3s3KzF7OUWT IHqFzyW0xqNdmxd6pcV0Ra5dlHqNmycClg1ioqV/YOMjAW+KL8KLMkY4tigOCdzUt9Fg NHkEL7KeeDYquiW7KQlRSxGehd+MYNBo+yBZ/AAWRyZVf5dTJRKP6wQSm0Atm8kgZd5d NMKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691678036; x=1692282836; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=X9JG3psjphMyRjn1g3Pfl4QyJ8GJh5nRixXikQnI0/w=; b=hNlSWAZYVASERPOjNOluKjXd83jGt+Yst8Re1R5Ysfcy9G6KrdA8ZI9dEHgPHnhnyF YlFrU7VQErrwd58WIiOZR+5cSKFL3yjAtzFq+tJYGYEsdUpHdegmgm1LBi5AKKDZ1JQT Sctdvx8O7YAZw5uWMOqLTql4SOPOIOQJZMgl4xur2Fc9XBvvdSqC0uagSyok+CeAUvET QhR71mm8ufVjGgxCJNzTDuNSnWjqzesAJpljkrDGlGJ2t4LZ0VekRDB4u6H0KHsY89iW neij1HApvHrQgK2IHpvD+28UxVlgrK3qLQBp/YnaOsRDFeIPuuaR64OJmdOYB3ZDQ4aX YZYQ== X-Gm-Message-State: AOJu0Yw+GKZdysMnZNM516tSxSj9ak7eGGqNhvSproKfEEhH7/OjWP6G aGxjB1SNuChAuDDfc291sDQSD6X6T2jl8pg44vbPEA== X-Google-Smtp-Source: AGHT+IFzyv8JW98AnYwtP89gy2B1ujoQ90+rsS6PAdrqYHQOtslQFbVF0kdzv3NksRgCIv340MgswEGvZYI3LBhjIQI= X-Received: by 2002:a05:6000:1372:b0:314:3bd7:6a0c with SMTP id q18-20020a056000137200b003143bd76a0cmr2459562wrz.33.1691678036345; Thu, 10 Aug 2023 07:33:56 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Thu, 10 Aug 2023 20:03:19 +0530 Message-ID: Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 8 Aug 2023 at 15:27, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Fri, 4 Aug 2023 at 20:36, Richard Sandiford > > wrote: > >> > >> Full review this time, sorry for the skipping the tests earlier. > > Thanks for the detailed review! Please find my responses inline below. > >> > >> Prathamesh Kulkarni writes: > >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > >> > index 7e5494dfd39..680d0e54fd4 100644 > >> > --- a/gcc/fold-const.cc > >> > +++ b/gcc/fold-const.cc > >> > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3. If not see > >> > #include "vec-perm-indices.h" > >> > #include "asan.h" > >> > #include "gimple-range.h" > >> > +#include > >> > >> This should be included by defining INCLUDE_ALGORITHM instead. > > Done. Just curious, why do we use this macro instead of directly > > including ? > > AIUI, one of the reasons for having every file start with includes > of config.h and (b)system.h, in that order, is to ensure that a small > and predictable amount of GCC-specific stuff happens before including > the system header files. That helps to avoid OS-specific clashes between > GCC code and system headers. > > But another major reason is that system.h ends by poisoning a lot of > stuff that system headers would be entitled to use. Ah OK, thanks for the clarification! > > >> > + tree_vector_builder builder (vectype, npatterns, nelts_per_pattern); > >> > + > >> > + // Fill a0 for each pattern > >> > + for (unsigned i = 0; i < npatterns; i++) > >> > + builder.quick_push (build_int_cst (inner_type, rand () % 100)); > >> > + > >> > + if (nelts_per_pattern == 1) > >> > + return builder.build (); > >> > + > >> > + // Fill a1 for each pattern > >> > + for (unsigned i = 0; i < npatterns; i++) > >> > + builder.quick_push (build_int_cst (inner_type, rand () % 100)); > >> > + > >> > + if (nelts_per_pattern == 2) > >> > + return builder.build (); > >> > + > >> > + for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) > >> > + { > >> > + tree prev_elem = builder[i - npatterns]; > >> > + int prev_elem_val = TREE_INT_CST_LOW (prev_elem); > >> > + int val = prev_elem_val + S; > >> > + builder.quick_push (build_int_cst (inner_type, val)); > >> > + } > >> > + > >> > + return builder.build (); > >> > +} > >> > + > >> > +static void > >> > +validate_res (unsigned npatterns, unsigned nelts_per_pattern, > >> > + tree res, tree *expected_res) > >> > +{ > >> > + ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) == npatterns); > >> > + ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) == nelts_per_pattern); > >> > >> I don't think this is safe when the inputs are randomised. E.g. we > >> could by chance end up with a vector of all zeros, which would have > >> a single pattern and a single element per pattern, regardless of the > >> shapes of the inputs. > >> > >> Given the way that vector_builder::finalize > >> canonicalises the encoding, it should be safe to use: > >> > >> * VECTOR_CST_NPATTERNS (res) <= npatterns > >> * vector_cst_encoded_nelts (res) <= npatterns * nelts_per_pattern > >> > >> If we do that then... > >> > >> > + > >> > + for (unsigned i = 0; i < vector_cst_encoded_nelts (res); i++) > >> > >> ...this loop bound should be npatterns * nelts_per_pattern instead. > > Ah indeed. Fixed, thanks. > > The patch instead does: > > ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) <= npatterns); > ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) <= nelts_per_pattern); > > I think the version I suggested is safer. It's not the goal of the > canonicalisation algorithm to reduce both npattners and nelts_per_pattern > individually. The algorithm can increase nelts_per_pattern in order > to decrease npatterns. Oops, sorry I misread, will fix in the next patch. > > >> > + { > >> > + tree arg0 = build_vec_cst_rand (integer_type_node, 1, 3, 2); > >> > + tree arg1 = build_vec_cst_rand (integer_type_node, 1, 3, 2); > >> > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > + > >> > + vec_perm_builder builder (arg0_len, 1, 3); > >> > + builder.quick_push (arg0_len); > >> > + builder.quick_push (arg0_len + 1); > >> > + builder.quick_push (arg0_len + 2); > >> > + > >> > + vec_perm_indices sel (builder, 2, arg0_len); > >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, NULL, true); > >> > + tree expected_res[] = { vector_cst_elt (arg1, 0), vector_cst_elt (arg1, 1), > >> > + vector_cst_elt (arg1, 2) }; > >> > + validate_res (1, 3, res, expected_res); > >> > + } > >> > + > >> > + /* Case 3: Leading element of arg1, stepped sequence: pattern 0 of arg0. > >> > + sel = {len, 0, 0, 0, 2, 0, ...} > >> > + npatterns = 2, nelts_per_pattern = 3. > >> > + Use extra pattern {0, ...} to lower number of elements per pattern. */ > >> > + { > >> > + tree arg0 = build_vec_cst_rand (char_type_node, 1, 3, 2); > >> > + tree arg1 = build_vec_cst_rand (char_type_node, 1, 3, 2); > >> > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > + > >> > + vec_perm_builder builder (arg0_len, 2, 3); > >> > + builder.quick_push (arg0_len); > >> > + int mask_elems[] = { 0, 0, 0, 2, 0 }; > >> > + for (int i = 0; i < 5; i++) > >> > + builder.quick_push (mask_elems[i]); > >> > >> This leaves one of the elements unspecified. > > Sorry, I didn't understand. > > It first pushes len in: > > builder.quick_push (arg0_len) > > and then pushes the remaining indices in the loop: > > for (int i = 0; i < 5; i++) > > builder.quick_push (mask_elems[i]) > > So overall, builder will have 6 elements: {len, 0, 0, 0, 2, 0} > > Ah, right. But in that case I think it would be clearer to put arg0_len > in mask_elems. Right of course, sorry. For some reason I thought earlier I couldn't initialize array of poly_uint64 :/ > > > +/* Try to fold permutation of ARG0 and ARG1 with SEL selector when > > + the input vectors are VECTOR_CST. Return NULL_TREE otherwise. > > + REASON has same purpose as described in > > + valid_mask_for_fold_vec_perm_cst_p. */ > > + > > + > > Nit: too much vertical space. Will fix in next patch, thanks. > > > +/* Invoke tests for fold_vec_perm_cst. */ > > + > > +static void > > +test () > > +{ > > + /* Conditionally execute fold_vec_perm_cst tests, if target supports > > + VLA vectors. Use a compile time check so we avoid instantiating > > + poly_uint64 with N > 1 on targets that do not support VLA vectors. */ > > + if constexpr (poly_int_traits::num_coeffs > 1) > > That's a C++17 feature, so we can't use it. > > Instead, how about: > > static bool > is_simple_vla_size (poly_uint64 size) > { > if (size.is_constant ()) > return false; > for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i) > if (size[i] != (i <= 1 ? size[0] : 0)) Just wondering is this should be (i == 1 ? size[0] : 0) since i is initialized to 1 ? IIUC, is_simple_vla_size should return true for polynomials of first degree and having same coeff like 4 + 4x ? > return false; > return true; > } > > > FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) > { > auto nunits = GET_MODE_NUNITS (mode); > if (!is_simple_vla_size (nunits)) > continue; > if (nunits[0] ...) > test_... (mode); > ... > > } > > test_vnx4si_v4si and test_v4si_vnx4si look good. But with the > loop structure above, I think we can apply the test_vnx4si and > test_vnx16qi to more cases. So the classification isn't the > exact number of elements, but instead a limit. > > I think the nunits[0] conditions for test_vnx4si are as follows > (inspection only, so could be wrong): > > > +/* Test cases where result and input vectors are VNx4SI */ > > + > > +static void > > +test_vnx4si (machine_mode vmode) > > +{ > > + /* Case 1: mask = {0, ...} */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 1); > > + builder.quick_push (0); > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { vector_cst_elt (res, 0) }; This should be { vector_cst_elt (arg0, 0) }; will fix in next patch. > > + validate_res (1, 1, res, expected_res); > > + } > > nunits[0] >= 2 (could be all nunits if the inputs had nelts_per_pattern==1, > which I think would be better) IIUC, the vectors that can be used for a particular test should have nunits[0] >= res_npatterns, where res_npatterns is as computed in fold_vec_perm_cst without the canonicalization ? For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we require nunits[0] >= 2 ? Which implies we can use above test for vectors with length 2 + 2x, 4 + 4x, etc. Sorry if this sounds like a silly question -- Won't nunits[0] >= 2 cover all nunits, since a vector, at a minimum, will contain 2 elements ? I will send a patch shortly addressing above suggestions. Thanks, Prathamesh > > > + > > + /* Case 2: mask = {len, ...} */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 1); > > + builder.quick_push (len); > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { vector_cst_elt (arg1, 0) }; > > + validate_res (1, 1, res, expected_res); > > + } > > same > > > + > > + /* Case 3: mask = { 0, len, ... } */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 2, 1); > > + builder.quick_push (0); > > + builder.quick_push (len); > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0) }; > > + validate_res (2, 1, res, expected_res); > > + } > > nunits[0] >= 2 > > > + > > + /* Case 4: mask = { 0, len, 1, len+1, ... } */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 2, 2); > > + builder.quick_push (0); > > + builder.quick_push (len); > > + builder.quick_push (1); > > + builder.quick_push (len + 1); > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0), > > + vector_cst_elt (arg0, 1), vector_cst_elt (arg1, 1) > > + }; > > + validate_res (2, 2, res, expected_res); > > + } > > nunits[0] >= 2 > > > + > > + /* Case 5: mask = { 0, len, 1, len+1, .... } > > + npatterns = 4, nelts_per_pattern = 1 */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 4, 1); > > + builder.quick_push (0); > > + builder.quick_push (len); > > + builder.quick_push (1); > > + builder.quick_push (len + 1); > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0), > > + vector_cst_elt (arg0, 1), vector_cst_elt (arg1, 1) > > + }; > > + validate_res (4, 1, res, expected_res); > > + } > > nunits[0] >= 4 > > > + > > + /* Case 6: mask = {0, 4, ...} > > + npatterns = 1, nelts_per_pattern = 2. > > + This should return NULL_TREE because the index 4 may choose > > + from either arg0 or arg1 depending on vector length. */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 2); > > + builder.quick_push (0); > > + builder.quick_push (4); > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + ASSERT_TRUE (res == NULL_TREE); > > + } > > nunits[0] == 2 or == 4 (could be <= 4 if the inputs had nelts_per_pattern==1) > > > + > > + /* Case 7: npatterns(arg0) = 4 > npatterns(sel) = 2 > > + mask = {0, len, 1, len + 1, ...} > > + sel_npatterns = 2, sel_nelts_per_pattern = 2. */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (arg0_len, 2, 2); > > + builder.quick_push (0); > > + builder.quick_push (arg0_len); > > + builder.quick_push (1); > > + builder.quick_push (arg0_len + 1); > > + vec_perm_indices sel (builder, 2, arg0_len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg1, 0), > > + vector_cst_elt (arg0, 1), vector_cst_elt (arg1, 1) > > + }; > > + validate_res (2, 2, res, expected_res); > > + } > > nunits[0] >= 2 > > > > + > > + /* Case 8: sel = {0, 1, 2, ...} > > + npatterns = 1, nelts_per_pattern = 3 > > + expected res: { arg0[0], arg0[1], arg0[2], ... } */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2); > > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (arg0_len, 1, 3); > > + builder.quick_push (0); > > + builder.quick_push (1); > > + builder.quick_push (2); > > + > > + vec_perm_indices sel (builder, 2, arg0_len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + tree expected_res[] = { vector_cst_elt (arg0, 0), vector_cst_elt (arg0, 1), > > + vector_cst_elt (arg0, 2) }; > > + validate_res (1, 3, res, expected_res); > > + } > > all nunits[0] > > > + > > + /* Case 9: sel = {len, len + 1, len + 2, ... } > > + npatterns = 1, nelts_per_pattern = 3 > > + expected res: { op1[0], op1[1], op1[2], ... } */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2); > > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (arg0_len, 1, 3); > > + builder.quick_push (arg0_len); > > + builder.quick_push (arg0_len + 1); > > + builder.quick_push (arg0_len + 2); > > + > > + vec_perm_indices sel (builder, 2, arg0_len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, NULL); > > + tree expected_res[] = { vector_cst_elt (arg1, 0), vector_cst_elt (arg1, 1), > > + vector_cst_elt (arg1, 2) }; > > + validate_res (1, 3, res, expected_res); > > + } > > all nunits[0] > > Same idea for the others. > > Thanks, > Richard