From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id 55A283858414 for ; Wed, 11 Oct 2023 11:28:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 55A283858414 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-4066692ad35so63041615e9.1 for ; Wed, 11 Oct 2023 04:28:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1697023706; x=1697628506; darn=gcc.gnu.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=p9SisC5FHTvV6Mgqhzw6QWid87B/vEf5inWOrzOvMHI=; b=Gtz8V2mieNK8ZZMuPBLxaJB4zu41ljDQJNASCcDH7tCYOxeFrS/6XvGpl/WFxMobRZ +91TrcC92hRmdZGn14dHrkSZj0qiNOaSVWl/6n1FQ0ooqu+ZEv2EY0pKOCs6ZVdWi5Wv t0BnbSAAISyc7Z5b0WQKcaMcaWrKAvqktlAtAGQk/EyuMDqlu9v1bdZsUh98TpGmU9fp jWHe6NjFP61kQqdWMKx9MT2MZVTa/hcuwCsn4j5jeIa6tc5oBaNS9vp2pVNA9I0B0Ysd IvhOFHDceh9K0mc1r4iz2PC+3rfVRiMNxArXNMIhencp+z2ZyoEPcoCq/yCahUdeN5EH IbTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697023706; x=1697628506; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p9SisC5FHTvV6Mgqhzw6QWid87B/vEf5inWOrzOvMHI=; b=w+AKheMwmLSpNpbrFXN6iB1jXK3QFzbDBwGGXkc6+b15xoTcoVu9KtpUQtBx2NLED9 1wrat858JdFvB6dV2oquz6OGXMx13E0hAV7McmFknQ5OtV7YACO5OIabIVmwbZCAS5NN rkvPsGBOHiFsxLL0IRSKV9jrcbY6yhG97YLoCBQUFNJ551dbD23V5Zo4NeRt9O1ZCfo7 3WrncAiHxBp57GmUbp3+WEZE1vcNX/8+sHU9hGbuSBOz4R6NGs50rrHHRcUSWBUC3nvM sU+77Z4eFZQ2WxGW/UuViRTQsdColGn+izJOjqvHh1VIdbId9lPo7mcTDGay3y78+Thb CybA== X-Gm-Message-State: AOJu0YwS8FW0C4rIElTMSGrsjHuuksKGeZUN3Q8nB4KWbUYBe+JbFk/m aqFqyyI8Rn8R/yQlq8n9mZ9c8MaZIQNVaK89cf/fIORlGBlvD0Qz X-Google-Smtp-Source: AGHT+IHvLkXLXdnTfQjCwIudmh+/ud37pE/5KqsizzLlpHy+SMyM6EmbfPhF+9er+pX0I55iGo444JMrJCGlGcdH8qA= X-Received: by 2002:a7b:c858:0:b0:404:737a:17d with SMTP id c24-20020a7bc858000000b00404737a017dmr17173583wml.9.1697023706008; Wed, 11 Oct 2023 04:28:26 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Wed, 11 Oct 2023 16:57:50 +0530 Message-ID: Subject: Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni wrote: > > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford > wrote: > > > > Prathamesh Kulkarni writes: > > > Hi, > > > The attached patch attempts to fix PR111648. > > > As mentioned in PR, the issue is when a1 is a multiple of vector > > > length, we end up creating following encoding in result: { base_elem, > > > arg[0], arg[1], ... } (assuming S = 1), > > > where arg is chosen input vector, which is incorrect, since the > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } > > > > > > For the test-case mentioned in PR, vectorizer pass creates > > > VEC_PERM_EXPR where: > > > arg0: { -16, -9, -10, -11 } > > > arg1: { -12, -5, -6, -7 } > > > sel = { 3, 4, 5, 6 } > > > > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with > > > following encoding: > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 > > > = { -11, -12, -5 } > > > > > > So for res[3], it used S = (-5) - (-12) = 7 > > > And hence computed it as -5 + 7 = 2. > > > instead of selecting arg1[2], ie, -6. > > > > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple > > > of vector length, so a1 ... ae select elements only from stepped part > > > of the pattern > > > from input vector and return false for this case. > > > > > > Since the vectors are VLS, fold_vec_perm_cst then sets: > > > res_npatterns = res_nelts > > > res_nelts_per_pattern = 1 > > > which seems to fix the issue by encoding all the elements. > > > > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, > > > which used a1 = 0, and thus selected arg1[0]. > > > > > > I removed Case 4 because it was already covered in test_nunits_min_4, > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } > > > and added a new Case 9 to test for this issue. > > > > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE, > > > and on x86_64-linux-gnu. > > > Does the patch look OK ? > > > > > > Thanks, > > > Prathamesh > > > > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. > > > > > > gcc/ChangeLog: > > > PR tree-optimization/111648 > > > * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 > > > is a multiple of vector length. > > > (test_nunits_min_2): Remove Case 4 and move Case 5 to ... > > > (test_nunits_min_4): ... here and rename case numbers. Also add > > > Case 9. > > > > > > gcc/testsuite/ChangeLog: > > > PR tree-optimization/111648 > > > * gcc.dg/vect/pr111648.c: New test. > > > > > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > > index 4f8561509ff..c5f421d6b76 100644 > > > --- a/gcc/fold-const.cc > > > +++ b/gcc/fold-const.cc > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > return false; > > > } > > > > > > - /* Ensure that the stepped sequence always selects from the same > > > - input pattern. */ > > > + /* Ensure that the stepped sequence always selects from the stepped > > > + part of same input pattern. */ > > > unsigned arg_npatterns > > > = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > > > : VECTOR_CST_NPATTERNS (arg1); > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > *reason = "step is not multiple of npatterns"; > > > return false; > > > } > > > + > > > + /* If a1 is a multiple of len, it will select base element of input > > > + vector resulting in following encoding: > > > + { base_elem, arg[0], arg[1], ... } where arg is the chosen input > > > + vector. This encoding is not originally present in arg, since it's > > > + defined as: > > > + { arg[0], arg[1], arg[2], ... }. */ > > > + > > > + if (multiple_p (a1, arg_len)) > > > + { > > > + if (reason) > > > + *reason = "selecting base element of input vector"; > > > + return false; > > > + } > > > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the > > second argument has 2 stepped patterns. > Ah right, thanks for pointing out. In the attached patch I extended the check > so that r1 < arg_npatterns which should check if we are choosing base > elements from any of the patterns in arg (and not just first). > Does that look OK ? > > > > The equivalent condition that handles multiple patterns would > > probably be to reject q1 < arg_npatterns. But that's only necessary if: > Sorry, I don't understand -- we use q1 only for determining which > vector to choose from, > and r1 will give the index for first element ? > > > > (1) the argument has three elements per pattern (i.e. has a stepped > > sequence) and > > > > (2) element 2 - element 1 != element 1 - element 0 > > > > I think we should check those to avoid pessimising VLA cases. > Thanks for the suggestions. In attached POC patch (stage-1 tested), I > added the above checks, > does it look in the right direction ? Also, should this patch be the > right fix for PR111754 ? Oops sorry, this patch causes build errors on aarch64. Please ignore it. Sorry for the noise. Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Richard > > > > > } > > > > > > return true; > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) > > > tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > > > validate_res (2, 2, res, expected_res); > > > } > > > - > > > - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) > > > - Test that the stepped sequence of the pattern selects from > > > - same input pattern. Since input vectors have npatterns = 2, > > > - and step (a2 - a1) = 1, step is not a multiple of npatterns > > > - in input vector. So return NULL_TREE. */ > > > - { > > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > > - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > - > > > - vec_perm_builder builder (len, 1, 3); > > > - poly_uint64 mask_elems[] = { 0, 0, 1 }; > > > - builder_push_elems (builder, mask_elems); > > > - > > > - vec_perm_indices sel (builder, 2, len); > > > - const char *reason; > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, > > > - &reason); > > > - ASSERT_TRUE (res == NULL_TREE); > > > - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > > - } > > > - > > > - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) > > > - Test that stepped sequence of the pattern selects from arg0. > > > - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > > > - { > > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > > - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > - > > > - vec_perm_builder builder (len, 1, 3); > > > - poly_uint64 mask_elems[] = { len, 0, 1 }; > > > - builder_push_elems (builder, mask_elems); > > > - > > > - vec_perm_indices sel (builder, 2, len); > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > > - > > > - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > > > - validate_res (1, 3, res, expected_res); > > > - } > > > } > > > } > > > > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) > > > validate_res (1, 3, res, expected_res); > > > } > > > > > > - /* Case 4: > > > + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) > > > + Test that stepped sequence of the pattern selects from arg0. > > > + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ > > > + { > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > + > > > + vec_perm_builder builder (len, 1, 3); > > > + poly_uint64 mask_elems[] = { len, 1, 2 }; > > > + builder_push_elems (builder, mask_elems); > > > + > > > + vec_perm_indices sel (builder, 2, len); > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > > + > > > + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; > > > + validate_res (1, 3, res, expected_res); > > > + } > > > + > > > + /* Case 5: > > > sel = { len, 0, 2, ... } // (1, 3) > > > This should return NULL because we cross the input vectors. > > > Because, > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) > > > ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); > > > } > > > > > > - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 > > > + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 > > > mask = { 0, len, 1, len + 1, ...} // (2, 2) > > > res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) > > > > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) > > > validate_res (2, 2, res, expected_res); > > > } > > > > > > - /* Case 6: Test combination in sel, where one pattern is dup and other > > > + /* Case 7: Test combination in sel, where one pattern is dup and other > > > is stepped sequence. > > > sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) > > > res = { arg0[0], arg0[0], arg0[0], > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) > > > validate_res (2, 3, res, expected_res); > > > } > > > > > > - /* Case 7: PR111048: Check that we set arg_npatterns correctly, > > > + /* Case 8: PR111048: Check that we set arg_npatterns correctly, > > > when arg0, arg1 and sel have different number of patterns. > > > arg0 is of shape (1, 1) > > > arg1 is of shape (4, 1) > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) > > > ASSERT_TRUE (res == NULL_TREE); > > > ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > > } > > > + > > > + /* Case 9: PR111648 - a1 is multiple of vector length, > > > + which results in incorrect encoding. Verify that we return > > > + NULL for this case. > > > + sel = { base_elem, len, len+1, ... } // (1, 3) > > > + In this case, the single pattern is: { base_elem len, len+1, ...} > > > + Let's assume that base_elem is used for indexing into arg0, > > > + and a1 ... ae chooses elements from arg1. > > > + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) > > > + Which creates an incorrect encoding with S = arg1[1] - arg1[0] > > > + while the original encoding in arg1 is > > > + arg1: { arg1[0], arg1[1], arg1[2], ... } > > > + with S = arg1[2] - arg1[1]. > > > + > > > + As a concrete example, for above PR: > > > + arg0: { -16, -9, -10, -11 } > > > + arg1: { -12, -5, -6, -7 } > > > + sel = { 3, 4, 5, 6 } > > > + > > > + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > > + Since a1 = 4 and arg_len = 4, it ended up creating the result with > > > + following encoding: > > > + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) > > > + = { -11, -12, -5 } > > > + > > > + So for res[3], it used S = (-5) - (-12) = 7 > > > + And hence computed it as -5 + 7 = 2. > > > + instead of arg1[2], ie, -6, which is the correct value. > > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ > > > + { > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3); > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3); > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > + > > > + vec_perm_builder builder (len, 1, 3); > > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > > + builder_push_elems (builder, mask_elems); > > > + > > > + vec_perm_indices sel (builder, 2, len); > > > + const char *reason; > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > > > + ASSERT_TRUE (res == NULL_TREE); > > > + ASSERT_TRUE (!strcmp (reason, > > > + "selecting base element of input vector")); > > > + } > > > } > > > } > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > > > new file mode 100644 > > > index 00000000000..093e2b02654 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > > > @@ -0,0 +1,23 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > + > > > +int a; > > > +int *b = &a; > > > +static int **c = &b; > > > +static int d; > > > +short e; > > > +short f; > > > + > > > +_Bool foo () > > > +{ > > > + f = -21; > > > + for (; f < -5; f++) { > > > + e = f ^ 3; > > > + d = *b; > > > + **c = e; > > > + } > > > + > > > + return d == -6; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */