From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1412 invoked by alias); 18 Nov 2019 11:04:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 1403 invoked by uid 89); 18 Nov 2019 11:04:10 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS,UNSUBSCRIBE_BODY autolearn=ham version=3.3.1 spammy=wi, lengths, indicator, Ie X-HELO: mail-lj1-f181.google.com Received: from mail-lj1-f181.google.com (HELO mail-lj1-f181.google.com) (209.85.208.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 18 Nov 2019 11:04:06 +0000 Received: by mail-lj1-f181.google.com with SMTP id d5so18399807ljl.4 for ; Mon, 18 Nov 2019 03:04:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=hG3W5rIQ3fEPSfLE9BFkf9G+hDGATose0oIWrkqeEf0=; b=Tbb8fiJqlNX+FTIzZxb5Ddkq8ImZG3HlNdIgCDXuSJiAzS5XJBenIN6z3gchuEIHVK 1OjHWbn2X/HbDVBD56pfOddDdF9EK225h/9cnQcwjpHEcPrm0L/glPVN+KzHHLpdaZqj 1gTF7yXL+wFsqxmFnqkzWxaEbAkXALo268P+99ELLQXY9pXKpGHVGFsyv6D1XYLYIvA6 5B4m5CkE9muXmZAQntj0q1WAExZ5vi7x/pg9S/yRb7Nibp/osqHKnR7AYMBZ7JaMsbn1 Gq58UlZSsCg1Cz8SlHie80Wx9YEPRrV3sn/+XmYi8YwSN6lHgUaZ6PKvSwDM0fmEL0Y/ 4t0Q== MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 18 Nov 2019 11:05:00 -0000 Message-ID: Subject: Re: [8/8] Optimise WAR and WAW alias checks To: GCC Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg01686.txt.bz2 On Mon, Nov 11, 2019 at 7:52 PM Richard Sandiford wrote: > > For: > > void > f1 (int *x, int *y) > { > for (int i = 0; i < 32; ++i) > x[i] += y[i]; > } > > we checked at runtime whether one vector at x would overlap one vector > at y. But in cases like this, the vector code would handle x <= y just > fine, since any write to address A still happens after any read from > address A. The only problem is if x is ahead of y by less than a > vector. > > The same is true for two writes: > > void > f2 (int *x, int *y) > { > for (int i = 0; i < 32; ++i) > { > x[i] = i; > y[i] = 2; > } > } > > if y <= x then a vector write at y after a vector write at x would > have the same net effect as the original scalar writes. > > This patch optimises the alias checks for these two cases. E.g., > before the patch, f1 used: > > add x2, x0, 15 > sub x2, x2, x1 > cmp x2, 30 > bls .L2 > > whereas after the patch it uses: > > add x2, x1, 4 > sub x2, x0, x2 > cmp x2, 8 > bls .L2 > > Read-after-write cases like: > > int > f3 (int *x, int *y) > { > int res = 0; > for (int i = 0; i < 32; ++i) > { > x[i] = i; > res += y[i]; > } > return res; > } > > can cope with x == y, but otherwise don't allow overlap in either > direction. Since checking for x == y at runtime would require extra > code, we're probably better off sticking with the current overlap test. > > An overlap test is also needed if the scalar or vector accesses covered > by the alias check are mixed together, rather than all statements for > the second access following all statements for the first access. > > The new code for gcc.target/aarch64/sve/var_strict_[135].c is slightly > better than before. OK. Thanks, Richard. > > 2019-11-11 Richard Sandiford > > gcc/ > * tree-data-ref.c (create_intersect_range_checks_index): If the > alias pair describes simple WAW and WAR dependencies, just check > whether the first B access overlaps later A accesses. > (create_waw_or_war_checks): New function that performs the same > optimization on addresses. > (create_intersect_range_checks): Call it. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-8.c: Expect WAR/WAW checks to be used. > * gcc.dg/vect/vect-alias-check-14.c: Likewise. > * gcc.dg/vect/vect-alias-check-15.c: Likewise. > * gcc.dg/vect/vect-alias-check-18.c: Likewise. > * gcc.dg/vect/vect-alias-check-19.c: Likewise. > * gcc.target/aarch64/sve/var_stride_1.c: Update expected sequence. > * gcc.target/aarch64/sve/var_stride_2.c: Likewise. > * gcc.target/aarch64/sve/var_stride_3.c: Likewise. > * gcc.target/aarch64/sve/var_stride_5.c: Likewise. > > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/tree-data-ref.c 2019-11-11 18:32:13.186616541 +0000 > @@ -1806,6 +1806,8 @@ create_intersect_range_checks_index (cla > abs_step, &niter_access2)) > return false; > > + bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == 0; > + > unsigned int i; > for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) > { > @@ -1907,16 +1909,57 @@ create_intersect_range_checks_index (cla > > Combining the tests requires limit to be computable in an unsigned > form of the index type; if it isn't, we fall back to the usual > - pointer-based checks. */ > - poly_offset_int limit = (idx_len1 + idx_access1 - 1 > - + idx_len2 + idx_access2 - 1); > + pointer-based checks. > + > + We can do better if DR_B is a write and if DR_A and DR_B are > + well-ordered in both the original and the new code (see the > + comment above the DR_ALIAS_* flags for details). In this case > + we know that for each i in [0, n-1], the write performed by > + access i of DR_B occurs after access numbers j<=i of DR_A in > + both the original and the new code. Any write or anti > + dependencies wrt those DR_A accesses are therefore maintained. > + > + We just need to make sure that each individual write in DR_B does not > + overlap any higher-indexed access in DR_A; such DR_A accesses happen > + after the DR_B access in the original code but happen before it in > + the new code. > + > + We know the steps for both accesses are equal, so by induction, we > + just need to test whether the first write of DR_B overlaps a later > + access of DR_A. In other words, we need to move min1 along by > + one iteration: > + > + min1' = min1 + idx_step > + > + and use the ranges: > + > + [min1' + low_offset1', min1' + high_offset1' + idx_access1 - 1] > + > + and: > + > + [min2, min2 + idx_access2 - 1] > + > + where: > + > + low_offset1' = +ve step ? 0 : -(idx_len1 - |idx_step|) > + high_offset1' = +ve_step ? idx_len1 - |idx_step| : 0. */ > + if (waw_or_war_p) > + idx_len1 -= abs_idx_step; > + > + poly_offset_int limit = idx_len1 + idx_access1 - 1 + idx_access2 - 1; > + if (!waw_or_war_p) > + limit += idx_len2; > + > tree utype = unsigned_type_for (TREE_TYPE (min1)); > if (!wi::fits_to_tree_p (limit, utype)) > return false; > > poly_offset_int low_offset1 = neg_step ? -idx_len1 : 0; > - poly_offset_int high_offset2 = neg_step ? 0 : idx_len2; > + poly_offset_int high_offset2 = neg_step || waw_or_war_p ? 0 : idx_len2; > poly_offset_int bias = high_offset2 + idx_access2 - 1 - low_offset1; > + /* Equivalent to adding IDX_STEP to MIN1. */ > + if (waw_or_war_p) > + bias -= wi::to_offset (idx_step); > > tree subject = fold_build2 (MINUS_EXPR, utype, > fold_convert (utype, min2), > @@ -1932,7 +1975,169 @@ create_intersect_range_checks_index (cla > *cond_expr = part_cond_expr; > } > if (dump_enabled_p ()) > - dump_printf (MSG_NOTE, "using an index-based overlap test\n"); > + { > + if (waw_or_war_p) > + dump_printf (MSG_NOTE, "using an index-based WAR/WAW test\n"); > + else > + dump_printf (MSG_NOTE, "using an index-based overlap test\n"); > + } > + return true; > +} > + > +/* A subroutine of create_intersect_range_checks, with a subset of the > + same arguments. Try to optimize cases in which the second access > + is a write and in which some overlap is valid. */ > + > +static bool > +create_waw_or_war_checks (tree *cond_expr, > + const dr_with_seg_len_pair_t &alias_pair) > +{ > + const dr_with_seg_len& dr_a = alias_pair.first; > + const dr_with_seg_len& dr_b = alias_pair.second; > + > + /* Check for cases in which: > + > + (a) DR_B is always a write; > + (b) the accesses are well-ordered in both the original and new code > + (see the comment above the DR_ALIAS_* flags for details); and > + (c) the DR_STEPs describe all access pairs covered by ALIAS_PAIR. */ > + if (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) > + return false; > + > + /* Check for equal (but possibly variable) steps. */ > + tree step = DR_STEP (dr_a.dr); > + if (!operand_equal_p (step, DR_STEP (dr_b.dr))) > + return false; > + > + /* Make sure that we can operate on sizetype without loss of precision. */ > + tree addr_type = TREE_TYPE (DR_BASE_ADDRESS (dr_a.dr)); > + if (TYPE_PRECISION (addr_type) != TYPE_PRECISION (sizetype)) > + return false; > + > + /* All addresses involved are known to have a common alignment ALIGN. > + We can therefore subtract ALIGN from an exclusive endpoint to get > + an inclusive endpoint. In the best (and common) case, ALIGN is the > + same as the access sizes of both DRs, and so subtracting ALIGN > + cancels out the addition of an access size. */ > + unsigned int align = MIN (dr_a.align, dr_b.align); > + poly_uint64 last_chunk_a = dr_a.access_size - align; > + poly_uint64 last_chunk_b = dr_b.access_size - align; > + > + /* Get a boolean expression that is true when the step is negative. */ > + tree indicator = dr_direction_indicator (dr_a.dr); > + tree neg_step = fold_build2 (LT_EXPR, boolean_type_node, > + fold_convert (ssizetype, indicator), > + ssize_int (0)); > + > + /* Get lengths in sizetype. */ > + tree seg_len_a > + = fold_convert (sizetype, rewrite_to_non_trapping_overflow (dr_a.seg_len)); > + step = fold_convert (sizetype, rewrite_to_non_trapping_overflow (step)); > + > + /* Each access has the following pattern: > + > + <- |seg_len| -> > + <--- A: -ve step ---> > + +-----+-------+-----+-------+-----+ > + | n-1 | ..... | 0 | ..... | n-1 | > + +-----+-------+-----+-------+-----+ > + <--- B: +ve step ---> > + <- |seg_len| -> > + | > + base address > + > + where "n" is the number of scalar iterations covered by the segment. > + > + A is the range of bytes accessed when the step is negative, > + B is the range when the step is positive. > + > + We know that DR_B is a write. We also know (from checking that > + DR_A and DR_B are well-ordered) that for each i in [0, n-1], > + the write performed by access i of DR_B occurs after access numbers > + j<=i of DR_A in both the original and the new code. Any write or > + anti dependencies wrt those DR_A accesses are therefore maintained. > + > + We just need to make sure that each individual write in DR_B does not > + overlap any higher-indexed access in DR_A; such DR_A accesses happen > + after the DR_B access in the original code but happen before it in > + the new code. > + > + We know the steps for both accesses are equal, so by induction, we > + just need to test whether the first write of DR_B overlaps a later > + access of DR_A. In other words, we need to move addr_a along by > + one iteration: > + > + addr_a' = addr_a + step > + > + and check whether: > + > + [addr_b, addr_b + last_chunk_b] > + > + overlaps: > + > + [addr_a' + low_offset_a, addr_a' + high_offset_a + last_chunk_a] > + > + where [low_offset_a, high_offset_a] spans accesses [1, n-1]. I.e.: > + > + low_offset_a = +ve step ? 0 : seg_len_a - step > + high_offset_a = +ve step ? seg_len_a - step : 0 > + > + This is equivalent to testing whether: > + > + addr_a' + low_offset_a <= addr_b + last_chunk_b > + && addr_b <= addr_a' + high_offset_a + last_chunk_a > + > + Converting this into a single test, there is an overlap if: > + > + 0 <= addr_b + last_chunk_b - addr_a' - low_offset_a <= limit > + > + where limit = high_offset_a - low_offset_a + last_chunk_a + last_chunk_b > + > + If DR_A is performed, limit + |step| - last_chunk_b is known to be > + less than the size of the object underlying DR_A. We also know > + that last_chunk_b <= |step|; this is checked elsewhere if it isn't > + guaranteed at compile time. There can therefore be no overflow if > + "limit" is calculated in an unsigned type with pointer precision. */ > + tree addr_a = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a.dr), > + DR_OFFSET (dr_a.dr)); > + addr_a = fold_build_pointer_plus (addr_a, DR_INIT (dr_a.dr)); > + > + tree addr_b = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b.dr), > + DR_OFFSET (dr_b.dr)); > + addr_b = fold_build_pointer_plus (addr_b, DR_INIT (dr_b.dr)); > + > + /* Advance ADDR_A by one iteration and adjust the length to compensate. */ > + addr_a = fold_build_pointer_plus (addr_a, step); > + tree seg_len_a_minus_step = fold_build2 (MINUS_EXPR, sizetype, > + seg_len_a, step); > + if (!CONSTANT_CLASS_P (seg_len_a_minus_step)) > + seg_len_a_minus_step = build1 (SAVE_EXPR, sizetype, seg_len_a_minus_step); > + > + tree low_offset_a = fold_build3 (COND_EXPR, sizetype, neg_step, > + seg_len_a_minus_step, size_zero_node); > + if (!CONSTANT_CLASS_P (low_offset_a)) > + low_offset_a = build1 (SAVE_EXPR, sizetype, low_offset_a); > + > + /* We could use COND_EXPR , > + but it's usually more efficient to reuse the LOW_OFFSET_A result. */ > + tree high_offset_a = fold_build2 (MINUS_EXPR, sizetype, seg_len_a_minus_step, > + low_offset_a); > + > + /* The amount added to addr_b - addr_a'. */ > + tree bias = fold_build2 (MINUS_EXPR, sizetype, > + size_int (last_chunk_b), low_offset_a); > + > + tree limit = fold_build2 (MINUS_EXPR, sizetype, high_offset_a, low_offset_a); > + limit = fold_build2 (PLUS_EXPR, sizetype, limit, > + size_int (last_chunk_a + last_chunk_b)); > + > + tree subject = fold_build2 (POINTER_DIFF_EXPR, ssizetype, addr_b, addr_a); > + subject = fold_build2 (PLUS_EXPR, sizetype, > + fold_convert (sizetype, subject), bias); > + > + *cond_expr = fold_build2 (GT_EXPR, boolean_type_node, subject, limit); > + if (dump_enabled_p ()) > + dump_printf (MSG_NOTE, "using an address-based WAR/WAW test\n"); > return true; > } > > @@ -2036,6 +2241,9 @@ create_intersect_range_checks (class loo > if (create_intersect_range_checks_index (loop, cond_expr, alias_pair)) > return; > > + if (create_waw_or_war_checks (cond_expr, alias_pair)) > + return; > + > unsigned HOST_WIDE_INT min_align; > tree_code cmp_code; > /* We don't have to check DR_ALIAS_MIXED_STEPS here, since both versions > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c 2019-11-11 18:32:13.186616541 +0000 > @@ -60,5 +60,5 @@ main (void) > } > > /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */ > -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */ > +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */ > /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c 2019-11-11 18:32:13.186616541 +0000 > @@ -60,5 +60,5 @@ main (void) > > /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */ > /* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */ > -/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */ > +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */ > /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c 2019-11-11 18:32:13.186616541 +0000 > @@ -57,5 +57,5 @@ main (void) > } > > /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */ > -/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */ > +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */ > /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c 2019-11-11 18:32:13.186616541 +0000 > @@ -60,5 +60,5 @@ main (void) > } > > /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */ > -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */ > +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */ > /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c 2019-11-11 18:32:13.186616541 +0000 > @@ -58,5 +58,5 @@ main (void) > } > > /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */ > -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */ > +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */ > /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c 2019-11-11 18:32:13.186616541 +0000 > @@ -15,13 +15,9 @@ f (TYPE *x, TYPE *y, unsigned short n, l > /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */ > /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */ > /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */ > -/* Should multiply by (VF-1)*4 rather than (257-1)*4. */ > -/* { dg-final { scan-assembler-not {, 1024} } } */ > -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */ > -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */ > -/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */ > -/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */ > -/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */ > -/* Two range checks and a check for n being zero. */ > -/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */ > -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */ > +/* Should use a WAR check that multiplies by (VF-2)*4 rather than > + an overlap check that multiplies by (257-1)*4. */ > +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */ > +/* One range check and a check for n being zero. */ > +/* { dg-final { scan-assembler-times {\t(?:cmp|tst)\t} 1 } } */ > +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2019-11-11 18:32:13.186616541 +0000 > @@ -15,7 +15,7 @@ f (TYPE *x, TYPE *y, unsigned short n, u > /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */ > /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */ > /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */ > -/* Should multiply by (257-1)*4 rather than (VF-1)*4. */ > +/* Should multiply by (257-1)*4 rather than (VF-1)*4 or (VF-2)*4. */ > /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x2, 10, 16\n} 1 } } */ > /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x3, 10, 16\n} 1 } } */ > /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c 2019-11-11 18:32:13.186616541 +0000 > @@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, int n, long m __att > /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */ > /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */ > /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */ > -/* Should multiply by (VF-1)*4 rather than (257-1)*4. */ > -/* { dg-final { scan-assembler-not {, 1024} } } */ > -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */ > -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */ > -/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */ > -/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */ > -/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */ > -/* Two range checks and a check for n being zero. */ > -/* { dg-final { scan-assembler {\tcmp\t} } } */ > -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */ > +/* Should use a WAR check that multiplies by (VF-2)*4 rather than > + an overlap check that multiplies by (257-1)*4. */ > +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */ > +/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */ > +/* One range check and a check for n being zero. */ > +/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */ > +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c 2019-11-11 18:32:12.000000000 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c 2019-11-11 18:32:13.186616541 +0000 > @@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, long n, long m __at > /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */ > /* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */ > /* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */ > -/* Should multiply by (VF-1)*8 rather than (257-1)*8. */ > -/* { dg-final { scan-assembler-not {, 2048} } } */ > -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */ > -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]11} } } */ > -/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */ > -/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */ > -/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */ > -/* Two range checks and a check for n being zero. */ > -/* { dg-final { scan-assembler {\tcmp\t} } } */ > -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */ > +/* Should use a WAR check that multiplies by (VF-2)*8 rather than > + an overlap check that multiplies by (257-1)*4. */ > +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #16\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */ > +/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */ > +/* One range check and a check for n being zero. */ > +/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */ > +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */