public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "acoplan at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/115120] New: Bad interaction between ivcanon and early break vectorization
Date: Thu, 16 May 2024 16:03:07 +0000	[thread overview]
Message-ID: <bug-115120-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120

            Bug ID: 115120
           Summary: Bad interaction between ivcanon and early break
                    vectorization
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

Consider the following testcase on aarch64:

int arr[1024];
int *f()
{
    int i;
    for (i = 0; i < 1024; i++)
      if (arr[i] == 42)
        break;
    return arr + i;
}

compiled with -O3 we get the following vector loop body:

.L2:
        cmp     x2, x1
        beq     .L9
.L6:
        ldr     q31, [x1]
        add     x1, x1, 16
        mov     v27.16b, v29.16b
        mov     v28.16b, v30.16b
        cmeq    v31.4s, v31.4s, v26.4s
        add     v29.4s, v29.4s, v24.4s
        add     v30.4s, v30.4s, v25.4s
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x3, d31
        cbz     x3, .L2

it's somewhat surprising that there are two vector adds, looking at the
optimized dump:

<bb 3> [local count: 1063004408]:
  # vect_vec_iv_.6_28 = PHI <_29(10), { 0, 1, 2, 3 }(2)>
  # vect_vec_iv_.7_33 = PHI <_34(10), { 1024, 1023, 1022, 1021 }(2)>
  # ivtmp.18_19 = PHI <ivtmp.18_20(10), ivtmp.18_26(2)>
  _34 = vect_vec_iv_.7_33 + { 4294967292, 4294967292, 4294967292, 4294967292 };
  _29 = vect_vec_iv_.6_28 + { 4, 4, 4, 4 };
  _25 = (void *) ivtmp.18_19;
  vect__1.10_39 = MEM <vector(4) int> [(int *)_25];
  mask_patt_9.11_41 = vect__1.10_39 == { 42, 42, 42, 42 };
  if (mask_patt_9.11_41 != { 0, 0, 0, 0 })
    goto <bb 4>; [5.50%]
  else
    goto <bb 10>; [94.50%]

we can see that there are two IV updates that got vectorized.  It turns out
that
one of these comes from the ivcanon pass.  If I add -fno-tree-loop-ivcanon we
instead get the following vector loop body:

.L2:
        cmp     x2, x1
        beq     .L9
.L6:
        ldr     q31, [x1]
        add     x1, x1, 16
        mov     v29.16b, v30.16b
        add     v30.4s, v30.4s, v27.4s
        cmeq    v31.4s, v31.4s, v28.4s
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x3, d31
        cbz     x3, .L2

which is much cleaner.  Looking at the tree dumps, the ivcanon pass makes the
following transformation:

--- cddce2.tree 2024-05-16 13:49:10.426703350 +0000
+++ ivcanon.tree        2024-05-16 13:49:17.678874925 +0000
@@ -4,6 +4,8 @@
   int i;
   int _1;
   int * _8;
+  unsigned int ivtmp_11;
+  unsigned int ivtmp_12;
   long unsigned int _13;
   long unsigned int _15;
   long unsigned int prephitmp_16;
@@ -12,6 +14,7 @@

   <bb 3> [local count: 1063004408]:
   # i_10 = PHI <i_7(7), 0(2)>
+  # ivtmp_12 = PHI <ivtmp_11(7), 1024(2)>
   _1 = arr[i_10];
   if (_1 == 42)
     goto <bb 5>; [5.50%]
@@ -20,7 +23,8 @@

   <bb 4> [local count: 1004539166]:
   i_7 = i_10 + 1;
-  if (i_7 != 1024)
+  ivtmp_11 = ivtmp_12 - 1;
+  if (ivtmp_11 != 0)
     goto <bb 7>; [98.93%]
   else
     goto <bb 8>; [1.07%]

i.e. it introduces the backwards-counting IV.  It seems in the general case
without vectorization ivopts then cleans this up and ensures we only have a
single IV.

In the vectorized case it seems this problem only shows up with early break
vectorization. Looking at a simple reduction, such as:

int a[1024];
int g()
{
    int sum = 0;
    for (int i = 0; i < 1024; i++)
        sum += a[i];
    return sum;
}

although we still have the backwards-counting IV in ifcvt:

  <bb 3> [local count: 1063004408]:
  # sum_9 = PHI <sum_5(5), 0(2)>
  # i_11 = PHI <i_6(5), 0(2)>
  # ivtmp_8 = PHI <ivtmp_7(5), 1024(2)>
  _1 = a[i_11];
  sum_5 = _1 + sum_9;
  i_6 = i_11 + 1;
  ivtmp_7 = ivtmp_8 - 1;
  if (ivtmp_7 != 0)
    goto <bb 5>; [98.99%]
  else
    goto <bb 4>; [1.01%]

we end up with only scalar IVs after vectorization, and the backwards scalar IV
ends up getting deleted by dce6:

Deleting : ivtmp_7 = ivtmp_8 - 1;

I'm not sure what the right solution is but we should avoid having duplicated
IVs with early break vectorization.

             reply	other threads:[~2024-05-16 16:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-16 16:03 acoplan at gcc dot gnu.org [this message]
2024-05-17  6:45 ` [Bug tree-optimization/115120] " rguenth at gcc dot gnu.org
2024-05-17  6:46 ` rguenth at gcc dot gnu.org
2024-05-17  7:47 ` tnfchris at gcc dot gnu.org
2024-06-24 18:09 ` tnfchris at gcc dot gnu.org
2024-06-25  7:03 ` tnfchris at gcc dot gnu.org
2024-06-25 12:46 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-115120-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).