public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/104368] [12 Regression] Failure to vectorise conditional grouped accesses after PR102659
Date: Fri, 04 Feb 2022 07:58:14 +0000	[thread overview]
Message-ID: <bug-104368-4-AD49ZbRXIH@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-104368-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104368

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-02-04
             Status|UNCONFIRMED                 |NEW
                 CC|                            |amacleod at redhat dot com

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  On x86 with AVX2 we don't get this vectorized anymore for the same
reason.

t.c:5:15: missed:  failed: evolution of base is not affine.
        base_address:
        offset from base address:
        constant offset from base address:
        step:
        base alignment: 0
        base misalignment: 0
        offset alignment: 0
        step alignment: 0
        base_object: *_8
Creating dr for *_12

if-conversion now produces

...
  _47 = (unsigned long) y_21(D);
..
# i_26 = PHI <i_23(8), 0(15)>
_1 = (long unsigned int) i_26;
_2 = _1 * 4;
_3 = x_20(D) + _2;
_4 = *_3;
_45 = (unsigned int) i_26;
_46 = _45 * 2;
_5 = (int) _46;
_6 = (long unsigned int) _5;
_7 = _6 * 4;
_48 = _47 + _7;
_8 = (int *) _48;
_49 = _4 > 0;
_9 = .MASK_LOAD (_8, 32B, _49);
_10 = _6 + 1;
_11 = _10 * 4;
_51 = _11 + _47;
_12 = (int *) _51;
_13 = .MASK_LOAD (_12, 32B, _49);
_52 = (unsigned int) _9;
_53 = (unsigned int) _13;
_54 = _52 + _53;
_14 = (int) _54;
.MASK_STORE (_3, 32B, _49, _14);
i_23 = i_26 + 1;
if (n_19(D) > i_23)
  goto <bb 8>; [89.00%]
else
  goto <bb 6>; [11.00%]


note that if-conversion is correct in rewriting i*2 and i*2 + 1 to unsigned
arithmetic since that will now execute unconditionally and can overflow.

In the end the issue is that the multiplication by the element size is
done in sizetype and so y[i*2] and y[i*2+1] might not be adjacent.  What
we miss is that iff the stmts were executed then because of undefined overflow
they will always be adjacent.

IMHO the only good way to recover is to scrap the separate if-conversion step
and do vectorization on the original IL.  Or integrate the two passes
as much as to allow dataref analysis on the not if-converted IL.

Another possibility (and long-standing TODO) is to teach SCEV analysis
to derive assumptions we can version the loop on - in this case that
i*2 + 1 does not overflow.

Note in this particular case we probably miss to see that

i is in [0,INT_MAX-1] and thus (unsigned)i * 2 + 1 never wraps

(unless I miss something).  We have

  <bb 3> [local count: 955630226]:
  # RANGE [0, 2147483647] NONZERO 2147483647
  # i_26 = PHI <i_23(8), 0(15)>
  # RANGE [0, 2147483646] NONZERO 2147483647
  _1 = (long unsigned int) i_26;
  # RANGE [0, 8589934584] NONZERO 8589934588
  _2 = _1 * 4;
  # PT = null { D.2435 } (nonlocal, restrict)
  _3 = x_20(D) + _2;
  _4 = MEM[(int *)_3 clique 1 base 1];
  _45 = (unsigned int) i_26;
  _46 = _45 * 2;
  _5 = (int) _46;
  _6 = (long unsigned int) _5;
  _7 = _6 * 4;
  _48 = _47 + _7;

so unfortunately while _1 has that correct range, i_26 does not and the
ifcvt generated stmts don't either.  It might be possible to throw
ranger on the if-converted body.

Andrew - if we'd like to do that, in tree-if-conv.cc in tree_if_conversion ()
after we've produced the final IL (after the call to ifcvt_hoist_invariants),
is there a way to invoke ranger on the stmts of the (single-BB) loop
and have it adjust the global ranges?  In particular - see above, it
would need to somehow improve the global range of the i_26 IV.

The pass creates blocks and destroys edges, so I'm not sure if we can
reasonably use a caching instance over its lifetime so cost per loop would
be a limiting factor.

  parent reply	other threads:[~2022-02-04  7:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03 14:51 [Bug tree-optimization/104368] New: " rsandifo at gcc dot gnu.org
2022-02-03 23:42 ` [Bug tree-optimization/104368] " pinskia at gcc dot gnu.org
2022-02-04  7:58 ` rguenth at gcc dot gnu.org [this message]
2022-05-06  8:32 ` [Bug tree-optimization/104368] [12/13 " jakub at gcc dot gnu.org
2022-07-26 13:01 ` rguenth at gcc dot gnu.org
2023-05-08 12:23 ` [Bug tree-optimization/104368] [12/13/14 " rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-104368-4-AD49ZbRXIH@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).