public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/104368] [12 Regression] Failure to vectorise conditional grouped accesses after PR102659 Date: Fri, 04 Feb 2022 07:58:14 +0000 [thread overview] Message-ID: <bug-104368-4-AD49ZbRXIH@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-104368-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104368 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Last reconfirmed| |2022-02-04 Status|UNCONFIRMED |NEW CC| |amacleod at redhat dot com --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. On x86 with AVX2 we don't get this vectorized anymore for the same reason. t.c:5:15: missed: failed: evolution of base is not affine. base_address: offset from base address: constant offset from base address: step: base alignment: 0 base misalignment: 0 offset alignment: 0 step alignment: 0 base_object: *_8 Creating dr for *_12 if-conversion now produces ... _47 = (unsigned long) y_21(D); .. # i_26 = PHI <i_23(8), 0(15)> _1 = (long unsigned int) i_26; _2 = _1 * 4; _3 = x_20(D) + _2; _4 = *_3; _45 = (unsigned int) i_26; _46 = _45 * 2; _5 = (int) _46; _6 = (long unsigned int) _5; _7 = _6 * 4; _48 = _47 + _7; _8 = (int *) _48; _49 = _4 > 0; _9 = .MASK_LOAD (_8, 32B, _49); _10 = _6 + 1; _11 = _10 * 4; _51 = _11 + _47; _12 = (int *) _51; _13 = .MASK_LOAD (_12, 32B, _49); _52 = (unsigned int) _9; _53 = (unsigned int) _13; _54 = _52 + _53; _14 = (int) _54; .MASK_STORE (_3, 32B, _49, _14); i_23 = i_26 + 1; if (n_19(D) > i_23) goto <bb 8>; [89.00%] else goto <bb 6>; [11.00%] note that if-conversion is correct in rewriting i*2 and i*2 + 1 to unsigned arithmetic since that will now execute unconditionally and can overflow. In the end the issue is that the multiplication by the element size is done in sizetype and so y[i*2] and y[i*2+1] might not be adjacent. What we miss is that iff the stmts were executed then because of undefined overflow they will always be adjacent. IMHO the only good way to recover is to scrap the separate if-conversion step and do vectorization on the original IL. Or integrate the two passes as much as to allow dataref analysis on the not if-converted IL. Another possibility (and long-standing TODO) is to teach SCEV analysis to derive assumptions we can version the loop on - in this case that i*2 + 1 does not overflow. Note in this particular case we probably miss to see that i is in [0,INT_MAX-1] and thus (unsigned)i * 2 + 1 never wraps (unless I miss something). We have <bb 3> [local count: 955630226]: # RANGE [0, 2147483647] NONZERO 2147483647 # i_26 = PHI <i_23(8), 0(15)> # RANGE [0, 2147483646] NONZERO 2147483647 _1 = (long unsigned int) i_26; # RANGE [0, 8589934584] NONZERO 8589934588 _2 = _1 * 4; # PT = null { D.2435 } (nonlocal, restrict) _3 = x_20(D) + _2; _4 = MEM[(int *)_3 clique 1 base 1]; _45 = (unsigned int) i_26; _46 = _45 * 2; _5 = (int) _46; _6 = (long unsigned int) _5; _7 = _6 * 4; _48 = _47 + _7; so unfortunately while _1 has that correct range, i_26 does not and the ifcvt generated stmts don't either. It might be possible to throw ranger on the if-converted body. Andrew - if we'd like to do that, in tree-if-conv.cc in tree_if_conversion () after we've produced the final IL (after the call to ifcvt_hoist_invariants), is there a way to invoke ranger on the stmts of the (single-BB) loop and have it adjust the global ranges? In particular - see above, it would need to somehow improve the global range of the i_26 IV. The pass creates blocks and destroys edges, so I'm not sure if we can reasonably use a caching instance over its lifetime so cost per loop would be a limiting factor.
next prev parent reply other threads:[~2022-02-04 7:58 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-02-03 14:51 [Bug tree-optimization/104368] New: " rsandifo at gcc dot gnu.org 2022-02-03 23:42 ` [Bug tree-optimization/104368] " pinskia at gcc dot gnu.org 2022-02-04 7:58 ` rguenth at gcc dot gnu.org [this message] 2022-05-06 8:32 ` [Bug tree-optimization/104368] [12/13 " jakub at gcc dot gnu.org 2022-07-26 13:01 ` rguenth at gcc dot gnu.org 2023-05-08 12:23 ` [Bug tree-optimization/104368] [12/13/14 " rguenth at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-104368-4-AD49ZbRXIH@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).