From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CF3E03858D35; Fri, 4 Feb 2022 07:58:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CF3E03858D35 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/104368] [12 Regression] Failure to vectorise conditional grouped accesses after PR102659 Date: Fri, 04 Feb 2022 07:58:14 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: everconfirmed cf_reconfirmed_on bug_status cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Feb 2022 07:58:14 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104368 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Last reconfirmed| |2022-02-04 Status|UNCONFIRMED |NEW CC| |amacleod at redhat dot com --- Comment #1 from Richard Biener --- Confirmed. On x86 with AVX2 we don't get this vectorized anymore for the s= ame reason. t.c:5:15: missed: failed: evolution of base is not affine. base_address: offset from base address: constant offset from base address: step: base alignment: 0 base misalignment: 0 offset alignment: 0 step alignment: 0 base_object: *_8 Creating dr for *_12 if-conversion now produces ... _47 =3D (unsigned long) y_21(D); .. # i_26 =3D PHI _1 =3D (long unsigned int) i_26; _2 =3D _1 * 4; _3 =3D x_20(D) + _2; _4 =3D *_3; _45 =3D (unsigned int) i_26; _46 =3D _45 * 2; _5 =3D (int) _46; _6 =3D (long unsigned int) _5; _7 =3D _6 * 4; _48 =3D _47 + _7; _8 =3D (int *) _48; _49 =3D _4 > 0; _9 =3D .MASK_LOAD (_8, 32B, _49); _10 =3D _6 + 1; _11 =3D _10 * 4; _51 =3D _11 + _47; _12 =3D (int *) _51; _13 =3D .MASK_LOAD (_12, 32B, _49); _52 =3D (unsigned int) _9; _53 =3D (unsigned int) _13; _54 =3D _52 + _53; _14 =3D (int) _54; .MASK_STORE (_3, 32B, _49, _14); i_23 =3D i_26 + 1; if (n_19(D) > i_23) goto ; [89.00%] else goto ; [11.00%] note that if-conversion is correct in rewriting i*2 and i*2 + 1 to unsigned arithmetic since that will now execute unconditionally and can overflow. In the end the issue is that the multiplication by the element size is done in sizetype and so y[i*2] and y[i*2+1] might not be adjacent. What we miss is that iff the stmts were executed then because of undefined overf= low they will always be adjacent. IMHO the only good way to recover is to scrap the separate if-conversion st= ep and do vectorization on the original IL. Or integrate the two passes as much as to allow dataref analysis on the not if-converted IL. Another possibility (and long-standing TODO) is to teach SCEV analysis to derive assumptions we can version the loop on - in this case that i*2 + 1 does not overflow. Note in this particular case we probably miss to see that i is in [0,INT_MAX-1] and thus (unsigned)i * 2 + 1 never wraps (unless I miss something). We have [local count: 955630226]: # RANGE [0, 2147483647] NONZERO 2147483647 # i_26 =3D PHI # RANGE [0, 2147483646] NONZERO 2147483647 _1 =3D (long unsigned int) i_26; # RANGE [0, 8589934584] NONZERO 8589934588 _2 =3D _1 * 4; # PT =3D null { D.2435 } (nonlocal, restrict) _3 =3D x_20(D) + _2; _4 =3D MEM[(int *)_3 clique 1 base 1]; _45 =3D (unsigned int) i_26; _46 =3D _45 * 2; _5 =3D (int) _46; _6 =3D (long unsigned int) _5; _7 =3D _6 * 4; _48 =3D _47 + _7; so unfortunately while _1 has that correct range, i_26 does not and the ifcvt generated stmts don't either. It might be possible to throw ranger on the if-converted body. Andrew - if we'd like to do that, in tree-if-conv.cc in tree_if_conversion = () after we've produced the final IL (after the call to ifcvt_hoist_invariants= ), is there a way to invoke ranger on the stmts of the (single-BB) loop and have it adjust the global ranges? In particular - see above, it would need to somehow improve the global range of the i_26 IV. The pass creates blocks and destroys edges, so I'm not sure if we can reasonably use a caching instance over its lifetime so cost per loop would be a limiting factor.=