From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C88CF385841E; Tue, 12 Mar 2024 09:59:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C88CF385841E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1710237578; bh=Oo8PgptPwTpJVESwYATqvjnOyaEwJK9T5bs8g+PjKj8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=tE6oFmMEPPZwPZXrpSw5ABf65rg/ZajjOo/eL5IQx6V0YN7ooyj0viKWTuK62/+Q5 N8PgkyG/B71nEbRCcQ0w0iTojolRZEx4OFUsVps8X5Tp6Tn/I53+Q01dKXvSoGknTd kX8PQBtRm4yBUIWbBZj+SVqoGhmhCbRPPWdWD9tg= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193 Date: Tue, 12 Mar 2024 09:59:33 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114151 --- Comment #19 from Richard Biener --- So what remains here is differences like - (chrec =3D {(long unsigned int) (col_stride_10 * _105), +, (long unsigne= d int) col_stride_10}_2) + (chrec =3D (long unsigned int) (int) {(unsigned int) col_stride_10 * (un= signed int) _105, +, (unsigned int) col_stride_10}_2) where we can't pull the sign-extension inside the CHREC because it might overflow. And (set_scalar_evolution=20 instantiated_below =3D 22=20 (scalar =3D _59) - (scalar_evolution =3D {(long unsigned int) (col_stride_10 * _105) * 2, +, (long unsigned int) col_stride_10 * 2}_2)) + (scalar_evolution =3D _59)) +) which is failure to analyze at all. This one looks like [local count: 118111600]: # col_stride_10 =3D PHI if (size_15(D) > 0) goto ; [89.00%] else goto ; [11.00%] [local count: 118111600]: return; ... [local count: 343854870]: # RANGE [irange] int [0, 2147483646] # j_73 =3D PHI <_105(22), _68(19)> ... col_i_61 =3D col_stride_10 * j_73; # RANGE [irange] long unsigned int [0, 2147483647][18446744071562067968, +INF] _60 =3D (long unsigned int) col_i_61; # RANGE [irange] long unsigned int [0, 4294967294][18446744069414584320, 18446744073709551614] MASK 0xfffffffffffffffe VALUE 0x0 _59 =3D _60 * 2; j_73 is {_105, +, 1}_2 col_i_61 is (int) {(unsigned int) col_stride_10 * (unsigned int) _105, +, (unsigned int) col_stride_10}_2 _60 is (long unsigned int) (int) {(unsigned int) col_stride_10 * (unsigned = int) _105, +, (unsigned int) col_stride_10}_2 and on the _60 * 2 multiply we fail. When applying Andrews proposed patch this doesn't help since the range of col_stride_10 can only conditionally be adjusted to positive. SCEV caches a scalar evolution based on SSA_NAME and 'instantiated below' block which is "block_before_loop" which is a loops preheader or the function ENTRY block for analyses of scalars in the loop tree root. A conservative context for analysis of the SCEV might be 1) the definition stmt of the SSA name 2) the instantiated-below block (on-exit ranges of it) With doing 2) by feeding the last stmt of the block as context (when the block is empty that won't work :/) the testcase is optimized again when I discard the SCEV cache at the start of IVOPTs and wrap IVOPTs in a ranger instance. While ranger has a range_on_exit API this doesn't work on GENERIC expressio= ns as far as I can see but only SSA names but I guess that could be "fixed" given range_on_exit also looks at the last stmt and eventually defers to range_of_expr (or range_on_entry), but possibly get_tree_range needs variants for on_entry/on_exit (it doesn't seem to use it's 'stmt' context very consistently, notably not for SSA_NAMEs ...). Interestingly enough we somehow still need the diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index c16b776c1e3..c0eda5fc51d 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -102,7 +102,15 @@ gimple_ranger::range_of_expr (vrange &r, tree expr, gi= mple *stmt) if (!stmt) { Value_Range tmp (TREE_TYPE (expr)); - m_cache.get_global_range (r, expr); + // If there is no global range for EXPR yet, try to evaluate it. + // THis call does set R to a global range regardless. + if (!m_cache.get_global_range (r, expr)) + { + gimple *s =3D SSA_NAME_DEF_STMT (expr); + // Calculate a range for S if it is safe to do so. + if (s && gimple_bb (s) && gimple_get_lhs (s) =3D=3D expr) + return range_of_stmt (r, s); + } // Pick up implied context information from the on-entry cache // if current_bb is set. Do not attempt any new calculations. if (current_bb && m_cache.block_range (tmp, current_bb, expr, false)) hunk of Andrews patch to do it :/ There's one other detail - the problematical multiply folding is col_stride_10 * {_105, +, 1}_2 I'm thinking that similar to CHREC_LEFT =3D=3D 0 we can handle CHREC_RIGHT = =3D=3D 1 without unsigned promotion. In the second iteration we are replacing (_105 + 1) * col_stride_10 with _105 * col_stride_10 + col_stride_10 but we know already that _105 * col_stride_10 doesn't overflow as we computed that in the first iteration. And 1 * X never overflows. The third iteration is problematic - we don't know whether 2 * col_stride_10 overflows if _105 was zero, if it was not it might have been -1 which means the second iteration computed 0 * col_stride_10 originally. Hmm, so _105 =3D=3D -1 is problematic, so no - I don't think we can handle CHREC_RIGHT =3D=3D 1 specially.=