From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9A6BE3857005; Sat, 15 Apr 2023 10:10:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9A6BE3857005 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681553404; bh=p2KzIrWR7ZBIcNE1iCIHcpMSfhS+bR58BC39b9Y+c7E=; h=From:To:Subject:Date:In-Reply-To:References:From; b=RyGuUKbbOcDlYP/9v78Wz6N8YF2XKHuJEVzZnHXYoG75AwP5tBH4RGNxZbF0kNIVS bYrvye6hDY/KplSDH3P7AB50oeit/I/6crkCWjrYQmc5wfKBkBO1HZnTOgg8WgUUnk yy0sLpqQyYsJfuWyBBXcNuviYeeU4NOIwkM+kaYE= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons Date: Sat, 15 Apr 2023 10:10:02 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154 --- Comment #52 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:de0ee9d14165eebb3d31c84e98260c05c3b33acb commit r13-7192-gde0ee9d14165eebb3d31c84e98260c05c3b33acb Author: Jakub Jelinek Date: Sat Apr 15 12:08:45 2023 +0200 if-conv: Small improvement for expansion of complex PHIs [PR109154] The following patch is just a dumb improvement, gets rid of 2 unnecessa= ry instructions on both the PR's original testcase and on the two reduced ones, both on -mcpu=3Dneoverse-v1 and -mavx512f. The thing is, if we have args_len (args_len >=3D 2) unique PHI argument= s, we need only args_len - 1 COND_EXPRs to expand the PHI, because first COND_EXPR can merge 2 unique arguments and all the following ones merge another unique argument with the previously merged arguments, while the code for mysterious reasons was always emitting args_len COND_EXPRs, where the first COND_EXPR merged the first and second unique arguments, the second COND_EXPR merged the second unique argument with result of merging the first and second unique arguments and the rest was already expectable, nth COND_EXPR for n > 2 merged the nth unique argum= ent with result of merging the previous unique arguments. Now, in my understanding, the bb_predicate for bb's predecessor need to form a disjunct set which together creates the successor's bb_predicate, so I don't see why we'd need to check all the bb_predicates, if we check all but one then when all those other ones are false the last bb_predic= ate is necessarily true. Given that the code attempts to sort argument with most occurrences (so likely most complex combined predicate) last, I ch= ose not to test that last argument's predicate. So e.g. on the testcase from comment 47 in the PR: void foo (int *f, int d, int e) { for (int i =3D 0; i < 1024; i++) { int a =3D f[i]; int t; if (a < 0) t =3D 1; else if (a < e) t =3D 1 - a * d; else t =3D 0; f[i] =3D t; } } we used to emit: _7 =3D a_10 < 0; _21 =3D a_10 >=3D 0; _22 =3D a_10 < e_11(D); _23 =3D _21 & _22; _26 =3D a_10 >=3D e_11(D); _27 =3D _21 & _26; _ifc__42 =3D _7 ? 1 : t_13; _ifc__43 =3D _23 ? t_13 : _ifc__42; t_6 =3D _27 ? 0 : _ifc__43; while the following patch changes it to: _7 =3D a_10 < 0; _21 =3D a_10 >=3D 0; _22 =3D a_10 < e_11(D); _23 =3D _21 & _22; _ifc__42 =3D _23 ? t_13 : 0; t_6 =3D _7 ? 1 : _ifc__42; which I believe should be sufficient for a PHI <1, t_13, 0>. I've gathered some statistics and on x86_64-linux and i686-linux bootstraps/regtests, this code triggers: 92 4 4 112 2 4 141 3 4 4046 3 3 (where 2nd number is args_len and 3rd argument EDGE_COUNT (bb->preds) and first argument count of those from sort | uniq -c | sort -n). In all these cases the patch should squeze one extra COND_EXPR and its associated predicate (the latter only if it wasn't used elsewhere). Incrementally, I think we should try to perform some analysis on which predicates depend on inverses of other predicates and if possible try to sort the arguments better and omit testing unnecessary predicates. So essentially for the above testcase deconstruct it back to: _7 =3D a_10 < 0; _22 =3D a_10 < e_11(D); _ifc__42 =3D _22 ? t_13 : 0; t_6 =3D _7 ? 1 : _ifc__42; which is like what this patch produces, but with the & a_10 >=3D 0 part removed, because the last predicate is a_10 < 0 and so testing a_10 >= =3D 0 on what appears on the false branch doesn't make sense. But I'm afraid that will take more work than is doable in stage4 right = now. 2023-04-15 Jakub Jelinek PR tree-optimization/109154 * tree-if-conv.cc (predicate_scalar_phi): For complex PHIs, emit just args_len - 1 COND_EXPRs rather than args_len. Formatting fix.=