From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 98EE03857721; Fri, 14 Jul 2023 10:22:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 98EE03857721 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689330176; bh=gI+f30sVROox+GALbGk6lI0fNZApMGyWBJLerIoq/wI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=qlKCNO1ZWWOQDgC8GDd7LMzGOGH2jdIaG/9Qoilx5zfIC4hZiza/x/OQ9vKv7Z5/4 0F+AWgvcsNSYa1MfK5jWlVFZEe4RaObnGcsFK7NsIYth5Q5WX6odEoAADXlfj12XQA jANsCsMPTvIlETpbzoylq47cODsfdE2N1TL3dwU0= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/109154] [13/14 regression] jump threading de-optimizes nested floating point comparisons Date: Fri, 14 Jul 2023 10:22:55 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.2 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154 --- Comment #67 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:9ed4fcfe47f28b36c73d74109898514ef4da00fb commit r14-2517-g9ed4fcfe47f28b36c73d74109898514ef4da00fb Author: Tamar Christina Date: Fri Jul 14 11:21:46 2023 +0100 ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154] This patch builds on the previous patch by fixing another issue with the way ifcvt currently picks which branches to test. The issue with the current implementation is while it sorts for occurrences of the argument, it doesn't check for complexity of the arguments. As an example: [local count: 528603100]: ... if (distbb_75 >=3D 0.0) goto ; [59.00%] else goto ; [41.00%] [local count: 216727269]: ... goto ; [100.00%] [local count: 311875831]: ... if (distbb_75 < iftmp.0_98) goto ; [20.00%] else goto ; [80.00%] [local count: 62375167]: ... [local count: 528603100]: # prephitmp_175 =3D PHI <_173(18), 0.0(17), _174(16)> All tree arguments to the PHI have the same number of occurrences, name= ly 1, however it makes a big difference which comparison we test first. Sorting only on occurrences we'll pick the compares coming from BB 18 a= nd BB 17, This means we end up generating 4 comparisons, while 2 would have been enough. By keeping track of the "complexity" of the COND in each BB, (i.e. the number of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and using a key tuple of we end up selecting the compare from BB 16 and BB 18 first. BB 16 only requires 1 compare, and BB 18, after we test BB 16 also only requires one additional compare. This change pair= ed with the one previous above results in the optimal 2 compares. For deep nesting, i.e. for ... _79 =3D vr_15 > 20; _80 =3D _68 & _79; _82 =3D vr_15 <=3D 20; _83 =3D _68 & _82; _84 =3D vr_15 < -20; _85 =3D _73 & _84; _87 =3D vr_15 >=3D -20; _88 =3D _73 & _87; _ifc__111 =3D _55 ? 10 : 12; _ifc__112 =3D _70 ? 7 : _ifc__111; _ifc__113 =3D _85 ? 8 : _ifc__112; _ifc__114 =3D _88 ? 9 : _ifc__113; _ifc__115 =3D _45 ? 1 : _ifc__114; _ifc__116 =3D _63 ? 3 : _ifc__115; _ifc__117 =3D _65 ? 4 : _ifc__116; _ifc__118 =3D _83 ? 6 : _ifc__117; _ifc__119 =3D _60 ? 2 : _ifc__118; _ifc__120 =3D _43 ? 13 : _ifc__119; _ifc__121 =3D _75 ? 11 : _ifc__120; vw_1 =3D _80 ? 5 : _ifc__121; Most of the comparisons are still needed because the chain of occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >=3D -20 a= nd _85 is _73 & vr_15 < -20. clearly given _73 needs to be true in both branches, the only additional test needed is on vr_15, where the one test is the negation of the other. So we don't need to do the comparison of _73 twice. The changes in the patch reduces the overall number of compares by one,= but has a bigger effect on the dependency chain. Previously we would generate 5 instructions chain: cmple p7.s, p4/z, z29.s, z30.s cmpne p7.s, p7/z, z29.s, #0 cmple p6.s, p7/z, z31.s, z30.s cmpge p6.s, p6/z, z27.s, z25.s cmplt p15.s, p6/z, z28.s, z21.s as the longest chain. With this patch we generate 3: cmple p7.s, p3/z, z27.s, z30.s cmpne p7.s, p7/z, z27.s, #0 cmpgt p7.s, p7/z, z31.s, z30.s and I don't think (x <=3D y) && (x !=3D 0) && (z > y) can be reduced fu= rther. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (INCLUDE_ALGORITHM): Include. (struct bb_predicate): Add no_predicate_stmts. (set_bb_predicate): Increase predicate count. (set_bb_predicate_gimplified_stmts): Conditionally initialize no_predicate_stmts. (get_bb_num_predicate_stmts): New. (init_bb_predicate): Initialzie no_predicate_stmts. (release_bb_predicate): Cleanup no_predicate_stmts. (insert_gimplified_predicates): Preserve no_predicate_stmts. gcc/testsuite/ChangeLog: PR tree-optimization/109154 * gcc.dg/vect/vect-ifcvt-20.c: New test.=