From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 58C263858C20; Thu, 13 Apr 2023 17:25:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58C263858C20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681406749; bh=771VvVW9smbb85I5Jq+q8sZ1ZmR4bMGwypF+r3UWxBE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=s/w6hBvaldseu9TZvdMzc0y6aGSEB5Q4LBmjm6dbLujRt7XwgIQxlaAXKNsqbTKiU oKSxhxBe8phpVVd0nqt7161zjvKGU+3rrYM3V5zRWfZ1Vz4UowCd2q2rL0vlb+8uf8 8G89j/xVk/8K2L7s2RctJRIlC0CJIyiAnRyszWz8= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons Date: Thu, 13 Apr 2023 17:25:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154 --- Comment #46 from rguenther at suse dot de --- Am 13.04.2023 um 18:54 schrieb jakub at gcc dot gnu.org : >=20 > =EF=BB=BFhttps://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154 >=20 > --- Comment #45 from Jakub Jelinek --- > So, would > void > foo (float *f, float d, float e) > { > if (e >=3D 2.0f && e <=3D 4.0f) > ; > else > __builtin_unreachable (); > for (int i =3D 0; i < 1024; i++) > { > float a =3D f[i]; > f[i] =3D (a < 0.0f ? 1.0f : 1.0f - a * d) * (a < e ? 1.0f : 0.0f); > } > } > be a better reduction on what's going on? > From the frange/threading POV, when e is in [2.0f, 4.0f] range, if a < 0.= 0f, we > know that a < e is also true, so there is no point in testing that at run= time. > So I think what threadfull1 does is right and desirable if the final code > actually performs those comparisons and uses conditional jumps. > The only thing is that it is harmful for vectorization and maybe for pred= icated > code. > Therefore, for scalar code at least without massive ARM style conditional > execution, > the above is better emitted as > if (a < 0.0f) > tmp =3D 1.0f; > else > { > tmp =3D (1.0f - a * d) * (a < e ? 1.0f : 0.0f); > } > or even > if (a < 0.0f) > tmp =3D 1.0f; > else if (a < e) > tmp =3D 1.0f - a * d; > else > tmp =3D 0.0f; > f[i] =3D tmp; > Thus, could we effectively try to undo it at ifcvt time on loops for > vectorization only, or during vectorization or something similar? > As ifcvt then turns the IMHO desirable > if (a_16 >=3D 0.0) > goto ; [59.00%] > else > goto ; [41.00%] >=20 > [local count: 435831803]: > goto ; [100.00%] >=20 > [local count: 627172605]: > _7 =3D a_16 * d_17(D); > iftmp.0_18 =3D 1.0e+0 - _7; > if (e_13(D) > a_16) > goto ; [20.00%] > else > goto ; [80.00%] >=20 > [local count: 125434523]: > goto ; [100.00%] >=20 > [local count: 501738082]: >=20 > [local count: 1063004410]: > # prephitmp_26 =3D PHI > (ok, the 2 empty forwarders are unlikely useful) into: > _7 =3D a_16 * d_17(D); > iftmp.0_18 =3D 1.0e+0 - _7; > _21 =3D a_16 >=3D 0.0; > _10 =3D e_13(D) > a_16; > _9 =3D _10 & _21; > _27 =3D e_13(D) <=3D a_16; > _28 =3D _21 & _27; > _ifc__43 =3D _9 ? iftmp.0_18 : 0.0; > _ifc__44 =3D _28 ? 0.0 : _ifc__43; > _45 =3D a_16 < 0.0; > prephitmp_26 =3D _45 ? 1.0e+0 : _ifc__44; > Now, perhaps if ifcvt used ranger, it could figure out that a_16 < 0.0 im= plies > e_13(D) > a_16 and do something smarter with it. > Or maybe just try to do smarter ifcvt just based on the original CFG. > The pre-ifcvt code was a_16 < 0.0f ? 1.0f : a_16 < e_13 ? 1.0f - a_16 * d= _17 : > 0.0f > so when ifcvt puts everything together, make it > _7 =3D a_16 * d_17(D); > iftmp.0_18 =3D 1.0e+0 - _7; > _27 =3D e_13(D) > a_16; > _28 =3D a_16 < 0.0; > _ifc__43 =3D _27 ? iftmp.0_18 : 0.0f; > prephitmp_26 =3D _28 ? 1.0f : _ifc__43; > ? Certainly improving what ifcvt produces for multiarg phis is desirable. I= =E2=80=99m not sure if undoing the threading is generally possible. > --=20 > You are receiving this mail because: > You are on the CC list for the bug.=