From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DD80C3858418; Sat, 3 Sep 2022 16:08:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD80C3858418 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1662221328; bh=EoPNL+XxyzUWrB4YLIjmfKWYIXY34YpJH7n/DWdGi+E=; h=From:To:Subject:Date:In-Reply-To:References:From; b=XJXVUZR0IaLsDJuQeRmSP340VG+KOb4MmmJP9Vvas+ifjm0wVV5wa3RqbXnu+AKWG 7f8qZj9sENwi5eduYb+7ZveyH71KEFkmgIt0b4qD7haHuWshuBu6j93G2JgaT9BDEE 3XxfVe4ltYXYzcewwaTdklOuc6YyWqZglA2GQhnI= From: "aldyh at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/91645] Missed optimization with sqrt(x*x) Date: Sat, 03 Sep 2022 16:08:48 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 9.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: aldyh at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D91645 Aldy Hernandez changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aldyh at gcc dot gnu.org, | |amacleod at redhat dot com, | |jakub at gcc dot gnu.org, | |law at gcc dot gnu.org --- Comment #7 from Aldy Hernandez --- > Now, the problem is that GCC doesn't seem to optimize away the call to sq= rtf > based on some surrounding code. As an example, it would be neat to have t= his > (or something similar) to get compiled into the same mulss-sqrtss-ret: >=20 > float test (float x)=20 > { > float y =3D x*x; > if (y >=3D 0.f) > return std::sqrt(y); > __builtin_unreachable(); > } >=20 > If I understand it correctly, the 'y >=3D 0.f' excludes 'y' being NaN and= 'y' > being negative (though this is excluded by 'y =3D x*x'), so there is no n= eed > to check if the argument to `std::sqrt` is any bad, enabling to just do > 'sqrtss' and return. I'm not an FP expert, but I think we have enough information to do this rig= ht now. The evrp dump now has: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BB 2 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Imports: y_2=20=20 Exports: y_2=20=20 : y_2 =3D x_1(D) * x_1(D); if (y_2 >=3D 0.0) goto ; [INV] else goto ; [INV] 2->3 (T) y_2 : [frange] float [0.0, Inf] !NAN=20 2->4 (F) y_2 : [frange] float [ -Inf, 0.0]=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BB 3 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D y_2 [frange] float [0.0, Inf] !NAN=20 : _6 =3D __builtin_sqrtf (y_2); return _6; Which means that y_2 is known to be [0.0, Inf] excluding a NAN. What needs to happen for the call to __builtin_sqrtf to be optimized to sqr= tss?=