From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2756C3857805; Tue, 12 Apr 2022 17:29:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2756C3857805 From: "burnus at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/105246] New: [amdgcn] Use library call for SQRT with -ffast-math + provide additional option to use single-precsion opcode Date: Tue, 12 Apr 2022 17:29:06 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: documentation, wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: burnus at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter cc target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Apr 2022 17:29:07 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105246 Bug ID: 105246 Summary: [amdgcn] Use library call for SQRT with -ffast-math + provide additional option to use single-precsion opcode Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: documentation, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: ams at gcc dot gnu.org Target Milestone: --- Target: amdgcn-amdhsa AMD GCN hardware has opcodes which operate on double-precision variables as input/output but internally only do single-precision operation. This affects (currently) only "sqrt" which for -funsafe-math-optimizations (implied by -Ofast / -ffast-math) uses AMDGCN's "v_sqrt". Namely gcc/config/gcn/gcn-valu.md has: (define_insn "sqrt2" ... "flag_unsafe_math_optimizations" "v_sqrt%i0\t%0, %1" Thus: while "v_sqrt" works on double-precision variables, it only calculates with 23bits (as with float32) instead of 52bits (as float64 provides) for t= he fractional part of the floating-point number. PROBLEM: In many cases, this loss of precision by an order of 100,000,000 (= 10=E2=81=B8 / 2=C2=B2=E2=81=B9) is very unexpected and too much for code which requires double precision. An U= LP of 4 is expected not an ULP of 10=E2=81=B8! In particular: In order to permit several optimizations, -Ofast or --fast-m= ath is commonly recommended and the precision loss is unexpected. In terms of testsuites, OvO's sqrt examples are effected, requiring a way higher OVO_TOL_ULP to pass (=E2=86=92 https://github.com/TApplencourt/OvO ) But the issue really came up when discussion with HPC code users. EXPECTED: - By default, with -ffast-math, do the double-precision operation by a libr= ary call - Provide some (GCN-specific) -m... flag to do those calculations in single precision. For instance something like: -mpermit-reduced-precision Use hardware intrinsics instead of library even if they provide a much reduced precision. Example: use v_sqrt with double-precision variables even though the hardware only provides single-precision results for the fractional part of the floating-point variable. (Default: disabled)=