From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9F1DA385480D; Fri, 4 Dec 2020 12:42:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9F1DA385480D From: "alexander.grund@tu-dresden.de" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 Date: Fri, 04 Dec 2020 12:42:04 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 8.3.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: alexander.grund@tu-dresden.de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Dec 2020 12:42:04 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98140 Bug ID: 98140 Summary: Reused register by xsmincdp leads to wrong NaN propagation on Power9 Product: gcc Version: 8.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: alexander.grund@tu-dresden.de Target Milestone: --- Created attachment 49679 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D49679&action=3Dedit (preprocessed) source code to reproduce issue Summary: xsmincdp instructions are generated in a form like `xsmincdp b,a,b` for code that looks like `(a > b) ? b : a` I was debugging an issue in PyTorch (https://github.com/pytorch/pytorch/issues/48591) where I encountered the following problem: A clamp function is used which looks like this: c[i] =3D a[i] < min_vec[i] ? min_vec[i] : (a[i] > max_vec[i] ? max_vec[i] : a[i]); This is used in very complex code using multiple levels of C++ templates, lambdas and such and uses a combination of manually unrolled loops and unroll-friendly loops (i.e. the above is called in a loop with a fixed trip count of 8) The generated ASM code has this (using objdump): c[i] =3D a[i] < min_vec[i] ? min_vec[i] : (a[i] > max_vec[i] ? max_vec[= i] : a[i]); 8e970: 20 00 fe cb lfd f31,32(r30) 8e974: 00 f8 9c ff fcmpu cr7,f28,f31 8e978: 0c 00 9c 41 blt cr7,8e984=20 8e97c: 40 00 fe cb lfd f31,64(r30) 8e980: 40 fc fc f3 xsmincdp vs31,vs28,vs31 8e984: 28 00 9e cb lfd f28,40(r30) So I assume f28/vs28 contains a[i] and vs31 contains max_vec[i], so the instruction generated looks like `xsmincdp max_vec,a,max_vec` which on NaN = will return max_vec. However in the source code a should be returned due to the condition evaluating to false when a NaN is involved. Reproducing this is tricky, as it depends on many conditions. From my observations I assume some register pressure is required and even some other function also calling that code, so maybe some side effects from there. Usi= ng GCC 10.2.0 I wasn't able to reproduce this as the codegen is slightly different: Seemingly it notices that max_vec contains the same value for al= l i and reuses a single register: 324: 00 70 1f fc fcmpu cr0,f31,f14 328: 90 f8 a0 fe fmr f21,f31 32c: 08 00 81 41 bgt 334 330: 40 74 be f2 xsmincdp vs21,vs30,vs14 334: 00 78 1f fc fcmpu cr0,f31,f15 338: 90 f8 c0 fe fmr f22,f31 33c: 08 00 81 41 bgt 344 340: 40 7c de f2 xsmincdp vs22,vs30,vs15 I'm attaching some source code which can be compiled using PyTorch 1.7.0 an= d 2 examples of preprocessed code which yield the above when compiled using `g+= +=20 -mcpu=3Dpower9 -g -std=3Dgnu++14 -O3`=