From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 8FC533858D28 for ; Thu, 6 Apr 2023 20:39:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8FC533858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 336Kc11e027032; Thu, 6 Apr 2023 15:38:01 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 336Kc0aP027031; Thu, 6 Apr 2023 15:38:00 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Thu, 6 Apr 2023 15:37:59 -0500 From: Segher Boessenkool To: Michael Meissner , gcc-patches@gcc.gnu.org, "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt , chip.kerchner@ibm.com Subject: Re: PR target/70243: Do not generate fmaddfp and fnmsubfp Message-ID: <20230406203759.GK25951@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,KAM_MANYTO,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote: > The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors Those are not existing instructions. You mean "vmaddfp" etc. > than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating > these instructions seems to break Eigen. Those instructions use round-to-nearest-tiea-to-even, like all other VMX FP insns. A proper patch has to deal with all VMX FP insns. But, almost all programs expect that rounding mode anyway, so this is not a problem in practice. What happened on Eigen is that the Linux kernel starts every new process with VSCR[NJ]=1, breaking pretty much everything that wants floating point for non-toy purposes. (There currently is a bug on LE that sets the wrong bit, hiding the problem in that configuration, but it is intended there as well). > GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems > as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions. The > advantage of the Altivec instructions is that they are 4 operand instructions > (i.e. the target register does not have to overlap with one of the input > registers). The advantage is it can eliminate an extra move instruction. The > disadvantage is it does round the same was as the VSX instructions. And it gets the VSCR[NJ] setting applied. Yup. > This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp > instructions as alternatives in the VSX instruction insn support, and in the > Altivec insns it adds a test to prevent the insn from being used if VSX is > available. I also added a test to the regression test suite. Please leave the latter out, it does not belong in this patch. If you want a patch to do that deal with *all* VMX FP insns? There also are add, sub, mul, etc. Well I think those (as well as madd and nmsub) are the only ones that use the NJ bit or the RN bits, but please check. > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -750,12 +750,15 @@ (define_insn "altivec_vsel4" > > ;; Fused multiply add. > > +;; If we are using VSX instructions, do not generate the vmaddfp instruction > +;; since is has different rounding behavior than the xvmaddsp instruction. > + No blank lines please. > (define_insn "*altivec_fmav4sf4" > [(set (match_operand:V4SF 0 "register_operand" "=v") > (fma:V4SF (match_operand:V4SF 1 "register_operand" "v") > (match_operand:V4SF 2 "register_operand" "v") > (match_operand:V4SF 3 "register_operand" "v")))] > - "VECTOR_UNIT_ALTIVEC_P (V4SFmode)" > + "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX" This is very error-prone. Maybe add a test to the VECTOR_UNIT_ALTIVEC macro instead? > -;; Fused vector multiply/add instructions. Support the classical Altivec > -;; versions of fma, which allows the target to be a separate register from the > -;; 3 inputs. Under VSX, the target must be either the addend or the first > -;; multiply. > +;; Fused vector multiply/add instructions. Do not use the classical Altivec (Two spaces after dot, and AltiVec is spelled with a capital V. I don't like it either, VMX is a much nicer and more regular name). > +;; versions of fma. Those instructions allows the target to be a separate > +;; register from the 3 inputs, but they have different rounding behaviors. > > (define_insn "*vsx_fmav4sf4" > - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v") > + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa") > (fma:V4SF > - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v") > - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v") > - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))] > + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa") > + (match_operand:V4SF 2 "vsx_register_operand" "wa,0") > + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))] > "VECTOR_UNIT_VSX_P (V4SFmode)" > "@ > xvmaddasp %x0,%x1,%x2 > - xvmaddmsp %x0,%x1,%x3 > - vmaddfp %0,%1,%2,%3" > + xvmaddmsp %x0,%x1,%x3" > [(set_attr "type" "vecfloat")]) So this part looks okay, and it alone is safe for GCC 13 as well. > (define_insn "*vsx_nfmsv4sf4" > - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v") > + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa") > (neg:V4SF > (fma:V4SF > - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v") > - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v") > + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa") > + (match_operand:V4SF 2 "vsx_register_operand" "wa,0") > (neg:V4SF > - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))] > + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))] > "VECTOR_UNIT_VSX_P (V4SFmode)" > "@ > xvnmsubasp %x0,%x1,%x2 > - xvnmsubmsp %x0,%x1,%x3 > - vnmsubfp %0,%1,%2,%3" > + xvnmsubmsp %x0,%x1,%x3" > [(set_attr "type" "vecfloat")]) Well, together with this of course :-) Could you please do that? Segher