From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id C5F46385840E for ; Thu, 7 Oct 2021 23:40:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C5F46385840E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 197Nd7ZX029541; Thu, 7 Oct 2021 18:39:07 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 197Nd6KM029540; Thu, 7 Oct 2021 18:39:06 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Thu, 7 Oct 2021 18:39:06 -0500 From: Segher Boessenkool To: "Paul A. Clarke" Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com Subject: Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics Message-ID: <20211007233906.GQ10333@gate.crashing.org> References: <20210823190310.1679905-1-pc@us.ibm.com> <20210823190310.1679905-2-pc@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210823190310.1679905-2-pc@us.ibm.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 23:40:08 -0000 On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote: > No attempt is made to optimize writing the FPSCR (by checking if the new > value would be the same), other than using lighter weight instructions > when possible. __builtin_set_fpscr_rn makes optimised code (using mtfsb[01]) automatically, fwiw. > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and > convert _mm_ceil* and _mm_floor* into macros. This matches the current > analogous implementations in config/i386/smmintrin.h. Hrm. Using function-like macros is begging for trouble, as usual. But the x86 version does this, so meh. > +extern __inline __m128d > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_round_pd (__m128d __A, int __rounding) > +{ > + __v2df __r; > + union { > + double __fr; > + long long __fpscr; > + } __enables_save, __fpscr_save; > + > + if (__rounding & _MM_FROUND_NO_EXC) > + { > + /* Save enabled exceptions, disable all exceptions, > + and preserve the rounding mode. */ > +#ifdef _ARCH_PWR9 > + __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr)); The __volatile__ does likely not do what you want. As far as I can see you do not want one here anyway? "volatile" does not order asm wrt fp insns, which you likely *do* want. > + __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] }; You put spaces after only some casts, btw? Well maybe I found the one place you did it wrong, heh :-) And you can avoid having so many parens by making extra variables -- much more readable. > + switch (__rounding) You do not need any of that __ either. > +/* { dg-do run } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-O2 -mvsx" } */ "dg-do run" requires vsx_hw, not just vsx_ok. Testing on a machine without VSX (so before p7) would have shown that, but do you have access to any? This is one of those things we are only told about a year after it was added, because no one who tests often does that on so old hardware :-) So, okay for trunk (and backports after some burn-in) with that vsx_ok fixed. That asm needs fixing, but you can do that later. Thanks! Segher