From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 55A24385740A for ; Thu, 19 Aug 2021 18:16:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55A24385740A Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17JI7aDv063334; Thu, 19 Aug 2021 14:16:20 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ahsjcmumh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Aug 2021 14:16:20 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17JICAV3023819; Thu, 19 Aug 2021 18:16:19 GMT Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by ppma02dal.us.ibm.com with ESMTP id 3aeexygqsp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Aug 2021 18:16:19 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17JIGISY20316636 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 19 Aug 2021 18:16:18 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8756C124069; Thu, 19 Aug 2021 18:16:18 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 38508124058; Thu, 19 Aug 2021 18:16:18 +0000 (GMT) Received: from li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com (unknown [9.160.180.244]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTPS; Thu, 19 Aug 2021 18:16:18 +0000 (GMT) Date: Thu, 19 Aug 2021 13:16:16 -0500 From: "Paul A. Clarke" To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 1/6] rs6000: Support SSE4.1 "round" intrinsics Message-ID: <20210819181616.GA1113121@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> References: <20210809202355.568303-1-pc@us.ibm.com> <20210809202355.568303-2-pc@us.ibm.com> <20210818224658.GL1583@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210818224658.GL1583@gate.crashing.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 X-Proofpoint-GUID: _nkMc1yTRtIq36z4sAPUjTgoN8tnTHy_ X-Proofpoint-ORIG-GUID: _nkMc1yTRtIq36z4sAPUjTgoN8tnTHy_ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-19_06:2021-08-17, 2021-08-19 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxlogscore=999 clxscore=1015 suspectscore=0 bulkscore=0 spamscore=0 phishscore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108190105 X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Aug 2021 18:16:24 -0000 On Wed, Aug 18, 2021 at 05:46:58PM -0500, Segher Boessenkool wrote: > On Mon, Aug 09, 2021 at 03:23:50PM -0500, Paul A. Clarke wrote: > > Suppress exceptions (when specified), by saving, manipulating, and > > restoring the FPSCR. Similarly, save, set, and restore the floating-point > > rounding mode when required. > > > > No attempt is made to optimize writing the FPSCR (by checking if the new > > value would be the same), other than using lighter weight instructions > > when possible. > > There are __builtin_set_fpscr_rn and friends, please use those, those > are optimised for any platform. I do. (Unless I missed an opportunity somewhere?) The "optimize" comment refers to, for example, not checking the current rounding mode before setting and restoring it. > > * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, > > _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss): > > Convert from function to macro. > > Please explain why you regress this (not in the changelog of course). I'm not sure what "regress" means here? I should've said that these are now identical implementations to those found in config/i386/smmintrin.h. I'll add that to the commit message in v2. > > +/* Rounding mode macros. */ > > +#define _MM_FROUND_TO_NEAREST_INT 0x00 > > +#define _MM_FROUND_TO_ZERO 0x01 > > +#define _MM_FROUND_TO_POS_INF 0x02 > > +#define _MM_FROUND_TO_NEG_INF 0x03 > > +#define _MM_FROUND_CUR_DIRECTION 0x04 > > You can just write "0" .. "4", heh. Copied from config/i386/smmintrin.h. > > +#define _MM_FROUND_NINT \ > > + (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC) > > +#define _MM_FROUND_FLOOR \ > > + (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC) > > +#define _MM_FROUND_CEIL \ > > + (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC) > > +#define _MM_FROUND_TRUNC \ > > + (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC) > > +#define _MM_FROUND_RINT \ > > + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC) > > +#define _MM_FROUND_NEARBYINT \ > > + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) > > All these macro definitions will comfortably fit on one line. Copied from config/i386/smmintrin.h. > > +__inline __m128d > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > +_mm_round_pd (__m128d __A, int __rounding) > > +{ > > Non-static inline is not what you want, esp. with gnu-inline? Or, what > is the goal, and why can you not do it with modern inline? This is the same basic signature as the other 600+ intrinsics. Actually, they were all described as "extern", but in a previous review, you said: > "extern" on definitions is superfluous So, I've dropped that for newer ones. Should they all instead be "static"? The goal is to be compatible with the i386 implementations. Those typically use something like: extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) (which kinda makes me want to put "extern" back, now that I think about it). I'm not sure what you mean by "modern inline". > > + __v2df __r; > > + union { > > + double __fr; > > + long long __fpscr; > > + } __save, __tmp; > > + > > + if (__rounding & _MM_FROUND_NO_EXC) > > + { > > Wrong indent. This code is very hard to read because of that. OK, will fix in v2. > If you figure that gee, it would be a nice if we had a builtin for > mffsce, then please make one? :-) Is one use-case sufficient grounds? I can give it a shot if so. > > + case _MM_FROUND_TO_NEAREST_INT: > > + __tmp.__fr = __builtin_mffsl (); > > + __attribute__((fallthrough)); > > Space before (. OK > > + case _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC: > > Space after |. OK > Please fix these things and resend. Will do. Thanks! PC