From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id A5E093858D3C for ; Tue, 12 Oct 2021 19:36:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A5E093858D3C Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19CHoR2j019636; Tue, 12 Oct 2021 15:36:02 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com with ESMTP id 3bnd7r537s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Oct 2021 15:36:02 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19CJXOfa029306; Tue, 12 Oct 2021 19:36:01 GMT Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by ppma04wdc.us.ibm.com with ESMTP id 3bk2qatb5b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Oct 2021 19:36:01 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19CJa0AH44958168 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Oct 2021 19:36:00 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0FBCB124058; Tue, 12 Oct 2021 19:36:00 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 76B8F124053; Tue, 12 Oct 2021 19:35:59 +0000 (GMT) Received: from li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com (unknown [9.160.187.151]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTPS; Tue, 12 Oct 2021 19:35:59 +0000 (GMT) Date: Tue, 12 Oct 2021 14:35:57 -0500 From: "Paul A. Clarke" To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com Subject: Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics Message-ID: <20211012193557.GN243632@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> References: <20210823190310.1679905-2-pc@us.ibm.com> <20211007233906.GQ10333@gate.crashing.org> <20211008010423.GC243632@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <20211008173915.GR10333@gate.crashing.org> <20211008192728.GF243632@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <20211008223111.GU10333@gate.crashing.org> <20211011134617.GG243632@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <20211011162839.GY10333@gate.crashing.org> <20211011173107.GH243632@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <20211011220412.GD10333@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211011220412.GD10333@gate.crashing.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 X-Proofpoint-GUID: VF6rwx2zBziQMtoWxzMjFL2DefbbZi3F X-Proofpoint-ORIG-GUID: VF6rwx2zBziQMtoWxzMjFL2DefbbZi3F X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-12_05,2021-10-12_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 bulkscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 lowpriorityscore=0 suspectscore=0 impostorscore=0 malwarescore=0 adultscore=0 phishscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2110120104 X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2021 19:36:04 -0000 On Mon, Oct 11, 2021 at 05:04:12PM -0500, Segher Boessenkool wrote: > On Mon, Oct 11, 2021 at 12:31:07PM -0500, Paul A. Clarke wrote: > > On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote: > > > > Very similar methods are used in glibc today. Are those broken? > > > > > > Maybe. > > > > Ouch. > > So show the code? You asked for it. ;-) Boiled down to remove macroisms and code that should be removed by optimization: -- static __inline __attribute__ ((__always_inline__)) void libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r) { fenv_union_t old; register fenv_union_t __fr; __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r)); ctx->env = old.fenv = __fr.fenv; ctx->updated_status = (r != (old.l & 3)); } static __inline __attribute__ ((__always_inline__)) void libc_feresetround_ppc (fenv_t *envp) { fenv_union_t new = { .fenv = *envp }; register fenv_union_t __fr; __fr.l = new.l & 3; __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv)); } double __sin (double x) { struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx))); libc_feholdsetround_ppc_ctx (&ctx, (0)); /* floating point intensive code. */ return retval; } -- There's not much to it, really. "mffscrni" on the way in to save and set a required rounding mode, and "mffscrn" on the way out to restore it. > > > If you get a real (i.e. not inline) function call there, that > > > can save you often. > > > > Calling a real function in order to execute a single instruction is > > sub-optimal. ;-) > > Calling a real function (that does not even need a stack frame, just a > blr) is not terribly expensive, either. Not ideal, better would be better. > > > > Would creating a __builtin_mffsce be another solution? > > > > > > Yes. And not a bad idea in the first place. > > > > The previous "Nope" and this "Yes" seem in contradiction. If there is no > > difference between "asm" and builtin, how does using a builtin solve the > > problem? > > You will have to make the builtin solve it. What a builtin can do is > virtually unlimited. What an asm can do is not: it just outputs some > assembler language, and does in/out/clobber constraints. You can do a > *lot* with that, but it is much more limited than everything you can do > in the compiler! :-) > > The fact remains that there is no way in RTL (or Gimple for that matter) > to express things like rounding mode changes. You will need to > artificially make some barriers. I know there is __builtin_set_fpscr_rn that generates mffscrn. This is not used in the code above because I believe it first appears in GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define a return value, which would be handy in this case). Does the implementation of that builtin meet the requirements needed here, to prevent reordering of FP computation across instantiations of the builtin? If not, is there a model on which to base an implementation of __builtin_mffsce (or some preferred name)? PC