From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 5AAB43858408 for ; Tue, 2 Jan 2024 09:57:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5AAB43858408 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5AAB43858408 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704189460; cv=none; b=HLn2iAw99vkZWZY8VzIkmaRIdZHP9ieBCUGwyKQQQqo24l+lbjFVoBR+MvbbLy3ec8qxySEPTNge6KPPVLM8TM/TSlo2nZoZZCfDAeDhOI7rKjNNekaz1KHdOsCsbB4TGP93YoG99QvQCfXg95B3E1119W7K2PGlereI8fM4CWE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704189460; c=relaxed/simple; bh=LiGrqQ0T2IcRjexEfSgKF7h1dIQ8Nx2g2DDm0Q1RGAU=; h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=eOtHFtXc7jIm3FAOcKeCpDIPmIkXPj/iwWv6GVxoycs+xM1ytIUj6A7xVYZnZQRI+1kKjA59Nx1cU4Czdwg7Ma/JJCsxkwtntpPasVbdgrMQvf4KvkhsCTMRlhUirkQIzxFtDaLceCoei1e2iLGgrNu9KsizcjfiyhusYCE446o= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1704189457; bh=LiGrqQ0T2IcRjexEfSgKF7h1dIQ8Nx2g2DDm0Q1RGAU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=L7u5ONGWDo/RUj71z0x1b+zh43Qaln0fF33OfE38qfjcj8iJ9Qh8LBDzQkyyiq6kn podjFFpsfHvQrSsbamuCLCo/TEuXOUKXi3Nbcv18c1eipNKiFy9/eaWSsM9ro3Z7Yj rGmYtDdH/cmM10Fily7hV05SxnoA8q+5AJXxOxEs= Received: from [IPv6:240e:358:11a9:2200:dc73:854d:832e:3] (unknown [IPv6:240e:358:11a9:2200:dc73:854d:832e:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 4C2F666BE5; Tue, 2 Jan 2024 04:57:33 -0500 (EST) Message-ID: <664e08724df70b49d256c26d0a01000ca956a6da.camel@xry111.site> Subject: Re: [PATCH 2/2] MIPS: Hard-float rounding instructions support From: Xi Ruoyao To: Junxian Zhu Cc: libc-alpha@sourceware.org Date: Tue, 02 Jan 2024 17:57:29 +0800 In-Reply-To: References: <20231225103548.1615-2-zhujunxian@oss.cipunited.com> <20231225103548.1615-4-zhujunxian@oss.cipunited.com> Autocrypt: addr=xry111@xry111.site; prefer-encrypt=mutual; keydata=mDMEYnkdPhYJKwYBBAHaRw8BAQdAsY+HvJs3EVKpwIu2gN89cQT/pnrbQtlvd6Yfq7egugi0HlhpIFJ1b3lhbyA8eHJ5MTExQHhyeTExMS5zaXRlPoiTBBMWCgA7FiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQrKrSDhnnEOPHFgD8D9vUToTd1MF5bng9uPJq5y3DfpcxDp+LD3joA3U2TmwA/jZtN9xLH7CGDHeClKZK/ZYELotWfJsqRcthOIGjsdAPuDgEYnkdPhIKKwYBBAGXVQEFAQEHQG+HnNiPZseiBkzYBHwq/nN638o0NPwgYwH70wlKMZhRAwEIB4h4BBgWCgAgFiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwwACgkQrKrSDhnnEOPjXgD/euD64cxwqDIqckUaisT3VCst11RcnO5iRHm6meNIwj0BALLmWplyi7beKrOlqKfuZtCLbiAPywGfCNg8LOTt4iMD Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.50.2 MIME-Version: 1.0 X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 2024-01-02 at 17:43 +0800, Junxian Zhu wrote: > =E5=9C=A8 2023/12/26 16:29, Xi Ruoyao =E5=86=99=E9=81=93: > > On Tue, 2023-12-26 at 10:37 +0800, Junxian Zhu wrote: > > > =E5=9C=A8 2023/12/25 18:51, Xi Ruoyao =E5=86=99=E9=81=93: > > > > On Mon, 2023-12-25 at 18:35 +0800, Junxian Zhu wrote: > > > >=20 > > > > /* snip */ > > > >=20 > > > > > +/* > > > > > + * ceil(x) > > > > > + * Return x rounded toward -inf to integral value > > > > > + * Method: > > > > > + * Bit twiddling. > > > > > + */ > > > > > + > > > > > +#if ((__mips_fpr =3D=3D 64) && (__mips_hard_float =3D=3D 1) && (= (__mips =3D=3D 32 && __mips_isa_rev > 1) || __mips =3D=3D 64)) > > > > > +#include > > > > > +#include > > > > > +#include > > > > > + > > > > > +ENTRY(__ceil) > > > > > + .set push > > > > > + .set noreorder > > > > > + .set noat > > > > > +# $f0=3Dret, $f12=3Ddouble, a0=3Dint64/int32_h, a1=3Dint32_l, a2= =3Dsign, a3=3Dexp > > > > > +#if __mips =3D=3D 64 > > > > > + dmfc1=C2=A0=C2=A0 a0, $f12 # assign int64 > > > > > +#else > > > > > + mfhc1=C2=A0=C2=A0 a0, $f12 # assign int64 > > > > > +#endif > > > > > + cfc1=C2=A0=C2=A0=C2=A0 t0, $f26 > > > > > + ceil.l.d=C2=A0=C2=A0=C2=A0 $f0, $f12 > > > > No, C23 does not allow this function to raise an INEXACT exception,= but > > > > ceil.l.d will do so. > > > >=20 > > > > Such optimizations should be performed in GCC which can be controll= ed by > > > > the programmer with -std=3Dc23 and/or -f[no-]fp-int-builtin-inexact= , not > > > > in Glibc where we cannot know if the programmer wants to deviate fr= om > > > > C23. > > > The cfc1 instruction will backup float point exception status before > > > running ceil.l.d, and the following ctc1 will restore float point > > > exception status to avoid INEXACT exception raised by ceil.l.d. It's = the > > > same way like what have been done in s_ceil.S for i386. > > Still incorrect because when the Enable field of FCSR contains INEXACT = a > > SIGFPE will be immediately delivered and there is no way to recover.=C2= =A0 A > > demonstration: > >=20 > > #define _GNU_SOURCE > > #include > > #include > >=20 > > int main() > > { > > =C2=A0=C2=A0 printf("%d\n", feenableexcept(FE_INEXACT)); > >=20 > > =C2=A0=C2=A0 double data =3D 114.514; > > =C2=A0=C2=A0 long control; > > =C2=A0=C2=A0 asm("cfc1\t%1,$f26\n\t" > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "ceil.l.d\t%0,%0\n\t" > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "cvt.d.l\t%0,%0\n\t" > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "ctc1\t%1,$f26": "+f"(data), "=3Dr= "(control)); > > =C2=A0=C2=A0 printf("%.15f\n", data); > > =C2=A0=C2=A0 return 0; > > } > >=20 > > On i386 the fnstenv instruction also masks out all the FP exceptions so > > this is not a problem.=C2=A0 See commit 26b0bf96000a. >=20 > I can use "ctc1 $0, $28" to disable all float point exception to ensure= =20 > no FP exceptions occur at here. But it will introduce additional=20 > consumption. And then it will likely be even slower than the generic implementation like Adhemerval already tested on the cfarm machine. Frankly I'm even unsure if your (incorrect) implementation is really faster than the generic implementation: if the uarch just handles all ctc1 instructions equally (i.e. always stalling the FP unit or even the entire CPU for a dozen of cycles) it would be already slower. Have you benchmarked this on real hardware? Note that benchmarking on things like QEMU can be completely misleading. --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University