From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Dhp9=IM=xry111.site=xry111@sourceware.org>
Received: from xry111.site (xry111.site [89.208.246.23])
	by sourceware.org (Postfix) with ESMTPS id 5AAB43858408
	for <libc-alpha@sourceware.org>; Tue,  2 Jan 2024 09:57:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5AAB43858408
Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5AAB43858408
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704189460; cv=none;
	b=HLn2iAw99vkZWZY8VzIkmaRIdZHP9ieBCUGwyKQQQqo24l+lbjFVoBR+MvbbLy3ec8qxySEPTNge6KPPVLM8TM/TSlo2nZoZZCfDAeDhOI7rKjNNekaz1KHdOsCsbB4TGP93YoG99QvQCfXg95B3E1119W7K2PGlereI8fM4CWE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1704189460; c=relaxed/simple;
	bh=LiGrqQ0T2IcRjexEfSgKF7h1dIQ8Nx2g2DDm0Q1RGAU=;
	h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=eOtHFtXc7jIm3FAOcKeCpDIPmIkXPj/iwWv6GVxoycs+xM1ytIUj6A7xVYZnZQRI+1kKjA59Nx1cU4Czdwg7Ma/JJCsxkwtntpPasVbdgrMQvf4KvkhsCTMRlhUirkQIzxFtDaLceCoei1e2iLGgrNu9KsizcjfiyhusYCE446o=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site;
	s=default; t=1704189457;
	bh=LiGrqQ0T2IcRjexEfSgKF7h1dIQ8Nx2g2DDm0Q1RGAU=;
	h=Subject:From:To:Cc:Date:In-Reply-To:References:From;
	b=L7u5ONGWDo/RUj71z0x1b+zh43Qaln0fF33OfE38qfjcj8iJ9Qh8LBDzQkyyiq6kn
	 podjFFpsfHvQrSsbamuCLCo/TEuXOUKXi3Nbcv18c1eipNKiFy9/eaWSsM9ro3Z7Yj
	 rGmYtDdH/cmM10Fily7hV05SxnoA8q+5AJXxOxEs=
Received: from [IPv6:240e:358:11a9:2200:dc73:854d:832e:3] (unknown [IPv6:240e:358:11a9:2200:dc73:854d:832e:3])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384)
	(Client did not present a certificate)
	(Authenticated sender: xry111@xry111.site)
	by xry111.site (Postfix) with ESMTPSA id 4C2F666BE5;
	Tue,  2 Jan 2024 04:57:33 -0500 (EST)
Message-ID: <664e08724df70b49d256c26d0a01000ca956a6da.camel@xry111.site>
Subject: Re: [PATCH 2/2] MIPS: Hard-float rounding instructions support
From: Xi Ruoyao <xry111@xry111.site>
To: Junxian Zhu <zhujunxian@oss.cipunited.com>
Cc: libc-alpha@sourceware.org
Date: Tue, 02 Jan 2024 17:57:29 +0800
In-Reply-To: <ae79a262-8b36-424f-bcdf-304571193035@oss.cipunited.com>
References: <20231225103548.1615-2-zhujunxian@oss.cipunited.com>
	 <20231225103548.1615-4-zhujunxian@oss.cipunited.com>
	 <c7d09b8a9bee6380e8579715deca6a5ce375be9e.camel@xry111.site>
	 <c9cc801e-742f-4631-9a34-481eec4e146c@oss.cipunited.com>
	 <a8ceea9a3fa18d404c3ec356970ef2b275daedf9.camel@xry111.site>
	 <ae79a262-8b36-424f-bcdf-304571193035@oss.cipunited.com>
Autocrypt: addr=xry111@xry111.site; prefer-encrypt=mutual;
 keydata=mDMEYnkdPhYJKwYBBAHaRw8BAQdAsY+HvJs3EVKpwIu2gN89cQT/pnrbQtlvd6Yfq7egugi0HlhpIFJ1b3lhbyA8eHJ5MTExQHhyeTExMS5zaXRlPoiTBBMWCgA7FiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQrKrSDhnnEOPHFgD8D9vUToTd1MF5bng9uPJq5y3DfpcxDp+LD3joA3U2TmwA/jZtN9xLH7CGDHeClKZK/ZYELotWfJsqRcthOIGjsdAPuDgEYnkdPhIKKwYBBAGXVQEFAQEHQG+HnNiPZseiBkzYBHwq/nN638o0NPwgYwH70wlKMZhRAwEIB4h4BBgWCgAgFiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwwACgkQrKrSDhnnEOPjXgD/euD64cxwqDIqckUaisT3VCst11RcnO5iRHm6meNIwj0BALLmWplyi7beKrOlqKfuZtCLbiAPywGfCNg8LOTt4iMD
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.50.2 
MIME-Version: 1.0
X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>

On Tue, 2024-01-02 at 17:43 +0800, Junxian Zhu wrote:
> =E5=9C=A8 2023/12/26 16:29, Xi Ruoyao =E5=86=99=E9=81=93:
> > On Tue, 2023-12-26 at 10:37 +0800, Junxian Zhu wrote:
> > > =E5=9C=A8 2023/12/25 18:51, Xi Ruoyao =E5=86=99=E9=81=93:
> > > > On Mon, 2023-12-25 at 18:35 +0800, Junxian Zhu wrote:
> > > >=20
> > > > /* snip */
> > > >=20
> > > > > +/*
> > > > > + * ceil(x)
> > > > > + * Return x rounded toward -inf to integral value
> > > > > + * Method:
> > > > > + *	Bit twiddling.
> > > > > + */
> > > > > +
> > > > > +#if ((__mips_fpr =3D=3D 64) && (__mips_hard_float =3D=3D 1) && (=
(__mips =3D=3D 32 && __mips_isa_rev > 1) || __mips =3D=3D 64))
> > > > > +#include <sys/regdef.h>
> > > > > +#include <sysdep.h>
> > > > > +#include <libm-alias-double.h>
> > > > > +
> > > > > +ENTRY(__ceil)
> > > > > +	.set push
> > > > > +	.set noreorder
> > > > > +	.set noat
> > > > > +# $f0=3Dret, $f12=3Ddouble, a0=3Dint64/int32_h, a1=3Dint32_l, a2=
=3Dsign, a3=3Dexp
> > > > > +#if __mips =3D=3D 64
> > > > > +	dmfc1=C2=A0=C2=A0 a0, $f12 # assign int64
> > > > > +#else
> > > > > +	mfhc1=C2=A0=C2=A0 a0, $f12 # assign int64
> > > > > +#endif
> > > > > +	cfc1=C2=A0=C2=A0=C2=A0 t0, $f26
> > > > > +	ceil.l.d=C2=A0=C2=A0=C2=A0 $f0, $f12
> > > > No, C23 does not allow this function to raise an INEXACT exception,=
 but
> > > > ceil.l.d will do so.
> > > >=20
> > > > Such optimizations should be performed in GCC which can be controll=
ed by
> > > > the programmer with -std=3Dc23 and/or -f[no-]fp-int-builtin-inexact=
, not
> > > > in Glibc where we cannot know if the programmer wants to deviate fr=
om
> > > > C23.
> > > The cfc1 instruction will backup float point exception status before
> > > running ceil.l.d, and the following ctc1 will restore float point
> > > exception status to avoid INEXACT exception raised by ceil.l.d. It's =
the
> > > same way like what have been done in s_ceil.S for i386.
> > Still incorrect because when the Enable field of FCSR contains INEXACT =
a
> > SIGFPE will be immediately delivered and there is no way to recover.=C2=
=A0 A
> > demonstration:
> >=20
> > #define _GNU_SOURCE
> > #include <stdio.h>
> > #include <fenv.h>
> >=20
> > int main()
> > {
> > =C2=A0=C2=A0 printf("%d\n", feenableexcept(FE_INEXACT));
> >=20
> > =C2=A0=C2=A0 double data =3D 114.514;
> > =C2=A0=C2=A0 long control;
> > =C2=A0=C2=A0 asm("cfc1\t%1,$f26\n\t"
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "ceil.l.d\t%0,%0\n\t"
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "cvt.d.l\t%0,%0\n\t"
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "ctc1\t%1,$f26": "+f"(data), "=3Dr=
"(control));
> > =C2=A0=C2=A0 printf("%.15f\n", data);
> > =C2=A0=C2=A0 return 0;
> > }
> >=20
> > On i386 the fnstenv instruction also masks out all the FP exceptions so
> > this is not a problem.=C2=A0 See commit 26b0bf96000a.
>=20
> I can use "ctc1 $0, $28" to disable all float point exception to ensure=
=20
> no FP exceptions occur at here. But it will introduce additional=20
> consumption.

And then it will likely be even slower than the generic implementation
like Adhemerval already tested on the cfarm machine.  Frankly I'm even
unsure if your (incorrect) implementation is really faster than the
generic implementation: if the uarch just handles all ctc1 instructions
equally (i.e. always stalling the FP unit or even the entire CPU for a
dozen of cycles) it would be already slower.

Have you benchmarked this on real hardware?  Note that benchmarking on
things like QEMU can be completely misleading.

--=20
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University