From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by sourceware.org (Postfix) with ESMTPS id D27443858D32 for ; Fri, 10 Mar 2023 16:42:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D27443858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb36.google.com with SMTP id i40so2481402ybj.6 for ; Fri, 10 Mar 2023 08:42:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678466559; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sdr0KikMZTuPUU1FfHCTZSjoi3rxsRqgO7orVomx9Z0=; b=FadOAIBQLLYOAWNPkpDw3tBafYSRXIokUAG2p4MHfK4e+fMTjUkshZxg/Sh0jzcHwc Aw5dzNO9CWbvfVr9bI5R9/WnEyh9biW9c6PHJQXx7+r3NRwGlzpf1PBPjNyX8V3GIkF4 3CApfz+emmGusHZ2qK6oTeBeWk+EESGCSMsGwGZuTMBOS9YEum0YOsooQR2V8SfB3pC2 B1YTnPESqNeNUiE6E1v3ajnqoSdukYX3t3GtxO9OwKYpWMLvke+lVJZahrr0xnvMsq8E hxWHZ1xCKJ+y5Bt0MQzn6oG+L/ckiekCp2gwdSNtWGxrBrV7yj2i3qWy8dL/nNXolx9R XIOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678466559; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sdr0KikMZTuPUU1FfHCTZSjoi3rxsRqgO7orVomx9Z0=; b=t0xyDYxOim3TjH3/bs+wGIOIWRqqrciypTqrPbnssoCkA0LhH4654kXRTBfgFa/uII l7Bk9l/lerUEBwPXOL5rrSY9S2Ht2VWKzG1vxcICM69eRp9js3+qUwJ8LGhraRdw27kl 6PlXQGIgmyuK2fuspOxhQ9F9nrFUpfuD9rcyEFN5WbPlgBm2ACxreTYC94friRtFIiCt svocp672dw5gRzSMh9qw1P077nuAvA2F+mMdiOSLKlNiemCVwVYGLjUBEKaNLR6crdeL BsYjHlQdtJ7X8KT2dKlLm58HUholWoaONeZYGtymMm1ELG9HkFMvhSyfnBGYsyT+hJ7M ifgg== X-Gm-Message-State: AO0yUKUbYWPsFTHgZ1x2LW5e7WaQAnii0dXy8W1MfEpYWm1CBZ1q7N4b Kr3/uOQIopqZhGceAqNGV1/l860PXTG3pYuWkbzB7+qe X-Google-Smtp-Source: AK7set+pVBs2WeIIMZL/Bjwg+vBHfe0gv9eAIKoWX1r7naKQfBRpiEC1wFyJHsJAIw4OgF6ba4fH8HIHTqvaDKno0F4= X-Received: by 2002:a05:6902:4f4:b0:a06:5ef5:3a82 with SMTP id w20-20020a05690204f400b00a065ef53a82mr12992663ybs.5.1678466559128; Fri, 10 Mar 2023 08:42:39 -0800 (PST) MIME-Version: 1.0 References: <20230309183312.205763-1-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Fri, 10 Mar 2023 08:42:03 -0800 Message-ID: Subject: Re: [PATCH] x86-64: Add x87 fmod and remainder [BZ #30179] To: Noah Goldstein Cc: libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3021.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Mar 9, 2023 at 6:36 PM Noah Goldstein wro= te: > > On Thu, Mar 9, 2023 at 12:33=E2=80=AFPM H.J. Lu via Libc-alpha > wrote: > > > > X87 (fprem/fprem1) implementations of fmod and remainder are much faste= r > > than generic fmod and remainder. Add e_fmod.S, e_fmodf.S, e_remainder.= S > > and e_remainderf.S with fprem/fprem1. This fixes BZ #30179. > > --- > > sysdeps/x86_64/fpu/e_fmod.S | 22 ++++++++++++++++++++++ > > sysdeps/x86_64/fpu/e_fmodf.S | 22 ++++++++++++++++++++++ > > sysdeps/x86_64/fpu/e_remainder.S | 22 ++++++++++++++++++++++ > > sysdeps/x86_64/fpu/e_remainderf.S | 22 ++++++++++++++++++++++ > > 4 files changed, 88 insertions(+) > > create mode 100644 sysdeps/x86_64/fpu/e_fmod.S > > create mode 100644 sysdeps/x86_64/fpu/e_fmodf.S > > create mode 100644 sysdeps/x86_64/fpu/e_remainder.S > > create mode 100644 sysdeps/x86_64/fpu/e_remainderf.S > > > > diff --git a/sysdeps/x86_64/fpu/e_fmod.S b/sysdeps/x86_64/fpu/e_fmod.S > > new file mode 100644 > > index 0000000000..4bdc8a1ab0 > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/e_fmod.S > > @@ -0,0 +1,22 @@ > > +/* > > + * Public domain. > > + */ > > + > > +#include > > +#include > > + > > +ENTRY(__ieee754_fmod) > > + movsd %xmm0, -16(%rsp) > > + movsd %xmm1, -8(%rsp) > > + fldl -8(%rsp) > > + fldl -16(%rsp) > > +1: fprem > > + fstsw %ax > > + sahf > > + jp 1b > For all functions can you replace `sahf; jp` with `testl $0x400, %eax; jn= z`? Yes. SAHF isn't available for all x86-64 CPUs. > > > + fstp %st(1) > > + fstpl -8(%rsp) > > + movsd -8(%rsp), %xmm0 > > + ret > > +END (__ieee754_fmod) > > +libm_alias_finite (__ieee754_fmod, __fmod) > > diff --git a/sysdeps/x86_64/fpu/e_fmodf.S b/sysdeps/x86_64/fpu/e_fmodf.= S > > new file mode 100644 > > index 0000000000..6f76daff01 > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/e_fmodf.S > > @@ -0,0 +1,22 @@ > > +/* > > + * Public domain. > > + */ > > + > > +#include > > +#include > > + > > +ENTRY(__ieee754_fmodf) > > + movss %xmm0, -8(%rsp) > > + movss %xmm1, -4(%rsp) > > + flds -4(%rsp) > > + flds -8(%rsp) > > +1: fprem > > + fstsw %ax > > + sahf > > + jp 1b > > + fstp %st(1) > > + fstps -4(%rsp) > > + movss -4(%rsp), %xmm0 > > + ret > > +END (__ieee754_fmodf) > > +libm_alias_finite (__ieee754_fmodf, __fmodf) > > diff --git a/sysdeps/x86_64/fpu/e_remainder.S b/sysdeps/x86_64/fpu/e_re= mainder.S > > new file mode 100644 > > index 0000000000..be2184f25a > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/e_remainder.S > > @@ -0,0 +1,22 @@ > > +/* > > + * Public domain. > > + */ > > + > > +#include > > +#include > > + > > +ENTRY(__ieee754_remainder) > > + movsd %xmm0, -16(%rsp) > > + movsd %xmm1, -8(%rsp) > > + fldl -8(%rsp) > > + fldl -16(%rsp) > > +1: fprem1 > > + fstsw %ax > > + sahf > > + jp 1b > > + fstp %st(1) > > + fstpl -8(%rsp) > > + movsd -8(%rsp), %xmm0 > > + ret > > +END (__ieee754_remainder) > > +libm_alias_finite (__ieee754_remainder, __remainder) > > diff --git a/sysdeps/x86_64/fpu/e_remainderf.S b/sysdeps/x86_64/fpu/e_r= emainderf.S > > new file mode 100644 > > index 0000000000..42972d3f84 > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/e_remainderf.S > > @@ -0,0 +1,22 @@ > > +/* > > + * Public domain. > > + */ > > + > > +#include > > +#include > > + > > +ENTRY(__ieee754_remainderf) > > + movss %xmm0, -8(%rsp) > > + movss %xmm1, -4(%rsp) > > + flds -4(%rsp) > > + flds -8(%rsp) > > +1: fprem1 > > + fstsw %ax > > + sahf > > + jp 1b > > + fstp %st(1) > > + fstps -4(%rsp) > > + movss -4(%rsp), %xmm0 > > + ret > > +END (__ieee754_remainderf) > > +libm_alias_finite (__ieee754_remainderf, __remainderf) > > -- > > 2.39.2 > > --=20 H.J.