From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by sourceware.org (Postfix) with ESMTPS id D3F3B3857426; Fri, 23 Sep 2022 10:50:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D3F3B3857426 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qk1-x72c.google.com with SMTP id x18so7962936qkn.6; Fri, 23 Sep 2022 03:50:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date; bh=aqz9x/cmrnEtCCVkOPgFSbrg0Ui3Mv7cvN8AFVc/k3Q=; b=ZhLg27rTxHolFX5ZePSz9rPRH+TB8KXxo8w3W8I0PSqJajt3WTubxKwdlWcriKN6oe op3u2UKdEW2nQaXDUksgpA49clYcFfd2BSlF2fz24j1l3qQnsjdvVpQhEeSlq/aUqS8D QeYvu94dJJ7gB/MHyorcX/MCcnMmi3W75C8WOTWdrbCl3685BCkDhJ3eSMI0Md4bD6k8 sDXvstyQWpVuZ3hhGpjEN14siW8gNeOKPfTSWZcXpQYzENX6W6g1/2V3Tm87iVxEXJWg mgeM2HPKRcNQnWe72hX9kPXV8AU9jbJzX3Z86Y6p9DTLvKOIqFGVpZuprXKyniGaESWj EdUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date; bh=aqz9x/cmrnEtCCVkOPgFSbrg0Ui3Mv7cvN8AFVc/k3Q=; b=Oc8kPYGwXnvmNS4RqUYGn7kCiGY9IT3U72VwUd0zSLrBRcytgNIsUNz3PH09pPSYdF MykIQu2hJtaTjGcg5GyVA3rXBmZNVA4W+ZjSqLSVWkIAmlS9ekYoANF5RTEQUWPIewC9 kHEipNe0kR62mmkW70nggeF6XVNIiT4CYopDl/ND/07qGWGyIlIy1BRjbp33YnrGh3L2 7da0c4totkUp6A9+ffPIXQPIpGy28akR8uJrcwGEhtH2zpIU6weNisC2WxiFPyfivfCm D7C0Y+YmI8ziuFOKqU0YeM5wUTt998SAlA/keUQL8HARqXBv1BM6PX+W0gYL9um0xko0 sruQ== X-Gm-Message-State: ACrzQf1l2ZZNeLHjsfDKqZdwD5IAhRauoAWDYEzqkrheUOL3TtuOJhn+ OzSJFCf+1vo5pQJqp0hXrXMw+ceA7KoiI0IzK1o= X-Google-Smtp-Source: AMsMyM7uFZhYIZABoBEXSl4AuEZMA546JuvM5LNuZKDlGPBaHGYT1XJUg13Juq7x2XsZR0pHoVh30kWF/6FNvf5Ifd0= X-Received: by 2002:a05:620a:28cf:b0:6b5:e32f:febb with SMTP id l15-20020a05620a28cf00b006b5e32ffebbmr4913461qkp.258.1663930222187; Fri, 23 Sep 2022 03:50:22 -0700 (PDT) MIME-Version: 1.0 References: <20210623222846.2162301-1-hjl.tools@gmail.com> <20210623222846.2162301-4-hjl.tools@gmail.com> In-Reply-To: From: =?UTF-8?B?6Kyd5piH6YGUKFNoZW4tVGEgSHNpZWgp?= Date: Fri, 23 Sep 2022 18:50:10 +0800 Message-ID: Subject: Re: [PATCH v8 3/4] x86_64: roundeven with sse4.1 support To: Noah Goldstein Cc: Sunil Pandey , "H.J. Lu" , Libc-stable Mailing List , Florian Weimer , GNU C Library Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,KAM_STOCKGEN,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Fine by me, too Noah Goldstein =E6=96=BC 2022=E5=B9=B49=E6=9C=881= 4=E6=97=A5 =E9=80=B1=E4=B8=89 =E4=B8=8A=E5=8D=889:25=E5=AF=AB=E9=81=93=EF= =BC=9A > > On Sun, Sep 11, 2022 at 1:09 PM Sunil Pandey via Libc-stable > wrote: > > > > On Wed, Apr 27, 2022 at 5:11 PM Sunil Pandey wrote: > > > > > > On Wed, Jun 23, 2021 at 3:32 PM H.J. Lu via Libc-alpha > > > wrote: > > > > > > > > From: Shen-Ta Hsieh > > > > > > > > This patch adds support for the sse4.1 hardware floating point > > > > roundeven. > > > > > > > > Here is some benchmark results on my systems: > > > > > > > > =3DAMD Ryzen 9 3900X 12-Core Processor=3D > > > > > > > > * benchmark result before this commit > > > > | | roundeven | roundevenf | > > > > |------------|--------------|--------------| > > > > | duration | 3.75587e+09 | 3.75114e+09 | > > > > | iterations | 3.93053e+08 | 4.35402e+08 | > > > > | max | 52.592 | 58.71 | > > > > | min | 7.98 | 7.22 | > > > > | mean | 9.55563 | 8.61535 | > > > > > > > > * benchmark result after this commit > > > > | | roundeven | roundevenf | > > > > |------------|---------------|--------------| > > > > | duration | 3.73815e+09 | 3.73738e+09 | > > > > | iterations | 5.82692e+08 | 5.91498e+08 | > > > > | max | 56.468 | 51.642 | > > > > | min | 6.27 | 6.156 | > > > > | mean | 6.41532 | 6.3185 | > > > > > > > > =3DIntel(R) Pentium(R) CPU D1508 @ 2.20GHz=3D > > > > > > > > * benchmark result before this commit > > > > | | roundeven | roundevenf | > > > > |------------|--------------|--------------| > > > > | duration | 2.18208e+09 | 2.18258e+09 | > > > > | iterations | 2.39932e+08 | 2.46924e+08 | > > > > | max | 96.378 | 98.035 | > > > > | min | 6.776 | 5.94 | > > > > | mean | 9.09456 | 8.83907 | > > > > > > > > * benchmark result after this commit > > > > | | roundeven | roundevenf | > > > > |------------|--------------|--------------| > > > > | duration | 2.17415e+09 | 2.17005e+09 | > > > > | iterations | 3.56193e+08 | 4.09824e+08 | > > > > | max | 51.693 | 97.192 | > > > > | min | 5.926 | 5.093 | > > > > | mean | 6.10385 | 5.29507 | > > > > > > > > Signed-off-by: Shen-Ta Hsieh > > > > Reviewed-by: H.J. Lu > > > > --- > > > > sysdeps/x86_64/fpu/multiarch/Makefile | 5 +-- > > > > sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c | 2 ++ > > > > .../x86_64/fpu/multiarch/s_roundeven-sse4_1.S | 24 ++++++++++++++ > > > > sysdeps/x86_64/fpu/multiarch/s_roundeven.c | 31 +++++++++++++++= ++++ > > > > sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c | 3 ++ > > > > .../fpu/multiarch/s_roundevenf-sse4_1.S | 24 ++++++++++++++ > > > > sysdeps/x86_64/fpu/multiarch/s_roundevenf.c | 31 +++++++++++++++= ++++ > > > > 7 files changed, 118 insertions(+), 2 deletions(-) > > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1= .S > > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_= 1.S > > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > > > > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64= /fpu/multiarch/Makefile > > > > index 57892c56bb..d425ffd6d3 100644 > > > > --- a/sysdeps/x86_64/fpu/multiarch/Makefile > > > > +++ b/sysdeps/x86_64/fpu/multiarch/Makefile > > > > @@ -1,11 +1,12 @@ > > > > ifeq ($(subdir),math) > > > > libm-sysdep_routines +=3D s_floor-c s_ceil-c s_floorf-c s_ceilf-c = \ > > > > s_rint-c s_rintf-c s_nearbyint-c s_nearbyin= tf-c \ > > > > - s_trunc-c s_truncf-c > > > > + s_roundeven-c s_roundevenf-c s_trunc-c s_tr= uncf-c > > > > > > > > libm-sysdep_routines +=3D s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse= 4_1 \ > > > > s_floorf-sse4_1 s_nearbyint-sse4_1 \ > > > > - s_nearbyintf-sse4_1 s_rint-sse4_1 s_rintf-s= se4_1 \ > > > > + s_nearbyintf-sse4_1 s_roundeven-sse4_1 \ > > > > + s_roundevenf-sse4_1 s_rint-sse4_1 s_rintf-s= se4_1 \ > > > > s_trunc-sse4_1 s_truncf-sse4_1 > > > > > > > > libm-sysdep_routines +=3D e_exp-fma e_log-fma e_pow-fma s_atan-fma= \ > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c b/sysdeps= /x86_64/fpu/multiarch/s_roundeven-c.c > > > > new file mode 100644 > > > > index 0000000000..c7be43cb22 > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > > > @@ -0,0 +1,2 @@ > > > > +#define __roundeven __roundeven_c > > > > +#include > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S b/sy= sdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > > > new file mode 100644 > > > > index 0000000000..6ae8f6b1d3 > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > > > @@ -0,0 +1,24 @@ > > > > +/* Copyright (C) 2021 Free Software Foundation, Inc. > > > > + This file is part of the GNU C Library. > > > > + > > > > + The GNU C Library is free software; you can redistribute it and= /or > > > > + modify it under the terms of the GNU Lesser General Public > > > > + License as published by the Free Software Foundation; either > > > > + version 2.1 of the License, or (at your option) any later versi= on. > > > > + > > > > + The GNU C Library is distributed in the hope that it will be us= eful, > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the G= NU > > > > + Lesser General Public License for more details. > > > > + > > > > + You should have received a copy of the GNU Lesser General Publi= c > > > > + License along with the GNU C Library; if not, see > > > > + . */ > > > > + > > > > +#include > > > > + > > > > + .section .text.sse4.1,"ax",@progbits > > > > +ENTRY(__roundeven_sse41) > > > > + roundsd $8, %xmm0, %xmm0 > > > > + ret > > > > +END(__roundeven_sse41) > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven.c b/sysdeps/x= 86_64/fpu/multiarch/s_roundeven.c > > > > new file mode 100644 > > > > index 0000000000..d92eda652a > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > > > @@ -0,0 +1,31 @@ > > > > +/* Multiple versions of __roundeven. > > > > + Copyright (C) 2021 Free Software Foundation, Inc. > > > > + This file is part of the GNU C Library. > > > > + > > > > + The GNU C Library is free software; you can redistribute it and= /or > > > > + modify it under the terms of the GNU Lesser General Public > > > > + License as published by the Free Software Foundation; either > > > > + version 2.1 of the License, or (at your option) any later versi= on. > > > > + > > > > + The GNU C Library is distributed in the hope that it will be us= eful, > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the G= NU > > > > + Lesser General Public License for more details. > > > > + > > > > + You should have received a copy of the GNU Lesser General Publi= c > > > > + License along with the GNU C Library; if not, see > > > > + . */ > > > > + > > > > +#include > > > > + > > > > +#define roundeven __redirect_roundeven > > > > +#define __roundeven __redirect___roundeven > > > > +#include > > > > +#undef roundeven > > > > +#undef __roundeven > > > > + > > > > +#define SYMBOL_NAME roundeven > > > > +#include "ifunc-sse4_1.h" > > > > + > > > > +libc_ifunc_redirected (__redirect_roundeven, __roundeven, IFUNC_SE= LECTOR ()); > > > > +libm_alias_double (__roundeven, roundeven) > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c b/sysdep= s/x86_64/fpu/multiarch/s_roundevenf-c.c > > > > new file mode 100644 > > > > index 0000000000..72a6e7d1fb > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > > > @@ -0,0 +1,3 @@ > > > > +#undef __roundevenf > > > > +#define __roundevenf __roundevenf_c > > > > +#include > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S b/s= ysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > > > new file mode 100644 > > > > index 0000000000..a76e10807e > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > > > @@ -0,0 +1,24 @@ > > > > +/* Copyright (C) 2021 Free Software Foundation, Inc. > > > > + This file is part of the GNU C Library. > > > > + > > > > + The GNU C Library is free software; you can redistribute it and= /or > > > > + modify it under the terms of the GNU Lesser General Public > > > > + License as published by the Free Software Foundation; either > > > > + version 2.1 of the License, or (at your option) any later versi= on. > > > > + > > > > + The GNU C Library is distributed in the hope that it will be us= eful, > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the G= NU > > > > + Lesser General Public License for more details. > > > > + > > > > + You should have received a copy of the GNU Lesser General Publi= c > > > > + License along with the GNU C Library; if not, see > > > > + . */ > > > > + > > > > +#include > > > > + > > > > + .section .text.sse4.1,"ax",@progbits > > > > +ENTRY(__roundevenf_sse41) > > > > + roundss $8, %xmm0, %xmm0 > > > > + ret > > > > +END(__roundevenf_sse41) > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c b/sysdeps/= x86_64/fpu/multiarch/s_roundevenf.c > > > > new file mode 100644 > > > > index 0000000000..2ee196e68f > > > > --- /dev/null > > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > > > @@ -0,0 +1,31 @@ > > > > +/* Multiple versions of __roundevenf. > > > > + Copyright (C) 2021 Free Software Foundation, Inc. > > > > + This file is part of the GNU C Library. > > > > + > > > > + The GNU C Library is free software; you can redistribute it and= /or > > > > + modify it under the terms of the GNU Lesser General Public > > > > + License as published by the Free Software Foundation; either > > > > + version 2.1 of the License, or (at your option) any later versi= on. > > > > + > > > > + The GNU C Library is distributed in the hope that it will be us= eful, > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the G= NU > > > > + Lesser General Public License for more details. > > > > + > > > > + You should have received a copy of the GNU Lesser General Publi= c > > > > + License along with the GNU C Library; if not, see > > > > + . */ > > > > + > > > > +#include > > > > + > > > > +#define roundevenf __redirect_roundevenf > > > > +#define __roundevenf __redirect___roundevenf > > > > +#include > > > > +#undef roundevenf > > > > +#undef __roundevenf > > > > + > > > > +#define SYMBOL_NAME roundevenf > > > > +#include "ifunc-sse4_1.h" > > > > + > > > > +libc_ifunc_redirected (__redirect_roundevenf, __roundevenf, IFUNC_= SELECTOR ()); > > > > +libm_alias_float (__roundeven, roundeven) > > > > -- > > > > 2.31.1 > > > > > > > > > > I would like to backport this patch to release branches. > > > Any comments or objections? > > > > > > --Sunil > > > > I would like to backport this patch to release branch 2.33. > > > > Any comments/suggestions or objections on this. > > Fine by me. > > > > commit 1683249d17e14827b6579529742eb895027dfa84 > > Author: Shen-Ta Hsieh > > Date: Mon May 24 09:43:11 2021 +0800 > > > > x86_64: roundeven with sse4.1 support > > > > This patch adds support for the sse4.1 hardware floating point > > roundeven. > > > > Here is some benchmark results on my systems: