From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) by sourceware.org (Postfix) with ESMTPS id E56133858D32; Sun, 11 Sep 2022 20:08:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E56133858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb29.google.com with SMTP id c9so9939629ybf.5; Sun, 11 Sep 2022 13:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=p7tKwdeeywNL1URliUINXcd3d+FDmKDO40dfPXdyt6s=; b=haSJ35aBPTN9KaRknE93XzTo1ztf4yKPGR0ZPUivBC8imdOEmSSRDe1mz4FvNOmVF5 Js7B3nOeGMs+cdHfgVuc9SDfLw9rDPSLwYFhI8We2KSX9nQbNcmCwma6FvWUSjMSdbI5 OrIpxv96S/fA53cdWSFT6nZIuVjzdQ2JODjHocRsl8SyYsGJc+RmKQtnESMuPmUNTorE MH6yVHqHOhhWbH82YV/YG0ujSq0EV2yrOtJjrMOoVjFrUYRF91GKSLD4Dyqoqa9BuWAQ m3I0Vh/VyUjMX9SCLlLw3/KwzwtmZDDgp7ti8FQZsUJKTez6g0uVZmLRii0B9epl8s5S jWYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=p7tKwdeeywNL1URliUINXcd3d+FDmKDO40dfPXdyt6s=; b=LPOdN6ybhKvECBEs6OUQHM2KMag0AQJ81q5/0rDyVXIJY647hSfXh3X35WEkAdf+rk T/Q2BbC8qHmoc+wiD/i/yAUgZr4x3aviVw/3KMu4SzIrpZ/IphGuzTHp/8U+04DFzmdG SvGFbN3l5Sb6lbZe1NXagwlHz4dw2jZAgPXxvPNhtFPloZiLH5npU1V0/MeEL1QYlK/+ i6Dq0YRm8ZIP5qOQoENlUKGBIov8er11S9Yxp4J8OQVzwscbkDncVL2jdfuBRrE3Qsjy 0mkA1CC3EWBYvHDKFhP088KlBd+C+YZ7q9ss0UxKGOSK3shB4xnfCstKNHGM7D9EybvL ISkw== X-Gm-Message-State: ACgBeo1hN2oXjDjO9DqzJc1yvskmSKYtfd1qaBJXu7XEXdVcZu1TOpbh gGgrVHoFLEufbVWdiKm9w4DQieCAEs/N87tawl0= X-Google-Smtp-Source: AA6agR6UqpDR0vpsAUWjtrMcoFaqGcLs4qmcwevO9nJ9EpYNNpMvAjbBSDrQOCR3Fsvb7WDGHEtVSg0rY2LGHzFQKMQ= X-Received: by 2002:a25:5f11:0:b0:6ae:d0b0:ceb3 with SMTP id t17-20020a255f11000000b006aed0b0ceb3mr4582631ybb.48.1662926936197; Sun, 11 Sep 2022 13:08:56 -0700 (PDT) MIME-Version: 1.0 References: <20210623222846.2162301-1-hjl.tools@gmail.com> <20210623222846.2162301-4-hjl.tools@gmail.com> In-Reply-To: From: Sunil Pandey Date: Sun, 11 Sep 2022 13:08:20 -0700 Message-ID: Subject: Re: [PATCH v8 3/4] x86_64: roundeven with sse4.1 support To: "H.J. Lu" , Libc-stable Mailing List , Florian Weimer Cc: GNU C Library , Shen-Ta Hsieh Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,KAM_SHORT,KAM_STOCKGEN,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Apr 27, 2022 at 5:11 PM Sunil Pandey wrote: > > On Wed, Jun 23, 2021 at 3:32 PM H.J. Lu via Libc-alpha > wrote: > > > > From: Shen-Ta Hsieh > > > > This patch adds support for the sse4.1 hardware floating point > > roundeven. > > > > Here is some benchmark results on my systems: > > > > =AMD Ryzen 9 3900X 12-Core Processor= > > > > * benchmark result before this commit > > | | roundeven | roundevenf | > > |------------|--------------|--------------| > > | duration | 3.75587e+09 | 3.75114e+09 | > > | iterations | 3.93053e+08 | 4.35402e+08 | > > | max | 52.592 | 58.71 | > > | min | 7.98 | 7.22 | > > | mean | 9.55563 | 8.61535 | > > > > * benchmark result after this commit > > | | roundeven | roundevenf | > > |------------|---------------|--------------| > > | duration | 3.73815e+09 | 3.73738e+09 | > > | iterations | 5.82692e+08 | 5.91498e+08 | > > | max | 56.468 | 51.642 | > > | min | 6.27 | 6.156 | > > | mean | 6.41532 | 6.3185 | > > > > =Intel(R) Pentium(R) CPU D1508 @ 2.20GHz= > > > > * benchmark result before this commit > > | | roundeven | roundevenf | > > |------------|--------------|--------------| > > | duration | 2.18208e+09 | 2.18258e+09 | > > | iterations | 2.39932e+08 | 2.46924e+08 | > > | max | 96.378 | 98.035 | > > | min | 6.776 | 5.94 | > > | mean | 9.09456 | 8.83907 | > > > > * benchmark result after this commit > > | | roundeven | roundevenf | > > |------------|--------------|--------------| > > | duration | 2.17415e+09 | 2.17005e+09 | > > | iterations | 3.56193e+08 | 4.09824e+08 | > > | max | 51.693 | 97.192 | > > | min | 5.926 | 5.093 | > > | mean | 6.10385 | 5.29507 | > > > > Signed-off-by: Shen-Ta Hsieh > > Reviewed-by: H.J. Lu > > --- > > sysdeps/x86_64/fpu/multiarch/Makefile | 5 +-- > > sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c | 2 ++ > > .../x86_64/fpu/multiarch/s_roundeven-sse4_1.S | 24 ++++++++++++++ > > sysdeps/x86_64/fpu/multiarch/s_roundeven.c | 31 +++++++++++++++++++ > > sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c | 3 ++ > > .../fpu/multiarch/s_roundevenf-sse4_1.S | 24 ++++++++++++++ > > sysdeps/x86_64/fpu/multiarch/s_roundevenf.c | 31 +++++++++++++++++++ > > 7 files changed, 118 insertions(+), 2 deletions(-) > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile > > index 57892c56bb..d425ffd6d3 100644 > > --- a/sysdeps/x86_64/fpu/multiarch/Makefile > > +++ b/sysdeps/x86_64/fpu/multiarch/Makefile > > @@ -1,11 +1,12 @@ > > ifeq ($(subdir),math) > > libm-sysdep_routines += s_floor-c s_ceil-c s_floorf-c s_ceilf-c \ > > s_rint-c s_rintf-c s_nearbyint-c s_nearbyintf-c \ > > - s_trunc-c s_truncf-c > > + s_roundeven-c s_roundevenf-c s_trunc-c s_truncf-c > > > > libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \ > > s_floorf-sse4_1 s_nearbyint-sse4_1 \ > > - s_nearbyintf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ > > + s_nearbyintf-sse4_1 s_roundeven-sse4_1 \ > > + s_roundevenf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ > > s_trunc-sse4_1 s_truncf-sse4_1 > > > > libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \ > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > new file mode 100644 > > index 0000000000..c7be43cb22 > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > @@ -0,0 +1,2 @@ > > +#define __roundeven __roundeven_c > > +#include > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > new file mode 100644 > > index 0000000000..6ae8f6b1d3 > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > @@ -0,0 +1,24 @@ > > +/* Copyright (C) 2021 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#include > > + > > + .section .text.sse4.1,"ax",@progbits > > +ENTRY(__roundeven_sse41) > > + roundsd $8, %xmm0, %xmm0 > > + ret > > +END(__roundeven_sse41) > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven.c b/sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > new file mode 100644 > > index 0000000000..d92eda652a > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > @@ -0,0 +1,31 @@ > > +/* Multiple versions of __roundeven. > > + Copyright (C) 2021 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#include > > + > > +#define roundeven __redirect_roundeven > > +#define __roundeven __redirect___roundeven > > +#include > > +#undef roundeven > > +#undef __roundeven > > + > > +#define SYMBOL_NAME roundeven > > +#include "ifunc-sse4_1.h" > > + > > +libc_ifunc_redirected (__redirect_roundeven, __roundeven, IFUNC_SELECTOR ()); > > +libm_alias_double (__roundeven, roundeven) > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > new file mode 100644 > > index 0000000000..72a6e7d1fb > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > @@ -0,0 +1,3 @@ > > +#undef __roundevenf > > +#define __roundevenf __roundevenf_c > > +#include > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > new file mode 100644 > > index 0000000000..a76e10807e > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > @@ -0,0 +1,24 @@ > > +/* Copyright (C) 2021 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#include > > + > > + .section .text.sse4.1,"ax",@progbits > > +ENTRY(__roundevenf_sse41) > > + roundss $8, %xmm0, %xmm0 > > + ret > > +END(__roundevenf_sse41) > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c b/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > new file mode 100644 > > index 0000000000..2ee196e68f > > --- /dev/null > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > @@ -0,0 +1,31 @@ > > +/* Multiple versions of __roundevenf. > > + Copyright (C) 2021 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#include > > + > > +#define roundevenf __redirect_roundevenf > > +#define __roundevenf __redirect___roundevenf > > +#include > > +#undef roundevenf > > +#undef __roundevenf > > + > > +#define SYMBOL_NAME roundevenf > > +#include "ifunc-sse4_1.h" > > + > > +libc_ifunc_redirected (__redirect_roundevenf, __roundevenf, IFUNC_SELECTOR ()); > > +libm_alias_float (__roundeven, roundeven) > > -- > > 2.31.1 > > > > I would like to backport this patch to release branches. > Any comments or objections? > > --Sunil I would like to backport this patch to release branch 2.33. Any comments/suggestions or objections on this. commit 1683249d17e14827b6579529742eb895027dfa84 Author: Shen-Ta Hsieh Date: Mon May 24 09:43:11 2021 +0800 x86_64: roundeven with sse4.1 support This patch adds support for the sse4.1 hardware floating point roundeven. Here is some benchmark results on my systems: