From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by sourceware.org (Postfix) with ESMTPS id 661A4385C41D; Wed, 14 Sep 2022 01:25:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 661A4385C41D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x52d.google.com with SMTP id e18so20044533edj.3; Tue, 13 Sep 2022 18:25:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=O0Q/FmimQmHL52Ta2ziiNTaDZ32VTXK9P76P+okVflk=; b=Jssych4vPPalzjEJQuiQnq9ETT2NV5lxPLJGtcR58+sJoOlV345CYzrUmtnJo90bk9 efNqh99FA/gVS2zkjgxPMOjdcmEZr6QIDpQInMf4EyD8BiRerMxEEZe3wyaluhX96Anp o1jhVGZXx2/YEk3EqV5PpGjaoii71NpR3BX0I1MczPSVD+Y2hueD8/SgStDPi8EEFGIs 8BiV50NVlxfL5aAlXsuDfcalWC/jlS5My+JZYYJkKyTSMYUIxRd8WM5rfL2gJiAovDmM l7Fb9OYH5/VjP1xqH8GfrmmBPASTl0Lo838cNzzf4A9IeeMI65l6sbDmLbL8v/onqJPf 7Xyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=O0Q/FmimQmHL52Ta2ziiNTaDZ32VTXK9P76P+okVflk=; b=qtYNcySLW0cgTbtmwBAdLq3S8Wjs6Lh+07GhEkkpBedj5U+qzbgPQcjuYDCjMrOVtB rfz/XufpCxA+t9NKxZ/hezaI4xUDc2J4IdKO1KeDgZ0KswNY1oG4mTgFlDzx8DnK6+t+ FdAW/dRDWi+FAeG8QVbN5Xk75HDZxo1Y6Em28BLiCJXuzR5jU4vNgVg7OlWQhY85037l l5yxduAQbYaLKbicGXkx5N348DxByc689Ha1PS05av1+z0RAx4B3ekLlY+ExlbmaXakS iQvSxc4xYMBYuS/B5EoeHcNA3jCTG+iRJEJox1EMzJ9Pfj5s3iUkkBbGk17q6p0/vsJ4 FKbw== X-Gm-Message-State: ACgBeo3evOXwjLg7lXeHleBIGTwC6MfUZhVFvsV0zrlqvyrv5W8F4ebj jcbDb3H2VPJLMlyAZIHN2AYK/k193o4uUaD/2o4= X-Google-Smtp-Source: AA6agR6dplQcni1Sx3OHtOPse7l0XJfcZr0bNceMCZ7Igx7qN4qZmFytfnGsvPR0HAMROjgyRkzqpW2L0BYii7xzxrQ= X-Received: by 2002:a05:6402:3596:b0:450:c4d9:a04b with SMTP id y22-20020a056402359600b00450c4d9a04bmr23292693edc.218.1663118755897; Tue, 13 Sep 2022 18:25:55 -0700 (PDT) MIME-Version: 1.0 References: <20210623222846.2162301-1-hjl.tools@gmail.com> <20210623222846.2162301-4-hjl.tools@gmail.com> In-Reply-To: From: Noah Goldstein Date: Tue, 13 Sep 2022 18:25:44 -0700 Message-ID: Subject: Re: [PATCH v8 3/4] x86_64: roundeven with sse4.1 support To: Sunil Pandey Cc: "H.J. Lu" , Libc-stable Mailing List , Florian Weimer , Shen-Ta Hsieh , GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,KAM_STOCKGEN,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Sep 11, 2022 at 1:09 PM Sunil Pandey via Libc-stable wrote: > > On Wed, Apr 27, 2022 at 5:11 PM Sunil Pandey wrote: > > > > On Wed, Jun 23, 2021 at 3:32 PM H.J. Lu via Libc-alpha > > wrote: > > > > > > From: Shen-Ta Hsieh > > > > > > This patch adds support for the sse4.1 hardware floating point > > > roundeven. > > > > > > Here is some benchmark results on my systems: > > > > > > =AMD Ryzen 9 3900X 12-Core Processor= > > > > > > * benchmark result before this commit > > > | | roundeven | roundevenf | > > > |------------|--------------|--------------| > > > | duration | 3.75587e+09 | 3.75114e+09 | > > > | iterations | 3.93053e+08 | 4.35402e+08 | > > > | max | 52.592 | 58.71 | > > > | min | 7.98 | 7.22 | > > > | mean | 9.55563 | 8.61535 | > > > > > > * benchmark result after this commit > > > | | roundeven | roundevenf | > > > |------------|---------------|--------------| > > > | duration | 3.73815e+09 | 3.73738e+09 | > > > | iterations | 5.82692e+08 | 5.91498e+08 | > > > | max | 56.468 | 51.642 | > > > | min | 6.27 | 6.156 | > > > | mean | 6.41532 | 6.3185 | > > > > > > =Intel(R) Pentium(R) CPU D1508 @ 2.20GHz= > > > > > > * benchmark result before this commit > > > | | roundeven | roundevenf | > > > |------------|--------------|--------------| > > > | duration | 2.18208e+09 | 2.18258e+09 | > > > | iterations | 2.39932e+08 | 2.46924e+08 | > > > | max | 96.378 | 98.035 | > > > | min | 6.776 | 5.94 | > > > | mean | 9.09456 | 8.83907 | > > > > > > * benchmark result after this commit > > > | | roundeven | roundevenf | > > > |------------|--------------|--------------| > > > | duration | 2.17415e+09 | 2.17005e+09 | > > > | iterations | 3.56193e+08 | 4.09824e+08 | > > > | max | 51.693 | 97.192 | > > > | min | 5.926 | 5.093 | > > > | mean | 6.10385 | 5.29507 | > > > > > > Signed-off-by: Shen-Ta Hsieh > > > Reviewed-by: H.J. Lu > > > --- > > > sysdeps/x86_64/fpu/multiarch/Makefile | 5 +-- > > > sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c | 2 ++ > > > .../x86_64/fpu/multiarch/s_roundeven-sse4_1.S | 24 ++++++++++++++ > > > sysdeps/x86_64/fpu/multiarch/s_roundeven.c | 31 +++++++++++++++++++ > > > sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c | 3 ++ > > > .../fpu/multiarch/s_roundevenf-sse4_1.S | 24 ++++++++++++++ > > > sysdeps/x86_64/fpu/multiarch/s_roundevenf.c | 31 +++++++++++++++++++ > > > 7 files changed, 118 insertions(+), 2 deletions(-) > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile > > > index 57892c56bb..d425ffd6d3 100644 > > > --- a/sysdeps/x86_64/fpu/multiarch/Makefile > > > +++ b/sysdeps/x86_64/fpu/multiarch/Makefile > > > @@ -1,11 +1,12 @@ > > > ifeq ($(subdir),math) > > > libm-sysdep_routines += s_floor-c s_ceil-c s_floorf-c s_ceilf-c \ > > > s_rint-c s_rintf-c s_nearbyint-c s_nearbyintf-c \ > > > - s_trunc-c s_truncf-c > > > + s_roundeven-c s_roundevenf-c s_trunc-c s_truncf-c > > > > > > libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \ > > > s_floorf-sse4_1 s_nearbyint-sse4_1 \ > > > - s_nearbyintf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ > > > + s_nearbyintf-sse4_1 s_roundeven-sse4_1 \ > > > + s_roundevenf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ > > > s_trunc-sse4_1 s_truncf-sse4_1 > > > > > > libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \ > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > > new file mode 100644 > > > index 0000000000..c7be43cb22 > > > --- /dev/null > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > > > @@ -0,0 +1,2 @@ > > > +#define __roundeven __roundeven_c > > > +#include > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > > new file mode 100644 > > > index 0000000000..6ae8f6b1d3 > > > --- /dev/null > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > > > @@ -0,0 +1,24 @@ > > > +/* Copyright (C) 2021 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/or > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version. > > > + > > > + The GNU C Library is distributed in the hope that it will be useful, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include > > > + > > > + .section .text.sse4.1,"ax",@progbits > > > +ENTRY(__roundeven_sse41) > > > + roundsd $8, %xmm0, %xmm0 > > > + ret > > > +END(__roundeven_sse41) > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven.c b/sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > > new file mode 100644 > > > index 0000000000..d92eda652a > > > --- /dev/null > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven.c > > > @@ -0,0 +1,31 @@ > > > +/* Multiple versions of __roundeven. > > > + Copyright (C) 2021 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/or > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version. > > > + > > > + The GNU C Library is distributed in the hope that it will be useful, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include > > > + > > > +#define roundeven __redirect_roundeven > > > +#define __roundeven __redirect___roundeven > > > +#include > > > +#undef roundeven > > > +#undef __roundeven > > > + > > > +#define SYMBOL_NAME roundeven > > > +#include "ifunc-sse4_1.h" > > > + > > > +libc_ifunc_redirected (__redirect_roundeven, __roundeven, IFUNC_SELECTOR ()); > > > +libm_alias_double (__roundeven, roundeven) > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > > new file mode 100644 > > > index 0000000000..72a6e7d1fb > > > --- /dev/null > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > > > @@ -0,0 +1,3 @@ > > > +#undef __roundevenf > > > +#define __roundevenf __roundevenf_c > > > +#include > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > > new file mode 100644 > > > index 0000000000..a76e10807e > > > --- /dev/null > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > > > @@ -0,0 +1,24 @@ > > > +/* Copyright (C) 2021 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/or > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version. > > > + > > > + The GNU C Library is distributed in the hope that it will be useful, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include > > > + > > > + .section .text.sse4.1,"ax",@progbits > > > +ENTRY(__roundevenf_sse41) > > > + roundss $8, %xmm0, %xmm0 > > > + ret > > > +END(__roundevenf_sse41) > > > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c b/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > > new file mode 100644 > > > index 0000000000..2ee196e68f > > > --- /dev/null > > > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > > @@ -0,0 +1,31 @@ > > > +/* Multiple versions of __roundevenf. > > > + Copyright (C) 2021 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/or > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version. > > > + > > > + The GNU C Library is distributed in the hope that it will be useful, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include > > > + > > > +#define roundevenf __redirect_roundevenf > > > +#define __roundevenf __redirect___roundevenf > > > +#include > > > +#undef roundevenf > > > +#undef __roundevenf > > > + > > > +#define SYMBOL_NAME roundevenf > > > +#include "ifunc-sse4_1.h" > > > + > > > +libc_ifunc_redirected (__redirect_roundevenf, __roundevenf, IFUNC_SELECTOR ()); > > > +libm_alias_float (__roundeven, roundeven) > > > -- > > > 2.31.1 > > > > > > > I would like to backport this patch to release branches. > > Any comments or objections? > > > > --Sunil > > I would like to backport this patch to release branch 2.33. > > Any comments/suggestions or objections on this. Fine by me. > > commit 1683249d17e14827b6579529742eb895027dfa84 > Author: Shen-Ta Hsieh > Date: Mon May 24 09:43:11 2021 +0800 > > x86_64: roundeven with sse4.1 support > > This patch adds support for the sse4.1 hardware floating point > roundeven. > > Here is some benchmark results on my systems: