From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id 5EAEB385841B for ; Sat, 24 Feb 2024 22:23:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5EAEB385841B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5EAEB385841B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::42b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708813435; cv=none; b=TTR2KesrGd3RgrAEJOrXGVfpGn6GHv649obx22kUy6wHPbL8wxAjMx5bMXfMVb3cVWpVVEXxCJt/2CGxQ5YAgr5cLKDEaudumBI44ziN9zoRrhGYKwJAWp8UbrylKe/0eeQrBHDErbiM8TP1ujQa7nMKTu7pF9XzSSUKmxeDP2w= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708813435; c=relaxed/simple; bh=cf6sAXP13EsJeEvvidppJReaM6HPM2JBSjlLFnvEmt4=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=vDRI77A1FgPsPZ3CVoGjRnoK4wo6WwI/XP8c1Lyi5uVPGZ+L5g96kGbNljwa+aLHcwbNAS3vnxOrNwYC5Gg2KuHvMfYnmnfIRYRc32xMq1jJ2qeZ992CCqyp2gbRjDgFDX+WrnvW+DLY4kV1UIi95+/UAqRjsQPk6mzpXoZE7Yc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x42b.google.com with SMTP id ffacd0b85a97d-33d146737e6so1303849f8f.0 for ; Sat, 24 Feb 2024 14:23:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708813431; x=1709418231; darn=sourceware.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=v8fLLgzN1zphtF4zdlHSR4sHGgG186Y6AazITXioEYw=; b=c29eZ2Te9UGaPau+S/toWa6qf9hwvmoMaCaJSM9W13hNtsfR+PiZRZSEGTum0t5/kv 5srE0vGqZ+Gbavjr1YW8DpKNQqjvC8azDQ2qZpEfZJxDR9auIqTSKGEahJI17CcoOtAd SL10gT9ImWTPCRXbnuZ4Ct8rdBO67XES4tXG4NWStFio0S4ZfUGXqHCcXCWIraYYm8id zouJbu7ytIWfwqewo/5xahEi3QVgoI0Xf9SGCsb3tcRIRAFiZZv23cgGR/bMzfWjaBHp giCqOs1J3YgRq+5e+SwfCoPlT7+fvmTD0Yqv0cC6eT80r58EShlCxoWfKsmCJN0XbWXg B7Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708813431; x=1709418231; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=v8fLLgzN1zphtF4zdlHSR4sHGgG186Y6AazITXioEYw=; b=OP110DBV5DzVAcZOGxNyhpkdtb/VhbzaqI/e9pR35Ixv91fLji9Z1RkJekp/7lfqQd bSPbcFSguTD3uYOlZ+CRhd2ncQGQkIBenygKgFWjDk8mu5aDQ6gb97TiFlj1McBeFP+A IeYyVew3gKZs9zEhHyU+PbzQXAA09XDSojpXFMcxWwkBhl1tFKGMks+ithmHon1+k6S/ ZTXV2EaWth2AZgnWATLC1iijwpaYp5Vuadz/Sl0e0QGaYK4XMvEE1NHW2xGLrKxJ3EVD bcKT67NaHvThqxf0BamPfImVDux1ouLgQEQjlV7mj2WUFaJeBwrZoZacW+rqSc8EquWa jCvw== X-Gm-Message-State: AOJu0YzjqqlaxbU2lY1Vrt3yDNNYeGn9O19HeQsk2GCwcjq6AfZsZTP6 J/+33KkCCKY18rzwqD81uaQyAdL39tk8Jxlt9gSIUUfslYZEknBOYTionwgp+A3DenMwVFXTDnX jTAHfJBgR5eUrZGUU2fYw31nj8qNZV2YvFkk= X-Google-Smtp-Source: AGHT+IGwRZQJFfMLNzJ8P3UTWFv86R+tl7hkTWX+eNCsnLwDA7YJ/eDLMWiAComDIDkpcmtfkJFkRveTJMWpbaazEMw= X-Received: by 2002:adf:ebc3:0:b0:33d:ef:ceac with SMTP id v3-20020adfebc3000000b0033d00efceacmr2164433wrn.61.1708813430651; Sat, 24 Feb 2024 14:23:50 -0800 (PST) MIME-Version: 1.0 References: <20240224023553.3804579-1-skpgkp2@gmail.com> In-Reply-To: From: Sunil Pandey Date: Sat, 24 Feb 2024 14:23:14 -0800 Message-ID: Subject: Re: [PATCH v2] x86_64: Exclude SSE, AVX and FMA4 variants in libm multiarch To: "H.J. Lu" Cc: libc-alpha@sourceware.org Content-Type: multipart/alternative; boundary="0000000000008583b906122821b8" X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000008583b906122821b8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Feb 24, 2024 at 8:27=E2=80=AFAM H.J. Lu wrote: > On Sat, Feb 24, 2024 at 8:23=E2=80=AFAM H.J. Lu wro= te: > > > > On Fri, Feb 23, 2024 at 6:36=E2=80=AFPM Sunil K Pandey > wrote: > > > > > > When glibc is built with ISA level 3 or higher by default, the > resulting > > > glibc binaries won't run on SSE or FMA4 processors. Exclude SSE, AVX > and > > > FMA4 variants in libm multiarch when ISA level 3 or higher is enabled > by > > > default. > > > > > > When glibc is built with ISA level 2 enabled by default, only keep > SSE4.1 > > > variant. > > > > > > Fixes BZ 31335. > > > > > > NB: elf/tst-valgrind-smoke test fails with ISA level 4, because > valgrind > > > doesn't support AVX512 instructions: > > > > > > https://bugs.kde.org/show_bug.cgi?id=3D383010 > > > > > > Changes from v1: > > > > > > Replace AVX2 and FMA feature check with ISA level. > > > Replace SSE4_1 feature check with ISA level. > > > --- > > > sysdeps/x86/configure | 31 ++++ > > > sysdeps/x86/configure.ac | 23 +++ > > > sysdeps/x86_64/fpu/multiarch/Makefile | 148 +++++++++-------= -- > > > sysdeps/x86_64/fpu/multiarch/e_asin.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/e_atan2.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/e_exp.c | 13 +- > > > sysdeps/x86_64/fpu/multiarch/e_exp2f.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/e_expf.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/e_log.c | 13 +- > > > sysdeps/x86_64/fpu/multiarch/e_log2.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/e_log2f.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/e_logf.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/e_pow.c | 13 +- > > > sysdeps/x86_64/fpu/multiarch/e_powf.c | 27 ++-- > > > sysdeps/x86_64/fpu/multiarch/s_atan.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/s_ceil-avx.S | 28 ++++ > > > sysdeps/x86_64/fpu/multiarch/s_ceil-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_ceil.c | 21 +-- > > > sysdeps/x86_64/fpu/multiarch/s_ceilf-avx.S | 28 ++++ > > > sysdeps/x86_64/fpu/multiarch/s_ceilf-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_ceilf.c | 21 +-- > > > sysdeps/x86_64/fpu/multiarch/s_cosf.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/s_expm1.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/s_floor-avx.S | 28 ++++ > > > sysdeps/x86_64/fpu/multiarch/s_floor-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_floor.c | 21 +-- > > > sysdeps/x86_64/fpu/multiarch/s_floorf-avx.S | 28 ++++ > > > .../x86_64/fpu/multiarch/s_floorf-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_floorf.c | 21 +-- > > > sysdeps/x86_64/fpu/multiarch/s_log1p.c | 11 +- > > > .../x86_64/fpu/multiarch/s_nearbyint-avx.S | 28 ++++ > > > .../x86_64/fpu/multiarch/s_nearbyint-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_nearbyint.c | 19 ++- > > > .../x86_64/fpu/multiarch/s_nearbyintf-avx.S | 28 ++++ > > > .../fpu/multiarch/s_nearbyintf-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/s_rint-avx.S | 28 ++++ > > > sysdeps/x86_64/fpu/multiarch/s_rint-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_rint.c | 21 +-- > > > sysdeps/x86_64/fpu/multiarch/s_rintf-avx.S | 28 ++++ > > > sysdeps/x86_64/fpu/multiarch/s_rintf-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_rintf.c | 21 +-- > > > .../x86_64/fpu/multiarch/s_roundeven-avx.S | 28 ++++ > > > .../x86_64/fpu/multiarch/s_roundeven-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_roundeven.c | 19 ++- > > > .../x86_64/fpu/multiarch/s_roundevenf-avx.S | 28 ++++ > > > .../fpu/multiarch/s_roundevenf-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_roundevenf.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/s_sin.c | 19 ++- > > > sysdeps/x86_64/fpu/multiarch/s_sincos.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/s_sincosf.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/s_sinf.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/s_tan.c | 11 +- > > > sysdeps/x86_64/fpu/multiarch/s_trunc-avx.S | 28 ++++ > > > sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_trunc.c | 21 +-- > > > sysdeps/x86_64/fpu/multiarch/s_truncf-avx.S | 28 ++++ > > > .../x86_64/fpu/multiarch/s_truncf-sse4_1.S | 12 ++ > > > sysdeps/x86_64/fpu/multiarch/s_truncf.c | 21 +-- > > > sysdeps/x86_64/fpu/multiarch/w_exp.c | 7 +- > > > sysdeps/x86_64/fpu/multiarch/w_log.c | 7 +- > > > sysdeps/x86_64/fpu/multiarch/w_pow.c | 7 +- > > > 62 files changed, 950 insertions(+), 295 deletions(-) > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_ceil-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_ceilf-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_floor-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_floorf-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_nearbyint-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_nearbyintf-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_rint-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_rintf-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_trunc-avx.S > > > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_truncf-avx.S > > > > > > > > diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile > b/sysdeps/x86_64/fpu/multiarch/Makefile > > > index e1a490dd98..91ac85012b 100644 > > > --- a/sysdeps/x86_64/fpu/multiarch/Makefile > > > +++ b/sysdeps/x86_64/fpu/multiarch/Makefile > > > @@ -1,49 +1,4 @@ > > > ifeq ($(subdir),math) > > > -libm-sysdep_routines +=3D \ > > > - s_ceil-c \ > > > - s_ceilf-c \ > > > - s_floor-c \ > > > - s_floorf-c \ > > > - s_nearbyint-c \ > > > - s_nearbyintf-c \ > > > - s_rint-c \ > > > - s_rintf-c \ > > > - s_roundeven-c \ > > > - s_roundevenf-c \ > > > - s_trunc-c \ > > > - s_truncf-c \ > > > -# libm-sysdep_routines > > > - > > > -libm-sysdep_routines +=3D \ > > > - s_ceil-sse4_1 \ > > > - s_ceilf-sse4_1 \ > > > - s_floor-sse4_1 \ > > > - s_floorf-sse4_1 \ > > > - s_nearbyint-sse4_1 \ > > > - s_nearbyintf-sse4_1 \ > > > - s_rint-sse4_1 \ > > > - s_rintf-sse4_1 \ > > > - s_roundeven-sse4_1 \ > > > - s_roundevenf-sse4_1 \ > > > - s_trunc-sse4_1 \ > > > - s_truncf-sse4_1 \ > > > -# libm-sysdep_routines > > > - > > > -libm-sysdep_routines +=3D \ > > > - e_asin-fma \ > > > - e_atan2-fma \ > > > - e_exp-fma \ > > > - e_log-fma \ > > > - e_log2-fma \ > > > - e_pow-fma \ > > > - s_atan-fma \ > > > - s_expm1-fma \ > > > - s_log1p-fma \ > > > - s_sin-fma \ > > > - s_sincos-fma \ > > > - s_tan-fma \ > > > -# libm-sysdep_routines > > > - > > > CFLAGS-e_asin-fma.c =3D -mfma -mavx2 > > > CFLAGS-e_atan2-fma.c =3D -mfma -mavx2 > > > CFLAGS-e_exp-fma.c =3D -mfma -mavx2 > > > @@ -57,23 +12,6 @@ CFLAGS-s_sin-fma.c =3D -mfma -mavx2 > > > CFLAGS-s_tan-fma.c =3D -mfma -mavx2 > > > CFLAGS-s_sincos-fma.c =3D -mfma -mavx2 > > > > > > -libm-sysdep_routines +=3D \ > > > - s_cosf-sse2 \ > > > - s_sincosf-sse2 \ > > > - s_sinf-sse2 \ > > > -# libm-sysdep_routines > > > - > > > -libm-sysdep_routines +=3D \ > > > - e_exp2f-fma \ > > > - e_expf-fma \ > > > - e_log2f-fma \ > > > - e_logf-fma \ > > > - e_powf-fma \ > > > - s_cosf-fma \ > > > - s_sincosf-fma \ > > > - s_sinf-fma \ > > > -# libm-sysdep_routines > > > - > > > CFLAGS-e_exp2f-fma.c =3D -mfma -mavx2 > > > CFLAGS-e_expf-fma.c =3D -mfma -mavx2 > > > CFLAGS-e_log2f-fma.c =3D -mfma -mavx2 > > > @@ -83,17 +21,93 @@ CFLAGS-s_sinf-fma.c =3D -mfma -mavx2 > > > CFLAGS-s_cosf-fma.c =3D -mfma -mavx2 > > > CFLAGS-s_sincosf-fma.c =3D -mfma -mavx2 > > > > > > +# Check if ISA level is 3 or 4 > > > +ifneq (,$(filter $(have-x86-isa-level),3 4)) > > > libm-sysdep_routines +=3D \ > > > > If we add ISA level 5 and compile glibc with ISA level 5, this won't > work. > > It is specially bad for release branches. Glibc release branches should > > compile properly without any changes when glibc is built with > > -march=3Dx86-64-v5. > > > > Glibc release branches are OK. But when level 5 support is added, we > have to change many places in Makefiles where have-x86-isa-level is used. > I think we should avoid it. > > have-x86-isa-level used in only one makefile in this patch. sysdeps/x86_64/fpu/multiarch/Makefile:ifneq (,$(filter $(have-x86-isa-level),3 4)) sysdeps/x86_64/fpu/multiarch/Makefile:ifeq ($(have-x86-isa-level),baseline) --=20 > H.J. > --0000000000008583b906122821b8--