From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by sourceware.org (Postfix) with ESMTPS id CA7E7385115D for ; Tue, 28 Jun 2022 18:38:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CA7E7385115D Received: by mail-yb1-xb2f.google.com with SMTP id v185so14189266ybe.8 for ; Tue, 28 Jun 2022 11:38:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rGrDQ0x/SmJHnN4qD/X4vuSS58GKQBBPq28A/aQM3Zc=; b=rXr54jdu3d1pi47E+n9iMFXgRyA/upIwJjIFcSVjVhbMp+NsjMs7QH148tE/Rz6CYa JdM8CbRVdxPoRhU8PjN7a3hJBHEKPrd4T/BpAPMRx88LFS7KatXC0+SMELhznJ4W0QGH Vzc6netVhHMwVRwIVZPurQD1rGlnVMXF321JMHpnzUxknJkTl1ldcPyoAiFiPHZE0cUJ 1ejPqNRWb0EqR3skBjSQa8FTNA0nvDgshT01j2DI+se1pB+fFfCyDMJlq9RQ5QPziFQc 3g4wcuTVHdZBwnt+/iO1vk1GIGln2VrSYG3rL8gOAHQ7SVs6nzg+xotHO4a4jNRdCVQg m9JQ== X-Gm-Message-State: AJIora9wzP+t/FdtncZYxZJdaEPhjaY6+U2s7oO10y2MTNypNQVstZcg VDm0j3o+VUWkGqdcbULKzrUKSmvuFn7DK/FlJPhZJ/Uh1Os= X-Google-Smtp-Source: AGRyM1tKt4dRv/4oA5RF2A4U4/ioNj2Q6iHH/R1qIM31fxsZOz5Uk9UPDEO4WXt9d/rjdxqZ1gwThyz8qyCMppal6Lo= X-Received: by 2002:a25:b9c3:0:b0:668:a418:13c with SMTP id y3-20020a25b9c3000000b00668a418013cmr21884450ybj.498.1656441517203; Tue, 28 Jun 2022 11:38:37 -0700 (PDT) MIME-Version: 1.0 References: <20220628152628.17802-1-goldstein.w.n@gmail.com> <20220628152628.17802-2-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Tue, 28 Jun 2022 11:38:26 -0700 Message-ID: Subject: Re: [PATCH v1] x86: Add support for building strstr with explicit ISA level To: "H.J. Lu" Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jun 2022 18:38:39 -0000 On Tue, Jun 28, 2022 at 11:35 AM H.J. Lu wrote: > > On Tue, Jun 28, 2022 at 11:24 AM Noah Goldstein wrote: > > > > On Tue, Jun 28, 2022 at 11:21 AM H.J. Lu wrote: > > > > > > On Tue, Jun 28, 2022 at 8:26 AM Noah Goldstein wrote: > > > > > > > > Small changes for this function as the generic implementation remains > > > > the same for all ISA levels. > > > > > > > > Only changes are using the X86_ISA_CPU_FEATURE{S}_{USABLE|ARCH}_P > > > > macros so that some of the checks at least can constant evaluate > > > > and some comments explaining the ISA constraints on the function. > > > > --- > > > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 13 +++++++------ > > > > sysdeps/x86_64/multiarch/strstr.c | 10 +++++----- > > > > 2 files changed, 12 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > > > index 0d28319905..a1bff560bc 100644 > > > > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > > > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > > > @@ -620,12 +620,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > > > > > > > /* Support sysdeps/x86_64/multiarch/strstr.c. */ > > > > IFUNC_IMPL (i, name, strstr, > > > > - IFUNC_IMPL_ADD (array, i, strstr, > > > > - (CPU_FEATURE_USABLE (AVX512VL) > > > > - && CPU_FEATURE_USABLE (AVX512BW) > > > > - && CPU_FEATURE_USABLE (AVX512DQ) > > > > - && CPU_FEATURE_USABLE (BMI2)), > > > > - __strstr_avx512) > > > > + /* All implementations of strstr are built at all ISA levels. */ > > > > + IFUNC_IMPL_ADD (array, i, strstr, > > > > + (CPU_FEATURE_USABLE (AVX512VL) > > > > + && CPU_FEATURE_USABLE (AVX512BW) > > > > + && CPU_FEATURE_USABLE (AVX512DQ) > > > > + && CPU_FEATURE_USABLE (BMI2)), > > > > + __strstr_avx512) > > > > IFUNC_IMPL_ADD (array, i, strstr, 1, __strstr_sse2_unaligned) > > > > IFUNC_IMPL_ADD (array, i, strstr, 1, __strstr_generic)) > > > > > > > > diff --git a/sysdeps/x86_64/multiarch/strstr.c b/sysdeps/x86_64/multiarch/strstr.c > > > > index 2b83199245..3f86bfa5f2 100644 > > > > --- a/sysdeps/x86_64/multiarch/strstr.c > > > > +++ b/sysdeps/x86_64/multiarch/strstr.c > > > > @@ -49,13 +49,13 @@ IFUNC_SELECTOR (void) > > > > const struct cpu_features *cpu_features = __get_cpu_features (); > > > > > > > > if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_AVX512) > > > > - && CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > > > > - && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > > > > - && CPU_FEATURE_USABLE_P (cpu_features, AVX512DQ) > > > > - && CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > > > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > > > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > > > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512DQ) > > > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > > > > return __strstr_avx512; > > > > > > > > - if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Load)) > > > > + if (X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Load, )) > > > > > > Is Fast_Unaligned_Load set on all processors before? If not, we should > > > revert > > > > It's set at ISA level >= 2. AFAICT the reason the bit exists is so that that > > CPUs with slow sse42 can fallback on an unaligned sse2 implementation > > if it's available as opposed to the generic / often quite expensive aligned > > sse2 impl. > > Is Fast_Unaligned_Load set on Zhaoxin processors? No, but I believe we agreed to treat that specially only for the functions where it really mattered. I.e for memcpy where this is a meaningful distinction we won't use the X86_ISA_CPU_FEATURE... checks, we will force a runtime check. > > > Example in strcmp > > > > > > > > /* Feature(s) enabled when ISA level >= 2. */ > > > #define Fast_Unaligned_Load_X86_ISA_LEVEL 2 > > > > > > > return __strstr_sse2_unaligned; > > > > > > > > return __strstr_generic; > > > > -- > > > > 2.34.1 > > > > > > > > > > > > > -- > > > H.J. > > > > -- > H.J.