From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id C927C3851AB4 for ; Wed, 15 Jun 2022 01:09:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C927C3851AB4 Received: by mail-pf1-x435.google.com with SMTP id x4so10031083pfj.10 for ; Tue, 14 Jun 2022 18:09:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j1zK3j9or9Vc/Y6jngRi/vgdNIaUXuBXU2pUN7CnQ1Y=; b=HXoPVML0+flXco/9mPDh3KBvoFLV5U5aGPMUI8iS6bAPy3vasjk5ATcBlJC5d6yNo9 Ys4DbkfWOiaARyxNKbqqhjnghq6l67TChLBujEG6YdeUsRc26iNNq+sy/70ci+WQ7Aj+ hJj+D5IyrVnEZ2maNkzG5kPaIL0XdvnlvJr2IJnGbkJ08RkCXksf+8HQPhMFhZq6b7pH AAGfoYev//MpPKE4XgSTmD+2wQhuIB5o4nqaRQLLU4qk36qQAS57PY1wh6LAy/4V0xQk abv04XT9jSDeZHe7U79AjDpixdx6zZ3xS0UiVhfnABtzdYaw6LTnVQfb1FBrkR+A7bQK Bx9w== X-Gm-Message-State: AOAM5306y2DDB8m6VOJ4nK9l1DqmQCiEyfVA1d4qHfK5kF8imD5gCQtz GrEDmSKlRywlRwtwCkF7tmqaKUPTv1B+zT8X6tA= X-Google-Smtp-Source: ABdhPJyDZiOqqJ1qswOlcMaXEHhXPyHoweqbthCn4Dc0crUng1mt/kD8Q1bOQZ952Xn6zHhcQvJ6Rr3BPuJs0iVDWf8= X-Received: by 2002:a63:5522:0:b0:405:1ff7:33dd with SMTP id j34-20020a635522000000b004051ff733ddmr6860695pgb.86.1655255340887; Tue, 14 Jun 2022 18:09:00 -0700 (PDT) MIME-Version: 1.0 References: <20220615002533.1741934-1-goldstein.w.n@gmail.com> <20220615002533.1741934-3-goldstein.w.n@gmail.com> In-Reply-To: <20220615002533.1741934-3-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Tue, 14 Jun 2022 18:08:25 -0700 Message-ID: Subject: Re: [PATCH v1 3/3] x86: Add sse42 implementation to strcmp's ifunc To: Noah Goldstein Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3025.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2022 01:09:03 -0000 On Tue, Jun 14, 2022 at 5:25 PM Noah Goldstein wrote: > > This has been missing since the the ifuncs where added. > > The performance of SSE4.2 is preferable to to SSE2. > > Measured on Tigerlake with N = 20 runs. > Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906 > --- > sysdeps/x86_64/multiarch/strcmp.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c > index a248c2a6e6..9c1677724c 100644 > --- a/sysdeps/x86_64/multiarch/strcmp.c > +++ b/sysdeps/x86_64/multiarch/strcmp.c > @@ -28,6 +28,7 @@ > > extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; > extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse42) attribute_hidden; > extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; > extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; > extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; > @@ -52,6 +53,10 @@ IFUNC_SELECTOR (void) > return OPTIMIZE (avx2); > } > > + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2) > + && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2)) > + return OPTIMIZE (sse42); > + > if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Load)) > return OPTIMIZE (sse2_unaligned); > > -- > 2.34.1 > LGTM. Thanks. -- H.J.