From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by sourceware.org (Postfix) with ESMTPS id 2BAC53858C83 for ; Mon, 7 Feb 2022 19:40:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2BAC53858C83 Received: by mail-pj1-x1033.google.com with SMTP id h14-20020a17090a130e00b001b88991a305so72512pja.3 for ; Mon, 07 Feb 2022 11:40:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+onItHHDpKTKIR9nPHnEQ9Ngeh9G8siMvGgJiwHVS38=; b=Zev9ADHCWSzjFTXTfl37k43mqzW3WbAzWLDVKuIst3PVfyvPUf4hB2ewr3BtLPpNQC HAaT1De6lYXvZ5k0Wvd4tNuksbG6u/K0BZGqgHDCPGEIHKT+0r2wRezWauRFfqohk3Fz 61LR03RY9o8ayvpLMgaLcIDctCEF6jhZBJEpESI6aJWDAPX/DQBkvovLBadYhUrGmICu UyDXobUhqT9Dmamw+CeMLwTLteUwtQd1tkYnlA0K7sfAc/hykLChlAdRP8XbzJYeYiHh Mz66DS2kIw+ktvH1DqlxxSMfUQj0taPk7044IHFBY6wqXDP0XKJP8X9wvpkRcT+PmnnM oRtA== X-Gm-Message-State: AOAM532UlF+yZM7FuBwYZIbm8mqYaFAgNiOjNF42kOhAHBUzbjHmGkW5 Odrgw7a1gCIg6cRQlYPI4F8XveCUAXBGlxcXVR0ODWqv X-Google-Smtp-Source: ABdhPJyZWJBqf7mgcmdLpuBtINZgb8BQStugWgWistVWUbJKNMwPzUs1yCUfbCcsMJq95s+plUt1XEfD4I0R/C+0hXQ= X-Received: by 2002:a17:902:ec81:: with SMTP id x1mr941655plg.109.1644262857345; Mon, 07 Feb 2022 11:40:57 -0800 (PST) MIME-Version: 1.0 References: <20220207063854.3324172-1-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Mon, 7 Feb 2022 13:40:46 -0600 Message-ID: Subject: Re: [PATCH v1] x86: Remove SSE3 instruction for broadcast in memset.S (SSE2 Only) To: "H.J. Lu" Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Feb 2022 19:40:59 -0000 On Mon, Feb 7, 2022 at 8:33 AM H.J. Lu wrote: > > On Mon, Feb 7, 2022 at 4:54 AM H.J. Lu wrote: > > > > On Sun, Feb 6, 2022 at 10:39 PM Noah Goldstein wrote: > > > > > > commit b62ace2740a106222e124cc86956448fa07abf4d > > > Author: Noah Goldstein > > > Date: Sun Feb 6 00:54:18 2022 -0600 > > > > > > x86: Improve vec generation in memset-vec-unaligned-erms.S > > > > > > Revert usage of 'pshufb' in broadcast logic as it is an SSE3 > > pshufb is an SSSE3, not SSE3, instruction. Fixed. The commit message is different but V2 is up. > > > > instruction and memset.S is restricted to only SSE2 instructions. > > > --- > > > sysdeps/x86_64/memset.S | 19 ++++++++++--------- > > > 1 file changed, 10 insertions(+), 9 deletions(-) > > > > > > diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S > > > index ccf036be53..148553cf3d 100644 > > > --- a/sysdeps/x86_64/memset.S > > > +++ b/sysdeps/x86_64/memset.S > > > @@ -28,22 +28,23 @@ > > > #define VMOVU movups > > > #define VMOVA movaps > > > > > > -# define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > > +#define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > > movd d, %xmm0; \ > > > - pxor %xmm1, %xmm1; \ > > > - pshufb %xmm1, %xmm0; \ > > > - movq r, %rax > > > + movq r, %rax; \ > > > + punpcklbw %xmm0, %xmm0; \ > > > + punpcklwd %xmm0, %xmm0; \ > > > + pshufd $0, %xmm0, %xmm0 > > > > > > -# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > > +#define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > > movd d, %xmm0; \ > > > pshufd $0, %xmm0, %xmm0; \ > > > movq r, %rax > > > > > > -# define MEMSET_VDUP_TO_VEC0_HIGH() > > > -# define MEMSET_VDUP_TO_VEC0_LOW() > > > +#define MEMSET_VDUP_TO_VEC0_HIGH() > > > +#define MEMSET_VDUP_TO_VEC0_LOW() > > > > What are these changes for? > > > > > -# define WMEMSET_VDUP_TO_VEC0_HIGH() > > > -# define WMEMSET_VDUP_TO_VEC0_LOW() > > > +#define WMEMSET_VDUP_TO_VEC0_HIGH() > > > +#define WMEMSET_VDUP_TO_VEC0_LOW() > > > > What are these changes for? Undid them in V2. Realized I had misindented them in my last commit. > > > > > #define SECTION(p) p > > > > > > -- > > > 2.25.1 > > > > > > > > > -- > > H.J. > > > > -- > H.J.