From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id 478243858D28 for ; Sat, 23 Oct 2021 13:22:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 478243858D28 Received: by mail-pl1-x635.google.com with SMTP id e10so4669101plh.8 for ; Sat, 23 Oct 2021 06:22:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Vjoq1HIfiMhlZW9gSEWcqBLbcdsKnJ34bdDhijwrUso=; b=0LhhVQcu1wyvfDJT5tRo8PAlbk4Y80vWQZMRRjYiJUKm7W17+WW+F+hu0W8/onCH/g iO8entwVC7peL5+PHLQwGOPkC0ZjKDKBBpYKI/kGOTRnu62M6DP6eFuufxp9oyE+OBJY o6eiJhSEQD7CqlgWf6maht0jGdBSXU5pLqpOLYqzRfVfOOl8RHGUy4yJf3gxWTbktMmr 3H27eTMlDzaVSR4IF1VrKyifi8LueTzoYA3zwFBebDA8xu/foRChp1nr2FEmiu2xhaKU GZctyamFUaFZZ2bM1Wjfab6U1Og70VBrEFtbk+2ALMXEkFYywkB8k77OKcn6hb9WfLEL Hj6A== X-Gm-Message-State: AOAM532XQv3U3c+CWC5anP3uoIxOULakjgvD3JsNPLDkXb1UPWMFs38S eKMqivWViWaM6Ku+dF29UE7MMYFJYJJTAhW+Djk= X-Google-Smtp-Source: ABdhPJwHoMb81TfvyqifP0Dfs1HFCflAxoM9/eWwZJCVx9dzks9dfrbB3zlOqZYTwe3/MiW4+6zKJoW1glrFJyRc/nw= X-Received: by 2002:a17:90b:3b86:: with SMTP id pc6mr4497580pjb.143.1634995365325; Sat, 23 Oct 2021 06:22:45 -0700 (PDT) MIME-Version: 1.0 References: <20211023052647.535991-1-goldstein.w.n@gmail.com> In-Reply-To: <20211023052647.535991-1-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Sat, 23 Oct 2021 06:22:09 -0700 Message-ID: Subject: Re: [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S To: Noah Goldstein Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3029.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Oct 2021 13:22:47 -0000 On Fri, Oct 22, 2021 at 10:26 PM Noah Goldstein wrote: > > This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. > > it could potentially be dangerous to use SSE2 if this function is ever > called without using 'vzeroupper' beforehand. While compilers appear > to use 'vzeroupper' before function calls if AVX2 has been used, using > SSE2 here is more brittle. Since it is not absolutely necessary it > should be avoided. > > It costs 2-extra bytes but the extra bytes should only eat into > alignment padding. > --- > sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > index 2761b54f2e..640f6757fa 100644 > --- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > +++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > @@ -561,13 +561,13 @@ L(between_16_31): > /* From 16 to 31 bytes. No branch when size == 16. */ > > /* Use movups to save code size. */ > - movups (%rsi), %xmm2 > + vmovdqu (%rsi), %xmm2 > VPCMP $4, (%rdi), %xmm2, %k1 > kmovd %k1, %eax > testl %eax, %eax > jnz L(return_vec_0_lv) > /* Use overlapping loads to avoid branches. */ > - movups -16(%rsi, %rdx, CHAR_SIZE), %xmm2 > + vmovdqu -16(%rsi, %rdx, CHAR_SIZE), %xmm2 > VPCMP $4, -16(%rdi, %rdx, CHAR_SIZE), %xmm2, %k1 > addl $(CHAR_PER_VEC - (16 / CHAR_SIZE)), %edx > kmovd %k1, %eax > -- > 2.29.2 > LGTM. Reviewed-by: H.J. Lu Thanks. -- H.J.