From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by sourceware.org (Postfix) with ESMTPS id A34CC3858430; Sat, 23 Apr 2022 01:22:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A34CC3858430 Received: by mail-qv1-xf35.google.com with SMTP id b17so7326516qvf.12; Fri, 22 Apr 2022 18:22:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6mIH+L+lI6spPiyvYQKCKQtWJoIiUjCSpszQ0SYonkc=; b=U0OXSy/m5p/hCKM1KynPMIKU2Ix0184GPZfIKWBEMsDU01NJRrLI5t5XlQR4E4BuoB +72eYXmcebCVowtZ97d7kE4Xje7fIobFbEmo6WUw1zxgl19/g5ifQN+VEUvTGPtr6KNp i5b5xlVQBxmf3AODeCEUGZpjN1HiLk2yq+3qCcqvL2BDk4aqUbZqJaIAmktLhOYEwqJ+ 4wQ0+JV2EsvYZxDSQO9TyojwfuQa9kAeEcK6HuHl4BeFFtbo39gODhC7FdVDlrCebeOL 9bXrhueZEnEczkLFqKqZXioiVKndXnItISxdnsvg6e+i798NmCBY0dVWo3fNGwq0Xs6H e/8w== X-Gm-Message-State: AOAM532QPkK+cQ5/mtqykLNzjWlgseiNE+MIrSWL44WnWSTdsKREKyat EMt720K2w6y0xoJBOvkxu4EsC5d1STJD0pbU3n1nZCXCLCY= X-Google-Smtp-Source: ABdhPJy4Y2UyAGNRz8fak+Oyqeh+R1Z9ut5RgG8NZ5a3DoW3KuuKQ5pdet/fuvaPh9yzTTTA4Ga1V7yMM31bQsmbW9s= X-Received: by 2002:a0c:f84b:0:b0:444:46d3:4dbf with SMTP id g11-20020a0cf84b000000b0044446d34dbfmr5895607qvo.106.1650676974131; Fri, 22 Apr 2022 18:22:54 -0700 (PDT) MIME-Version: 1.0 References: <20211023052647.535991-1-goldstein.w.n@gmail.com> In-Reply-To: From: Sunil Pandey Date: Fri, 22 Apr 2022 18:22:18 -0700 Message-ID: Subject: Re: [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S To: "H.J. Lu" , libc-stable@sourceware.org Cc: Noah Goldstein , GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-stable@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-stable mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Apr 2022 01:22:56 -0000 On Sat, Oct 23, 2021 at 6:22 AM H.J. Lu via Libc-alpha wrote: > > On Fri, Oct 22, 2021 at 10:26 PM Noah Goldstein wrote: > > > > This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. > > > > it could potentially be dangerous to use SSE2 if this function is ever > > called without using 'vzeroupper' beforehand. While compilers appear > > to use 'vzeroupper' before function calls if AVX2 has been used, using > > SSE2 here is more brittle. Since it is not absolutely necessary it > > should be avoided. > > > > It costs 2-extra bytes but the extra bytes should only eat into > > alignment padding. > > --- > > sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > > index 2761b54f2e..640f6757fa 100644 > > --- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > > +++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > > @@ -561,13 +561,13 @@ L(between_16_31): > > /* From 16 to 31 bytes. No branch when size == 16. */ > > > > /* Use movups to save code size. */ > > - movups (%rsi), %xmm2 > > + vmovdqu (%rsi), %xmm2 > > VPCMP $4, (%rdi), %xmm2, %k1 > > kmovd %k1, %eax > > testl %eax, %eax > > jnz L(return_vec_0_lv) > > /* Use overlapping loads to avoid branches. */ > > - movups -16(%rsi, %rdx, CHAR_SIZE), %xmm2 > > + vmovdqu -16(%rsi, %rdx, CHAR_SIZE), %xmm2 > > VPCMP $4, -16(%rdi, %rdx, CHAR_SIZE), %xmm2, %k1 > > addl $(CHAR_PER_VEC - (16 / CHAR_SIZE)), %edx > > kmovd %k1, %eax > > -- > > 2.29.2 > > > > LGTM. > > Reviewed-by: H.J. Lu > > Thanks. > > -- > H.J. I would like to backport this patch to release branches. Any comments or objections? --Sunil