From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd2a.google.com (mail-io1-xd2a.google.com [IPv6:2607:f8b0:4864:20::d2a]) by sourceware.org (Postfix) with ESMTPS id 003623858416 for ; Sat, 23 Oct 2021 05:26:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 003623858416 Received: by mail-io1-xd2a.google.com with SMTP id z69so8020494iof.9 for ; Fri, 22 Oct 2021 22:26:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=4YYMwPQfYa7qATU3x/S9s4YqeTs4EE7hWnuY4BJXi5U=; b=24ZFt1ok5EmIC+LG9eH5Ou4rEBOxQKM6GmoWUmIfEoCUFVFzhCUiNNUYdwEjmj1vEq Owir/62IfS3G7SVOMzhkCRm863gvXChjL4UCCU9txA3clwaRmXZrlwt1EVIjllEwSfgf 8bmABnOoHgWI0lo8rcy+AeQqzMhDHqfGJ5Ml/uxA8sTZZEcl7Pf6W4EVHNJ/w0FH3jp5 nOrKekakqmnOBlHfTBBmj4PakZpMWxXSd7LJK6xlnPUlGdZ4sgqO5z6zlMvhHPVeKWS6 5zeQ6JfpnrMs4k7EhucJbwkCuBU1IhvIjqj49UtW4gxwkXcK2v4DP5Npxn5rUiymyWKu M+PA== X-Gm-Message-State: AOAM530QB275EEN9Am5xjEpCHxEeff960wbdUBWTlzSh1WJRt3f/CaRB a0V1uijj9gNEX/PS0mcr1uGbrF/DDcM= X-Google-Smtp-Source: ABdhPJwUo+JPrSvonRnS/zgmn2LsyZWqGK2jByM2ZbRi4ac8FabKjfCmBzyzeTRvvmKi83suFvGXrA== X-Received: by 2002:a05:6602:2a44:: with SMTP id k4mr2603872iov.56.1634966816188; Fri, 22 Oct 2021 22:26:56 -0700 (PDT) Received: from localhost.localdomain (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.googlemail.com with ESMTPSA id k16sm5226227ior.50.2021.10.22.22.26.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Oct 2021 22:26:55 -0700 (PDT) From: Noah Goldstein To: libc-alpha@sourceware.org Subject: [PATCH v1] x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S Date: Sat, 23 Oct 2021 01:26:47 -0400 Message-Id: <20211023052647.535991-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Oct 2021 05:26:58 -0000 This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. it could potentially be dangerous to use SSE2 if this function is ever called without using 'vzeroupper' beforehand. While compilers appear to use 'vzeroupper' before function calls if AVX2 has been used, using SSE2 here is more brittle. Since it is not absolutely necessary it should be avoided. It costs 2-extra bytes but the extra bytes should only eat into alignment padding. --- sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S index 2761b54f2e..640f6757fa 100644 --- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S +++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S @@ -561,13 +561,13 @@ L(between_16_31): /* From 16 to 31 bytes. No branch when size == 16. */ /* Use movups to save code size. */ - movups (%rsi), %xmm2 + vmovdqu (%rsi), %xmm2 VPCMP $4, (%rdi), %xmm2, %k1 kmovd %k1, %eax testl %eax, %eax jnz L(return_vec_0_lv) /* Use overlapping loads to avoid branches. */ - movups -16(%rsi, %rdx, CHAR_SIZE), %xmm2 + vmovdqu -16(%rsi, %rdx, CHAR_SIZE), %xmm2 VPCMP $4, -16(%rdi, %rdx, CHAR_SIZE), %xmm2, %k1 addl $(CHAR_PER_VEC - (16 / CHAR_SIZE)), %edx kmovd %k1, %eax -- 2.29.2