From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 7844) id ED9363858C54; Tue, 28 Nov 2023 18:51:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ED9363858C54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1701197488; bh=hQkmc1xIYiyQMh7KCUmvN1QIcim7/5pTCpuu07VrOO8=; h=From:To:Subject:Date:From; b=g0LDcg68/G9G1x5A5IGY/yyXG6O30vmSK7sGwLJb1EWxvphf03VWjTRssF/ZgpgrE BiBgx2XwTRVi4KVovsxR3pfB9b35gLBe9AgAN/SY1gBLYWITUx2GMML09c7VsytGXo clewc+9LIXvWhlGqN1kQDXTGcPKdR0hr467WFbRg= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Noah Goldstein To: glibc-cvs@sourceware.org Subject: [glibc] x86: Only align destination to 1x VEC_SIZE in memset 4x loop X-Act-Checkin: glibc X-Git-Author: Noah Goldstein X-Git-Refname: refs/heads/master X-Git-Oldrev: 3921c5b40f293c57cb326f58713c924b0662ef59 X-Git-Newrev: 9469261cf1924d350feeec64d2c80cafbbdcdd4d Message-Id: <20231128185128.ED9363858C54@sourceware.org> Date: Tue, 28 Nov 2023 18:51:28 +0000 (GMT) List-Id: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9469261cf1924d350feeec64d2c80cafbbdcdd4d commit 9469261cf1924d350feeec64d2c80cafbbdcdd4d Author: Noah Goldstein Date: Wed Nov 1 15:30:26 2023 -0500 x86: Only align destination to 1x VEC_SIZE in memset 4x loop Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on performance other than potentially resulting in an additional iteration of the loop. 1x maintains aligned stores (the only reason to align in this case) and doesn't incur any unnecessary loop iterations. Reviewed-by: Sunil K Pandey Diff: --- sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 3d9ad49cb9..0f0636b90f 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -293,7 +293,7 @@ L(more_2x_vec): leaq (VEC_SIZE * 4)(%rax), %LOOP_REG #endif /* Align dst for loop. */ - andq $(VEC_SIZE * -2), %LOOP_REG + andq $(VEC_SIZE * -1), %LOOP_REG .p2align 4 L(loop): VMOVA %VMM(0), LOOP_4X_OFFSET(%LOOP_REG)