From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3745 invoked by alias); 31 Oct 2018 18:54:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 3721 invoked by uid 89); 31 Oct 2018 18:54:10 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RDNS_DYNAMIC,TVD_RCVD_IP autolearn=ham version=3.3.2 spammy=Improvement, Hx-languages-length:1242, n, HContent-Transfer-Encoding:8bit X-HELO: brightrain.aerifal.cx Date: Wed, 31 Oct 2018 20:23:00 -0000 From: Rich Felker To: Adhemerval Zanella Cc: libc-alpha@sourceware.org Subject: Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2 Message-ID: <20181031185405.GW5150@brightrain.aerifal.cx> References: <20181008135950.9113-1-leonardo.sandoval.gonzalez@linux.intel.com> <2e43a120-bd68-7581-4b1e-889d5713b2a6@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2e43a120-bd68-7581-4b1e-889d5713b2a6@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: Rich Felker X-SW-Source: 2018-10/txt/msg00697.txt.bz2 On Wed, Oct 31, 2018 at 03:36:10PM -0300, Adhemerval Zanella wrote: > > > > > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S b/sysdeps/x86_64/multiarch/strcat-avx2.S > > new file mode 100644 > > index 00000000000..b0623564276 > > --- /dev/null > > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S > > @@ -0,0 +1,275 @@ > > +/* strcat with AVX2 > > Is this really a gain on real work usage comparing to generic strcat ( > (strcpy (dest + strlen (dest), src)) assuming optimized strcpy / strlen? > Wouldn't be simple and more i-cache friendly to use a custom generic > implementation that calls AVX2 strcpy/strlen (such as powerpc64 does)? I second this, and fail to see the advantage of increasing the volume of asm without a good reason. In this case specifically: - Improvement over trivial strcpy(dest+strlen(dest),src), assuming those functions are optimized, is at best a constant difference in overhead, vs the O(m+n) runtime of the operation. - Use of strcat at all is a major antipattern, typically leading to O(n²) time and buffer overflows. Thus optimizing it at all seems dubious (further encouraging its use "because it's fast"). Rich