From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-96877-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 3745 invoked by alias); 31 Oct 2018 18:54:10 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 3721 invoked by uid 89); 31 Oct 2018 18:54:10 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-22.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RDNS_DYNAMIC,TVD_RCVD_IP autolearn=ham version=3.3.2 spammy=Improvement, Hx-languages-length:1242, n, HContent-Transfer-Encoding:8bit
X-HELO: brightrain.aerifal.cx
Date: Wed, 31 Oct 2018 20:23:00 -0000
From: Rich Felker <dalias@libc.org>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH] x86-64: Optimize strcat/strncat, strcpy/strncpy and
 stpcpy/stpncpy with AVX2
Message-ID: <20181031185405.GW5150@brightrain.aerifal.cx>
References: <20181008135950.9113-1-leonardo.sandoval.gonzalez@linux.intel.com>
 <2e43a120-bd68-7581-4b1e-889d5713b2a6@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <2e43a120-bd68-7581-4b1e-889d5713b2a6@linaro.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: Rich Felker <dalias@aerifal.cx>
X-SW-Source: 2018-10/txt/msg00697.txt.bz2

On Wed, Oct 31, 2018 at 03:36:10PM -0300, Adhemerval Zanella wrote:
> 
> 
> 
> > diff --git a/sysdeps/x86_64/multiarch/strcat-avx2.S b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > new file mode 100644
> > index 00000000000..b0623564276
> > --- /dev/null
> > +++ b/sysdeps/x86_64/multiarch/strcat-avx2.S
> > @@ -0,0 +1,275 @@
> > +/* strcat with AVX2
> 
> Is this really a gain on real work usage comparing to generic strcat (
> (strcpy (dest + strlen (dest), src)) assuming optimized strcpy / strlen?
> Wouldn't be simple and more i-cache friendly to use a custom generic 
> implementation that calls AVX2 strcpy/strlen (such as powerpc64 does)?

I second this, and fail to see the advantage of increasing the volume
of asm without a good reason. In this case specifically:

- Improvement over trivial strcpy(dest+strlen(dest),src), assuming
  those functions are optimized, is at best a constant difference in
  overhead, vs the O(m+n) runtime of the operation.

- Use of strcat at all is a major antipattern, typically leading to
  O(nÂ²) time and buffer overflows. Thus optimizing it at all seems
  dubious (further encouraging its use "because it's fast").

Rich