From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-92887-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 89216 invoked by alias); 4 Jun 2018 14:14:36 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 89020 invoked by uid 89); 4 Jun 2018 14:14:28 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT autolearn=no version=3.3.2 spammy=heavy, Haswell, dam, documents
X-HELO: mga04.intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
X-ExtLoop1: 1
Message-ID: <3a3ebd816fd263cc9eb76f904594f4f0105e5c9a.camel@linux.intel.com>
Subject: Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2
From: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com>
To: Alexander Monakov <amonakov@ispras.ru>
Cc: "H.J. Lu" <hjl.tools@gmail.com>, GNU C Library
 <libc-alpha@sourceware.org>
Date: Mon, 04 Jun 2018 14:14:00 -0000
In-Reply-To: <alpine.LNX.2.20.13.1806021022140.1892@monopod.intra.ispras.ru>
References: 
	<20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com>
	  <CAMe9rOpKpR6pOLkxyMuTPBA1zSx4MmYYsTOwHz5pTxjdR57p1A@mail.gmail.com>
	  <alpine.LNX.2.20.13.1806011824140.1892@monopod.intra.ispras.ru>
	 <03bdf89c47880fd0734fc5b82213fc3c98eab372.camel@linux.intel.com>
	 <alpine.LNX.2.20.13.1806021022140.1892@monopod.intra.ispras.ru>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-06/txt/msg00041.txt.bz2

On Sat, 2018-06-02 at 10:44 +0300, Alexander Monakov wrote:
> On Fri, 1 Jun 2018, Leonardo Sandoval wrote:
> > this is partially true for AVX2 FMA and AVX512. What I am proposing
> > contains none of the latter instructions, just AVX2 without FMA
> > instructions.
> 
> This would address my concern (if true for all CPUs), but ...
> 
> > In the other hand, some microbenchmarks were done to see the
> > benefit of
> > this effort, which is resumed on the commit description but the
> > complete picture is here 
> 
> this does not. The whole point was that frequency behavior means the
> slowdown on programs making *occasional* calls to strcmp will not be
> captured by microbenchmarks. What good is saving dozens of cycles on
> strcmp calls if the remaining program is slowed down by 5%?
> 

right, perhaps microbenchmarks does not tell us much on this case
because AVX and non-AVX is not mixed. Also, if you look at the patch,
upper ymm bits are cleared (vzeroupper) before returning from strcmp,
thus there is no perf penalty in storing these and then restoring when
other AVX code is called again.

As I said before, using strcmp wont hurt performance at all (internal
HW perf team confirmed what I said) because we are not using any opcode
that that may drop frequency.

if you have a test scenario to prove the 5% drop, I would like to  
test it and discuss it further.

> I was missing that AVX frequency limits kick in only if "heavy"
> operations
> are used -- on recent generations. I'm not sure that's true for
> older, e.g.
> Haswell, generations. Intel's white paper explaining Haswell AVX
> clocks
> makes no distinction of "light" vs. "heavy" operations:
> 
> https://www.intel.com/content/dam/www/public/us/en/documents/white-pa
> pers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf
> 
> Can you please clarify further?
> 
> Alexander