From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 89216 invoked by alias); 4 Jun 2018 14:14:36 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 89020 invoked by uid 89); 4 Jun 2018 14:14:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT autolearn=no version=3.3.2 spammy=heavy, Haswell, dam, documents X-HELO: mga04.intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 Message-ID: <3a3ebd816fd263cc9eb76f904594f4f0105e5c9a.camel@linux.intel.com> Subject: Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2 From: Leonardo Sandoval To: Alexander Monakov Cc: "H.J. Lu" , GNU C Library Date: Mon, 04 Jun 2018 14:14:00 -0000 In-Reply-To: References: <20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com> <03bdf89c47880fd0734fc5b82213fc3c98eab372.camel@linux.intel.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-06/txt/msg00041.txt.bz2 On Sat, 2018-06-02 at 10:44 +0300, Alexander Monakov wrote: > On Fri, 1 Jun 2018, Leonardo Sandoval wrote: > > this is partially true for AVX2 FMA and AVX512. What I am proposing > > contains none of the latter instructions, just AVX2 without FMA > > instructions. > > This would address my concern (if true for all CPUs), but ... > > > In the other hand, some microbenchmarks were done to see the > > benefit of > > this effort, which is resumed on the commit description but the > > complete picture is here > > this does not. The whole point was that frequency behavior means the > slowdown on programs making *occasional* calls to strcmp will not be > captured by microbenchmarks. What good is saving dozens of cycles on > strcmp calls if the remaining program is slowed down by 5%? > right, perhaps microbenchmarks does not tell us much on this case because AVX and non-AVX is not mixed. Also, if you look at the patch, upper ymm bits are cleared (vzeroupper) before returning from strcmp, thus there is no perf penalty in storing these and then restoring when other AVX code is called again. As I said before, using strcmp wont hurt performance at all (internal HW perf team confirmed what I said) because we are not using any opcode that that may drop frequency. if you have a test scenario to prove the 5% drop, I would like to test it and discuss it further. > I was missing that AVX frequency limits kick in only if "heavy" > operations > are used -- on recent generations. I'm not sure that's true for > older, e.g. > Haswell, generations. Intel's white paper explaining Haswell AVX > clocks > makes no distinction of "light" vs. "heavy" operations: > > https://www.intel.com/content/dam/www/public/us/en/documents/white-pa > pers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf > > Can you please clarify further? > > Alexander