From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 67521 invoked by alias); 2 Jun 2018 07:46:33 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 66663 invoked by uid 89); 2 Jun 2018 07:45:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,KAM_NUMSUBJECT,SPF_PASS autolearn=no version=3.3.2 spammy=kick, resumed, Haswell, slowdown X-HELO: smtp.ispras.ru Date: Sat, 02 Jun 2018 07:46:00 -0000 From: Alexander Monakov To: Leonardo Sandoval cc: "H.J. Lu" , GNU C Library Subject: Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2 In-Reply-To: <03bdf89c47880fd0734fc5b82213fc3c98eab372.camel@linux.intel.com> Message-ID: References: <20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com> <03bdf89c47880fd0734fc5b82213fc3c98eab372.camel@linux.intel.com> User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2018-06/txt/msg00026.txt.bz2 On Fri, 1 Jun 2018, Leonardo Sandoval wrote: > this is partially true for AVX2 FMA and AVX512. What I am proposing > contains none of the latter instructions, just AVX2 without FMA > instructions. This would address my concern (if true for all CPUs), but ... > In the other hand, some microbenchmarks were done to see the benefit of > this effort, which is resumed on the commit description but the > complete picture is here this does not. The whole point was that frequency behavior means the slowdown on programs making *occasional* calls to strcmp will not be captured by microbenchmarks. What good is saving dozens of cycles on strcmp calls if the remaining program is slowed down by 5%? I was missing that AVX frequency limits kick in only if "heavy" operations are used -- on recent generations. I'm not sure that's true for older, e.g. Haswell, generations. Intel's white paper explaining Haswell AVX clocks makes no distinction of "light" vs. "heavy" operations: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf Can you please clarify further? Alexander