From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 63182 invoked by alias); 2 Jun 2018 11:37:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 63172 invoked by uid 89); 2 Jun 2018 11:37:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=kick, Haswell, Hx-languages-length:1076, intels X-HELO: albireo.enyo.de From: Florian Weimer To: Alexander Monakov Cc: Leonardo Sandoval , "H.J. Lu" , GNU C Library Subject: Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2 References: <20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com> <03bdf89c47880fd0734fc5b82213fc3c98eab372.camel@linux.intel.com> Date: Sat, 02 Jun 2018 11:37:00 -0000 In-Reply-To: (Alexander Monakov's message of "Sat, 2 Jun 2018 10:44:15 +0300 (MSK)") Message-ID: <87sh651ks7.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2018-06/txt/msg00028.txt.bz2 * Alexander Monakov: > this does not. The whole point was that frequency behavior means the > slowdown on programs making *occasional* calls to strcmp will not be > captured by microbenchmarks. What good is saving dozens of cycles on > strcmp calls if the remaining program is slowed down by 5%? > > I was missing that AVX frequency limits kick in only if "heavy" operations > are used -- on recent generations. I'm not sure that's true for older, e.g. > Haswell, generations. Intel's white paper explaining Haswell AVX clocks > makes no distinction of "light" vs. "heavy" operations: > > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf This should be easy to measure. Aren't there perf counters for that? The CORE_POWER.LVL0_TURBO_LICENSE, CORE_POWER.LVL1_TURBO_LICENSE, CORE_POWER.LVL2_TURBO_LICENSE counters? Run the benchmark in parallel with itself, and then with other compute loads, and see which of the counters increase?