From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-92874-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 63182 invoked by alias); 2 Jun 2018 11:37:58 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 63172 invoked by uid 89); 2 Jun 2018 11:37:58 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=kick, Haswell, Hx-languages-length:1076, intels
X-HELO: albireo.enyo.de
From: Florian Weimer <fw@deneb.enyo.de>
To: Alexander Monakov <amonakov@ispras.ru>
Cc: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com>,  "H.J. Lu" <hjl.tools@gmail.com>,  GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2
References: <20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com>
	<CAMe9rOpKpR6pOLkxyMuTPBA1zSx4MmYYsTOwHz5pTxjdR57p1A@mail.gmail.com>
	<alpine.LNX.2.20.13.1806011824140.1892@monopod.intra.ispras.ru>
	<03bdf89c47880fd0734fc5b82213fc3c98eab372.camel@linux.intel.com>
	<alpine.LNX.2.20.13.1806021022140.1892@monopod.intra.ispras.ru>
Date: Sat, 02 Jun 2018 11:37:00 -0000
In-Reply-To: <alpine.LNX.2.20.13.1806021022140.1892@monopod.intra.ispras.ru>
	(Alexander Monakov's message of "Sat, 2 Jun 2018 10:44:15 +0300
	(MSK)")
Message-ID: <87sh651ks7.fsf@mid.deneb.enyo.de>
MIME-Version: 1.0
Content-Type: text/plain
X-SW-Source: 2018-06/txt/msg00028.txt.bz2

* Alexander Monakov:

> this does not. The whole point was that frequency behavior means the
> slowdown on programs making *occasional* calls to strcmp will not be
> captured by microbenchmarks. What good is saving dozens of cycles on
> strcmp calls if the remaining program is slowed down by 5%?
>
> I was missing that AVX frequency limits kick in only if "heavy" operations
> are used -- on recent generations. I'm not sure that's true for older, e.g.
> Haswell, generations. Intel's white paper explaining Haswell AVX clocks
> makes no distinction of "light" vs. "heavy" operations:
>
> https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf

This should be easy to measure.  Aren't there perf counters for that?
The CORE_POWER.LVL0_TURBO_LICENSE, CORE_POWER.LVL1_TURBO_LICENSE,
CORE_POWER.LVL2_TURBO_LICENSE counters?

Run the benchmark in parallel with itself, and then with other compute
loads, and see which of the counters increase?