From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 47269 invoked by alias); 3 Dec 2019 17:43:35 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 47258 invoked by uid 89); 3 Dec 2019 17:43:34 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=expense, UD:intel.com, products, our X-HELO: mail-ot1-f53.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uOmuoeGjrwMF53FHVdZnb86pZn/06fQJ5eLqR2Zez5M=; b=fBqkOAsFtwq8ybgCljHNxxHsmPH0Ml7kBNO2ItxjKkJBQvBN4k3hAPLGoL2wla5wQr hTvOaace2ghlpX8j5XoHWDWTVTcUYNZye6t9o84Lx/lRKDacFb41mvkZ323CxM7MoUHr q86wm41cG8TeP3DYKagy4r84D0EoGfXqyWp7hCQSn7EZD3/+hYrJgTGIIpgrLpkQnU4b eUSD9VjyrkZNXMpNNfJa2QPYRAyoriqm2eaU5KJ52WIH7N7usw+YlgTKstCHHSsumjUX sl/v7zlKeynCtlSKhGrqYGvCq+6Z3qJwCYPXdwgoIQgoLCVx5bTdOq9zoLpL26EAnKYN wSIQ== MIME-Version: 1.0 References: <8e209ae2-969e-d4d5-bc00-0111c85198a6@redhat.com> In-Reply-To: <8e209ae2-969e-d4d5-bc00-0111c85198a6@redhat.com> From: "H.J. Lu" Date: Tue, 03 Dec 2019 17:43:00 -0000 Message-ID: Subject: Re: Commit 27d3ce1467990f89126e228559dec8f84b96c60e? To: "Carlos O'Donell" Cc: libc-alpha , Florian Weimer Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2019-12/txt/msg00103.txt.bz2 On Tue, Nov 26, 2019 at 7:01 AM Carlos O'Donell wrote: > > HJ, > > In commit 27d3ce1467990f89126e228559dec8f84b96c60e we stop > setting bit_arch_Fast_Copy_Backward for Intel Core processors > as an optimization to improve performance. > > It turns out that this change also improves performance for > Haswell servers. Was it the intent of this change to *also* > improve performance for Haswell? The comments don't indicate > this and I was worried that it might be an unintentional change > in this case. The particular CPU was a E5-2650 v3. > > If we step back and look at the overall sequence of changes and > performance it looks like this: > > The performance regression is between this change: > c3d8dc45c9df199b8334599a6cbd98c9950dba62 - Triggers default: handling + TSX handling. > - Causes a 21% lmbench regression for an E5-2650 v3. > > and this change (the one we are discussing): > 27d3ce1467990f89126e228559dec8f84b96c60e - Removes bit_arch_Fast_Copy_Backward. > - Restores the performance loss. > > My worry is that the two are unrelated, and that we've only > just made back performance at the expense of the other change > and we could be doing better. > > As our Intel expert what do you think is going on here? My change should be a NOP on Haswell since Fast_Copy_Backward is used only in x86_64/multiarch/ifunc-memmove.h: if (CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) { if (CPU_FEATURES_CPU_P (cpu_features, ERMS)) return OPTIMIZE (avx_unaligned_erms); return OPTIMIZE (avx_unaligned); } if (!CPU_FEATURES_CPU_P (cpu_features, SSSE3) || CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Copy)) { if (CPU_FEATURES_CPU_P (cpu_features, ERMS)) return OPTIMIZE (sse2_unaligned_erms); return OPTIMIZE (sse2_unaligned); } if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Copy_Backward)) return OPTIMIZE (ssse3_back); return OPTIMIZE (ssse3); and AVX_Fast_Unaligned_Load is set on Haswell. > -- > Cheers, > Carlos. > > [1] https://ark.intel.com/content/www/us/en/ark/products/81705/intel-xeon-processor-e5-2650-v3-25m-cache-2-30-ghz.html > -- H.J.