From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-107671-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 47269 invoked by alias); 3 Dec 2019 17:43:35 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 47258 invoked by uid 89); 3 Dec 2019 17:43:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=expense, UD:intel.com, products, our
X-HELO: mail-ot1-f53.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=uOmuoeGjrwMF53FHVdZnb86pZn/06fQJ5eLqR2Zez5M=;
        b=fBqkOAsFtwq8ybgCljHNxxHsmPH0Ml7kBNO2ItxjKkJBQvBN4k3hAPLGoL2wla5wQr
         hTvOaace2ghlpX8j5XoHWDWTVTcUYNZye6t9o84Lx/lRKDacFb41mvkZ323CxM7MoUHr
         q86wm41cG8TeP3DYKagy4r84D0EoGfXqyWp7hCQSn7EZD3/+hYrJgTGIIpgrLpkQnU4b
         eUSD9VjyrkZNXMpNNfJa2QPYRAyoriqm2eaU5KJ52WIH7N7usw+YlgTKstCHHSsumjUX
         sl/v7zlKeynCtlSKhGrqYGvCq+6Z3qJwCYPXdwgoIQgoLCVx5bTdOq9zoLpL26EAnKYN
         wSIQ==
MIME-Version: 1.0
References: <8e209ae2-969e-d4d5-bc00-0111c85198a6@redhat.com>
In-Reply-To: <8e209ae2-969e-d4d5-bc00-0111c85198a6@redhat.com>
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 03 Dec 2019 17:43:00 -0000
Message-ID: <CAMe9rOrj19j1_hQVmsgj5Jrq_rs6vfJiYgtCABNZUkiG0T11KQ@mail.gmail.com>
Subject: Re: Commit 27d3ce1467990f89126e228559dec8f84b96c60e?
To: "Carlos O'Donell" <carlos@redhat.com>
Cc: libc-alpha <libc-alpha@sourceware.org>, Florian Weimer <fweimer@redhat.com>
Content-Type: text/plain; charset="UTF-8"
X-SW-Source: 2019-12/txt/msg00103.txt.bz2

On Tue, Nov 26, 2019 at 7:01 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> HJ,
>
> In commit 27d3ce1467990f89126e228559dec8f84b96c60e we stop
> setting bit_arch_Fast_Copy_Backward for Intel Core processors
> as an optimization to improve performance.
>
> It turns out that this change also improves performance for
> Haswell servers. Was it the intent of this change to *also*
> improve performance for Haswell? The comments don't indicate
> this and I was worried that it might be an unintentional change
> in this case. The particular CPU was a E5-2650 v3.
>
> If we step back and look at the overall sequence of changes and
> performance it looks like this:
>
> The performance regression is between this change:
> c3d8dc45c9df199b8334599a6cbd98c9950dba62 - Triggers default: handling + TSX handling.
> - Causes a 21% lmbench regression for an E5-2650 v3.
>
> and this change (the one we are discussing):
> 27d3ce1467990f89126e228559dec8f84b96c60e - Removes bit_arch_Fast_Copy_Backward.
> - Restores the performance loss.
>
> My worry is that the two are unrelated, and that we've only
> just made back performance at the expense of the other change
> and we could be doing better.
>
> As our Intel expert what do you think is going on here?

My change should be a NOP on Haswell since Fast_Copy_Backward is
used only in x86_64/multiarch/ifunc-memmove.h:

  if (CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
    {
      if (CPU_FEATURES_CPU_P (cpu_features, ERMS))
        return OPTIMIZE (avx_unaligned_erms);

      return OPTIMIZE (avx_unaligned);
    }

  if (!CPU_FEATURES_CPU_P (cpu_features, SSSE3)
      || CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Copy))
    {
      if (CPU_FEATURES_CPU_P (cpu_features, ERMS))
        return OPTIMIZE (sse2_unaligned_erms);

      return OPTIMIZE (sse2_unaligned);
    }

  if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Copy_Backward))
    return OPTIMIZE (ssse3_back);

  return OPTIMIZE (ssse3);

and AVX_Fast_Unaligned_Load is set on Haswell.


> --
> Cheers,
> Carlos.
>
> [1] https://ark.intel.com/content/www/us/en/ark/products/81705/intel-xeon-processor-e5-2650-v3-25m-cache-2-30-ghz.html
>


-- 
H.J.