From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <goldstein.w.n@gmail.com>
Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com
 [IPv6:2607:f8b0:4864:20::42c])
 by sourceware.org (Postfix) with ESMTPS id 781AE384843F
 for <libc-alpha@sourceware.org>; Fri, 16 Jul 2021 17:20:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 781AE384843F
Received: by mail-pf1-x42c.google.com with SMTP id m83so9489728pfd.0
 for <libc-alpha@sourceware.org>; Fri, 16 Jul 2021 10:20:41 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=lSTI6Rt/e+sXQoSGPH4TtR0MuvJ6eKnqzbHCQ13F3uQ=;
 b=Y+vvm3OtVI/q9kEZCukfEJ7m+oUBA3phuPB0O2l0R5IQxKO4Yo6TL1rmEXgLmtOPH9
 ba9MJCYX/iNGSL6Z0ftMKnq99YN/KdGN0gRnNJgu6LSPjQzv//K+kcT2zzV4Ea8dji5o
 ZxcQivqLpuiJMYi0WGHvUHuO2MFdHaQkCIX+Sp52suGSt0rdRF5hvVNHcfP0eoWtNi4r
 LEyq09DmoggoJz24B7uZ96XvzgDSz01Vt/ozwQO3pO6eXpRyKfpdicvpNWGlLw2mVHdH
 vAc80qHUC41oyj3bGiZVp+EDmLxcRe9hW7l9jKYtFFKpwD9KzhSdTL/ylYW5PGrWjolZ
 XBtA==
X-Gm-Message-State: AOAM530pZ9yyv6HVWALI+WLtCg6t3mMvq1zARu/BxJ6KSZGS61XuEvKT
 xWSiDOwF/BnUylznnSVmC2CkSEv/mlrX0Fwr/vE=
X-Google-Smtp-Source: ABdhPJyyVCrx24e+ck+HrBf+NVp3XyIV+B+Mc9ZjfBYF2E3zLbhl3/e7ty/OjGa4p1pfEM7C7OTfURF4Og3s06Tky5Y=
X-Received: by 2002:a63:5b51:: with SMTP id l17mr10863993pgm.408.1626456040588; 
 Fri, 16 Jul 2021 10:20:40 -0700 (PDT)
MIME-Version: 1.0
References: <20210716023656.670004-1-jason@redhat.com>
 <CADzB+2ktqA=nzYS5zJAxH9TfFexP-OzZw8SoPOd907-qxVbWpg@mail.gmail.com>
 <2136759.qKCeTcHjAi@excalibur>
In-Reply-To: <2136759.qKCeTcHjAi@excalibur>
From: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Fri, 16 Jul 2021 13:20:29 -0400
Message-ID: <CAFUsyf+_rDAdX_2G=cTy_3fSrn04jD+x8XuqHf3JZkosrBPSbA@mail.gmail.com>
Subject: Re: [PATCH] c++: implement C++17 hardware interference size
To: Matthias Kretz <m.kretz@gsi.de>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>, 
 "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>,
 "libstdc++" <libstdc++@gcc.gnu.org>, 
 GNU C Library <libc-alpha@sourceware.org>
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM,
 HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=unavailable autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jul 2021 17:20:43 -0000

On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kretz@gsi.de> wrote:

> On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote=
:
> > > Currently the patch does not adjust the values based on -march, as in
> JF's
> > > proposal.  I'll need more guidance from the ARM/AArch64 maintainers
> about
> > > how to go about that.  --param l1-cache-line-size is set based on
> -mtune,
> > > but I don't think we want -mtune to change these ABI-affecting values=
.
> > > Are
> > > there -march values for which a smaller range than 64-256 makes sense=
?
>
> As a user who cares about ABI but also cares about maximizing performance
> of
> builds for a specific HPC setup I'd expect the hardware interference size
> values to be allowed to break ABIs. The point of these values is to give
> me
> better performance portability (but not necessarily binary portability)
> than
> my usual "pick 64 as a good average".


> Wrt, -march / -mtune setting hardware interference size: IMO -mtune=3DX
> should
> be interpreted as "my binary is supposed to be optimized for X, I accept
> inefficiencies on everything that's not X".
>
> On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> > On intel x86 systems with a private L2 cache the spatial prefetcher
> > can cause destructive interference along 128 byte aligned boundaries.
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-i=
a-3
> > 2-architectures-optimization-manual.pdf#page=3D60
>
> I don't understand how this feature would lead to false sharing. But mayb=
e
> I
> misunderstand the spatial prefetcher. The first access to one of the two
> cache
> lines pairs would bring both cache lines to LLC (and possibly L2). If a
> core
> with a different L2 reads the other cache line the cache line would be
> duplicated; if it writes to it, it would be exclusive to the other core's
> L2.
> The cache line pairs do not affect each other anymore. Maybe there's a
> minor
> inefficiency on initial transfer from memory, but isn't that all?
>

If two cores that do not share an L2 cache need exclusive access to
a cache-line, the L2 spatial prefetcher could cause pingponging if those
two cache-lines were adjacent and shared the same 128 byte alignment.
Say core A requests line x1 in exclusive, it also get line x2 (not sure
if x2 would be in shared or exclusive), core B then requests x2 in
exclusive,
it also gets x1. Irrelevant of the state x1 comes into core B's private L2
cache
it invalidates the exclusive state on cache-line x1 in core A's private L2
cache. If this was done in a loop (say a simple `lock add` loop) it would
cause
pingponging on cache-lines x1/x2 between core A and B's private L2 caches.


>
> That said. Intel documents the spatial prefetcher exclusively for Sandy
> Bridge. So if you still believe 128 is necessary, set the destructive
> hardware
> interference size to 64 for all of x86 except -mtune=3Dsandybridge.
>

AFAIK the spatial prefetcher exists on newer x86_64 machines as well.


>
> --
> =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
>