From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <goldstein.w.n@gmail.com>
Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com
 [IPv6:2607:f8b0:4864:20::52f])
 by sourceware.org (Postfix) with ESMTPS id C1CA539AD02D;
 Fri, 16 Jul 2021 21:23:56 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C1CA539AD02D
Received: by mail-pg1-x52f.google.com with SMTP id y4so11173801pgl.10;
 Fri, 16 Jul 2021 14:23:56 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=ZAnhbMOsSjk8A4YiFrn+Q/hahDIUZ/yboh1tIaBLfY8=;
 b=qFilfvJgANV1OwiPu4SmMDMhBU/+L/aja6y/+CKO+CKDa6dYJsupSrrMmk0E0hDGZ7
 hZ09oE5i1XWWf16wK/2Yipu7BCl7cAn7/5c2m6IGUHO1RlrBPMyo0wSi0dZZ9sPAkJ5h
 qmSqyUIQ/S1PKG9nG5WDmAfL9i0BmFQVM/gwUy87+W42z8NTSXLKqg3shB29hd6FeVOE
 +KwCgVfMIiKocTv6GctrgIN6NSsX4ddqg9w9524iJ8fE7h+IOKwWpkcgtrm6ZcQT9IM4
 QLf/gWiUsBf5X2Cwxh3YT7bFbc4NLuLlsTH02l2+TR6uSPWx28zF1fzuonqDmv9tu4AO
 aWag==
X-Gm-Message-State: AOAM530E/W50rBPinW/bIXZruirETH+3VkMWR/p7+MpqnBLsVDSmVi9B
 Sqf8x85gQ5bhd3FOya8BsJ3MrNK6anLeZ9eiLvg=
X-Google-Smtp-Source: ABdhPJwPcsEQXJzVw5InfkKALx5T4/3ZeXEIGEGfqR1tfwrTjlY4fQJDR7y5G2s6Y8+ZO2eB/9nJSRgyNfLBunTGnCE=
X-Received: by 2002:a62:1d86:0:b029:32a:311a:9595 with SMTP id
 d128-20020a621d860000b029032a311a9595mr12549812pfd.74.1626470635800; Fri, 16
 Jul 2021 14:23:55 -0700 (PDT)
MIME-Version: 1.0
References: <20210716023656.670004-1-jason@redhat.com>
 <2136759.qKCeTcHjAi@excalibur>
 <CAFUsyf+_rDAdX_2G=cTy_3fSrn04jD+x8XuqHf3JZkosrBPSbA@mail.gmail.com>
 <1770208.5S6X66LlFz@excalibur>
In-Reply-To: <1770208.5S6X66LlFz@excalibur>
From: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Fri, 16 Jul 2021 17:23:44 -0400
Message-ID: <CAFUsyfJpjsjD+T1AzDEMj2LQ4gOcSKrD55BeRw+htk4A0=4eZA@mail.gmail.com>
Subject: Re: [PATCH] c++: implement C++17 hardware interference size
To: Matthias Kretz <m.kretz@gsi.de>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>, 
 "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>,
 "libstdc++" <libstdc++@gcc.gnu.org>, 
 GNU C Library <libc-alpha@sourceware.org>
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM,
 HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=unavailable autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: libstdc++@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libstdc++ mailing list <libstdc++.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/libstdc++/>
List-Post: <mailto:libstdc++@gcc.gnu.org>
List-Help: <mailto:libstdc++-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/libstdc++>,
 <mailto:libstdc++-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jul 2021 21:23:58 -0000

On Fri, Jul 16, 2021 at 3:37 PM Matthias Kretz <m.kretz@gsi.de> wrote:

> On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> > > I don't understand how this feature would lead to false sharing. But
> maybe
> > > I
> > > misunderstand the spatial prefetcher. The first access to one of the
> two
> > > cache
> > > lines pairs would bring both cache lines to LLC (and possibly L2). If=
 a
> > > core
> > > with a different L2 reads the other cache line the cache line would b=
e
> > > duplicated; if it writes to it, it would be exclusive to the other
> core's
> > > L2.
> > > The cache line pairs do not affect each other anymore. Maybe there's =
a
> > > minor
> > > inefficiency on initial transfer from memory, but isn't that all?
> >
> > If two cores that do not share an L2 cache need exclusive access to
> > a cache-line, the L2 spatial prefetcher could cause pingponging if thos=
e
> > two cache-lines were adjacent and shared the same 128 byte alignment.
> > Say core A requests line x1 in exclusive, it also get line x2 (not sure
> > if x2 would be in shared or exclusive), core B then requests x2 in
> > exclusive,
> > it also gets x1. Irrelevant of the state x1 comes into core B's private
> L2
> > cache
> > it invalidates the exclusive state on cache-line x1 in core A's private
> L2
> > cache. If this was done in a loop (say a simple `lock add` loop) it wou=
ld
> > cause
> > pingponging on cache-lines x1/x2 between core A and B's private L2
> caches.
>
> Quoting the latest ORM: "The following two hardware prefetchers fetched
> data
> from memory to the L2 cache and last level cache:
> Spatial Prefetcher: This prefetcher strives to complete every cache line
> fetched to the L2 cache with the pair line that completes it to a 128-byt=
e
> aligned chunk."
>
> 1. If the requested cache line is already present on some other core, the
> spatial prefetcher should not get used ("fetched data from memory").
>

I think this is correct and I'm incorrect that a request from LLC to L2
will invoke the spatial prefetcher. So not issues with 64 bytes. Sorry for
the added confusion!

>
> 2. The section is about data prefetching. It is unclear whether the
> spatial
> prefetcher applies at all for normal cache line fetches.
>
> 3. The ORM uses past tense ("The following two hardware prefetchers
> fetched
> data"), which indicates to me that Intel isn't doing this for newer
> generations anymore.


> 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of
> cache
> line A and the adjacent cache line B thus is also loaded to LLC. Core 2
> request a read of line B and thus loads line A into LLC. Now both cores
> have
> both cache lines in LLC. Core 1 writes to line A, which invalidates line =
A
> in
> LLC of Core 2 but does not affect line B. Core 2 writes to line B,
> invalidating line A for Core 1. =3D> no false sharing. Where did I get my
> mental
> cache protocol wrong?


> --
> =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
>
>
>
>