From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id C1CA539AD02D; Fri, 16 Jul 2021 21:23:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C1CA539AD02D Received: by mail-pg1-x52f.google.com with SMTP id y4so11173801pgl.10; Fri, 16 Jul 2021 14:23:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZAnhbMOsSjk8A4YiFrn+Q/hahDIUZ/yboh1tIaBLfY8=; b=qFilfvJgANV1OwiPu4SmMDMhBU/+L/aja6y/+CKO+CKDa6dYJsupSrrMmk0E0hDGZ7 hZ09oE5i1XWWf16wK/2Yipu7BCl7cAn7/5c2m6IGUHO1RlrBPMyo0wSi0dZZ9sPAkJ5h qmSqyUIQ/S1PKG9nG5WDmAfL9i0BmFQVM/gwUy87+W42z8NTSXLKqg3shB29hd6FeVOE +KwCgVfMIiKocTv6GctrgIN6NSsX4ddqg9w9524iJ8fE7h+IOKwWpkcgtrm6ZcQT9IM4 QLf/gWiUsBf5X2Cwxh3YT7bFbc4NLuLlsTH02l2+TR6uSPWx28zF1fzuonqDmv9tu4AO aWag== X-Gm-Message-State: AOAM530E/W50rBPinW/bIXZruirETH+3VkMWR/p7+MpqnBLsVDSmVi9B Sqf8x85gQ5bhd3FOya8BsJ3MrNK6anLeZ9eiLvg= X-Google-Smtp-Source: ABdhPJwPcsEQXJzVw5InfkKALx5T4/3ZeXEIGEGfqR1tfwrTjlY4fQJDR7y5G2s6Y8+ZO2eB/9nJSRgyNfLBunTGnCE= X-Received: by 2002:a62:1d86:0:b029:32a:311a:9595 with SMTP id d128-20020a621d860000b029032a311a9595mr12549812pfd.74.1626470635800; Fri, 16 Jul 2021 14:23:55 -0700 (PDT) MIME-Version: 1.0 References: <20210716023656.670004-1-jason@redhat.com> <2136759.qKCeTcHjAi@excalibur> <1770208.5S6X66LlFz@excalibur> In-Reply-To: <1770208.5S6X66LlFz@excalibur> From: Noah Goldstein Date: Fri, 16 Jul 2021 17:23:44 -0400 Message-ID: Subject: Re: [PATCH] c++: implement C++17 hardware interference size To: Matthias Kretz Cc: gcc-patches List , "Richard Earnshaw (lists)" , "libstdc++" , GNU C Library X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jul 2021 21:23:58 -0000 On Fri, Jul 16, 2021 at 3:37 PM Matthias Kretz wrote: > On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote: > > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz wrote: > > > I don't understand how this feature would lead to false sharing. But > maybe > > > I > > > misunderstand the spatial prefetcher. The first access to one of the > two > > > cache > > > lines pairs would bring both cache lines to LLC (and possibly L2). If= a > > > core > > > with a different L2 reads the other cache line the cache line would b= e > > > duplicated; if it writes to it, it would be exclusive to the other > core's > > > L2. > > > The cache line pairs do not affect each other anymore. Maybe there's = a > > > minor > > > inefficiency on initial transfer from memory, but isn't that all? > > > > If two cores that do not share an L2 cache need exclusive access to > > a cache-line, the L2 spatial prefetcher could cause pingponging if thos= e > > two cache-lines were adjacent and shared the same 128 byte alignment. > > Say core A requests line x1 in exclusive, it also get line x2 (not sure > > if x2 would be in shared or exclusive), core B then requests x2 in > > exclusive, > > it also gets x1. Irrelevant of the state x1 comes into core B's private > L2 > > cache > > it invalidates the exclusive state on cache-line x1 in core A's private > L2 > > cache. If this was done in a loop (say a simple `lock add` loop) it wou= ld > > cause > > pingponging on cache-lines x1/x2 between core A and B's private L2 > caches. > > Quoting the latest ORM: "The following two hardware prefetchers fetched > data > from memory to the L2 cache and last level cache: > Spatial Prefetcher: This prefetcher strives to complete every cache line > fetched to the L2 cache with the pair line that completes it to a 128-byt= e > aligned chunk." > > 1. If the requested cache line is already present on some other core, the > spatial prefetcher should not get used ("fetched data from memory"). > I think this is correct and I'm incorrect that a request from LLC to L2 will invoke the spatial prefetcher. So not issues with 64 bytes. Sorry for the added confusion! > > 2. The section is about data prefetching. It is unclear whether the > spatial > prefetcher applies at all for normal cache line fetches. > > 3. The ORM uses past tense ("The following two hardware prefetchers > fetched > data"), which indicates to me that Intel isn't doing this for newer > generations anymore. > 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of > cache > line A and the adjacent cache line B thus is also loaded to LLC. Core 2 > request a read of line B and thus loads line A into LLC. Now both cores > have > both cache lines in LLC. Core 1 writes to line A, which invalidates line = A > in > LLC of Core 2 but does not affect line B. Core 2 writes to line B, > invalidating line A for Core 1. =3D> no false sharing. Where did I get my > mental > cache protocol wrong? > -- > =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 > > > >