From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lxmtout1.gsi.de (lxmtout1.gsi.de [140.181.3.111]) by sourceware.org (Postfix) with ESMTPS id D3F903840002 for ; Fri, 16 Jul 2021 19:37:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D3F903840002 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gsi.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gsi.de Received: from localhost (localhost [127.0.0.1]) by lxmtout1.gsi.de (Postfix) with ESMTP id D9BE12050D14; Fri, 16 Jul 2021 21:37:16 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at lxmtout1.gsi.de Received: from lxmtout1.gsi.de ([127.0.0.1]) by localhost (lxmtout1.gsi.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id IJ3y4BF1LKPb; Fri, 16 Jul 2021 21:37:16 +0200 (CEST) Received: from srvex1.campus.gsi.de (unknown [10.10.4.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by lxmtout1.gsi.de (Postfix) with ESMTPS id BBD412050D10; Fri, 16 Jul 2021 21:37:16 +0200 (CEST) Received: from excalibur.localnet (140.181.3.12) by srvex1.campus.gsi.de (10.10.4.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2242.10; Fri, 16 Jul 2021 21:37:16 +0200 From: Matthias Kretz To: Noah Goldstein CC: gcc-patches List , "Richard Earnshaw (lists)" , libstdc++ , GNU C Library Subject: Re: [PATCH] c++: implement C++17 hardware interference size Date: Fri, 16 Jul 2021 21:37:16 +0200 Message-ID: <1770208.5S6X66LlFz@excalibur> Organization: GSI Helmholtzzentrum =?UTF-8?B?ZsO8cg==?= Schwerionenforschung In-Reply-To: References: <20210716023656.670004-1-jason@redhat.com> <2136759.qKCeTcHjAi@excalibur> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Originating-IP: [140.181.3.12] X-ClientProxiedBy: srvex1.Campus.gsi.de (10.10.4.11) To srvex1.campus.gsi.de (10.10.4.11) X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jul 2021 19:37:20 -0000 On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote: > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz wrote: > > I don't understand how this feature would lead to false sharing. But ma= ybe > > I > > misunderstand the spatial prefetcher. The first access to one of the two > > cache > > lines pairs would bring both cache lines to LLC (and possibly L2). If a > > core > > with a different L2 reads the other cache line the cache line would be > > duplicated; if it writes to it, it would be exclusive to the other core= 's > > L2. > > The cache line pairs do not affect each other anymore. Maybe there's a > > minor > > inefficiency on initial transfer from memory, but isn't that all? >=20 > If two cores that do not share an L2 cache need exclusive access to > a cache-line, the L2 spatial prefetcher could cause pingponging if those > two cache-lines were adjacent and shared the same 128 byte alignment. > Say core A requests line x1 in exclusive, it also get line x2 (not sure > if x2 would be in shared or exclusive), core B then requests x2 in > exclusive, > it also gets x1. Irrelevant of the state x1 comes into core B's private L2 > cache > it invalidates the exclusive state on cache-line x1 in core A's private L2 > cache. If this was done in a loop (say a simple `lock add` loop) it would > cause > pingponging on cache-lines x1/x2 between core A and B's private L2 caches. Quoting the latest ORM: "The following two hardware prefetchers fetched dat= a=20 from memory to the L2 cache and last level cache: Spatial Prefetcher: This prefetcher strives to complete every cache line=20 fetched to the L2 cache with the pair line that completes it to a 128-byte= =20 aligned chunk." 1. If the requested cache line is already present on some other core, the=20 spatial prefetcher should not get used ("fetched data from memory"). 2. The section is about data prefetching. It is unclear whether the spatial= =20 prefetcher applies at all for normal cache line fetches. 3. The ORM uses past tense ("The following two hardware prefetchers fetched= =20 data"), which indicates to me that Intel isn't doing this for newer=20 generations anymore. 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of ca= che=20 line A and the adjacent cache line B thus is also loaded to LLC. Core 2=20 request a read of line B and thus loads line A into LLC. Now both cores hav= e=20 both cache lines in LLC. Core 1 writes to line A, which invalidates line A = in=20 LLC of Core 2 but does not affect line B. Core 2 writes to line B,=20 invalidating line A for Core 1. =3D> no false sharing. Where did I get my m= ental=20 cache protocol wrong? =2D-=20 =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80