From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by sourceware.org (Postfix) with ESMTPS id 781AE384843F for ; Fri, 16 Jul 2021 17:20:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 781AE384843F Received: by mail-pf1-x42c.google.com with SMTP id m83so9489728pfd.0 for ; Fri, 16 Jul 2021 10:20:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lSTI6Rt/e+sXQoSGPH4TtR0MuvJ6eKnqzbHCQ13F3uQ=; b=Y+vvm3OtVI/q9kEZCukfEJ7m+oUBA3phuPB0O2l0R5IQxKO4Yo6TL1rmEXgLmtOPH9 ba9MJCYX/iNGSL6Z0ftMKnq99YN/KdGN0gRnNJgu6LSPjQzv//K+kcT2zzV4Ea8dji5o ZxcQivqLpuiJMYi0WGHvUHuO2MFdHaQkCIX+Sp52suGSt0rdRF5hvVNHcfP0eoWtNi4r LEyq09DmoggoJz24B7uZ96XvzgDSz01Vt/ozwQO3pO6eXpRyKfpdicvpNWGlLw2mVHdH vAc80qHUC41oyj3bGiZVp+EDmLxcRe9hW7l9jKYtFFKpwD9KzhSdTL/ylYW5PGrWjolZ XBtA== X-Gm-Message-State: AOAM530pZ9yyv6HVWALI+WLtCg6t3mMvq1zARu/BxJ6KSZGS61XuEvKT xWSiDOwF/BnUylznnSVmC2CkSEv/mlrX0Fwr/vE= X-Google-Smtp-Source: ABdhPJyyVCrx24e+ck+HrBf+NVp3XyIV+B+Mc9ZjfBYF2E3zLbhl3/e7ty/OjGa4p1pfEM7C7OTfURF4Og3s06Tky5Y= X-Received: by 2002:a63:5b51:: with SMTP id l17mr10863993pgm.408.1626456040588; Fri, 16 Jul 2021 10:20:40 -0700 (PDT) MIME-Version: 1.0 References: <20210716023656.670004-1-jason@redhat.com> <2136759.qKCeTcHjAi@excalibur> In-Reply-To: <2136759.qKCeTcHjAi@excalibur> From: Noah Goldstein Date: Fri, 16 Jul 2021 13:20:29 -0400 Message-ID: Subject: Re: [PATCH] c++: implement C++17 hardware interference size To: Matthias Kretz Cc: gcc-patches List , "Richard Earnshaw (lists)" , "libstdc++" , GNU C Library X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jul 2021 17:20:43 -0000 On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz wrote: > On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote= : > > > Currently the patch does not adjust the values based on -march, as in > JF's > > > proposal. I'll need more guidance from the ARM/AArch64 maintainers > about > > > how to go about that. --param l1-cache-line-size is set based on > -mtune, > > > but I don't think we want -mtune to change these ABI-affecting values= . > > > Are > > > there -march values for which a smaller range than 64-256 makes sense= ? > > As a user who cares about ABI but also cares about maximizing performance > of > builds for a specific HPC setup I'd expect the hardware interference size > values to be allowed to break ABIs. The point of these values is to give > me > better performance portability (but not necessarily binary portability) > than > my usual "pick 64 as a good average". > Wrt, -march / -mtune setting hardware interference size: IMO -mtune=3DX > should > be interpreted as "my binary is supposed to be optimized for X, I accept > inefficiencies on everything that's not X". > > On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote: > > On intel x86 systems with a private L2 cache the spatial prefetcher > > can cause destructive interference along 128 byte aligned boundaries. > > > https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-i= a-3 > > 2-architectures-optimization-manual.pdf#page=3D60 > > I don't understand how this feature would lead to false sharing. But mayb= e > I > misunderstand the spatial prefetcher. The first access to one of the two > cache > lines pairs would bring both cache lines to LLC (and possibly L2). If a > core > with a different L2 reads the other cache line the cache line would be > duplicated; if it writes to it, it would be exclusive to the other core's > L2. > The cache line pairs do not affect each other anymore. Maybe there's a > minor > inefficiency on initial transfer from memory, but isn't that all? > If two cores that do not share an L2 cache need exclusive access to a cache-line, the L2 spatial prefetcher could cause pingponging if those two cache-lines were adjacent and shared the same 128 byte alignment. Say core A requests line x1 in exclusive, it also get line x2 (not sure if x2 would be in shared or exclusive), core B then requests x2 in exclusive, it also gets x1. Irrelevant of the state x1 comes into core B's private L2 cache it invalidates the exclusive state on cache-line x1 in core A's private L2 cache. If this was done in a loop (say a simple `lock add` loop) it would cause pingponging on cache-lines x1/x2 between core A and B's private L2 caches. > > That said. Intel documents the spatial prefetcher exclusively for Sandy > Bridge. So if you still believe 128 is necessary, set the destructive > hardware > interference size to 64 for all of x86 except -mtune=3Dsandybridge. > AFAIK the spatial prefetcher exists on newer x86_64 machines as well. > > -- > =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 >