From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 35CD93850401 for ; Fri, 16 Jul 2021 02:41:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 35CD93850401 Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-18-i-_Dp2VGP0K9q92E2oYY9g-1; Thu, 15 Jul 2021 22:41:31 -0400 X-MC-Unique: i-_Dp2VGP0K9q92E2oYY9g-1 Received: by mail-pl1-f200.google.com with SMTP id x17-20020a1709027c11b0290122d280f05bso1872265pll.8 for ; Thu, 15 Jul 2021 19:41:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/xHSEYOWCPip1Muk7a++4sGU7MPs44XjGr9RyxIlaeI=; b=e2Bn7hXIw9my1UebBt3njgVMFUf5CF0cY9chFVRaDRTQbJY055ctJCPbJCaS8ncODi Barop/jbRqQd0b4Sat6tzTNghglgR68NMv2ruar/7bsQb38JIdIrDRRXH1jr8ahZdkJH x1J1LLfncH6S55/CORwC1rBUfUA94Bp5B+2G+P/oCAj4brxl6hMNEuKdJpBP5YbQADiJ DdX3BkCI/OoTOyWJ8cZIdaQqXBl3RQ7mstESBVfemOw1u/b4oqRZlsLZP24sYm5KthGg 0r1QKAiO8DkcmXmBV3kNrz58nkGl+pqX7Z4rwvwPBEi+4Jh3PzidyTX0h4d1nNtPBREg MLCQ== X-Gm-Message-State: AOAM533IhLCQPLXTIiujXCum23XtgnS5yrsMhV42ZSFnLWgAXHDRqeQA WWdoG7CtwWO0HwxdAGWhxZaZYVoUXkdm5DXdiG08zRKCVR4BJIfmhIfzXJX0L/8d9/Ha01i7i1N Lfg/E3YnqFXttdCesBmI7bPYOtp/X3qD5rTyz X-Received: by 2002:a17:90a:6c61:: with SMTP id x88mr13541500pjj.122.1626403289435; Thu, 15 Jul 2021 19:41:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyQpNrl1FQdZYMccljbuFpGmu4i9sBBV6ec+mh1roEsLu06BQAxaJKO+8Z2LO+b6LqjEzpr2hqCrhohzDWAApo= X-Received: by 2002:a17:90a:6c61:: with SMTP id x88mr13541469pjj.122.1626403288897; Thu, 15 Jul 2021 19:41:28 -0700 (PDT) MIME-Version: 1.0 References: <20210716023656.670004-1-jason@redhat.com> In-Reply-To: <20210716023656.670004-1-jason@redhat.com> From: Jason Merrill Date: Thu, 15 Jul 2021 22:41:17 -0400 Message-ID: Subject: Re: [PATCH] c++: implement C++17 hardware interference size To: gcc-patches List Cc: "libstdc++" , libc-alpha@sourceware.org, "Richard Earnshaw (lists)" X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, HTML_MESSAGE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=unavailable autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jul 2021 02:41:38 -0000 Adding CCs that got lost in the initial mail. On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill wrote: > The last missing piece of the C++17 standard library is the hardware > intereference size constants. Much of the delay in implementing these has > been due to uncertainty about what the right values are, and even whether > there is a single constant value that is suitable; the destructive > interference size is intended to be used in structure layout, so program > ABIs will depend on it. > > In principle, both of these values should be the same as the target's L1 > cache line size. When compiling for a generic target that is intended to > support a range of target CPUs with different cache line sizes, the > constructive size should probably be the minimum size, and the destructive > size the maximum, unless you are constrained by ABI compatibility with > previous code. > > JF Bastien's implementation proposal is summarized at > https://github.com/itanium-cxx-abi/cxx-abi/issues/74 > > I implement this by adding new --params for the two sizes. Targets need to > override these values in targetm.target_option.override() to support the > feature. > > 64 bytes still seems correct for the x86 family. > > I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a > 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL > target, so I'd think 32/64 would make more sense. > > He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache > line, I've changed that to 64/256. Does that seem right? > > Currently the patch does not adjust the values based on -march, as in JF's > proposal. I'll need more guidance from the ARM/AArch64 maintainers about > how to go about that. --param l1-cache-line-size is set based on -mtune, > but I don't think we want -mtune to change these ABI-affecting values. Are > there -march values for which a smaller range than 64-256 makes sense? > > gcc/ChangeLog: > > * params.opt: Add destructive-interference-size and > constructive-interference-size. > * doc/invoke.texi: Document them. > * config/aarch64/aarch64.c (aarch64_override_options_internal): > Set them. > * config/arm/arm.c (arm_option_override): Set them. > * config/i386/i386-options.c (ix86_option_override_internal): > Set them. > > gcc/c-family/ChangeLog: > > * c.opt: Add -Winterference-size. > * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE > and __GCC_CONSTRUCTIVE_SIZE. > > gcc/cp/ChangeLog: > > * decl.c (cxx_init_decl_processing): Check > --param *-interference-size values. > > libstdc++-v3/ChangeLog: > > * include/std/version: Define __cpp_lib_hardware_interference_size. > * libsupc++/new: Define hardware interference size variables. > > gcc/testsuite/ChangeLog: > > * g++.target/aarch64/interference.C: New test. > * g++.target/arm/interference.C: New test. > * g++.target/i386/interference.C: New test. > --- > gcc/doc/invoke.texi | 22 ++++++++++++++++++ > gcc/c-family/c.opt | 5 ++++ > gcc/params.opt | 15 ++++++++++++ > gcc/c-family/c-cppbuiltin.c | 12 ++++++++++ > gcc/config/aarch64/aarch64.c | 9 ++++++++ > gcc/config/arm/arm.c | 6 +++++ > gcc/config/i386/i386-options.c | 6 +++++ > gcc/cp/decl.c | 23 +++++++++++++++++++ > .../g++.target/aarch64/interference.C | 9 ++++++++ > gcc/testsuite/g++.target/arm/interference.C | 9 ++++++++ > gcc/testsuite/g++.target/i386/interference.C | 8 +++++++ > libstdc++-v3/include/std/version | 3 +++ > libstdc++-v3/libsupc++/new | 10 ++++++-- > 13 files changed, 135 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C > create mode 100644 gcc/testsuite/g++.target/arm/interference.C > create mode 100644 gcc/testsuite/g++.target/i386/interference.C > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index ea8812425e9..f93cb7a20f7 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant > stride. > > This setting is only useful for strides that are known and constant. > > +@item destructive_interference_size > +@item constructive_interference_size > +The values for the C++17 variables > +@code{std::hardware_destructive_interference_size} and > +@code{std::hardware_constructive_interference_size}. The destructive > +interference size is the minimum recommended offset between two > +independent concurrently-accessed objects; the constructive > +interference size is the maximum recommended size of contiguous memory > +accessed together. Typically both will be the size of an L1 cache > +line for the target, in bytes. If the target can have a range of L1 > +cache line sizes, typically the constructive interference size will be > +the small end of the range and the destructive size will be the large > +end. > + > +These values, particularly the destructive size, are intended to be > +used for layout, and thus have ABI impact. The default values can > +change with @samp{-march}, and users should be aware of this and > +perhaps specify these values directly if they need to be ABI > +compatible with code that was compiled with a different @samp{-march}. > +Changing the default values for a generic target should be done > +cautiously. > + > @item loop-interchange-max-num-stmts > The maximum number of stmts in a loop to be interchanged. > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index 91929706aff..0398faf430a 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -722,6 +722,11 @@ Winit-list-lifetime > C++ ObjC++ Var(warn_init_list) Warning Init(1) > Warn about uses of std::initializer_list that can result in dangling > pointers. > > +Winterference-size > +C++ ObjC++ Var(warn_interference_size) Warning Init(1) > +Warn about nonsensical values of --param destructive-interference-size or > +constructive-interference-size. > + > Wimplicit > C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) > Warn about implicit declarations. > diff --git a/gcc/params.opt b/gcc/params.opt > index 92b003e38cb..a81a3ec82f1 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -358,6 +358,21 @@ The maximum code size growth ratio when expanding > into a jump table (in percent) > Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param > Optimization > The size of L1 cache line. > > +-param=destructive-interference-size= > +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param > Optimization > +The minimum recommended offset between two concurrently-accessed objects > to > +avoid additional performance degradation due to contention introduced by > the > +implementation. Typically the L1 cache line size, but can be larger to > +accommodate a variety of target processors with different cache line > sizes. > +C++17 code might use this value in structure layout. > + > +-param=constructive-interference-size= > +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param > Optimization > +The maximum recommended size of contiguous memory occupied by two objects > +accessed with temporal locality by concurrent threads. Typically the L1 > cache > +line size, but can be smaller to accommodate a variety of target > processors with > +different cache line sizes. > + > -param=l1-cache-size= > Common Joined UInteger Var(param_l1_cache_size) Init(64) Param > Optimization > The size of L1 cache. > diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c > index f79f939bd10..a7bf2544533 100644 > --- a/gcc/c-family/c-cppbuiltin.c > +++ b/gcc/c-family/c-cppbuiltin.c > @@ -741,6 +741,18 @@ cpp_atomic_builtins (cpp_reader *pfile) > builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", > targetm.atomic_test_and_set_trueval); > > + /* Macros for C++17 hardware interference size constants. Either both > or > + neither should be set. */ > + gcc_assert (!param_destruct_interfere_size > + == !param_construct_interfere_size); > + if (param_destruct_interfere_size) > + { > + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", > + param_destruct_interfere_size); > + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", > + param_construct_interfere_size); > + } > + > /* ptr_type_node can't be used here since ptr_mode is only set when > toplev calls backend_init which is not done with -E or pch. */ > psize = POINTER_SIZE_UNITS; > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index f5b25a7f704..b172fcdc93e 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -16297,6 +16297,15 @@ aarch64_override_options_internal (struct > gcc_options *opts) > SET_OPTION_IF_UNSET (opts, &global_options_set, > param_l1_cache_line_size, > aarch64_tune_params.prefetch->l1_cache_line_size); > + /* For a generic AArch64 target, cover the current range of cache line > + sizes. */ > + SET_OPTION_IF_UNSET (opts, &global_options_set, > + param_destruct_interfere_size, > + 256); > + SET_OPTION_IF_UNSET (opts, &global_options_set, > + param_construct_interfere_size, > + 64); > + > if (aarch64_tune_params.prefetch->l2_cache_size >= 0) > SET_OPTION_IF_UNSET (opts, &global_options_set, > param_l2_cache_size, > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 6d781e23ee9..edfa2ad3426 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -3656,6 +3656,12 @@ arm_option_override (void) > SET_OPTION_IF_UNSET (&global_options, &global_options_set, > param_l1_cache_line_size, > current_tune->prefetch.l1_cache_line_size); > + /* For a generic ARM target, JF Bastien proposed using 64 for both. > + ??? Why not 32 for constructive? */ > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > + param_destruct_interfere_size, 64); > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > + param_construct_interfere_size, 64); > if (current_tune->prefetch.l1_cache_size >= 0) > SET_OPTION_IF_UNSET (&global_options, &global_options_set, > param_l1_cache_size, > diff --git a/gcc/config/i386/i386-options.c > b/gcc/config/i386/i386-options.c > index 7cba655595e..3b1b3c838f8 100644 > --- a/gcc/config/i386/i386-options.c > +++ b/gcc/config/i386/i386-options.c > @@ -2571,6 +2571,12 @@ ix86_option_override_internal (bool main_args_p, > SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, > ix86_tune_cost->l2_cache_size); > > + /* 64B is the accepted value for these for all x86. */ > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > + param_destruct_interfere_size, 64); > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > + param_construct_interfere_size, 64); > + > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. > */ > if (opts->x_flag_prefetch_loop_arrays < 0 > && HAVE_prefetch > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c > index 01d64a16125..880fb8948a9 100644 > --- a/gcc/cp/decl.c > +++ b/gcc/cp/decl.c > @@ -4732,6 +4732,29 @@ cxx_init_decl_processing (void) > /* Show we use EH for cleanups. */ > if (flag_exceptions) > using_eh_for_cleanups (); > + > + /* Check that the hardware interference sizes are at least > + alignof(max_align_t), as required by the standard. */ > + if (param_destruct_interfere_size) > + { > + int max_align = max_align_t_align () / BITS_PER_UNIT; > + if (param_destruct_interfere_size < max_align) > + error ("%<--param destructive-interference-size=%d%> is less than " > + "%d", param_destruct_interfere_size, max_align); > + else if (param_destruct_interfere_size < param_l1_cache_line_size) > + warning (OPT_Winterference_size, > + "%<--param destructive-interference-size=%d%> " > + "is less than %<--param l1-cache-line-size=%d%>", > + param_destruct_interfere_size, param_l1_cache_line_size); > + if (param_construct_interfere_size < max_align) > + error ("%<--param constructive-interference-size=%d%> is less than > " > + "%d", param_construct_interfere_size, max_align); > + else if (param_construct_interfere_size > param_l1_cache_line_size) > + warning (OPT_Winterference_size, > + "%<--param constructive-interference-size=%d%> " > + "is greater than %<--param l1-cache-line-size=%d%>", > + param_construct_interfere_size, param_l1_cache_line_size); > + } > } > > /* Enter an abi node in global-module context. returns a cookie to > diff --git a/gcc/testsuite/g++.target/aarch64/interference.C > b/gcc/testsuite/g++.target/aarch64/interference.C > new file mode 100644 > index 00000000000..0fc01655223 > --- /dev/null > +++ b/gcc/testsuite/g++.target/aarch64/interference.C > @@ -0,0 +1,9 @@ > +// Test C++17 hardware interference size constants > +// { dg-do compile { target c++17 } } > + > +#include > + > +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent > ones use > +// 128 or even 256. > +static_assert(std::hardware_destructive_interference_size == 256); > +static_assert(std::hardware_constructive_interference_size == 64); > diff --git a/gcc/testsuite/g++.target/arm/interference.C > b/gcc/testsuite/g++.target/arm/interference.C > new file mode 100644 > index 00000000000..34fe8a52bff > --- /dev/null > +++ b/gcc/testsuite/g++.target/arm/interference.C > @@ -0,0 +1,9 @@ > +// Test C++17 hardware interference size constants > +// { dg-do compile { target c++17 } } > + > +#include > + > +// Recent ARM CPUs have a cache line size of 64. Older ones have > +// a size of 32, but I guess they're old enough that we don't care? > +static_assert(std::hardware_destructive_interference_size == 64); > +static_assert(std::hardware_constructive_interference_size == 64); > diff --git a/gcc/testsuite/g++.target/i386/interference.C > b/gcc/testsuite/g++.target/i386/interference.C > new file mode 100644 > index 00000000000..c7b910e3ada > --- /dev/null > +++ b/gcc/testsuite/g++.target/i386/interference.C > @@ -0,0 +1,8 @@ > +// Test C++17 hardware interference size constants > +// { dg-do compile { target c++17 } } > + > +#include > + > +// It is generally agreed that these are the right values for all x86. > +static_assert(std::hardware_destructive_interference_size == 64); > +static_assert(std::hardware_constructive_interference_size == 64); > diff --git a/libstdc++-v3/include/std/version > b/libstdc++-v3/include/std/version > index 27bcd32cb60..d5e155db48b 100644 > --- a/libstdc++-v3/include/std/version > +++ b/libstdc++-v3/include/std/version > @@ -140,6 +140,9 @@ > #define __cpp_lib_filesystem 201703 > #define __cpp_lib_gcd 201606 > #define __cpp_lib_gcd_lcm 201606 > +#ifdef __GCC_DESTRUCTIVE_SIZE > +# define __cpp_lib_hardware_interference_size 201703L > +#endif > #define __cpp_lib_hypot 201603 > #define __cpp_lib_invoke 201411L > #define __cpp_lib_lcm 201606 > diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new > index 3349b13fd1b..7bc67a6cb02 100644 > --- a/libstdc++-v3/libsupc++/new > +++ b/libstdc++-v3/libsupc++/new > @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) > _GLIBCXX_USE_NOEXCEPT { } > } // extern "C++" > > #if __cplusplus >= 201703L > -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER > namespace std > { > +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER > #define __cpp_lib_launder 201606 > /// Pointer optimization barrier [ptr.launder] > template > @@ -205,8 +205,14 @@ namespace std > void launder(const void*) = delete; > void launder(volatile void*) = delete; > void launder(const volatile void*) = delete; > -} > #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER > + > +#ifdef __GCC_DESTRUCTIVE_SIZE > +# define __cpp_lib_hardware_interference_size 201703L > + inline constexpr size_t hardware_destructive_interference_size = > __GCC_DESTRUCTIVE_SIZE; > + inline constexpr size_t hardware_constructive_interference_size = > __GCC_CONSTRUCTIVE_SIZE; > +#endif // __GCC_DESTRUCTIVE_SIZE > +} > #endif // C++17 > > #if __cplusplus > 201703L > > base-commit: f6dde32b9d487dd6e343d0a1e1d1f60783f5e735 > prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092 > prerequisite-patch-id: 36fa087f6b261576422f8f3b1638f76e5183c95a > -- > 2.27.0 > >