From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 30C8E385781F for ; Fri, 16 Jul 2021 02:37:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 30C8E385781F Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-585-l8LxEJjpNiKG1WZQj7l72Q-1; Thu, 15 Jul 2021 22:37:00 -0400 X-MC-Unique: l8LxEJjpNiKG1WZQj7l72Q-1 Received: by mail-qk1-f199.google.com with SMTP id bk12-20020a05620a1a0cb02903b899a4309cso5333250qkb.14 for ; Thu, 15 Jul 2021 19:37:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=4WhcxRKrInp6lz5IgnQFUQDhnjB5OrncccfULpYJE1g=; b=B36KC9H/RPMQ5stU79KF4/6CLfQNC0zcbpgmmuqOG3zBwnWmNlBTj5EUULNi4WNkab e/G4oyY5eKWdI+T6OxHf7x1YpOuHLC4rCaOFLImptxqCgEQsMwLdyTsS7efCzvDYHuOn Rb7CgCmM79Mj4snAqMfVzo/hqLhn+H+6qA6amqQ72O2Rkm5DIJyR3sIIalD4boxrg13D OfRVC+9d5Nsg8YsNLRHnN3OcBvLoh/ZYlnkcvUNfWoYEmNTql5qo8vCtV3zegTil1qdv RSnYlWOaj1YEBc6dCR3/XLNa4wjERk6a24BI/+5sTNI+7h3f8uDA7qBufulS1NkMWNnT jO0A== X-Gm-Message-State: AOAM531fka1/zrP5orx7NgxdA04kprvjmYKZ08HtH5HuxZjMgrQwcs8C uFQ/A6mClatM7p/yqqZRm2v8y0D/tDHBiZoA4dZOosx+azWTSkCkI/+u33gcQnoSXtpIGOwycgW CFRLp/KwNmeBzBu1o7PfJv57F3/20mkHaCJohWuez5AEw7IZtXM2P/FFREVu9wRUhRg== X-Received: by 2002:a05:620a:444d:: with SMTP id w13mr7289966qkp.417.1626403019759; Thu, 15 Jul 2021 19:36:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6HCzqd1+WjTWzXnjse2PaswExKm0slXiEwlOR/rKtHUzOt3OHmGLt2gJJEDjJKWIDfgPp4A== X-Received: by 2002:a05:620a:444d:: with SMTP id w13mr7289931qkp.417.1626403019209; Thu, 15 Jul 2021 19:36:59 -0700 (PDT) Received: from barrymore.redhat.com (130-44-159-43.s11817.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43]) by smtp.gmail.com with ESMTPSA id l6sm3394313qkk.117.2021.07.15.19.36.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jul 2021 19:36:58 -0700 (PDT) From: Jason Merrill To: gcc-patches@gcc.gnu.org Subject: [PATCH] c++: implement C++17 hardware interference size Date: Thu, 15 Jul 2021 22:36:56 -0400 Message-Id: <20210716023656.670004-1-jason@redhat.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jul 2021 02:37:06 -0000 The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets need to override these values in targetm.target_option.override() to support the feature. 64 bytes still seems correct for the x86 family. I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL target, so I'd think 32/64 would make more sense. He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Does that seem right? Currently the patch does not adjust the values based on -march, as in JF's proposal. I'll need more guidance from the ARM/AArch64 maintainers about how to go about that. --param l1-cache-line-size is set based on -mtune, but I don't think we want -mtune to change these ABI-affecting values. Are there -march values for which a smaller range than 64-256 makes sense? gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/doc/invoke.texi | 22 ++++++++++++++++++ gcc/c-family/c.opt | 5 ++++ gcc/params.opt | 15 ++++++++++++ gcc/c-family/c-cppbuiltin.c | 12 ++++++++++ gcc/config/aarch64/aarch64.c | 9 ++++++++ gcc/config/arm/arm.c | 6 +++++ gcc/config/i386/i386-options.c | 6 +++++ gcc/cp/decl.c | 23 +++++++++++++++++++ .../g++.target/aarch64/interference.C | 9 ++++++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++++++ libstdc++-v3/include/std/version | 3 +++ libstdc++-v3/libsupc++/new | 10 ++++++-- 13 files changed, 135 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C create mode 100644 gcc/testsuite/g++.target/arm/interference.C create mode 100644 gcc/testsuite/g++.target/i386/interference.C diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index ea8812425e9..f93cb7a20f7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive_interference_size +@item constructive_interference_size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. If the target can have a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +These values, particularly the destructive size, are intended to be +used for layout, and thus have ABI impact. The default values can +change with @samp{-march}, and users should be aware of this and +perhaps specify these values directly if they need to be ABI +compatible with code that was compiled with a different @samp{-march}. +Changing the default values for a generic target should be done +cautiously. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 91929706aff..0398faf430a 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/params.opt b/gcc/params.opt index 92b003e38cb..a81a3ec82f1 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -358,6 +358,21 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index f79f939bd10..a7bf2544533 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,18 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f5b25a7f704..b172fcdc93e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16297,6 +16297,15 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6d781e23ee9..edfa2ad3426 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3656,6 +3656,12 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + /* For a generic ARM target, JF Bastien proposed using 64 for both. + ??? Why not 32 for constructive? */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 7cba655595e..3b1b3c838f8 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2571,6 +2571,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 01d64a16125..880fb8948a9 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4732,6 +4732,29 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + if (param_destruct_interfere_size) + { + int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index 27bcd32cb60..d5e155db48b 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L base-commit: f6dde32b9d487dd6e343d0a1e1d1f60783f5e735 prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092 prerequisite-patch-id: 36fa087f6b261576422f8f3b1638f76e5183c95a -- 2.27.0