From: Noah Goldstein <goldstein.w.n@gmail.com>
To: Jason Merrill <jason@redhat.com>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>,
"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>,
"libstdc++" <libstdc++@gcc.gnu.org>,
GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH] c++: implement C++17 hardware interference size
Date: Thu, 15 Jul 2021 22:48:52 -0400 [thread overview]
Message-ID: <CAFUsyf+eKUg=8rUEL2FfYnfzxCG=U7giPpZiDwBGRFpw+kwmzQ@mail.gmail.com> (raw)
In-Reply-To: <CADzB+2ktqA=nzYS5zJAxH9TfFexP-OzZw8SoPOd907-qxVbWpg@mail.gmail.com>
On Thu, Jul 15, 2021 at 10:41 PM Jason Merrill via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
> Adding CCs that got lost in the initial mail.
>
> On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill <jason@redhat.com> wrote:
>
> > The last missing piece of the C++17 standard library is the hardware
> > intereference size constants. Much of the delay in implementing these
> has
> > been due to uncertainty about what the right values are, and even whether
> > there is a single constant value that is suitable; the destructive
> > interference size is intended to be used in structure layout, so program
> > ABIs will depend on it.
> >
> > In principle, both of these values should be the same as the target's L1
> > cache line size. When compiling for a generic target that is intended to
> > support a range of target CPUs with different cache line sizes, the
> > constructive size should probably be the minimum size, and the
> destructive
> > size the maximum, unless you are constrained by ABI compatibility with
> > previous code.
> >
> > JF Bastien's implementation proposal is summarized at
> > https://github.com/itanium-cxx-abi/cxx-abi/issues/74
> >
> > I implement this by adding new --params for the two sizes. Targets need
> to
> > override these values in targetm.target_option.override() to support the
> > feature.
> >
> > 64 bytes still seems correct for the x86 family.
> >
> > I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
> > 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
> > target, so I'd think 32/64 would make more sense.
> >
> > He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
> > line, I've changed that to 64/256. Does that seem right?
> >
> > Currently the patch does not adjust the values based on -march, as in
> JF's
> > proposal. I'll need more guidance from the ARM/AArch64 maintainers about
> > how to go about that. --param l1-cache-line-size is set based on -mtune,
> > but I don't think we want -mtune to change these ABI-affecting values.
> Are
> > there -march values for which a smaller range than 64-256 makes sense?
> >
> > gcc/ChangeLog:
> >
> > * params.opt: Add destructive-interference-size and
> > constructive-interference-size.
> > * doc/invoke.texi: Document them.
> > * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > Set them.
> > * config/arm/arm.c (arm_option_override): Set them.
> > * config/i386/i386-options.c (ix86_option_override_internal):
> > Set them.
> >
> > gcc/c-family/ChangeLog:
> >
> > * c.opt: Add -Winterference-size.
> > * c-cppbuiltin.c (cpp_atomic_builtins): Add
> __GCC_DESTRUCTIVE_SIZE
> > and __GCC_CONSTRUCTIVE_SIZE.
> >
> > gcc/cp/ChangeLog:
> >
> > * decl.c (cxx_init_decl_processing): Check
> > --param *-interference-size values.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/version: Define
> __cpp_lib_hardware_interference_size.
> > * libsupc++/new: Define hardware interference size variables.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.target/aarch64/interference.C: New test.
> > * g++.target/arm/interference.C: New test.
> > * g++.target/i386/interference.C: New test.
> > ---
> > gcc/doc/invoke.texi | 22 ++++++++++++++++++
> > gcc/c-family/c.opt | 5 ++++
> > gcc/params.opt | 15 ++++++++++++
> > gcc/c-family/c-cppbuiltin.c | 12 ++++++++++
> > gcc/config/aarch64/aarch64.c | 9 ++++++++
> > gcc/config/arm/arm.c | 6 +++++
> > gcc/config/i386/i386-options.c | 6 +++++
> > gcc/cp/decl.c | 23 +++++++++++++++++++
> > .../g++.target/aarch64/interference.C | 9 ++++++++
> > gcc/testsuite/g++.target/arm/interference.C | 9 ++++++++
> > gcc/testsuite/g++.target/i386/interference.C | 8 +++++++
> > libstdc++-v3/include/std/version | 3 +++
> > libstdc++-v3/libsupc++/new | 10 ++++++--
> > 13 files changed, 135 insertions(+), 2 deletions(-)
> > create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
> > create mode 100644 gcc/testsuite/g++.target/arm/interference.C
> > create mode 100644 gcc/testsuite/g++.target/i386/interference.C
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index ea8812425e9..f93cb7a20f7 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant
> > stride.
> >
> > This setting is only useful for strides that are known and constant.
> >
> > +@item destructive_interference_size
> > +@item constructive_interference_size
> > +The values for the C++17 variables
> > +@code{std::hardware_destructive_interference_size} and
> > +@code{std::hardware_constructive_interference_size}. The destructive
> > +interference size is the minimum recommended offset between two
> > +independent concurrently-accessed objects; the constructive
> > +interference size is the maximum recommended size of contiguous memory
> > +accessed together. Typically both will be the size of an L1 cache
> > +line for the target, in bytes. If the target can have a range of L1
> > +cache line sizes, typically the constructive interference size will be
> > +the small end of the range and the destructive size will be the large
> > +end.
> > +
> > +These values, particularly the destructive size, are intended to be
> > +used for layout, and thus have ABI impact. The default values can
> > +change with @samp{-march}, and users should be aware of this and
> > +perhaps specify these values directly if they need to be ABI
> > +compatible with code that was compiled with a different @samp{-march}.
> > +Changing the default values for a generic target should be done
> > +cautiously.
> > +
> > @item loop-interchange-max-num-stmts
> > The maximum number of stmts in a loop to be interchanged.
> >
> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index 91929706aff..0398faf430a 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -722,6 +722,11 @@ Winit-list-lifetime
> > C++ ObjC++ Var(warn_init_list) Warning Init(1)
> > Warn about uses of std::initializer_list that can result in dangling
> > pointers.
> >
> > +Winterference-size
> > +C++ ObjC++ Var(warn_interference_size) Warning Init(1)
> > +Warn about nonsensical values of --param destructive-interference-size
> or
> > +constructive-interference-size.
> > +
> > Wimplicit
> > C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
> > Warn about implicit declarations.
> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index 92b003e38cb..a81a3ec82f1 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -358,6 +358,21 @@ The maximum code size growth ratio when expanding
> > into a jump table (in percent)
> > Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param
> > Optimization
> > The size of L1 cache line.
> >
> > +-param=destructive-interference-size=
> > +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param
> > Optimization
> > +The minimum recommended offset between two concurrently-accessed objects
> > to
> > +avoid additional performance degradation due to contention introduced by
> > the
> > +implementation. Typically the L1 cache line size, but can be larger to
> > +accommodate a variety of target processors with different cache line
> > sizes.
> > +C++17 code might use this value in structure layout.
> > +
> > +-param=constructive-interference-size=
> > +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param
> > Optimization
> > +The maximum recommended size of contiguous memory occupied by two
> objects
> > +accessed with temporal locality by concurrent threads. Typically the L1
> > cache
> > +line size, but can be smaller to accommodate a variety of target
> > processors with
> > +different cache line sizes.
> > +
> > -param=l1-cache-size=
> > Common Joined UInteger Var(param_l1_cache_size) Init(64) Param
> > Optimization
> > The size of L1 cache.
> > diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> > index f79f939bd10..a7bf2544533 100644
> > --- a/gcc/c-family/c-cppbuiltin.c
> > +++ b/gcc/c-family/c-cppbuiltin.c
> > @@ -741,6 +741,18 @@ cpp_atomic_builtins (cpp_reader *pfile)
> > builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
> > targetm.atomic_test_and_set_trueval);
> >
> > + /* Macros for C++17 hardware interference size constants. Either both
> > or
> > + neither should be set. */
> > + gcc_assert (!param_destruct_interfere_size
> > + == !param_construct_interfere_size);
> > + if (param_destruct_interfere_size)
> > + {
> > + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
> > + param_destruct_interfere_size);
> > + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
> > + param_construct_interfere_size);
> > + }
> > +
> > /* ptr_type_node can't be used here since ptr_mode is only set when
> > toplev calls backend_init which is not done with -E or pch. */
> > psize = POINTER_SIZE_UNITS;
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index f5b25a7f704..b172fcdc93e 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -16297,6 +16297,15 @@ aarch64_override_options_internal (struct
> > gcc_options *opts)
> > SET_OPTION_IF_UNSET (opts, &global_options_set,
> > param_l1_cache_line_size,
> >
> aarch64_tune_params.prefetch->l1_cache_line_size);
> > + /* For a generic AArch64 target, cover the current range of cache line
> > + sizes. */
> > + SET_OPTION_IF_UNSET (opts, &global_options_set,
> > + param_destruct_interfere_size,
> > + 256);
> > + SET_OPTION_IF_UNSET (opts, &global_options_set,
> > + param_construct_interfere_size,
> > + 64);
> > +
> > if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
> > SET_OPTION_IF_UNSET (opts, &global_options_set,
> > param_l2_cache_size,
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index 6d781e23ee9..edfa2ad3426 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -3656,6 +3656,12 @@ arm_option_override (void)
> > SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > param_l1_cache_line_size,
> > current_tune->prefetch.l1_cache_line_size);
> > + /* For a generic ARM target, JF Bastien proposed using 64 for both.
> > + ??? Why not 32 for constructive? */
> > + SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > + param_destruct_interfere_size, 64);
> > + SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > + param_construct_interfere_size, 64);
> > if (current_tune->prefetch.l1_cache_size >= 0)
> > SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > param_l1_cache_size,
> > diff --git a/gcc/config/i386/i386-options.c
> > b/gcc/config/i386/i386-options.c
> > index 7cba655595e..3b1b3c838f8 100644
> > --- a/gcc/config/i386/i386-options.c
> > +++ b/gcc/config/i386/i386-options.c
> > @@ -2571,6 +2571,12 @@ ix86_option_override_internal (bool main_args_p,
> > SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
> > ix86_tune_cost->l2_cache_size);
> >
> > + /* 64B is the accepted value for these for all x86. */
> > + SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > + param_destruct_interfere_size, 64);
> > + SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > + param_construct_interfere_size, 64);
> > +
> > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.
> > */
> > if (opts->x_flag_prefetch_loop_arrays < 0
> > && HAVE_prefetch
> > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> > index 01d64a16125..880fb8948a9 100644
> > --- a/gcc/cp/decl.c
> > +++ b/gcc/cp/decl.c
> > @@ -4732,6 +4732,29 @@ cxx_init_decl_processing (void)
> > /* Show we use EH for cleanups. */
> > if (flag_exceptions)
> > using_eh_for_cleanups ();
> > +
> > + /* Check that the hardware interference sizes are at least
> > + alignof(max_align_t), as required by the standard. */
> > + if (param_destruct_interfere_size)
> > + {
> > + int max_align = max_align_t_align () / BITS_PER_UNIT;
> > + if (param_destruct_interfere_size < max_align)
> > + error ("%<--param destructive-interference-size=%d%> is less
> than "
> > + "%d", param_destruct_interfere_size, max_align);
> > + else if (param_destruct_interfere_size < param_l1_cache_line_size)
> > + warning (OPT_Winterference_size,
> > + "%<--param destructive-interference-size=%d%> "
> > + "is less than %<--param l1-cache-line-size=%d%>",
> > + param_destruct_interfere_size,
> param_l1_cache_line_size);
> > + if (param_construct_interfere_size < max_align)
> > + error ("%<--param constructive-interference-size=%d%> is less
> than
> > "
> > + "%d", param_construct_interfere_size, max_align);
> > + else if (param_construct_interfere_size >
> param_l1_cache_line_size)
> > + warning (OPT_Winterference_size,
> > + "%<--param constructive-interference-size=%d%> "
> > + "is greater than %<--param l1-cache-line-size=%d%>",
> > + param_construct_interfere_size,
> param_l1_cache_line_size);
> > + }
> > }
> >
> > /* Enter an abi node in global-module context. returns a cookie to
> > diff --git a/gcc/testsuite/g++.target/aarch64/interference.C
> > b/gcc/testsuite/g++.target/aarch64/interference.C
> > new file mode 100644
> > index 00000000000..0fc01655223
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/aarch64/interference.C
> > @@ -0,0 +1,9 @@
> > +// Test C++17 hardware interference size constants
> > +// { dg-do compile { target c++17 } }
> > +
> > +#include <new>
> > +
> > +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent
> > ones use
> > +// 128 or even 256.
> > +static_assert(std::hardware_destructive_interference_size == 256);
> > +static_assert(std::hardware_constructive_interference_size == 64);
> > diff --git a/gcc/testsuite/g++.target/arm/interference.C
> > b/gcc/testsuite/g++.target/arm/interference.C
> > new file mode 100644
> > index 00000000000..34fe8a52bff
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/arm/interference.C
> > @@ -0,0 +1,9 @@
> > +// Test C++17 hardware interference size constants
> > +// { dg-do compile { target c++17 } }
> > +
> > +#include <new>
> > +
> > +// Recent ARM CPUs have a cache line size of 64. Older ones have
> > +// a size of 32, but I guess they're old enough that we don't care?
> > +static_assert(std::hardware_destructive_interference_size == 64);
> > +static_assert(std::hardware_constructive_interference_size == 64);
> > diff --git a/gcc/testsuite/g++.target/i386/interference.C
> > b/gcc/testsuite/g++.target/i386/interference.C
> > new file mode 100644
> > index 00000000000..c7b910e3ada
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/interference.C
> > @@ -0,0 +1,8 @@
> > +// Test C++17 hardware interference size constants
> > +// { dg-do compile { target c++17 } }
> > +
> > +#include <new>
> > +
> > +// It is generally agreed that these are the right values for all x86.
> > +static_assert(std::hardware_destructive_interference_size == 64);
> > +static_assert(std::hardware_constructive_interference_size == 64);
> > diff --git a/libstdc++-v3/include/std/version
> > b/libstdc++-v3/include/std/version
> > index 27bcd32cb60..d5e155db48b 100644
> > --- a/libstdc++-v3/include/std/version
> > +++ b/libstdc++-v3/include/std/version
> > @@ -140,6 +140,9 @@
> > #define __cpp_lib_filesystem 201703
> > #define __cpp_lib_gcd 201606
> > #define __cpp_lib_gcd_lcm 201606
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > +#endif
> > #define __cpp_lib_hypot 201603
> > #define __cpp_lib_invoke 201411L
> > #define __cpp_lib_lcm 201606
> > diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
> > index 3349b13fd1b..7bc67a6cb02 100644
> > --- a/libstdc++-v3/libsupc++/new
> > +++ b/libstdc++-v3/libsupc++/new
> > @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*)
> > _GLIBCXX_USE_NOEXCEPT { }
> > } // extern "C++"
> >
> > #if __cplusplus >= 201703L
> > -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> > namespace std
> > {
> > +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> > #define __cpp_lib_launder 201606
> > /// Pointer optimization barrier [ptr.launder]
> > template<typename _Tp>
> > @@ -205,8 +205,14 @@ namespace std
> > void launder(const void*) = delete;
> > void launder(volatile void*) = delete;
> > void launder(const volatile void*) = delete;
> > -}
> > #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
> > +
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > + inline constexpr size_t hardware_destructive_interference_size =
> > __GCC_DESTRUCTIVE_SIZE;
> > + inline constexpr size_t hardware_constructive_interference_size =
> > __GCC_CONSTRUCTIVE_SIZE;
> > +#endif // __GCC_DESTRUCTIVE_SIZE
> > +}
> > #endif // C++17
> >
> > #if __cplusplus > 201703L
> >
> > base-commit: f6dde32b9d487dd6e343d0a1e1d1f60783f5e735
> > prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092
> > prerequisite-patch-id: 36fa087f6b261576422f8f3b1638f76e5183c95a
> > --
> > 2.27.0
> >
> >
>
On intel x86 systems with a private L2 cache the spatial prefetcher
can cause destructive interference along 128 byte aligned boundaries.
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=60
next prev parent reply other threads:[~2021-07-16 2:49 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-16 2:36 Jason Merrill
2021-07-16 2:41 ` Jason Merrill
2021-07-16 2:48 ` Noah Goldstein [this message]
2021-07-16 11:17 ` Jonathan Wakely
2021-07-16 13:27 ` Richard Earnshaw
2021-07-16 13:26 ` Jonathan Wakely
2021-07-16 15:12 ` Matthias Kretz
2021-07-16 15:30 ` Jason Merrill
2021-07-16 16:54 ` Jonathan Wakely
2021-07-16 18:43 ` Jason Merrill
2021-07-16 19:26 ` Matthias Kretz
2021-07-16 19:58 ` Jonathan Wakely
2021-07-17 8:14 ` Matthias Kretz
2021-07-17 13:32 ` Jonathan Wakely
2021-07-17 13:54 ` Matthias Kretz
2021-07-17 21:37 ` Jason Merrill
2021-07-19 9:41 ` Richard Earnshaw
2021-07-20 16:43 ` Jason Merrill
2021-09-10 13:16 ` [PATCH RFC] " Jason Merrill
2021-09-14 7:56 ` Christophe LYON
2021-09-15 10:25 ` Richard Earnshaw
2021-09-15 11:30 ` Christophe Lyon
2021-09-15 12:31 ` Martin Liška
2021-09-15 15:31 ` [pushed] c++: don't warn about internal interference sizes Jason Merrill
2021-09-15 15:36 ` Jeff Law
2021-09-15 15:38 ` Jason Merrill
2021-09-15 16:15 ` Christophe Lyon
2021-09-15 15:35 ` [PATCH RFC] c++: implement C++17 hardware interference size Jason Merrill
2021-07-20 18:05 ` [PATCH] " Thomas Rodgers
2021-07-16 17:20 ` Noah Goldstein
2021-07-16 19:37 ` Matthias Kretz
2021-07-16 21:23 ` Noah Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFUsyf+eKUg=8rUEL2FfYnfzxCG=U7giPpZiDwBGRFpw+kwmzQ@mail.gmail.com' \
--to=goldstein.w.n@gmail.com \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jason@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).