From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
To: Richard Henderson <richard.henderson@linaro.org>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,
"agraf@suse.de" <agraf@suse.de>,
Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
James Greenhalgh <James.Greenhalgh@arm.com>
Subject: Re: [PATCH, AArch64, v3 4/6] aarch64: Add out-of-line functions for LSE atomics
Date: Thu, 05 Sep 2019 10:00:00 -0000 [thread overview]
Message-ID: <38e61fc9-6418-1036-058d-2a9d961dac57@foss.arm.com> (raw)
In-Reply-To: <20181101214648.29432-5-richard.henderson@linaro.org>
Hi Richard,
On 11/1/18 9:46 PM, Richard Henderson wrote:
> This is the libgcc part of the interface -- providing the functions.
> Rationale is provided at the top of libgcc/config/aarch64/lse.S.
>
> Â Â Â Â Â Â Â * config/aarch64/lse-init.c: New file.
> Â Â Â Â Â Â Â * config/aarch64/lse.S: New file.
> Â Â Â Â Â Â Â * config/aarch64/t-lse: New file.
> Â Â Â Â Â Â Â * config.host: Add t-lse to all aarch64 tuples.
> ---
> Â libgcc/config/aarch64/lse-init.c |Â 45 ++++++
>  libgcc/config.host              |  4 +
> Â libgcc/config/aarch64/lse.SÂ Â Â Â Â | 238 +++++++++++++++++++++++++++++++
>  libgcc/config/aarch64/t-lse     | 44 ++++++
> Â 4 files changed, 331 insertions(+)
> Â create mode 100644 libgcc/config/aarch64/lse-init.c
> Â create mode 100644 libgcc/config/aarch64/lse.S
> Â create mode 100644 libgcc/config/aarch64/t-lse
>
> diff --git a/libgcc/config/aarch64/lse-init.c
> b/libgcc/config/aarch64/lse-init.c
> new file mode 100644
> index 00000000000..03b4e1e8ea8
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse-init.c
> @@ -0,0 +1,45 @@
> +/* Out-of-line LSE atomics for AArch64 architecture, Init.
> +Â Â Copyright (C) 2018 Free Software Foundation, Inc.
> +Â Â Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> +<http://www.gnu.org/licenses/>. */
> +
> +/* Define the symbol gating the LSE implementations. */
> +extern _Bool __aa64_have_atomics
> +Â Â Â __attribute__((visibility("hidden"), nocommon));
> +
Bootstrapping this patch series on an Armv8-A system with OOL atomics
enabled by default gave me link errors
when building libgomp about __aa64_have_atomics being undefined.
I haven't followed the series from the start so maybe I'm missing some
things, but I don't see where this variable is supposed to "live"?
Removing the 'extern' from here allows the bootstrap to proceed but it
fails at a later stage with bizzare errors like:
In file included from build/gencondmd.c:51:
$SRC/gcc/config/aarch64/constraints.md: In function ‘bool
satisfies_constraint_S(rtx)Â’:
$SRC/gcc/config/aarch64/constraints.md:120:10: error: ‘C’ was not
declared in this scope; did you mean ‘PC’?
 120 | (define_constraint "Y"
     |         ^
     |         PC
which looks like a miscompilation of sorts.
Thanks,
Kyrill
> +/* Disable initialization of __aa64_have_atomics during bootstrap. */
> +#ifndef inhibit_libc
> +# include <sys/auxv.h>
> +
> +/* Disable initialization if the system headers are too old. */
> +# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
> +
> +static void __attribute__((constructor))
> +init_have_atomics (void)
> +{
> +Â unsigned long hwcap = getauxval (AT_HWCAP);
> +Â __aa64_have_atomics = (hwcap & HWCAP_ATOMICS) != 0;
> +}
> +
> +# endif /* HWCAP */
> +#endif /* inhibit_libc */
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 029f6569caf..7e9a8b6bc8f 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -340,23 +340,27 @@ aarch64*-*-elf | aarch64*-*-rtems*)
> Â Â Â Â Â Â Â Â extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
> Â Â Â Â Â Â Â Â extra_parts="$extra_parts crtfastmath.o"
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> Â Â Â Â Â Â Â Â md_unwind_header=aarch64/aarch64-unwind.h
> Â Â Â Â Â Â Â Â ;;
> Â aarch64*-*-freebsd*)
> Â Â Â Â Â Â Â Â extra_parts="$extra_parts crtfastmath.o"
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> Â Â Â Â Â Â Â Â md_unwind_header=aarch64/freebsd-unwind.h
> Â Â Â Â Â Â Â Â ;;
> Â aarch64*-*-fuchsia*)
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
> Â Â Â Â Â Â Â Â ;;
> Â aarch64*-*-linux*)
> Â Â Â Â Â Â Â Â extra_parts="$extra_parts crtfastmath.o"
> Â Â Â Â Â Â Â Â md_unwind_header=aarch64/linux-unwind.h
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
> +Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
> Â Â Â Â Â Â Â Â tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> Â Â Â Â Â Â Â Â ;;
> Â alpha*-*-linux*)
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> new file mode 100644
> index 00000000000..3e42a6569af
> --- /dev/null
> +++ b/libgcc/config/aarch64/lse.S
> @@ -0,0 +1,238 @@
> +/* Out-of-line LSE atomics for AArch64 architecture.
> +Â Â Copyright (C) 2018 Free Software Foundation, Inc.
> +Â Â Contributed by Linaro Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> +<http://www.gnu.org/licenses/>. */
> +
> +/*
> + * The problem that we are trying to solve is operating system deployment
> + * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
> + *
> + * There are a number of potential solutions for this problem which have
> + * been proposed and rejected for various reasons. To recap:
> + *
> + * (1) Multiple builds. The dynamic linker will examine /lib64/atomics/
> + * if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
> + * However, not all Linux distributions are happy with multiple builds,
> + * and anyway it has no effect on main applications.
> + *
> + * (2) IFUNC. We could put these functions into libgcc_s.so, and have
> + * a single copy of each function for all DSOs. However, ARM is
> concerned
> + * that the branch-to-indirect-branch that is implied by using a PLT,
> + * as required by IFUNC, is too much overhead for smaller cpus.
> + *
> + * (3) Statically predicted direct branches. This is the approach that
> + * is taken here. These functions are linked into every DSO that
> uses them.
> + * All of the symbols are hidden, so that the functions are called via a
> + * direct branch. The choice of LSE vs non-LSE is done via one byte load
> + * followed by a well-predicted direct branch. The functions are
> compiled
> + * separately to minimize code size.
> + */
> +
> +/* Tell the assembler to accept LSE instructions. */
> +Â Â Â Â Â Â .arch armv8-a+lse
> +
> +/* Declare the symbol gating the LSE implementations. */
> +Â Â Â Â Â Â .hidden __aa64_have_atomics
> +
> +/* Turn size and memory model defines into mnemonic fragments. */
> +#if SIZE == 1
> +# define SÂ Â Â Â b
> +# define MASKÂ , uxtb
> +#elif SIZE == 2
> +# define SÂ Â Â Â h
> +# define MASKÂ , uxth
> +#elif SIZE == 4 || SIZE == 8 || SIZE == 16
> +# define S
> +# define MASK
> +#else
> +# error
> +#endif
> +
> +#if MODEL == 1
> +# define SUFFÂ _relax
> +# define A
> +# define L
> +#elif MODEL == 2
> +# define SUFFÂ _acq
> +# define AÂ Â Â Â a
> +# define L
> +#elif MODEL == 3
> +# define SUFFÂ _rel
> +# define A
> +# define LÂ Â Â Â l
> +#elif MODEL == 4
> +# define SUFFÂ _acq_rel
> +# define AÂ Â Â Â a
> +# define LÂ Â Â Â l
> +#else
> +# error
> +#endif
> +
> +/* Concatenate symbols. */
> +#define glue2_(A, B)Â Â Â Â Â Â Â Â Â Â A ## B
> +#define glue2(A, B)Â Â Â Â Â Â Â Â Â Â Â glue2_(A, B)
> +#define glue3_(A, B, C)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â A ## B ## C
> +#define glue3(A, B, C)Â Â Â Â Â Â Â Â glue3_(A, B, C)
> +#define glue4_(A, B, C, D)Â Â Â Â A ## B ## C ## D
> +#define glue4(A, B, C, D)Â Â Â Â Â glue4_(A, B, C, D)
> +
> +/* Select the size of a register, given a regno. */
> +#define x(N)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â glue2(x, N)
> +#define w(N)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â glue2(w, N)
> +#if SIZE < 8
> +# define s(N)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â w(N)
> +#else
> +# define s(N)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â x(N)
> +#endif
> +
> +#define NAME(BASE)Â Â Â Â Â Â Â Â Â Â Â Â glue4(__aa64_, BASE, SIZE, SUFF)
> +#define LDXRÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â glue4(ld, A, xr, S)
> +#define STXRÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â glue4(st, L, xr, S)
> +
> +/* Temporary registers used. Other than these, only the return value
> +  register (x0) and the flags are modified. */
> +#define tmp0Â Â 16
> +#define tmp1Â Â 17
> +#define tmp2Â Â 15
> +
> +/* Start and end a function. */
> +.macro STARTFN name
> +Â Â Â Â Â Â .text
> +Â Â Â Â Â Â .balign 16
> +      .globl \name
> +Â Â Â Â Â Â .hidden \name
> +      .type  \name, %function
> +\name:
> +.endm
> +
> +.macro ENDFN name
> +      .size  \name, . - \name
> +.endm
> +
> +/* Branch to LABEL if LSE is enabled.
> +Â Â The branch should be easily predicted, in that it will, after
> constructors,
> +  always branch the same way. The expectation is that systems that
> implement
> +Â Â ARMv8.1-Atomics are "beefier" than those that omit the extension.
> +Â Â By arranging for the fall-through path to use load-store-exclusive
> insns,
> +  we aid the branch predictor of the smallest cpus. */
> +.macro JUMP_IF_LSE label
> +      adrp   x(tmp0), __aa64_have_atomics
> +      ldrb   w(tmp0), [x(tmp0), :lo12:__aa64_have_atomics]
> +      cbnz   w(tmp0), \label
> +.endm
> +
> +#ifdef L_cas
> +
> +STARTFNÂ Â Â Â Â Â Â NAME(cas)
> +Â Â Â Â Â Â JUMP_IF_LSEÂ Â Â Â 8f
> +
> +#if SIZE < 16
> +#define CASÂ Â Â glue4(cas, A, L, S)
> +
> +      mov            s(tmp0), s(0)
> +0:Â Â Â Â LDXRÂ Â Â Â Â Â Â Â Â Â Â s(0), [x2]
> +      cmp            s(0), s(tmp0) MASK
> +      bne            1f
> +Â Â Â Â Â Â STXRÂ Â Â Â Â Â Â Â Â Â Â w(tmp1), s(1), [x2]
> +      cbnz           w(tmp1), 0b
> +1:Â Â Â Â ret
> +
> +8:Â Â Â Â CASÂ Â Â Â Â Â Â Â Â Â Â Â w(0), w(1), [x2]
> +Â Â Â Â Â Â ret
> +
> +#else
> +#define LDXPÂ Â glue3(ld, A, xp)
> +#define STXPÂ Â glue3(st, L, xp)
> +#define CASPÂ Â glue3(casp, A, L)
> +
> +      mov            x(tmp0), x0
> +      mov            x(tmp1), x1
> +0:Â Â Â Â LDXPÂ Â Â Â Â Â Â Â Â Â Â x0, x1, [x4]
> +      cmp            x0, x(tmp0)
> +      ccmp           x1, x(tmp1), #0, eq
> +      bne            1f
> +Â Â Â Â Â Â STXPÂ Â Â Â Â Â Â Â Â Â Â w(tmp2), x(tmp0), x(tmp1), [x4]
> +      cbnz           w(tmp2), 0b
> +1:Â Â Â Â ret
> +
> +8:Â Â Â Â CASPÂ Â Â Â Â Â Â Â Â Â Â x0, x1, x2, x3, [x4]
> +Â Â Â Â Â Â ret
> +
> +#endif
> +
> +ENDFNÂ NAME(cas)
> +#endif
> +
> +#ifdef L_swp
> +#define SWPÂ Â Â glue4(swp, A, L, S)
> +
> +STARTFNÂ Â Â Â Â Â Â NAME(swp)
> +Â Â Â Â Â Â JUMP_IF_LSEÂ Â Â Â 8f
> +
> +      mov            s(tmp0), s(0)
> +0:Â Â Â Â LDXRÂ Â Â Â Â Â Â Â Â Â Â s(0), [x1]
> +Â Â Â Â Â Â STXRÂ Â Â Â Â Â Â Â Â Â Â w(tmp1), s(tmp0), [x1]
> +      cbnz           w(tmp1), 0b
> +Â Â Â Â Â Â ret
> +
> +8:Â Â Â Â SWPÂ Â Â Â Â Â Â Â Â Â Â Â w(0), w(0), [x1]
> +Â Â Â Â Â Â ret
> +
> +ENDFNÂ NAME(swp)
> +#endif
> +
> +#if defined(L_ldadd) || defined(L_ldclr) \
> +Â Â Â || defined(L_ldeor) || defined(L_ldset)
> +
> +#ifdef L_ldadd
> +#define LDNMÂ Â ldadd
> +#define OPÂ Â Â Â add
> +#elif defined(L_ldclr)
> +#define LDNMÂ Â ldclr
> +#define OPÂ Â Â Â bic
> +#elif defined(L_ldeor)
> +#define LDNMÂ Â ldeor
> +#define OPÂ Â Â Â eor
> +#elif defined(L_ldset)
> +#define LDNMÂ Â ldset
> +#define OPÂ Â Â Â orr
> +#else
> +#error
> +#endif
> +#define LDOPÂ Â glue4(LDNM, A, L, S)
> +
> +STARTFNÂ Â Â Â Â Â Â NAME(LDNM)
> +Â Â Â Â Â Â JUMP_IF_LSEÂ Â Â Â 8f
> +
> +      mov            s(tmp0), s(0)
> +0:Â Â Â Â LDXRÂ Â Â Â Â Â Â Â Â Â Â s(0), [x1]
> +Â Â Â Â Â Â OPÂ Â Â Â Â Â Â Â Â Â Â Â Â s(tmp1), s(0), s(tmp0)
> +Â Â Â Â Â Â STXRÂ Â Â Â Â Â Â Â Â Â Â w(tmp1), s(tmp1), [x1]
> +      cbnz           w(tmp1), 0b
> +Â Â Â Â Â Â ret
> +
> +8:Â Â Â Â LDOPÂ Â Â Â Â Â Â Â Â Â Â s(0), s(0), [x1]
> +Â Â Â Â Â Â ret
> +
> +ENDFNÂ NAME(LDNM)
> +#endif
> diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse
> new file mode 100644
> index 00000000000..c7f4223cd45
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-lse
> @@ -0,0 +1,44 @@
> +# Out-of-line LSE atomics for AArch64 architecture.
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +# Contributed by Linaro Ltd.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3. If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Compare-and-swap has 5 sizes and 4 memory models.
> +S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas))
> +O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0)))
> +
> +# Swap, Load-and-operate have 4 sizes and 4 memory models
> +S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor
> ldset))
> +O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1)))
> +
> +LSE_OBJS := $(O0) $(O1)
> +
> +libgcc-objects += $(LSE_OBJS) lse-init$(objext)
> +
> +empty     =
> +space     = $(empty) $(empty)
> +PAT_SPLITÂ = $(subst _,$(space),$(*F))
> +PAT_BASEÂ Â = $(word 1,$(PAT_SPLIT))
> +PAT_NÂ Â Â Â Â = $(word 2,$(PAT_SPLIT))
> +PAT_MÂ Â Â Â Â = $(word 3,$(PAT_SPLIT))
> +
> +lse-init$(objext): $(srcdir)/config/aarch64/lse-init.c
> +Â Â Â Â Â Â $(gcc_compile) -c $<
> +
> +$(LSE_OBJS): $(srcdir)/config/aarch64/lse.S
> +Â Â Â Â Â Â $(gcc_compile) -DL_$(PAT_BASE) -DSIZE=$(PAT_N)
> -DMODEL=$(PAT_M) -c $<
> --
> 2.17.2
>
next prev parent reply other threads:[~2019-09-05 10:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-01 21:47 [PATCH, AArch64, v3 0/6] LSE atomics out-of-line Richard Henderson
2018-11-01 21:46 ` [PATCH, AArch64, v3 1/6] aarch64: Extend %R for integer registers Richard Henderson
2018-11-01 21:46 ` [PATCH, AArch64, v3 2/6] aarch64: Implement TImode compare-and-swap Richard Henderson
2018-11-01 21:47 ` [PATCH, AArch64, v3 3/6] aarch64: Tidy aarch64_split_compare_and_swap Richard Henderson
2018-11-01 21:47 ` [PATCH, AArch64, v3 6/6] Enable -matomic-ool by default Richard Henderson
2018-11-01 21:47 ` [PATCH, AArch64, v3 4/6] aarch64: Add out-of-line functions for LSE atomics Richard Henderson
2019-09-05 10:00 ` Kyrill Tkachov [this message]
2019-09-05 12:13 ` Richard Henderson
2019-09-05 12:53 ` Kyrill Tkachov
2018-11-01 21:47 ` [PATCH, AArch64, v3 5/6] aarch64: Implement -matomic-ool Richard Henderson
2019-09-05 9:56 ` Kyrill Tkachov
2019-09-05 12:17 ` Richard Henderson
2018-11-11 12:30 ` [PATCH, AArch64, v3 0/6] LSE atomics out-of-line Richard Henderson
2019-09-05 9:51 ` Kyrill Tkachov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=38e61fc9-6418-1036-058d-2a9d961dac57@foss.arm.com \
--to=kyrylo.tkachov@foss.arm.com \
--cc=James.Greenhalgh@arm.com \
--cc=Marcus.Shawcroft@arm.com \
--cc=Ramana.Radhakrishnan@arm.com \
--cc=agraf@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).