From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 110815 invoked by alias); 27 Feb 2018 13:44:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 110789 invoked by uid 89); 27 Feb 2018 13:44:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,KAM_STOCKGEN,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=Stuff X-HELO: mail-qk0-f182.google.com Received: from mail-qk0-f182.google.com (HELO mail-qk0-f182.google.com) (209.85.220.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 27 Feb 2018 13:44:44 +0000 Received: by mail-qk0-f182.google.com with SMTP id g2so23502734qkd.12 for ; Tue, 27 Feb 2018 05:44:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ti/XzkxI5g7yDXsOJh7rFJUS2GLAyczxTBywi3aWkJg=; b=GhdFiqJBvFTFSnb/sO3+14Ks5wrbzWlxAX7fpz/o/rmmuIbsuWjBQ6zAz1/vCPAkGc aWLDaGWaYR32JfTYWAFUI6D+1v/YaKnPUjFw+KGlJhZCWOClffwqEuou/X+WnNkNaGcV P+I7GoDEz0l23gszoqkotKWHronejxQTw+KRDdxrUyhWXGLXl+UbWBQq1fqaYnRodcrg 8RGtvPca4KRY/kSvOfdLTsncqaAESBalxx5Ju74m44oIseybgpEhdJX2VNAgygP9JsHL 4tJZm9mqoYU7xiezSmj0bUSwwi3C5L/uLKyqmdgmNxsG8LDj7W4f1LWgm3ZhHnMWjXS3 Q9MA== X-Gm-Message-State: APf1xPBc7kuf8ccw+yvQPqQCqS9CRut/9ZnKhY+0VFHHBQycKFZ2/uRV Zm6nl7BaplGI3/eNK5gNjFMlWkk7znY= X-Google-Smtp-Source: AG47ELuG/yQRYe+ZHLx6+wwR7A1A9LJxPe+sKaJQXqsGs4EEGACT+kZSzs7zogl9p4MGgl+uHpuGsQ== X-Received: by 10.55.144.198 with SMTP id s189mr7732116qkd.108.1519739081603; Tue, 27 Feb 2018 05:44:41 -0800 (PST) Received: from [10.0.0.2] ([179.159.9.95]) by smtp.googlemail.com with ESMTPSA id o98sm8159041qkh.82.2018.02.27.05.44.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 27 Feb 2018 05:44:40 -0800 (PST) Subject: Re: [PATCH v6] aarch64: Add split-stack support From: Adhemerval Zanella To: gcc-patches@gcc.gnu.org References: <1518026831-22979-1-git-send-email-adhemerval.zanella@linaro.org> Message-ID: <6422b8a8-9503-bbef-0b9d-e765d903ea74@linaro.org> Date: Tue, 27 Feb 2018 13:44:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <1518026831-22979-1-git-send-email-adhemerval.zanella@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-02/txt/msg01496.txt.bz2 Ping (with Szabolcs remarks fixed). On 07/02/2018 16:07, Adhemerval Zanella wrote: > Changes from previous version: > > - Changed the wait to call __morestack to use use a branch with link > instead of a simple branch. This allows use a call instruction and > avoid possible issues with later optimization passes which might > see a branch outside the instruction block (as noticed in previous > iterations while building a more complex workload as speccpu2006). > > - Change the return address to use the branch with link value and > set x12 to save x30. This simplifies the required instructions > to setup/save the return address. > > -- > > This patch adds the split-stack support on aarch64 (PR #67877). As for > other ports this patch should be used along with glibc and gold support. > > The support is done similar to other architectures: a split-stack field > is allocated before TCB by glibc, a target-specific __morestack implementation > and helper functions are added in libgcc and compiler supported in adjusted > (split-stack prologue, va_start for argument handling). I also plan to > send the gold support to adjust stack allocation acrosss split-stack > and default code calls. > > Current approach is to set the final stack adjustments using a 2 instructions > at most (mov/movk) which limits stack allocation to upper limit of 4GB. > The morestack call is non standard with x10 hollding the requested stack > pointer, x11 the argument pointer (if required), and x12 to return > continuation address. Unwinding is handled by a personality routine that > knows how to find stack segments. > > Split-stack prologue on function entry is as follow (this goes before the > usual function prologue): > > function: > mrs x9, tpidr_el0 > ldur x9, [x9, -8] > mov x10, > movk x10, #0x0, lsl #16 > sub x10, sp, x10 > mov x11, sp # if function has stacked arguments > cmp x9, x10 > bcc .LX > main_fn_entry: > [function prologue] > LX: > bl __morestack > b main_fn_entry > > Notes: > > 1. Even if a function does not allocate a stack frame, a split-stack prologue > is created. It is to avoid issues with tail call for external symbols > which might require linker adjustment (libgo/runtime/go-varargs.c). > > 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldur > to after the required stack calculation. > > 3. Similar to powerpc, When the linker detects a call from split-stack to > non-split-stack code, it adds 16k (or more) to the value found in "allocate" > instructions (so non-split-stack code gets a larger stack). The amount is > tunable by a linker option. This feature is only implemented in the GNU > gold linker. > > 4. AArch64 does not handle >4G stack initially and although it is possible > to implement it, limiting to 4G allows to materize the allocation with > only 2 instructions (mov + movk) and thus simplifying the linker > adjustments required. Supporting multiple threads each requiring more > than 4G of stack is probably not that important, and likely to OOM at > run time. > > 5. The TCB support on GLIBC is meant to be included in version 2.28. > > 6. Besides a regression tests I also checked with a SPECcpu2006 run with > -fsplit-stack additional option. I saw no regression besides 416.gamess > which fails on trunk as well (not sure if some misconfiguration in my > environment). > > libgcc/ChangeLog: > > * libgcc/config.host: Use t-stack and t-statck-aarch64 for > aarch64*-*-linux. > * libgcc/config/aarch64/morestack-c.c: New file. > * libgcc/config/aarch64/morestack.S: Likewise. > * libgcc/config/aarch64/t-stack-aarch64: Likewise. > * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific > code. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.c > (aarch64_supports_split_stack): New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove > macro. > * gcc/config/aarch64/aarch64-protos.h: Add > aarch64_expand_split_stack_prologue and > aarch64_split_stack_space_check. > * gcc/config/aarch64/aarch64.c (aarch64_expand_builtin_va_start): Use > internal argument pointer instead of virtual_incoming_args_rtx. > (morestack_ref): New symbol. > (aarch64_load_split_stack_value): New function. > (aarch64_expand_split_stack_prologue): Likewise. > (aarch64_internal_arg_pointer): Likewise. > (aarch64_file_end): Emit the split-stack note sections. > (aarch64_split_stack_space_check): Likewise. > (TARGET_ASM_FILE_END): New macro. > (TARGET_INTERNAL_ARG_POINTER): Likewise. > * gcc/config/aarch64/aarch64.h (aarch64_frame): Add > split_stack_arg_pointer to setup the argument pointer when using > split-stack. > * gcc/config/aarch64/aarch64.md > (UNSPECV_STACK_CHECK): New define. > (split_stack_prologue): New expand. > (split_stack_space_check): Likewise. > --- > gcc/common/config/aarch64/aarch64-common.c | 28 +++- > gcc/config/aarch64/aarch64-linux.h | 2 - > gcc/config/aarch64/aarch64-protos.h | 2 + > gcc/config/aarch64/aarch64.c | 182 ++++++++++++++++++++- > gcc/config/aarch64/aarch64.h | 3 + > gcc/config/aarch64/aarch64.md | 29 ++++ > libgcc/config.host | 1 + > libgcc/config/aarch64/morestack-c.c | 87 ++++++++++ > libgcc/config/aarch64/morestack.S | 254 +++++++++++++++++++++++++++++ > libgcc/config/aarch64/t-stack-aarch64 | 3 + > libgcc/generic-morestack.c | 1 + > 11 files changed, 588 insertions(+), 4 deletions(-) > create mode 100644 libgcc/config/aarch64/morestack-c.c > create mode 100644 libgcc/config/aarch64/morestack.S > create mode 100644 libgcc/config/aarch64/t-stack-aarch64 > > diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c > index 71d3953..cf17e2f 100644 > --- a/gcc/common/config/aarch64/aarch64-common.c > +++ b/gcc/common/config/aarch64/aarch64-common.c > @@ -107,6 +107,33 @@ aarch64_handle_option (struct gcc_options *opts, > } > } > > +/* -fsplit-stack uses a TCB field available on glibc-2.27. GLIBC also > + exports symbol, __tcb_private_ss, to signal it has the field available > + on TCB bloc. This aims to prevent binaries linked against newer > + GLIBC to run on non-supported ones. */ > + > +static bool > +aarch64_supports_split_stack (bool report ATTRIBUTE_UNUSED, > + struct gcc_options *opts ATTRIBUTE_UNUSED) > +{ > +#ifndef TARGET_GLIBC_MAJOR > +#define TARGET_GLIBC_MAJOR 0 > +#endif > +#ifndef TARGET_GLIBC_MINOR > +#define TARGET_GLIBC_MINOR 0 > +#endif > + /* Note: Can't test DEFAULT_ABI here, it isn't set until later. */ > + if (TARGET_GLIBC_MAJOR * 1000 + TARGET_GLIBC_MINOR >= 2026) > + return true; > + > + if (report) > + error ("%<-fsplit-stack%> currently only supported on AArch64 GNU/Linux with glibc-2.27 or later"); > + return false; > +} > + > +#undef TARGET_SUPPORTS_SPLIT_STACK > +#define TARGET_SUPPORTS_SPLIT_STACK aarch64_supports_split_stack > + > struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; > > /* An ISA extension in the co-processor and main instruction set space. */ > @@ -340,4 +367,3 @@ aarch64_rewrite_mcpu (int argc, const char **argv) > } > > #undef AARCH64_CPU_NAME_LENGTH > - > diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h > index bf1327e..1189bfe 100644 > --- a/gcc/config/aarch64/aarch64-linux.h > +++ b/gcc/config/aarch64/aarch64-linux.h > @@ -81,8 +81,6 @@ > } \ > while (0) > > -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack > - > /* Uninitialized common symbols in non-PIE executables, even with > strong definitions in dependent shared libraries, will resolve > to COPY relocated symbol in the executable. See PR65780. */ > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index cda2895..20fe10e 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -450,6 +450,8 @@ void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode); > bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx); > void aarch64_split_sve_subreg_move (rtx, rtx, rtx); > void aarch64_expand_prologue (void); > +void aarch64_expand_split_stack_prologue (void); > +void aarch64_split_stack_space_check (rtx, rtx); > void aarch64_expand_vector_init (rtx, rtx); > void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, > const_tree, unsigned); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 7c9c6e5..c653755 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -71,6 +71,7 @@ > #include "selftest.h" > #include "selftest-rtl.h" > #include "rtx-vector-builder.h" > +#include "except.h" > > /* This file should be included last. */ > #include "target-def.h" > @@ -12073,7 +12074,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) > /* Emit code to initialize STACK, which points to the next varargs stack > argument. CUM->AAPCS_STACK_SIZE gives the number of stack words used > by named arguments. STACK is 8-byte aligned. */ > - t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx); > + t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer); > if (cum->aapcs_stack_size > 0) > t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size * UNITS_PER_WORD); > t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t); > @@ -17351,6 +17352,179 @@ aarch64_select_early_remat_modes (sbitmap modes) > } > } > > +/* -fsplit-stack support. */ > + > +/* A SYMBOL_REF for __morestack. */ > +static GTY(()) rtx morestack_ref; > + > +/* Load split-stack area from thread pointer position. The split-stack is > + allocate just before thread pointer. */ > + > +static rtx > +aarch64_load_split_stack_value (bool use_hard_reg) > +{ > + /* Offset from thread pointer to split-stack area. */ > + const int psso = -8; > + > + rtx ssvalue = use_hard_reg > + ? gen_rtx_REG (Pmode, R9_REGNUM) : gen_reg_rtx (Pmode); > + ssvalue = aarch64_load_tp (ssvalue); > + rtx mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso)); > + emit_move_insn (ssvalue, mem); > + return ssvalue; > +} > + > +/* Emit -fsplit-stack prologue, which goes before the regular function > + prologue. */ > + > +void > +aarch64_expand_split_stack_prologue (void) > +{ > + rtx ssvalue, reg10, reg11, reg12, cc, jump; > + HOST_WIDE_INT allocate; > + rtx_code_label *ok_label; > + rtx_insn *insn; > + > + gcc_assert (flag_split_stack && reload_completed); > + > + /* It limits total maximum stack allocation on 4G so its value can be > + materialized using two instructions at most (movn/movk). It might be > + used by the linker to add some extra space for split calling non split > + stack functions. */ > + allocate = constant_lower_bound (cfun->machine->frame.frame_size); > + if (allocate > ((int64_t)1 << 32)) > + { > + sorry ("Stack frame larger than 4G is not supported for -fsplit-stack"); > + return; > + } > + > + if (morestack_ref == NULL_RTX) > + { > + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); > + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL > + | SYMBOL_FLAG_FUNCTION); > + } > + > + ssvalue = aarch64_load_split_stack_value (true); > + > + /* Always emit two insns to calculate the requested stack, so the linker > + can edit them when adjusting size for calling non-split-stack code. */ > + reg10 = gen_rtx_REG (Pmode, R10_REGNUM); > + emit_insn (gen_rtx_SET (reg10, GEN_INT (allocate & 0xffff))); > + emit_insn (gen_insv_immdi (reg10, GEN_INT (16), > + GEN_INT ((allocate & 0xffff0000) >> 16))); > + emit_insn (gen_sub3_insn (reg10, stack_pointer_rtx, reg10)); > + > + ok_label = gen_label_rtx (); > + > + /* If function uses stacked arguments save the old stack value so morestack > + can return it. */ > + reg11 = gen_rtx_REG (Pmode, R11_REGNUM); > + if (maybe_gt(crtl->args.size, 0) > + || maybe_gt(cfun->machine->frame.saved_varargs_size, 0)) > + emit_move_insn (reg11, stack_pointer_rtx); > + > + /* x12 holds the function entry x30 which will be restored by morestack. */ > + reg12 = gen_rtx_REG (Pmode, R12_REGNUM); > + emit_move_insn (reg12, gen_rtx_REG (Pmode, R30_REGNUM)); > + > + ok_label = gen_label_rtx (); > + cc = aarch64_gen_compare_reg (GEU, reg10, ssvalue); > + jump = gen_rtx_IF_THEN_ELSE (VOIDmode, > + gen_rtx_fmt_ee (GEU, VOIDmode, cc, const0_rtx), > + gen_rtx_LABEL_REF (VOIDmode, ok_label), > + pc_rtx); > + insn = emit_jump_insn (gen_rtx_SET (pc_rtx, jump)); > + JUMP_LABEL (insn) = ok_label; > + /* Mark the jump as very likely to be taken. */ > + add_reg_br_prob_note (insn, profile_probability::very_likely ()); > + > + insn = emit_call_insn (gen_call (gen_rtx_MEM (Pmode, morestack_ref), > + const0_rtx, const0_rtx)); > + > + rtx call_fusage = NULL_RTX; > + use_reg (&call_fusage, reg10); > + use_reg (&call_fusage, reg11); > + use_reg (&call_fusage, reg12); > + add_function_usage_to (insn, call_fusage); > + /* Indicate that this function can't jump to non-local gotos. */ > + make_reg_eh_region_note_nothrow_nononlocal (insn); > + > + emit_label (ok_label); > + LABEL_NUSES (ok_label)++; > +} > + > +/* Implement TARGET_ASM_FILE_END. */ > + > +static void > +aarch64_file_end (void) > +{ > + file_end_indicate_exec_stack (); > + > + if (flag_split_stack) > + { > + file_end_indicate_split_stack (); > + > + switch_to_section (data_section); > + fprintf (asm_out_file, "\t.align 3\n"); > + fprintf (asm_out_file, "\t.quad __libc_tcb_private_ss\n"); > + } > +} > + > +/* Return the internal arg pointer used for function incoming arguments. */ > + > +static rtx > +aarch64_internal_arg_pointer (void) > +{ > + if (flag_split_stack > + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) > + == NULL)) > + { > + if (cfun->machine->frame.split_stack_arg_pointer == NULL_RTX) > + { > + rtx pat; > + > + cfun->machine->frame.split_stack_arg_pointer = gen_reg_rtx (Pmode); > + REG_POINTER (cfun->machine->frame.split_stack_arg_pointer) = 1; > + > + /* Put the pseudo initialization right after the note at the > + beginning of the function. */ > + pat = gen_rtx_SET (cfun->machine->frame.split_stack_arg_pointer, > + gen_rtx_REG (Pmode, R11_REGNUM)); > + push_topmost_sequence (); > + emit_insn_after (pat, get_insns ()); > + pop_topmost_sequence (); > + } > + return plus_constant (Pmode, cfun->machine->frame.split_stack_arg_pointer, > + FIRST_PARM_OFFSET (current_function_decl)); > + } > + return virtual_incoming_args_rtx; > +} > + > +/* Emit -fsplit-stack dynamic stack allocation space check. */ > + > +void > +aarch64_split_stack_space_check (rtx size, rtx label) > +{ > + rtx ssvalue, cc, cmp, jump, temp; > + rtx requested = gen_reg_rtx (Pmode); > + > + /* Load __private_ss from TCB. */ > + ssvalue = aarch64_load_split_stack_value (false); > + > + temp = gen_reg_rtx (Pmode); > + > + /* And compare it with frame pointer plus required stack. */ > + size = force_reg (Pmode, size); > + emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx, size)); > + > + /* Jump to label call if current ss guard is not suffice. */ > + cc = aarch64_gen_compare_reg (GE, temp, ssvalue); > + cmp = gen_rtx_fmt_ee (GEU, VOIDmode, cc, const0_rtx); > + jump = emit_jump_insn (gen_condjump (cmp, cc, label)); > + JUMP_LABEL (jump) = label; > +} > + > /* Target-specific selftests. */ > > #if CHECKING_P > @@ -17423,6 +17597,9 @@ aarch64_run_selftests (void) > #undef TARGET_ASM_FILE_START > #define TARGET_ASM_FILE_START aarch64_start_file > > +#undef TARGET_ASM_FILE_END > +#define TARGET_ASM_FILE_END aarch64_file_end > + > #undef TARGET_ASM_OUTPUT_MI_THUNK > #define TARGET_ASM_OUTPUT_MI_THUNK aarch64_output_mi_thunk > > @@ -17513,6 +17690,9 @@ aarch64_run_selftests (void) > #undef TARGET_FUNCTION_VALUE_REGNO_P > #define TARGET_FUNCTION_VALUE_REGNO_P aarch64_function_value_regno_p > > +#undef TARGET_INTERNAL_ARG_POINTER > +#define TARGET_INTERNAL_ARG_POINTER aarch64_internal_arg_pointer > + > #undef TARGET_GIMPLE_FOLD_BUILTIN > #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index e3c52f6..20ef441 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -675,6 +675,9 @@ struct GTY (()) aarch64_frame > unsigned wb_candidate2; > > bool laid_out; > + > + /* Alternative internal arg pointer for -fsplit-stack. */ > + rtx split_stack_arg_pointer; > }; > > typedef struct GTY (()) machine_function > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 5a2a930..3104ed4 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -169,6 +169,7 @@ > UNSPEC_CLASTB > UNSPEC_FADDA > UNSPEC_REV_SUBREG > + UNSPEC_STACK_CHECK > ]) > > (define_c_enum "unspecv" [ > @@ -6010,6 +6011,34 @@ > (match_operand 1)) > (clobber (reg:CC CC_REGNUM))])]) > > +;; Handle -fsplit-stack > +(define_expand "split_stack_prologue" > + [(const_int 0)] > + "" > +{ > + aarch64_expand_split_stack_prologue (); > + DONE; > +}) > + > +;; If there are operand 0 bytes available on the stack, jump to > +;; operand 1. > +(define_expand "split_stack_space_check" > + [(set (match_dup 2) > + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) > + (set (match_dup 3) > + (minus (reg SP_REGNUM) > + (match_operand 0))) > + (set (match_dup 4) (compare:CC (match_dup 3) (match_dup 2))) > + (set (pc) (if_then_else > + (geu (match_dup 4) (const_int 0)) > + (label_ref (match_operand 1)) > + (pc)))] > + "" > +{ > + aarch64_split_stack_space_check (operands[0], operands[1]); > + DONE; > +}) > + > ;; AdvSIMD Stuff > (include "aarch64-simd.md") > > diff --git a/libgcc/config.host b/libgcc/config.host > index 96d55a4..d6a2d15 100644 > --- a/libgcc/config.host > +++ b/libgcc/config.host > @@ -355,6 +355,7 @@ aarch64*-*-linux*) > md_unwind_header=aarch64/linux-unwind.h > tmake_file="${tmake_file} ${cpu_type}/t-aarch64" > tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm" > + tmake_file="${tmake_file} t-stack aarch64/t-stack-aarch64" > ;; > alpha*-*-linux*) > tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm alpha/t-linux" > diff --git a/libgcc/config/aarch64/morestack-c.c b/libgcc/config/aarch64/morestack-c.c > new file mode 100644 > index 0000000..8de531f > --- /dev/null > +++ b/libgcc/config/aarch64/morestack-c.c > @@ -0,0 +1,87 @@ > +/* AArch64 support for -fsplit-stack. > + * Copyright (C) 2018 Free Software Foundation, Inc. > + * > + * This file is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License as published by the > + * Free Software Foundation; either version 3, or (at your option) any > + * later version. > + * > + * This file is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * Under Section 7 of GPL version 3, you are granted additional > + * permissions described in the GCC Runtime Library Exception, version > + * 3.1, as published by the Free Software Foundation. > + * > + * You should have received a copy of the GNU General Public License and > + * a copy of the GCC Runtime Library Exception along with this program; > + * see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > + * . > + */ > + > +#ifndef inhibit_libc > + > +#include > +#include > +#include > +#include "generic-morestack.h" > + > +#define INITIAL_STACK_SIZE 0x4000 > +#define BACKOFF 0x1000 > + > +void __generic_morestack_set_initial_sp (void *sp, size_t len); > +void *__morestack_get_guard (void); > +void __morestack_set_guard (void *); > +void *__morestack_make_guard (void *stack, size_t size); > +void __morestack_load_mmap (void); > + > +/* split-stack area position from thread pointer. */ > +static inline void * > +ss_pointer (void) > +{ > +#define SS_OFFSET (-8) > + return (void*) ((uintptr_t) __builtin_thread_pointer() + SS_OFFSET); > +} > + > +/* Initialize the stack guard when the program starts or when a new > + thread. This is called from a constructor using ctors section. */ > +void > +__stack_split_initialize (void) > +{ > + register uintptr_t* sp __asm__ ("sp"); > + uintptr_t *ss = ss_pointer (); > + *ss = (uintptr_t)sp - INITIAL_STACK_SIZE; > + __generic_morestack_set_initial_sp (sp, INITIAL_STACK_SIZE); > +} > + > +/* Return current __private_ss. */ > +void * > +__morestack_get_guard (void) > +{ > + void **ss = ss_pointer (); > + return *ss; > +} > + > +/* Set __private_ss to ptr. */ > +void > +__morestack_set_guard (void *ptr) > +{ > + void **ss = ss_pointer (); > + *ss = ptr; > +} > + > +/* Return the stack guard value for given stack. */ > +void * > +__morestack_make_guard (void *stack, size_t size) > +{ > + return (void*)((uintptr_t) stack - size + BACKOFF); > +} > + > +/* Make __stack_split_initialize a high priority constructor. */ > +static void (*const ctors []) > + __attribute__ ((used, section (".ctors.65535"), aligned (sizeof (void *)))) > + = { __stack_split_initialize, __morestack_load_mmap }; > + > +#endif /* !defined (inhibit_libc) */ > diff --git a/libgcc/config/aarch64/morestack.S b/libgcc/config/aarch64/morestack.S > new file mode 100644 > index 0000000..59a6391 > --- /dev/null > +++ b/libgcc/config/aarch64/morestack.S > @@ -0,0 +1,254 @@ > +# AArch64 support for -fsplit-stack. > +# Copyright (C) 2018 Free Software Foundation, Inc. > + > +# This file is part of GCC. > + > +# GCC is free software; you can redistribute it and/or modify it under > +# the terms of the GNU General Public License as published by the Free > +# Software Foundation; either version 3, or (at your option) any later > +# version. > + > +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +# WARRANTY; without even the implied warranty of MERCHANTABILITY or > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +# for more details. > + > +# Under Section 7 of GPL version 3, you are granted additional > +# permissions described in the GCC Runtime Library Exception, version > +# 3.1, as published by the Free Software Foundation. > + > +# You should have received a copy of the GNU General Public License and > +# a copy of the GCC Runtime Library Exception along with this program; > +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > +# . > + > +/* Define an entry point visible from C. */ > +#define ENTRY(name) \ > + .globl name; \ > + .type name,%function; \ > + .align 4; \ > + name##: > + > +#define END(name) \ > + .size name,.-name > + > +/* __morestack frame size. */ > +#define MORESTACK_FRAMESIZE 112 > +/* Offset from __morestack frame where the new stack size is saved and > + passed to __generic_morestack. */ > +#define NEWSTACK_SAVE 96 > + > +# Excess space needed to call ld.so resolver for lazy plt resolution. > +# Go uses sigaltstack so this doesn't need to also cover signal frame size. > +#define BACKOFF 0x1000 > +# Large excess allocated when calling non-split-stack code. > +#define NON_SPLIT_STACK 0x100000 > + > +/* split-stack area position from thread pointer. */ > +#define SPLITSTACK_PTR_TP -8 > + > + .text > +ENTRY(__morestack_non_split) > + .cfi_startproc > +# We use a cleanup to restore the TCB split stack field if an exception is > +# through this code. > + sub x10, x10, NON_SPLIT_STACK > + .cfi_endproc > +END(__morestack_non_split) > +# Fall through into __morestack > + > +# This function is called with non-standard calling convention: on entry > +# x10 is the requested stack pointer, x11 is previous stack pointer (if > +# functions has stacked arguments which needs to be restored), and x12 is > +# the caller link register on function entry (which will be restored by > +# morestack when returning to caller). The split-stack prologue is in > +# the form: > +# > +# function: > +# mrs x9, tpidr_el0 > +# ldur x9, [x9, #-8] > +# mov x10, > +# movk x10, #0x0, lsl #16 > +# sub x10, sp, x10 > +# mov x11, sp # if function has stacked arguments > +# mov x12, x30 > +# cmp x9, x10 > +# bcc .LX > +# main_fn_entry: > +# [function body] > +# LX: > +# bl __morestack > +# b main_fn_entry > +# > +# The N bit is also restored to indicate that the function is called > +# (so the prologue addition can set up the argument pointer correctly). > + > +ENTRY(__morestack) > +.LFB1: > + .cfi_startproc > + > +#ifdef __PIC__ > + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1 > +#else > + .cfi_personality 0x3,__gcc_personality_v0 > + .cfi_lsda 0x3,.LLSDA1 > +#endif > + # Calculate requested stack size. > + sub x10, sp, x10 > + > + # Save parameters > + stp x29, x12, [sp, -MORESTACK_FRAMESIZE]! > + .cfi_def_cfa_offset MORESTACK_FRAMESIZE > + .cfi_offset 29, -MORESTACK_FRAMESIZE > + .cfi_offset 30, -MORESTACK_FRAMESIZE+8 > + add x29, sp, 0 > + .cfi_def_cfa_register 29 > + # Adjust the requested stack size for the frame pointer save. > + stp x0, x1, [x29, 16] > + stp x2, x3, [x29, 32] > + add x10, x10, BACKOFF > + stp x4, x5, [x29, 48] > + stp x6, x7, [x29, 64] > + stp x8, x30, [x29, 80] > + str x10, [x29, 96] > + > + # void __morestack_block_signals (void) > + bl __morestack_block_signals > + > + # void *__generic_morestack (size_t *pframe_size, > + # void *old_stack, > + # size_t param_size) > + # pframe_size: is the size of the required stack frame (the function > + # amount of space remaining on the allocated stack). > + # old_stack: points at the parameters the old stack > + # param_size: size in bytes of parameters to copy to the new stack. > + add x0, x29, NEWSTACK_SAVE > + add x1, x29, MORESTACK_FRAMESIZE > + mov x2, 0 > + bl __generic_morestack > + > + # Start using new stack > + mov sp, x0 > + > + # Set __private_ss stack guard for the new stack. > + ldr x9, [x29, NEWSTACK_SAVE] > + add x0, x0, BACKOFF > + sub x0, x0, x9 > +.LEHB0: > + mrs x1, tpidr_el0 > + str x0, [x1, SPLITSTACK_PTR_TP] > + > + # void __morestack_unblock_signals (void) > + bl __morestack_unblock_signals > + > + # Set up for a call to the target function. > + ldp x0, x1, [x29, 16] > + ldp x2, x3, [x29, 32] > + ldp x4, x5, [x29, 48] > + ldp x6, x7, [x29, 64] > + ldp x8, x12, [x29, 80] > + add x11, x29, MORESTACK_FRAMESIZE > + ldr x30, [x29, 8] > + # Indicate __morestack was called. > + cmp x12, 0 > + blr x12 > + > + stp x0, x1, [x29, 16] > + stp x2, x3, [x29, 32] > + stp x4, x5, [x29, 48] > + stp x6, x7, [x29, 64] > + > + bl __morestack_block_signals > + > + # void *__generic_releasestack (size_t *pavailable) > + add x0, x29, NEWSTACK_SAVE > + bl __generic_releasestack > + > + # Reset __private_ss stack guard to value for old stack > + ldr x9, [x29, NEWSTACK_SAVE] > + add x0, x0, BACKOFF > + sub x0, x0, x9 > + > + # Update TCB split stack field > +.LEHE0: > + mrs x1, tpidr_el0 > + str x0, [x1, SPLITSTACK_PTR_TP] > + > + bl __morestack_unblock_signals > + > + # Use old stack again. > + add sp, x29, MORESTACK_FRAMESIZE > + > + ldp x0, x1, [x29, 16] > + ldp x2, x3, [x29, 32] > + ldp x4, x5, [x29, 48] > + ldp x6, x7, [x29, 64] > + ldp x29, x30, [x29] > + > + .cfi_remember_state > + .cfi_restore 30 > + .cfi_restore 29 > + .cfi_def_cfa 31, 0 > + > + ret > + > +# This is the cleanup code called by the stack unwinder when > +# unwinding through code between .LEHB0 and .LEHE0 above. > +cleanup: > + .cfi_restore_state > + # Reuse the new stack allocation to save/restore the > + # exception header > + str x0, [x29, NEWSTACK_SAVE] > + # size_t __generic_findstack (void *stack) > + add x0, x29, MORESTACK_FRAMESIZE > + bl __generic_findstack > + sub x0, x29, x0 > + add x0, x0, BACKOFF > + # Restore split-stack guard value > + mrs x1, tpidr_el0 > + str x0, [x1, SPLITSTACK_PTR_TP] > + ldr x0, [x29, NEWSTACK_SAVE] > + b _Unwind_Resume > + .cfi_endproc > +END(__morestack) > + > + .section .gcc_except_table,"a",@progbits > + .align 4 > +.LLSDA1: > + # @LPStart format (omit) > + .byte 0xff > + # @TType format (omit) > + .byte 0xff > + # Call-site format (uleb128) > + .byte 0x1 > + # Call-site table length > + .uleb128 .LLSDACSE1-.LLSDACSB1 > +.LLSDACSB1: > + # region 0 start > + .uleb128 .LEHB0-.LFB1 > + # length > + .uleb128 .LEHE0-.LEHB0 > + # landing pad > + .uleb128 cleanup-.LFB1 > + # no action (ie a cleanup) > + .uleb128 0 > +.LLSDACSE1: > + > + > + .global __gcc_personality_v0 > +#ifdef __PIC__ > + # Build a position independent reference to the personality function. > + .hidden DW.ref.__gcc_personality_v0 > + .weak DW.ref.__gcc_personality_v0 > + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat > + .type DW.ref.__gcc_personality_v0, @object > + .align 3 > +DW.ref.__gcc_personality_v0: > + .size DW.ref.__gcc_personality_v0, 8 > + .quad __gcc_personality_v0 > +#endif > + > + .section .note.GNU-stack,"",@progbits > + .section .note.GNU-split-stack,"",@progbits > + .section .note.GNU-no-split-stack,"",@progbits > diff --git a/libgcc/config/aarch64/t-stack-aarch64 b/libgcc/config/aarch64/t-stack-aarch64 > new file mode 100644 > index 0000000..4babb4e > --- /dev/null > +++ b/libgcc/config/aarch64/t-stack-aarch64 > @@ -0,0 +1,3 @@ > +# Makefile fragment to support -fsplit-stack for aarch64. > +LIB2ADD_ST += $(srcdir)/config/aarch64/morestack.S \ > + $(srcdir)/config/aarch64/morestack-c.c > diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c > index 80bfd7f..574f58d 100644 > --- a/libgcc/generic-morestack.c > +++ b/libgcc/generic-morestack.c > @@ -943,6 +943,7 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, > nsp -= 2 * 160; > #elif defined __s390__ > nsp -= 2 * 96; > +#elif defined __aarch64__ > #else > #error "unrecognized target" > #endif >