From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 113974 invoked by alias); 16 Nov 2015 20:01:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 113962 invoked by uid 89); 16 Nov 2015 20:01:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 X-HELO: smtp.eu.adacore.com Received: from mel.act-europe.fr (HELO smtp.eu.adacore.com) (194.98.77.210) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 16 Nov 2015 20:01:52 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-smtp.eu.adacore.com (Postfix) with ESMTP id 1EFDA33169AC; Mon, 16 Nov 2015 21:01:49 +0100 (CET) Received: from smtp.eu.adacore.com ([127.0.0.1]) by localhost (smtp.eu.adacore.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MFftoQJr9yeZ; Mon, 16 Nov 2015 21:01:49 +0100 (CET) Received: from polaris.localnet (bon31-6-88-161-99-133.fbx.proxad.net [88.161.99.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.eu.adacore.com (Postfix) with ESMTPSA id D5ADE33169A0; Mon, 16 Nov 2015 21:01:48 +0100 (CET) From: Eric Botcazou To: Richard Earnshaw Cc: gcc-patches@gcc.gnu.org, Ramana Radhakrishnan Subject: Re: [ARM] Fix PR middle-end/65958 Date: Mon, 16 Nov 2015 20:01:00 -0000 Message-ID: <11225412.5B49tQ9fvN@polaris> User-Agent: KMail/4.14.9 (Linux/3.16.7-29-desktop; KDE/4.14.9; x86_64; ; ) In-Reply-To: <5638F077.5050105@foss.arm.com> References: <1478566.ZKXszbaoG4@polaris> <9319219.YanzbaT3s8@polaris> <5638F077.5050105@foss.arm.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="nextPart3884145.50uUjTlhAn" Content-Transfer-Encoding: 7Bit X-SW-Source: 2015-11/txt/msg01988.txt.bz2 This is a multi-part message in MIME format. --nextPart3884145.50uUjTlhAn Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Content-length: 1296 > More comments inline. Revised version attached, which addresses all your comments and in particular removes the +#if PROBE_INTERVAL > 4096 +#error Cannot use indexed addressing mode for stack probing +#endif compile-time assertion. It generates the same code for PROBE_INTERVAL == 4096 as before and it generates code that can be assembled for 8192. Tested on Aarch64/Linux, OK for the mainline? 2015-11-16 Tristan Gingold Eric Botcazou PR middle-end/65958 * config/aarch64/aarch64-protos.h (aarch64_output_probe_stack-range): Declare. * config/aarch64/aarch64.md: Declare UNSPECV_BLOCKAGE and UNSPEC_PROBE_STACK_RANGE. (blockage): New instruction. (probe_stack_range): Likewise. * config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): New function. (aarch64_output_probe_stack_range): Likewise. (aarch64_expand_prologue): Invoke aarch64_emit_probe_stack_range if static builtin stack checking is enabled. * config/aarch64/aarch64-linux.h (STACK_CHECK_STATIC_BUILTIN): Define. 2015-11-16 Eric Botcazou * gcc.target/aarch64/stack-checking.c: New test. -- Eric Botcazou --nextPart3884145.50uUjTlhAn Content-Disposition: attachment; filename="stack-checking.c" Content-Transfer-Encoding: 7Bit Content-Type: text/x-csrc; charset="utf-8"; name="stack-checking.c" Content-length: 281 /* { dg-do run { target { *-*-linux* } } } */ /* { dg-options "-fstack-check" } */ int main(void) { char *p; if (1) { char i[48]; p = __builtin_alloca(8); p[0] = 1; } if (1) { char i[48], j[64]; j[32] = 0; } return !p[0]; } --nextPart3884145.50uUjTlhAn Content-Disposition: attachment; filename="pr65958-3.diff" Content-Transfer-Encoding: 7Bit Content-Type: text/x-patch; charset="utf-8"; name="pr65958-3.diff" Content-length: 8930 Index: config/aarch64/aarch64-linux.h =================================================================== --- config/aarch64/aarch64-linux.h (revision 230397) +++ config/aarch64/aarch64-linux.h (working copy) @@ -88,4 +88,7 @@ #undef TARGET_BINDS_LOCAL_P #define TARGET_BINDS_LOCAL_P default_binds_local_p_2 +/* Define this to be nonzero if static stack checking is supported. */ +#define STACK_CHECK_STATIC_BUILTIN 1 + #endif /* GCC_AARCH64_LINUX_H */ Index: config/aarch64/aarch64-protos.h =================================================================== --- config/aarch64/aarch64-protos.h (revision 230397) +++ config/aarch64/aarch64-protos.h (working copy) @@ -340,6 +340,7 @@ void aarch64_asm_output_labelref (FILE * void aarch64_cpu_cpp_builtins (cpp_reader *); void aarch64_elf_asm_named_section (const char *, unsigned, tree); const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *); +const char * aarch64_output_probe_stack_range (rtx, rtx); void aarch64_err_no_fpadvsimd (machine_mode, const char *); void aarch64_expand_epilogue (bool); void aarch64_expand_mov_immediate (rtx, rtx); Index: config/aarch64/aarch64.c =================================================================== --- config/aarch64/aarch64.c (revision 230397) +++ config/aarch64/aarch64.c (working copy) @@ -62,6 +62,7 @@ #include "sched-int.h" #include "cortex-a57-fma-steering.h" #include "target-globals.h" +#include "common/common-target.h" /* This file should be included last. */ #include "target-def.h" @@ -2151,6 +2152,169 @@ aarch64_libgcc_cmp_return_mode (void) return SImode; } +#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP) + +/* We use the 12-bit shifted immediate arithmetic instructions so values + must be multiple of (1 << 12), i.e. 4096. */ +#if (PROBE_INTERVAL % 4096) != 0 +#error Cannot use simple address calculation for stack probing +#endif + +/* The pair of scratch registers used for stack probing. */ +#define PROBE_STACK_FIRST_REG 9 +#define PROBE_STACK_SECOND_REG 10 + +/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE, + inclusive. These are offsets from the current stack pointer. */ + +static void +aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size) +{ + rtx reg1 = gen_rtx_REG (Pmode, PROBE_STACK_FIRST_REG); + + /* See the same assertion on PROBE_INTERVAL above. */ + gcc_assert ((first % 4096) == 0); + + /* See if we have a constant small number of probes to generate. If so, + that's the easy case. */ + if (size <= PROBE_INTERVAL) + { + const HOST_WIDE_INT base = ROUND_UP (size, 4096); + emit_set_insn (reg1, + plus_constant (Pmode, stack_pointer_rtx, + -(first + base))); + emit_stack_probe (plus_constant (Pmode, reg1, base - size)); + } + + /* The run-time loop is made up of 8 insns in the generic case while the + compile-time loop is made up of 4+2*(n-2) insns for n # of intervals. */ + else if (size <= 4 * PROBE_INTERVAL) + { + HOST_WIDE_INT i, rem; + + emit_set_insn (reg1, + plus_constant (Pmode, stack_pointer_rtx, + -(first + PROBE_INTERVAL))); + emit_stack_probe (reg1); + + /* Probe at FIRST + N * PROBE_INTERVAL for values of N from 2 until + it exceeds SIZE. If only two probes are needed, this will not + generate any code. Then probe at FIRST + SIZE. */ + for (i = 2 * PROBE_INTERVAL; i < size; i += PROBE_INTERVAL) + { + emit_set_insn (reg1, plus_constant (Pmode, reg1, -PROBE_INTERVAL)); + emit_stack_probe (reg1); + } + + rem = size - (i - PROBE_INTERVAL); + if (rem > 256) + { + const HOST_WIDE_INT base = ROUND_UP (rem, 4096); + emit_set_insn (reg1, plus_constant (Pmode, reg1, -base)); + emit_stack_probe (plus_constant (Pmode, reg1, base - rem)); + } + else + emit_stack_probe (plus_constant (Pmode, reg1, -rem)); + } + + /* Otherwise, do the same as above, but in a loop. Note that we must be + extra careful with variables wrapping around because we might be at + the very top (or the very bottom) of the address space and we have + to be able to handle this case properly; in particular, we use an + equality test for the loop condition. */ + else + { + rtx reg2 = gen_rtx_REG (Pmode, PROBE_STACK_SECOND_REG); + + /* Step 1: round SIZE to the previous multiple of the interval. */ + + HOST_WIDE_INT rounded_size = size & -PROBE_INTERVAL; + + + /* Step 2: compute initial and final value of the loop counter. */ + + /* TEST_ADDR = SP + FIRST. */ + emit_set_insn (reg1, + plus_constant (Pmode, stack_pointer_rtx, -first)); + + /* LAST_ADDR = SP + FIRST + ROUNDED_SIZE. */ + emit_set_insn (reg2, + plus_constant (Pmode, stack_pointer_rtx, + -(first + rounded_size))); + + + /* Step 3: the loop + + do + { + TEST_ADDR = TEST_ADDR + PROBE_INTERVAL + probe at TEST_ADDR + } + while (TEST_ADDR != LAST_ADDR) + + probes at FIRST + N * PROBE_INTERVAL for values of N from 1 + until it is equal to ROUNDED_SIZE. */ + + emit_insn (gen_probe_stack_range (reg1, reg1, reg2)); + + + /* Step 4: probe at FIRST + SIZE if we cannot assert at compile-time + that SIZE is equal to ROUNDED_SIZE. */ + + if (size != rounded_size) + { + HOST_WIDE_INT rem = size - rounded_size; + + if (rem > 256) + { + const HOST_WIDE_INT base = ROUND_UP (rem, 4096); + emit_set_insn (reg2, plus_constant (Pmode, reg2, -base)); + emit_stack_probe (plus_constant (Pmode, reg2, base - rem)); + } + else + emit_stack_probe (plus_constant (Pmode, reg2, -rem)); + } + } + + /* Make sure nothing is scheduled before we are done. */ + emit_insn (gen_blockage ()); +} + +/* Probe a range of stack addresses from REG1 to REG2 inclusive. These are + absolute addresses. */ + +const char * +aarch64_output_probe_stack_range (rtx reg1, rtx reg2) +{ + static int labelno = 0; + char loop_lab[32]; + rtx xops[2]; + + ASM_GENERATE_INTERNAL_LABEL (loop_lab, "LPSRL", labelno++); + + /* Loop. */ + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab); + + /* TEST_ADDR = TEST_ADDR + PROBE_INTERVAL. */ + xops[0] = reg1; + xops[1] = GEN_INT (PROBE_INTERVAL); + output_asm_insn ("sub\t%0, %0, %1", xops); + + /* Probe at TEST_ADDR. */ + output_asm_insn ("str\txzr, [%0]", xops); + + /* Test if TEST_ADDR == LAST_ADDR. */ + xops[1] = reg2; + output_asm_insn ("cmp\t%0, %1", xops); + + /* Branch. */ + fputs ("\tb.ne\t", asm_out_file); + assemble_name_raw (asm_out_file, loop_lab); + fputc ('\n', asm_out_file); + + return ""; +} + static bool aarch64_frame_pointer_required (void) { @@ -2551,6 +2715,18 @@ aarch64_expand_prologue (void) if (flag_stack_usage_info) current_function_static_stack_size = frame_size; + if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK) + { + if (crtl->is_leaf && !cfun->calls_alloca) + { + if (frame_size > PROBE_INTERVAL && frame_size > STACK_CHECK_PROTECT) + aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, + frame_size - STACK_CHECK_PROTECT); + } + else if (frame_size > 0) + aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size); + } + /* Store pairs and load pairs have a range only -512 to 504. */ if (offset >= 512) { Index: config/aarch64/aarch64.md =================================================================== --- config/aarch64/aarch64.md (revision 230397) +++ config/aarch64/aarch64.md (working copy) @@ -104,6 +104,7 @@ (define_c_enum "unspec" [ UNSPEC_MB UNSPEC_NOP UNSPEC_PRLG_STK + UNSPEC_PROBE_STACK_RANGE UNSPEC_RBIT UNSPEC_SISD_NEG UNSPEC_SISD_SSHL @@ -137,6 +138,7 @@ (define_c_enum "unspecv" [ UNSPECV_SET_FPCR ; Represent assign of FPCR content. UNSPECV_GET_FPSR ; Represent fetch of FPSR content. UNSPECV_SET_FPSR ; Represent assign of FPSR content. + UNSPECV_BLOCKAGE ; Represent a blockage ] ) @@ -4851,6 +4853,29 @@ (define_insn "stack_tie" [(set_attr "length" "0")] ) +;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and +;; all of memory. This blocks insns from being moved across this point. + +(define_insn "blockage" + [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] + "" + "" + [(set_attr "length" "0") + (set_attr "type" "block")] +) + +(define_insn "probe_stack_range" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0") + (match_operand:DI 2 "register_operand" "r")] + UNSPEC_PROBE_STACK_RANGE))] + "" +{ + return aarch64_output_probe_stack_range (operands[0], operands[2]); +} + [(set_attr "length" "32")] +) + ;; Named pattern for expanding thread pointer reference. (define_expand "get_thread_pointerdi" [(match_operand:DI 0 "register_operand" "=r")] --nextPart3884145.50uUjTlhAn--