From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4583 invoked by alias); 4 Jul 2016 12:20:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 4567 invoked by uid 89); 4 Jul 2016 12:20:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,RCVD_IN_SEMBACKSCATTER autolearn=no version=3.3.2 spammy=H*r:4.76, biarch, Nov, nov X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Mon, 04 Jul 2016 12:20:04 +0000 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u64CJIhA069683 for ; Mon, 4 Jul 2016 08:20:02 -0400 Received: from e06smtp17.uk.ibm.com (e06smtp17.uk.ibm.com [195.75.94.113]) by mx0a-001b2d01.pphosted.com with ESMTP id 23yfe8ejs6-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 04 Jul 2016 08:20:02 -0400 Received: from localhost by e06smtp17.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 4 Jul 2016 13:20:00 +0100 Received: from d06dlp01.portsmouth.uk.ibm.com (9.149.20.13) by e06smtp17.uk.ibm.com (192.168.101.147) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 4 Jul 2016 13:19:58 +0100 X-IBM-Helo: d06dlp01.portsmouth.uk.ibm.com X-IBM-MailFrom: vogt@linux.vnet.ibm.com X-IBM-RcptTo: gcc-patches@gcc.gnu.org Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id EB97F17D8042 for ; Mon, 4 Jul 2016 13:21:19 +0100 (BST) Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u64CJvXx8519998 for ; Mon, 4 Jul 2016 12:19:57 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u64CJuNO030909 for ; Mon, 4 Jul 2016 06:19:57 -0600 Received: from bl3ahm9f.de.ibm.com (sig-9-83-61-187.evts.uk.ibm.com [9.83.61.187]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u64CJtx8030868; Mon, 4 Jul 2016 06:19:56 -0600 Received: from dvogt by bl3ahm9f.de.ibm.com with local (Exim 4.76) (envelope-from ) id 1bK2qy-0002ju-3n; Mon, 04 Jul 2016 14:19:56 +0200 Date: Mon, 04 Jul 2016 12:20:00 -0000 From: Dominik Vogt To: Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel , Ulrich Weigand , Andreas Arnez Subject: Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue Reply-To: vogt@linux.vnet.ibm.com Mail-Followup-To: vogt@linux.vnet.ibm.com, Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel , Ulrich Weigand , Andreas Arnez References: <20151125125610.GA19687@linux.vnet.ibm.com> <20160506093747.GA22977@linux.vnet.ibm.com> <20160506094415.GA23043@linux.vnet.ibm.com> <03ecb4c1-3d2f-0ff7-1110-7519a5106d5a@redhat.com> <20160623154814.GA23280@linux.vnet.ibm.com> <20160624123044.GA12401@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="IS0zKkzwUGydFO0o" Content-Disposition: inline In-Reply-To: <20160624123044.GA12401@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16070412-0004-0000-0000-0000033E4075 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16070412-0005-0000-0000-00001AB8E2F6 Message-Id: <20160704121955.GA9654@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-04_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607040114 X-SW-Source: 2016-07/txt/msg00112.txt.bz2 --IS0zKkzwUGydFO0o Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 970 Version 4 with the following change: * Rebased on top of the "Minor cleanup to allocate_dynamic_stack_space" patch. The "Drop excess size used for run time allocated stack variables." path needs an update because it touches the dsame code as the patch in this message. Ran the testsuite on s390x biarch, s390 and x86_64. On Fri, Jun 24, 2016 at 01:30:44PM +0100, Dominik Vogt wrote: > > The only open question I'm aware of is the > > stack-usage-2.c test. I guess foo3() will not generate > > > > stack usage might be ... bytes > > > > On any target anymore, and using alloca() with a constant size > > results in "unbounded". It's unclear to me whether that message > > is ever generated, and if so, how to trigger it. This point is still open. If nobody has more comments Andreas will commit the (afaik already approved) patch soon and we can clean up the test case in a follow up patch. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany --IS0zKkzwUGydFO0o Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=0001-v4-ChangeLog Content-length: 708 gcc/ChangeLog * cfgexpand.c (expand_stack_vars): Implement synamic stack space allocation in the prologue. * explow.c (get_dynamic_stack_base): New function to return an address expression for the dynamic stack base. (get_dynamic_stack_size): New function to do the required dynamic stack space size calculations. (allocate_dynamic_stack_space): Use new functions. (align_dynamic_address): Move some code from allocate_dynamic_stack_space to new function. * explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export. gcc/testsuite/ChangeLog * gcc.target/s390/warn-dynamicstack-1.c: New test. * gcc.dg/stack-usage-2.c (foo3): Adapt expected warning. stack-layout-dynamic-1.c: New test. --IS0zKkzwUGydFO0o Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0001-v4-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch" Content-length: 16024 >From 83fafd37883e1af3deb2ff13b9fcaefc9d3b7c7e Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Wed, 25 Nov 2015 09:31:19 +0100 Subject: [PATCH] Allocate constant size dynamic stack space in the prologue ... ... and place it in the virtual stack vars area, if the platform supports it. On S/390 this saves adjusting the stack pointer twice and forcing the frame pointer into existence. It also removes the warning with -mwarn-dynamicstack that is triggered by cfun->calls_alloca == 1. This fixes a problem with the Linux kernel which aligns the page structure to 16 bytes at run time using inefficient code and issuing a bogus warning. --- gcc/cfgexpand.c | 21 +- gcc/explow.c | 226 ++++++++++++++------- gcc/explow.h | 8 + gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c | 14 ++ gcc/testsuite/gcc.dg/stack-usage-2.c | 4 +- .../gcc.target/s390/warn-dynamicstack-1.c | 17 ++ 6 files changed, 209 insertions(+), 81 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 56ef71d..f0ef80f 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -1052,7 +1052,9 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data) size_t si, i, j, n = stack_vars_num; HOST_WIDE_INT large_size = 0, large_alloc = 0; rtx large_base = NULL; + rtx large_allocsize = NULL; unsigned large_align = 0; + bool large_allocation_done = false; tree decl; /* Determine if there are any variables requiring "large" alignment. @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data) /* If there were any, allocate space. */ if (large_size > 0) - large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0, - large_align, true); + { + large_allocsize = GEN_INT (large_size); + get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL); + } } for (si = 0; si < n; ++si) @@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data) /* Large alignment is only processed in the last pass. */ if (pred) continue; + + if (large_allocsize && ! large_allocation_done) + { + /* Allocate space the virtual stack vars area in the + prologue. */ + HOST_WIDE_INT loffset; + + loffset = alloc_stack_frame_space + (INTVAL (large_allocsize), + PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT); + large_base = get_dynamic_stack_base (loffset, large_align); + large_allocation_done = true; + } gcc_assert (large_base != NULL); large_alloc += alignb - 1; diff --git a/gcc/explow.c b/gcc/explow.c index 09a0330..d505e98 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -1146,82 +1146,55 @@ record_new_stack_level (void) update_sjlj_context (); } -/* Return an rtx representing the address of an area of memory dynamically - pushed on the stack. +/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET. */ +static rtx +align_dynamic_address (rtx target, unsigned required_align) +{ + /* CEIL_DIV_EXPR needs to worry about the addition overflowing, + but we know it can't. So add ourselves and then do + TRUNC_DIV_EXPR. */ + target = expand_binop (Pmode, add_optab, target, + gen_int_mode (required_align / BITS_PER_UNIT - 1, + Pmode), + NULL_RTX, 1, OPTAB_LIB_WIDEN); + target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, + gen_int_mode (required_align / BITS_PER_UNIT, + Pmode), + NULL_RTX, 1); + target = expand_mult (Pmode, target, + gen_int_mode (required_align / BITS_PER_UNIT, + Pmode), + NULL_RTX, 1); - Any required stack pointer alignment is preserved. + return target; +} - SIZE is an rtx representing the size of the area. +/* Return an rtx through *PSIZE, representing the size of an area of memory to + be dynamically pushed on the stack. + + *PSIZE is an rtx representing the size of the area. SIZE_ALIGN is the alignment (in bits) that we know SIZE has. This - parameter may be zero. If so, a proper value will be extracted + parameter may be zero. If so, a proper value will be extracted from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed. REQUIRED_ALIGN is the alignment (in bits) required for the region of memory. - If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the - stack space allocated by the generated code cannot be added with itself - in the course of the execution of the function. It is always safe to - pass FALSE here and the following criterion is sufficient in order to - pass TRUE: every path in the CFG that starts at the allocation point and - loops to it executes the associated deallocation code. */ - -rtx -allocate_dynamic_stack_space (rtx size, unsigned size_align, - unsigned required_align, bool cannot_accumulate) + If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for + the additional size returned. */ +void +get_dynamic_stack_size (rtx *psize, unsigned size_align, + unsigned required_align, + HOST_WIDE_INT *pstack_usage_size) { - HOST_WIDE_INT stack_usage_size = -1; - rtx_code_label *final_label; - rtx final_target, target; - unsigned extra; - - /* If we're asking for zero bytes, it doesn't matter what we point - to since we can't dereference it. But return a reasonable - address anyway. */ - if (size == const0_rtx) - return virtual_stack_dynamic_rtx; - - /* Otherwise, show we're calling alloca or equivalent. */ - cfun->calls_alloca = 1; - - /* If stack usage info is requested, look into the size we are passed. - We need to do so this early to avoid the obfuscation that may be - introduced later by the various alignment operations. */ - if (flag_stack_usage_info) - { - if (CONST_INT_P (size)) - stack_usage_size = INTVAL (size); - else if (REG_P (size)) - { - /* Look into the last emitted insn and see if we can deduce - something for the register. */ - rtx_insn *insn; - rtx set, note; - insn = get_last_insn (); - if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size)) - { - if (CONST_INT_P (SET_SRC (set))) - stack_usage_size = INTVAL (SET_SRC (set)); - else if ((note = find_reg_equal_equiv_note (insn)) - && CONST_INT_P (XEXP (note, 0))) - stack_usage_size = INTVAL (XEXP (note, 0)); - } - } - - /* If the size is not constant, we can't say anything. */ - if (stack_usage_size == -1) - { - current_function_has_unbounded_dynamic_stack_size = 1; - stack_usage_size = 0; - } - } + unsigned extra = 0; + rtx size = *psize; /* Ensure the size is in the proper mode. */ if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode) size = convert_to_mode (Pmode, size, 1); - /* Adjust SIZE_ALIGN, if needed. */ if (CONST_INT_P (size)) { unsigned HOST_WIDE_INT lsb; @@ -1255,8 +1228,9 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, size = plus_constant (Pmode, size, extra); size = force_operand (size, NULL_RTX); - if (flag_stack_usage_info) - stack_usage_size += extra; + /*!!!*/ + if (flag_stack_usage_info && pstack_usage_size) + *pstack_usage_size += extra; if (extra && size_align > BITS_PER_UNIT) size_align = BITS_PER_UNIT; @@ -1278,13 +1252,89 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, { size = round_push (size); - if (flag_stack_usage_info) + if (flag_stack_usage_info && pstack_usage_size) { int align = crtl->preferred_stack_boundary / BITS_PER_UNIT; - stack_usage_size = (stack_usage_size + align - 1) / align * align; + *pstack_usage_size = + (*pstack_usage_size + align - 1) / align * align; } } + *psize = size; +} + +/* Return an rtx representing the address of an area of memory dynamically + pushed on the stack. + + Any required stack pointer alignment is preserved. + + SIZE is an rtx representing the size of the area. + + SIZE_ALIGN is the alignment (in bits) that we know SIZE has. This + parameter may be zero. If so, a proper value will be extracted + from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed. + + REQUIRED_ALIGN is the alignment (in bits) required for the region + of memory. + + If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the + stack space allocated by the generated code cannot be added with itself + in the course of the execution of the function. It is always safe to + pass FALSE here and the following criterion is sufficient in order to + pass TRUE: every path in the CFG that starts at the allocation point and + loops to it executes the associated deallocation code. */ + +rtx +allocate_dynamic_stack_space (rtx size, unsigned size_align, + unsigned required_align, bool cannot_accumulate) +{ + HOST_WIDE_INT stack_usage_size = -1; + rtx_code_label *final_label; + rtx final_target, target; + + /* If we're asking for zero bytes, it doesn't matter what we point + to since we can't dereference it. But return a reasonable + address anyway. */ + if (size == const0_rtx) + return virtual_stack_dynamic_rtx; + + /* Otherwise, show we're calling alloca or equivalent. */ + cfun->calls_alloca = 1; + + /* If stack usage info is requested, look into the size we are passed. + We need to do so this early to avoid the obfuscation that may be + introduced later by the various alignment operations. */ + if (flag_stack_usage_info) + { + if (CONST_INT_P (size)) + stack_usage_size = INTVAL (size); + else if (REG_P (size)) + { + /* Look into the last emitted insn and see if we can deduce + something for the register. */ + rtx_insn *insn; + rtx set, note; + insn = get_last_insn (); + if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size)) + { + if (CONST_INT_P (SET_SRC (set))) + stack_usage_size = INTVAL (SET_SRC (set)); + else if ((note = find_reg_equal_equiv_note (insn)) + && CONST_INT_P (XEXP (note, 0))) + stack_usage_size = INTVAL (XEXP (note, 0)); + } + } + + /* If the size is not constant, we can't say anything. */ + if (stack_usage_size == -1) + { + current_function_has_unbounded_dynamic_stack_size = 1; + stack_usage_size = 0; + } + } + + get_dynamic_stack_size (&size, size_align, required_align, &stack_usage_size); + target = gen_reg_rtx (Pmode); /* The size is supposed to be fully adjusted at this point so record it @@ -1447,19 +1497,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, target = final_target; } - /* CEIL_DIV_EXPR needs to worry about the addition overflowing, - but we know it can't. So add ourselves and then do - TRUNC_DIV_EXPR. */ - target = expand_binop (Pmode, add_optab, target, - gen_int_mode (required_align / BITS_PER_UNIT - 1, - Pmode), - NULL_RTX, 1, OPTAB_LIB_WIDEN); - target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, - gen_int_mode (required_align / BITS_PER_UNIT, Pmode), - NULL_RTX, 1); - target = expand_mult (Pmode, target, - gen_int_mode (required_align / BITS_PER_UNIT, Pmode), - NULL_RTX, 1); + target = align_dynamic_address (target, required_align); /* Now that we've committed to a return value, mark its alignment. */ mark_reg_pointer (target, required_align); @@ -1469,6 +1507,38 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, return target; } + +/* Return an rtx representing the address of an area of memory already + statically pushed onto the stack in the virtual stack vars area. (It is + assumed that the area is allocated in the function prologue.) + + Any required stack pointer alignment is preserved. + + OFFSET is the offset of the area into the virtual stack vars area. + + REQUIRED_ALIGN is the alignment (in bits) required for the region + of memory. */ + +rtx +get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align) +{ + rtx target; + + if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY) + crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + + target = gen_reg_rtx (Pmode); + emit_move_insn (target, virtual_stack_vars_rtx); + target = expand_binop (Pmode, add_optab, target, + gen_int_mode (offset, Pmode), + NULL_RTX, 1, OPTAB_LIB_WIDEN); + target = align_dynamic_address (target, required_align); + + /* Now that we've committed to a return value, mark its alignment. */ + mark_reg_pointer (target, required_align); + + return target; +} /* A front end may want to override GCC's stack checking by providing a run-time routine to call to check the stack, so provide a mechanism for diff --git a/gcc/explow.h b/gcc/explow.h index 52113db..e12f90c 100644 --- a/gcc/explow.h +++ b/gcc/explow.h @@ -87,6 +87,14 @@ extern void record_new_stack_level (void); /* Allocate some space on the stack dynamically and return its address. */ extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool); +/* Calculate the necessary size of a constant dynamic stack allocation from the + size of the variable area. */ +extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *); + +/* Returns the address of the dynamic stack space without allocating it. */ +extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset, + unsigned required_align); + /* Emit one stack probe at ADDRESS, an address within the stack. */ extern void emit_stack_probe (rtx); diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c new file mode 100644 index 0000000..6dc17af --- /dev/null +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c @@ -0,0 +1,14 @@ +/* Verify that run time aligned local variables are aloocated in the prologue + in one pass together with normal local variables. */ +/* { dg-do compile } */ +/* { dg-options "-O0 -fomit-frame-pointer" } */ + +extern void bar (void *, void *, void *); +void foo (void) +{ + int i; + __attribute__ ((aligned(65536))) char runtime_aligned_1[512]; + __attribute__ ((aligned(32768))) char runtime_aligned_2[1024]; + bar (&i, &runtime_aligned_1, &runtime_aligned_2); +} +/* { dg-final { scan-assembler-not "cfi_def_cfa_register" } } */ diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c index 86e2a65..1a9e7f3 100644 --- a/gcc/testsuite/gcc.dg/stack-usage-2.c +++ b/gcc/testsuite/gcc.dg/stack-usage-2.c @@ -16,7 +16,9 @@ int foo2 (void) /* { dg-warning "stack usage is \[0-9\]* bytes" } */ return 0; } -int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */ +/* The actual warning depends on whether stack space is allocated dynamically + or statically. */ +int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */ { char arr[1024] __attribute__((aligned (512))); arr[0] = 1; diff --git a/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c new file mode 100644 index 0000000..66913f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c @@ -0,0 +1,17 @@ +/* Check that the stack pointer is decreased only once in a funtion with + runtime aligned stack variables and -mwarn-dynamicstack does not generate a + warning. */ + +/* { dg-do compile { target { s390*-*-* } } } */ +/* { dg-options "-O2 -mwarn-dynamicstack" } */ + +extern int bar (char *pl); + +int foo (long size) +{ + char __attribute__ ((aligned(16))) l = size; + + return bar (&l); +} + +/* { dg-final { scan-assembler-times "%r15,-" 1 } } */ -- 2.3.0 --IS0zKkzwUGydFO0o--