From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 129217 invoked by alias); 26 Jul 2016 15:53:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 129207 invoked by uid 89); 26 Jul 2016 15:53:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=resulted, H*m:linux, vogt, dominik X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 26 Jul 2016 15:53:28 +0000 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u6QFiIB5003245 for ; Tue, 26 Jul 2016 11:53:26 -0400 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0b-001b2d01.pphosted.com with ESMTP id 24e5wbyfgs-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 26 Jul 2016 11:53:25 -0400 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 26 Jul 2016 16:53:24 +0100 Received: from d06dlp02.portsmouth.uk.ibm.com (9.149.20.14) by e06smtp14.uk.ibm.com (192.168.101.144) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 26 Jul 2016 16:53:22 +0100 X-IBM-Helo: d06dlp02.portsmouth.uk.ibm.com X-IBM-MailFrom: vogt@linux.vnet.ibm.com X-IBM-RcptTo: gcc-patches@gcc.gnu.org;law@redhat.com Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id 1352B219005F; Tue, 26 Jul 2016 16:52:49 +0100 (BST) Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u6QFrKeo19070978; Tue, 26 Jul 2016 15:53:22 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AF27D52075; Tue, 26 Jul 2016 15:49:07 +0100 (BST) Received: from oc5510024614.ibm.com (unknown [9.145.21.190]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 4D02252051; Tue, 26 Jul 2016 15:49:07 +0100 (BST) Received: by oc5510024614.ibm.com (Postfix, from userid 500) id E132317DC9; Tue, 26 Jul 2016 17:53:18 +0200 (CEST) Date: Tue, 26 Jul 2016 15:53:00 -0000 From: Dominik Vogt To: Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel Subject: Re: [PATCH 2/2][v4] Drop excess size used for run time allocated stack variables. Reply-To: vogt@linux.vnet.ibm.com Mail-Followup-To: vogt@linux.vnet.ibm.com, Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel References: <20160429221242.GA2205@linux.vnet.ibm.com> <20160503141753.GA17351@linux.vnet.ibm.com> <20160525133054.GA6938@linux.vnet.ibm.com> <20160525133241.GB6938@linux.vnet.ibm.com> <0d821ecb-89bb-06e4-fbce-501e59d418f8@redhat.com> <20160623095741.GA9623@linux.vnet.ibm.com> <59557a2d-0a6c-46d5-8e0a-02b7845c8734@redhat.com> <20160722095522.GA24126@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="UugvWAfsgieZRqgk" Content-Disposition: inline In-Reply-To: <20160722095522.GA24126@linux.vnet.ibm.com> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16072615-0016-0000-0000-00000214095A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16072615-0017-0000-0000-0000226A29D9 Message-Id: <20160726155317.GA5718@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-26_10:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607260170 X-SW-Source: 2016-07/txt/msg01730.txt.bz2 --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 2611 Finally a patch that works and is simple. Bootstrapped and regression tested on s390, s390x biarch and x86_64. The new patch exploits the known alignment of (stack pointer + STACK_DYNAMIC_OFFSET) as described earlier (see below). I think that is the right way to get rid of the extra allocation. It took a long time to understand the problem. As the patch triggers a bug in the fortran compiler, the der_type.f90 test case may fail on some targets if this patch is used without the fortran fix that I've posted in another thread. (The patch also contains a fix for a typo in a comment in the patched function.) See ChangeLog for a full description of the new patch. Since the patch is all new, we're not going to commit it without a new OK. > Actually I was goind to abandon the patch in its current state. > :-) We talked about it internally and concluded that the problem > is really this: > > * get_dynamic_stack_size is passed a SIZE of a data block (which > is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the > alignment of the underlying memory units (e.g. 32 bytes split > into 4 times 8 bytes = 64 bit alignment) and the > REQUIRED_ALIGN of the data portion of the allocated memory. > * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8 > and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9. > This is what is needed to have two bytes 8-byte-aligned at some > memory location without any known alignment. > * Finally round_push is called to round up SIZE to a multiple of > the stack slot size. > > The key to understanding this is that the function assumes that > STACK_DYNMAIC_OFFSET is completely unknown at the time its called > and therefore it does not make assumptions about the alignment of > STACKPOINTER + STACK_DYNMAIC_OFFSET. The latest patch simply > hard-codes that SP + SDO is supposed to be aligned to at least > stack slot size (and does that in a very complicated way). Since > there is no guarantee that this is the case on all targets, the > patch is broken. It may miscalculate a SIZE that is too small in > some cases. > > However, on many targets there is some guarantee about the > alignment of SP + SDO even if the actual value of SDO is unknown. > On s390x it's always 8-byte-aligned (stack slot size). So the > right fix should be to add knowledge about the target's guaranteed > alignment of SP + SDO to the function. I'm right now testing a > much simpler patch that uses > REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the > alignment. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=0001-v4-ChangeLog Content-length: 193 gcc/ChangeLog * explow.c (get_dynamic_stack_size): Take known alignment of stack pointer + STACK_DYNAMIC_OFFSET into account when calculating the size needed. Correct a typo in a comment. --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0001-v4-Reduce-size-allocated-for-run-time-allocated-stack-v.patch" Content-length: 2399 >From 7d7a6b7fdb189759eb11b96c93ee4adfa8608a97 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Fri, 29 Apr 2016 08:36:59 +0100 Subject: [PATCH] Reduce size allocated for run time allocated stack variables. The present calculation sometimes led to more stack memory being used than necessary with alloca. First, (STACK_BOUNDARY -1) would be added to the allocated size: size = plus_constant (Pmode, size, extra); size = force_operand (size, NULL_RTX); Then round_push was called and added another (STACK_BOUNDARY - 1) before rounding down to a multiple of STACK_BOUNDARY. On s390x this resulted in adding 14 before rounding down for "x" in the test case pr36728-1.c. The problem was that get_dynamic_stack_size did not take into account that the target might guarantee some alignment of (stack_pointer + STACK_DYNAMIC_OFFSET) even if the value of STACK_DYNAMIC_OFFSET is not known yet. --- gcc/explow.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/gcc/explow.c b/gcc/explow.c index a345690..f97a214 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -1224,9 +1224,15 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align, example), so we must preventively align the value. We leave space in SIZE for the hole that might result from the alignment operation. */ - extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; - size = plus_constant (Pmode, size, extra); - size = force_operand (size, NULL_RTX); + unsigned known_align = REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM); + if (known_align == 0) + known_align = BITS_PER_UNIT; + if (required_align > known_align) + { + extra = (required_align - known_align) / BITS_PER_UNIT; + size = plus_constant (Pmode, size, extra); + size = force_operand (size, NULL_RTX); + } if (flag_stack_usage_info && pstack_usage_size) *pstack_usage_size += extra; @@ -1235,7 +1241,7 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align, size_align = BITS_PER_UNIT; /* Round the size to a multiple of the required stack alignment. - Since the stack if presumed to be rounded before this allocation, + Since the stack is presumed to be rounded before this allocation, this will maintain the required alignment. If the stack grows downward, we could save an insn by subtracting -- 2.3.0 --UugvWAfsgieZRqgk--