From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8081 invoked by alias); 20 Apr 2011 08:55:01 -0000 Received: (qmail 8063 invoked by uid 22791); 20 Apr 2011 08:54:59 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,TW_QE X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (94.185.240.25) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Wed, 20 Apr 2011 08:54:40 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Wed, 20 Apr 2011 09:48:30 +0100 Received: from [10.1.67.34] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Wed, 20 Apr 2011 09:48:26 +0100 Subject: Re: [google] remove redundant push {lr} for -mthumb (issue4441050) From: Richard Earnshaw To: Carrot Wei Cc: reply@codereview.appspotmail.com, dougkwan@google.com, gcc-patches@gcc.gnu.org In-Reply-To: References: <20110419094104.B5D2A20562@guozhiwei.sha.corp.google.com> <1303217757.17819.66.camel@e102346-lin.cambridge.arm.com> Date: Wed, 20 Apr 2011 09:27:00 -0000 Message-Id: <1303289306.2645.10.camel@e102346-lin.cambridge.arm.com> Mime-Version: 1.0 X-MC-Unique: 111042009483004201 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-04/txt/msg01622.txt.bz2 On Wed, 2011-04-20 at 16:26 +0800, Carrot Wei wrote: > On Tue, Apr 19, 2011 at 8:55 PM, Richard Earnshaw wrot= e: > > > > On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote: > >> Reload pass tries to determine the stack frame, so it needs to check t= he > >> push/pop lr optimization opportunity. One of the criteria is if there = is any > >> far jump inside the function. Unfortunately at this time gcc can't dec= ide each > >> instruction's length and basic block layout, so it can't know the offs= et of > >> a jump. To be conservative it assumes every jump is a far jump. So any= jump > >> in a function will prevent this push/pop lr optimization. > >> > >> To enable the push/pop lr optimization in reload pass, I compute the p= ossible > >> maximum length of the function body. If the length is not large enough= , far > >> jump is not necessary, so we can safely do push/pop lr optimization. > >> > >> Tested on arm qemu with options -march=3Darmv5te -mthumb, without regr= ession. > >> > >> This patch is for google/main. > >> > >> 2011-04-19 Guozhi Wei > >> > >> Google ref 40255. > >> * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant. > >> (estimate_function_length): New function. > >> (thumb_far_jump_used_p): No far jump is needed in short function. > >> > > > > Setting aside for the moment Richi's issue with hot/cold sections, this > > isn't safe. Firstly get_attr_length() doesn't return the worst case > > length; and secondly, it doesn't take into account the size of reload > > insns that are still on the reloads stack -- these are only emitted > > right at the end of the reload pass. Both of these would need to be > > addressed before this can be safely done. > > > > It's worth noting here that in the dim and distant past we used to try > > to estimate the size of the function and eliminate redundant saves of > > R14, but the code had to be removed because it was too fragile; but it > > looks like some vestiges of the code are still in the compiler. > > > > A slightly less optimistic approach, but one that is much safer is to > > scan the function after reload has completed and see if we can avoid > > having to push LR. We can do this if: > > > I guess "less optimistic" is relative to the ideal optimization > situation, I believe it is still much better than current result. Do > you think if arm_reorg() is appropriate place to do this? >=20 Making the decision in a single pass would certainly be the best approach; and arm_reorg is certainly going to come after all other major code re-arrangements. Indeed, you should probably do this after the minipool placement so that you can be sure that these don't bulk up the body of the function too much. As you are doing the elimination late on in the compilation you can do a better job of estimation by calling shorten_branches() to work out the precise length of each insn. Then you can simply scan over the insns to work out if there is a branch that still needs r14. R.