From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32392 invoked by alias); 6 Nov 2008 18:00:43 -0000 Received: (qmail 32357 invoked by uid 22791); 6 Nov 2008 18:00:41 -0000 X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 06 Nov 2008 18:00:05 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id mA6Hu2cn007048; Thu, 6 Nov 2008 12:56:02 -0500 Received: from omfg.slc.redhat.com (vpn-13-102.rdu.redhat.com [10.11.13.102]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id mA6Htx2S005268; Thu, 6 Nov 2008 12:56:00 -0500 Message-ID: <49132F0A.7070306@redhat.com> Date: Thu, 06 Nov 2008 18:25:00 -0000 From: Jeff Law User-Agent: Thunderbird 2.0.0.16 (X11/20080723) MIME-Version: 1.0 To: luisgpm@linux.vnet.ibm.com CC: Andrew Pinski , sje@cup.hp.com, Richard Henderson , Peter Bergner , gcc-patches@gcc.gnu.org, paolo.carlini@oracle.com Subject: Re: Patch to fix gcc.c-torture/compile/20010102-1.c on IA64 HP-UX References: <200809161651.m8GGpEM19079@lucas.cup.hp.com> <1221596846.17787.28.camel@hpsje.cup.hp.com> <48D0248E.5030505@redhat.com> <1221601143.17787.40.camel@hpsje.cup.hp.com> <48D2B834.7040701@redhat.com> <1221847674.4972.27.camel@hpsje.cup.hp.com> <48D953DC.6080200@redhat.com> <1222202669.19545.1.camel@hpsje.cup.hp.com> <1223062006.612.13.camel@gargoyle> <48E6BB7D.8000809@redhat.com> <1224182072.8846.94.camel@gargoyle> <48F78BD3.5080907@redhat.com> <1225370489.23514.9.camel@gargoyle> In-Reply-To: <1225370489.23514.9.camel@gargoyle> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2008-11/txt/msg00243.txt.bz2 Luis Machado wrote: > On Thu, 2008-10-16 at 12:45 -0600, Jeff Law wrote: > >> Luis Machado wrote: >> >>> On Fri, 2008-10-03 at 17:46 -0700, Andrew Pinski wrote: >>> >>> >>>> On Fri, Oct 3, 2008 at 5:40 PM, Jeff Law wrote: >>>> >>>> >>>>> I'd be hard pressed to see how this patch could cause any kind of >>>>> performance regression since it's merely propagating information to >>>>> >>>>> >>>> the >>>> >>>> >>>>> backend that was getting lost in regrename and I don't see that the >>>>> >>>>> >>>> ppc uses >>>> >>>> >>>>> REG_POINTER in its backend. >>>>> >>>>> >>>> PPC does not use it but the way lwx is used is [base+index] which >>>> should really be done that way, otherwise it will have some stalls in >>>> some cases (on Power 6). >>>> >>>> Thanks, >>>> Andrew Pinski >>>> >>>> >>> After further investigation, there is some light to this problem. >>> >>> There appears to be a bad interaction between REG_POINTER and RTL alias >>> analysis, as we see below. >>> >>> The epilogue for these two functions from vortex (Mem_GetAddr, >>> Mem_GetWord) are the following, for both revisions: >>> >>> Mem_GetAddr (revision 140615): >>> >>> .L3: >>> mr 3,29 >>> mr 4,30 >>> bl Ut_PrintErr >>> lwz 7,36(1) >>> lwz 26,8(1) >>> * lis 8,Test@ha >>> lwz 27,12(1) >>> lwz 28,16(1) >>> mtlr 7 >>> mr 0,3 >>> lwz 29,20(1) >>> lwz 30,24(1) >>> mr 3,0 >>> lwz 31,28(1) >>> * stw 0,Test@l(8) >>> addi 1,1,32 >>> blr >>> >>> Mem_GetAddr (revision 140616): >>> >>> .L3: >>> mr 3,29 >>> mr 4,30 >>> bl Ut_PrintErr >>> * lis 8,Test@ha >>> mr 0,3 >>> * stw 0,Test@l(8) >>> mr 3,0 >>> lwz 7,36(1) >>> lwz 26,8(1) >>> lwz 27,12(1) >>> lwz 28,16(1) >>> mtlr 7 >>> lwz 29,20(1) >>> lwz 30,24(1) >>> lwz 31,28(1) >>> addi 1,1,32 >>> blr >>> >>> >>> Mem_GetWord (revisions 140615 and 140616): >>> >>> .L8: >>> lwz 5,0(31) >>> li 0,1 >>> cmpwi 6,5,0 >>> bne- 6,.L9 >>> lwz 10,36(1) >>> lwz 26,8(1) >>> * lis 11,Test@ha >>> mr 3,0 >>> lwz 27,12(1) >>> lwz 28,16(1) >>> mtlr 10 >>> * stw 0,Test@l(11) >>> lwz 29,20(1) >>> lwz 30,24(1) >>> lwz 31,28(1) >>> addi 1,1,32 >>> blr >>> >>> Clearly, the code generation for Mem_GetWord is unaffected by the >>> changes from revision 140616. >>> >>> The problem happens with Mem_GetAddr/revision 140616. We can see that >>> "lis 8,Test@ha" and "stw 0,Test@l(8)" are very close to each other, >>> leading to possible stalls in the pipeline, which doesn't happen with >>> the code generated by revision 140615. >>> >>> The positioning of "lis" and "stw" we see in revision 140616 is due to a >>> number of dependencies created wrt the following lwz instructions, so >>> the "stw" needs to execute before that, avoiding the possibility to move >>> "stw" further down. >>> >>> So we're left with the question of why these dependencies are being >>> created. >>> >>> Digging further, revision 140616 enabled some registers to store a value >>> meaning its contents are, in fact, a pointer, and we access this data >>> with REG_POINTER(...). >>> >>> Looking at this specific code inside alias.c:find_base_term: >>> >>> if (REG_P (tmp1) && REG_POINTER (tmp1)) >>> return find_base_term (tmp1); >>> >>> if (REG_P (tmp2) && REG_POINTER (tmp2)) >>> return find_base_term (tmp2); >>> >>> Since up until revision 140615 we didn't store the specific information >>> mentioned above, we just flew by these conditions and returned either >>> tmp1 or tmp2, not going into the recursive call. >>> >>> With revision 140616, we actually have the pointer-in-reg information, >>> and thus we go into the recursive call to either "find_base_term (tmp1)" >>> or "find_base_term (tmp2)". >>> >>> This should work as expected, as it really does for Mem_GetWord, but for >>> Mem_GetAddr, the recursive call to "find_base_term (tmp1)" returns NULL. >>> >>> This leads us back to "init_alias_analysis", where we actually fill up >>> the reg_base_value vector that is used, inside find_base_term, to return >>> the base term we want. >>> >>> Keeping it short, the 8th entry in that vector, that should've contained >>> the correct base term, is blank due to the 8th register being defined >>> and set before the "stw" instruction in the epilogue. This specific >>> situation led to all the others, including the performance degradation. >>> >>> This also relates to the -fno-strict-aliasing flag, which sets aliases >>> to 0 and we need to do extra work to sort out the conflicting pieces. >>> Without -fno-strict-aliasing, the code is generated correctly. >>> >>> So, it seems the RTL alias analysis framework in GCC is not designed to >>> handle REG_POINTER" on hard registers. I would like to suggest that we >>> return to Peter Bergner's proposed solution. >>> >>> >> Then it's the RTL alias analysis that needs to be fixed -- Steve's >> solution to regrename is the correct solution. >> >> Jeff >> > > Considering that the current RTL analysis code was not thought to handle > this scenario, wouldn't a proper fix to the RTL code together with the > REG_POINTER solution be appropriate? > It would actually be helpful to see the RTL in question to help guide suggestions for how to fix the aliasing code. For the code to have the effect you've described, I think the RTL must have had a form like (minus/plus/lo_sum (reg) (symbol_ref)) Where (reg) is a hard register marked as a pointer. If that's the case, then it seems to me you can get the behaviour you want with a fairly simple change. if (REG_P (tmp1) && REG_POINTER (tmp1)) { rtx base = find_base_term (tmp1) if (base) return base; } And similarly for tmp2. Basically this would cause us to fall through those two tests and not return NULL too early in the case you've described. We'd then fall into the code immediately after which will see if the base term is a named object or special address and if so, it'll use the named objected/special address. Can you try that and see if it helps. If it doesn't, then you're going to need to provide more information (RTL) causing the problematical behaviour. jeff