From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20079 invoked by alias); 16 Oct 2008 18:52:19 -0000 Received: (qmail 20067 invoked by uid 22791); 16 Oct 2008 18:52:18 -0000 X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 16 Oct 2008 18:51:42 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id m9GIldWX008408; Thu, 16 Oct 2008 14:47:39 -0400 Received: from omfg.slc.redhat.com (vpn-12-169.rdu.redhat.com [10.11.12.169]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m9GIlSvn025473; Thu, 16 Oct 2008 14:47:31 -0400 Message-ID: <48F78BD3.5080907@redhat.com> Date: Thu, 16 Oct 2008 22:02:00 -0000 From: Jeff Law User-Agent: Thunderbird 2.0.0.16 (X11/20080723) MIME-Version: 1.0 To: luisgpm@linux.vnet.ibm.com CC: Andrew Pinski , sje@cup.hp.com, Richard Henderson , Peter Bergner , gcc-patches@gcc.gnu.org, paolo.carlini@oracle.com Subject: Re: Patch to fix gcc.c-torture/compile/20010102-1.c on IA64 HP-UX References: <200809161651.m8GGpEM19079@lucas.cup.hp.com> <1221596846.17787.28.camel@hpsje.cup.hp.com> <48D0248E.5030505@redhat.com> <1221601143.17787.40.camel@hpsje.cup.hp.com> <48D2B834.7040701@redhat.com> <1221847674.4972.27.camel@hpsje.cup.hp.com> <48D953DC.6080200@redhat.com> <1222202669.19545.1.camel@hpsje.cup.hp.com> <1223062006.612.13.camel@gargoyle> <48E6BB7D.8000809@redhat.com> <1224182072.8846.94.camel@gargoyle> In-Reply-To: <1224182072.8846.94.camel@gargoyle> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2008-10/txt/msg00723.txt.bz2 Luis Machado wrote: > On Fri, 2008-10-03 at 17:46 -0700, Andrew Pinski wrote: > >> On Fri, Oct 3, 2008 at 5:40 PM, Jeff Law wrote: >> >>> I'd be hard pressed to see how this patch could cause any kind of >>> performance regression since it's merely propagating information to >>> >> the >> >>> backend that was getting lost in regrename and I don't see that the >>> >> ppc uses >> >>> REG_POINTER in its backend. >>> >> PPC does not use it but the way lwx is used is [base+index] which >> should really be done that way, otherwise it will have some stalls in >> some cases (on Power 6). >> >> Thanks, >> Andrew Pinski >> > > After further investigation, there is some light to this problem. > > There appears to be a bad interaction between REG_POINTER and RTL alias > analysis, as we see below. > > The epilogue for these two functions from vortex (Mem_GetAddr, > Mem_GetWord) are the following, for both revisions: > > Mem_GetAddr (revision 140615): > > .L3: > mr 3,29 > mr 4,30 > bl Ut_PrintErr > lwz 7,36(1) > lwz 26,8(1) > * lis 8,Test@ha > lwz 27,12(1) > lwz 28,16(1) > mtlr 7 > mr 0,3 > lwz 29,20(1) > lwz 30,24(1) > mr 3,0 > lwz 31,28(1) > * stw 0,Test@l(8) > addi 1,1,32 > blr > > Mem_GetAddr (revision 140616): > > .L3: > mr 3,29 > mr 4,30 > bl Ut_PrintErr > * lis 8,Test@ha > mr 0,3 > * stw 0,Test@l(8) > mr 3,0 > lwz 7,36(1) > lwz 26,8(1) > lwz 27,12(1) > lwz 28,16(1) > mtlr 7 > lwz 29,20(1) > lwz 30,24(1) > lwz 31,28(1) > addi 1,1,32 > blr > > > Mem_GetWord (revisions 140615 and 140616): > > .L8: > lwz 5,0(31) > li 0,1 > cmpwi 6,5,0 > bne- 6,.L9 > lwz 10,36(1) > lwz 26,8(1) > * lis 11,Test@ha > mr 3,0 > lwz 27,12(1) > lwz 28,16(1) > mtlr 10 > * stw 0,Test@l(11) > lwz 29,20(1) > lwz 30,24(1) > lwz 31,28(1) > addi 1,1,32 > blr > > Clearly, the code generation for Mem_GetWord is unaffected by the > changes from revision 140616. > > The problem happens with Mem_GetAddr/revision 140616. We can see that > "lis 8,Test@ha" and "stw 0,Test@l(8)" are very close to each other, > leading to possible stalls in the pipeline, which doesn't happen with > the code generated by revision 140615. > > The positioning of "lis" and "stw" we see in revision 140616 is due to a > number of dependencies created wrt the following lwz instructions, so > the "stw" needs to execute before that, avoiding the possibility to move > "stw" further down. > > So we're left with the question of why these dependencies are being > created. > > Digging further, revision 140616 enabled some registers to store a value > meaning its contents are, in fact, a pointer, and we access this data > with REG_POINTER(...). > > Looking at this specific code inside alias.c:find_base_term: > > if (REG_P (tmp1) && REG_POINTER (tmp1)) > return find_base_term (tmp1); > > if (REG_P (tmp2) && REG_POINTER (tmp2)) > return find_base_term (tmp2); > > Since up until revision 140615 we didn't store the specific information > mentioned above, we just flew by these conditions and returned either > tmp1 or tmp2, not going into the recursive call. > > With revision 140616, we actually have the pointer-in-reg information, > and thus we go into the recursive call to either "find_base_term (tmp1)" > or "find_base_term (tmp2)". > > This should work as expected, as it really does for Mem_GetWord, but for > Mem_GetAddr, the recursive call to "find_base_term (tmp1)" returns NULL. > > This leads us back to "init_alias_analysis", where we actually fill up > the reg_base_value vector that is used, inside find_base_term, to return > the base term we want. > > Keeping it short, the 8th entry in that vector, that should've contained > the correct base term, is blank due to the 8th register being defined > and set before the "stw" instruction in the epilogue. This specific > situation led to all the others, including the performance degradation. > > This also relates to the -fno-strict-aliasing flag, which sets aliases > to 0 and we need to do extra work to sort out the conflicting pieces. > Without -fno-strict-aliasing, the code is generated correctly. > > So, it seems the RTL alias analysis framework in GCC is not designed to > handle REG_POINTER" on hard registers. I would like to suggest that we > return to Peter Bergner's proposed solution. > Then it's the RTL alias analysis that needs to be fixed -- Steve's solution to regrename is the correct solution. Jeff