From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18003 invoked by alias); 14 Nov 2007 11:17:45 -0000 Received: (qmail 17993 invoked by uid 22791); 14 Nov 2007 11:17:44 -0000 X-Spam-Check-By: sourceware.org Received: from py-out-1112.google.com (HELO py-out-1112.google.com) (64.233.166.179) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 14 Nov 2007 11:17:40 +0000 Received: by py-out-1112.google.com with SMTP id a29so3070866pyi for ; Wed, 14 Nov 2007 03:17:38 -0800 (PST) Received: by 10.65.54.9 with SMTP id g9mr18923167qbk.1195039058556; Wed, 14 Nov 2007 03:17:38 -0800 (PST) Received: by 10.65.232.20 with HTTP; Wed, 14 Nov 2007 03:17:38 -0800 (PST) Message-ID: <84fc9c000711140317o5abefc88n1c01e562d46baed0@mail.gmail.com> Date: Wed, 14 Nov 2007 12:41:00 -0000 From: "Richard Guenther" To: "Diego Novillo" Subject: Re: Fix PR 33870 Cc: gcc-patches@gcc.gnu.org In-Reply-To: <84fc9c000711140255m22880809w7611ad878918bf96@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <84fc9c000711080245n4f52e669gbcecaa3485f55161@mail.gmail.com> <4739C673.5080905@google.com> <84fc9c000711130808s5ecc11fcp1df00f20f379b76c@mail.gmail.com> <84fc9c000711140255m22880809w7611ad878918bf96@mail.gmail.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2007-11/txt/msg00785.txt.bz2 On Nov 14, 2007 11:55 AM, Richard Guenther wrote: > > On Nov 13, 2007 5:08 PM, Richard Guenther wrote: > > > > On Nov 13, 2007 4:44 PM, Diego Novillo wrote: > > > Richard Guenther wrote: > > > > > > > taking the address of result.x.pDirty causes the SFT to be marked as > > > > in substructure and the problem re-surfaces. > > > > > > Yes. Thanks for the test case. The original patch did not take into > > > account that the nesting level of the memory reference is also important. > > > > > > There are two problems here: > > > > > > 1- Given an arbitrary memory expression M dereferencing a pointer P into > > > a nested structure, the offset computed for M is the offset starting at > > > P. However, the offsets computed for the original SFTs are offsets from > > > the top of the pointed-to memory object O. So, when we recognize this > > > situation, we must adjust the offsets. This was the situation > > > recognized and fixed by the original patch. > > > > > > The bug was in the detection of whether P is "pointing to the middle of > > > an object". This is where the nesting level comes in to play. Suppose > > > that M was the expression p_3->b[3] and the original pointed-to object > > > was v.x.y: > > > > > > struct { > > > ... > > > struct { > > > ... > > > struct { > > > ... > > > int b[4]; > > > } y; > > > } x; > > > } v; > > > > > > p_3 = &v.x.y; > > > > > > If alias analysis created SFTs for this structure, the array 'b' will > > > have 4 SFTs associated with it. Say SFT.12, SFT.13, SFT.14 and SFT.15. > > > However, only SFT.12 will be added to the alias set of p_3's name tag. > > > > > > So, when we see the expression 'p_3->b[3]', we compute an offset of 96, > > > but we need SFT.14 which will probably have an offset much higher than > > > that. So, we adjust 96 by the offset of SFT.12 and get to SFT.14. > > > > > > In the original patch, we recognized this situation by asking whether > > > the SFT.12 was inside a nested structure. However, it may happen that > > > the pointer was actually pointing to the base of the original object > > > 'v'. In which case, we would have the expression 'p_3->x.y.b[3]', which > > > also takes us to SFT.12. > > > > > > But in this case, the offset we computed from 'p_3->x.y.b[3]' is > > > *already* the right offset, so adding the offset of SFT.12 to it gives > > > us the wrong offset. So, what this patch does is determine the nesting > > > level of each SFT and the nesting level of the memory expression. We > > > only need to adjust the offset of an SFT if the nesting level of the > > > memory expression is lower than the nesting level of the SFT that we are > > > analyzing. > > > > > > > > > 2- The second problem we have with this scheme is that we rely on the > > > presence of key SFTs to adjust offsets when we detect references to > > > nested aggregates. This is something we cannot easily undo because > > > inside the operand scanner we cannot really start doing use-def chain > > > chasing to determine the original pointed-to objects. This work has > > > already been done by alias analysis, so duplicating this in the operand > > > scanner would be wasteful. > > > > > > This means that this scheme cannot tolerate these key SFTs to be > > > partitioned away. If they are partitioned, then inside the operand > > > scanner is essentially impossible to determine where in the structure is > > > an arbitrary memory expression referring to. This is shown in the new > > > test gcc.dg/tree-ssa/alias-16.c. > > > > > > One way to deal with this would be to use the pointed-to set for the > > > base pointer instead of the alias set of the name tag. This has the > > > problem that we'd miss the compile-time benefits of partitioning, as we > > > would be traversing the original sets instead of the partitioned > > > (smaller) alias sets. Also, we may want to get rid of the duplicate > > > pointed-to/alias sets in the future (not sure if they're both worth > > > keeping). > > > > > > The other way of dealing with this is to recognize which SFTs are > > > important to not partition and make sure they are never partitioned. > > > During alias analysis, if a nested SFT is added to the alias set of a > > > pointer, then it is marked as unpartitionable. The partitioner, in > > > turn, gives these SFTs a very high score and makes sure that they are > > > never added to a partition. The effects of this are/should be > > > negligible as it only involves a single SFT and nested structures. > > > > > > The new test alias-16.c tests for these edge cases. > > > > > > Tested on x86_64 and ppc64. Committed to trunk. > > > > Thanks. Now, you should be able to do all this without introducing > > all the nesting-level stuff if you make all pointed-to SFTs unpartitionable. > > > > In fact, > > > > > - if (SFT_IN_NESTED_STRUCT (var)) > > > + if (full_ref > > > + && SFT_NESTING_LEVEL (var) > 0 > > > + && ref_nesting_level (full_ref) < SFT_NESTING_LEVEL (var)) > > > { > > > /* Since VAR is an SFT inside a nested structure, the OFFSET > > > computed by get_ref_base_and_extent is the offset from the > > > - start of the immediately containing structure. However, to > > > - find out what other SFTs are affected by this reference, we > > > - need to know the offsets starting at the root structure in > > > - the nesting hierarchy. > > > + start of the immediately containing structure. If VAR is an > > > + SFT inside a nested structure, then FULL_REF may be a > > > + reference to the structure immediately enclosing SFT, and so > > > + OFFSET will be the offset from the start of the immediately > > > + enclosing structure. > > > + > > > + However, to find out what other SFTs are affected by this > > > + reference, we need to know the offsets starting at the root > > > + structure in the nesting hierarchy. > > > > > > For instance, given the following structure: > > > > This looks like a pure workaround for that you do not retain the offset > > zero SFT as unpartitionable. Otherwise you'd enter this function with > > it (if it is pointed-to), and you can always add the pointed-to SFT offset. > > > > I believe your patch makes it more twisted than it needs to be (it doesn't > > even look like an optimization, as you are scanning for pointed-to offset > > zero multiple times, for all pointed-to SFTs with a nesting level > > > than the ref). Oh, and as you basically go back towards my first fix that was reverted this patch causes compile-time regressions and we hit internal compiler error: in ssa_operand_alloc, at tree-ssa-operands.c:484 more often as well. (I believe the latter is because the max-aliased-vops limit is a per-function one while a per-stmt one would be more useful, though I see that it is difficult to estimate per-stmt VOP reduction during partitioning. But as we do the limit per-function we are not efficient in preventing this ICE) Richard.