From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14930 invoked by alias); 19 Dec 2007 04:30:38 -0000 Received: (qmail 14916 invoked by uid 22791); 19 Dec 2007 04:30:37 -0000 X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 19 Dec 2007 04:30:30 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.1) with ESMTP id lBJ4S6Er010156; Tue, 18 Dec 2007 23:28:06 -0500 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [10.11.255.20]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lBJ4S6qf032221; Tue, 18 Dec 2007 23:28:06 -0500 Received: from livre.oliva.athome.lsd.ic.unicamp.br (vpn-14-186.rdu.redhat.com [10.11.14.186]) by pobox.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lBJ4S4gT028220; Tue, 18 Dec 2007 23:28:05 -0500 Received: from livre.oliva.athome.lsd.ic.unicamp.br (localhost.localdomain [127.0.0.1]) by livre.oliva.athome.lsd.ic.unicamp.br (8.14.2/8.14.1) with ESMTP id lBJ4S22I017098; Wed, 19 Dec 2007 02:28:02 -0200 Received: (from aoliva@localhost) by livre.oliva.athome.lsd.ic.unicamp.br (8.14.2/8.14.1/Submit) id lBJ4RxIH017097; Wed, 19 Dec 2007 02:27:59 -0200 To: "Daniel Berlin" Cc: "Diego Novillo" , "Mark Mitchell" , "Robert Dewar" , "Ian Lance Taylor" , "Richard Guenther" , gcc-patches@gcc.gnu.org, gcc@gcc.gnu.org Subject: Re: Designs for better debug info in GCC References: <4aca3dc20712161712w1133fb96qd66be0e9a0bb1716@mail.gmail.com> <4766B8E5.60500@google.com> <4766DF5C.1020802@google.com> <47671BF4.5050704@google.com> <4aca3dc20712181519rb637c16oea78bbcc18abe097@mail.gmail.com> From: Alexandre Oliva Errors-To: aoliva@oliva.athome.lsd.ic.unicamp.br Date: Wed, 19 Dec 2007 04:35:00 -0000 In-Reply-To: <4aca3dc20712181519rb637c16oea78bbcc18abe097@mail.gmail.com> (Daniel Berlin's message of "Tue\, 18 Dec 2007 18\:19\:31 -0500") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2007-12/txt/msg00557.txt.bz2 On Dec 18, 2007, "Daniel Berlin" wrote: >> int c = z; >> whatever0(c); >> c = x; > Because you have added information you have no way of knowing. > How exactly did you compute that the call *definitely sets c to the > value of z_0*, and definitely sets the value of c to x_2. Err... I guess you're thinking memory, global variables, alias analysis and that sort of stuff. None of this applies to gimple registers, which is all the annotations are about. Yes, aliasing, memory references and must- and may-alias do play a role at the time of turning the annotations into equivalence classes, when memory locations that are not stack slots allocated to gimple regs that couldn't get hardware registers show up in the equivalence classes. These don't seem too hard to handle conservatively (removing even may-alias assignment destinations from equivalence classes, as well as non-local memory references at function calls and volatile asms), at the expense of incompleteness in debug information, or in a more lax way, at the potential expense of correctness. I still don't know exactly where to draw the line here, this note-propagation algorithm is one that I haven't completely figured out yet. > However, value equivalene does not imply location equivalence, and all > of our debug formats deal with locations of variables, except for > constants. Dwarf enables arbitrary value expressions too. There's some discussion about lvalue vs rvalue in the document, and this is also something that will take some experimenting. I'm not entirely sure where to draw the line, and I'm not entirely sure there is a perfect answer. For example, consider that a variable's home is a stack slot, but for a loop in which it's not modified, it's held in a register. Clearly in this case the correct representation is for the variable to be in both locations, both as lvalues. But if the variable is further copied to other variables or locations, these additoinal locations probably shouldn't be regarded as the same variable any more; at most, as rvalues, but maybe not even that. And then, if for some particular instruction, the variable in the register needs to be copied to a different register class, then it is correct to state that, between the copy and the use, the variable is held in all three locations. I'm still trying to figure out how to deal with overlaps between variables, deciding whether locations are to be handled as lvalues or rvalues, this sort of stuff. It is indeed a difficult problem. > IE If you translate this directly into DWARF3, as written, you will > claim that c and x_4 has the same location (since dwarf does not let > you say "it has the same value as x, but not the same location), Yeah. The $1M question is, when two variables are coalesced into one, does this mean we now have two variables sharing the same location, or do we just use the rvalue of one (which?) for the other? Isn't this like talking about body and spirit of variables? After optimization, I'm not even sure that talking about location (body) of variables make much sense. An important part of the design process was to distinguish between source-level variables and implementation-level variables. Our naming of stack slots or pseudos as variables is just a mnemonic artifact for us compiler engineers, to simplify debugging. Which variables they actually represent depends a lot on optimization decisions, perhaps even more than on the original code. So I talk about binding a source-level variable to a value, rather than to a location. Then, we figure out the locations that hold the value, what other variables do, how they overlap, maybe how they're used, and then figure out which locations should be assigned to each source variable. Tricky. The only certainty I have right now is that the annotations I've proposed enable us to keep track of values. Distributing locations in equivalence classes to different user variables is an open problem, and there are various possible solutions that could make sense, and that would be arguably correct. > if all you want is the values you compute above, on SSA, you can > easily use a lattice to compute the same values you are going to > compute as you update the annotations on the fly. This sounds interesting, but I don't quite follow what you mean. Can you elaborate, maybe give some examples? > Tracking which values *definitely represent user values* is actually > quite easy at the tree level, and doesn't require any IR modification. But is the binding of user variables to user values for specified ranges part of this representation too? I don't see that it is, and this is the gap I'm trying to fill with the debug annotations. > It may be worth doing at the RTL level, however, where the solution > requires making up program points at each definition site and > computing the dataflow problem in terms of them. /me mumbles something about RTL-SSA, that Jeff Law started working on before we took this turn into Tree-SSA. I'm sort of having to introduce some limited form of SSA in RTL to infer global equivalence classes out of the annotations, in the RTL var-tracking pass. Fun... If only we had sticked to a single IR... (No personal preference, I like both, but I'd rather not have to duplicate work so as to deal with both) -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org} Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}