From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-143117-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14930 invoked by alias); 19 Dec 2007 04:30:38 -0000
Received: (qmail 14916 invoked by uid 22791); 19 Dec 2007 04:30:37 -0000
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 19 Dec 2007 04:30:30 +0000
Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) 	by mx1.redhat.com (8.13.8/8.13.1) with ESMTP id lBJ4S6Er010156; 	Tue, 18 Dec 2007 23:28:06 -0500
Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [10.11.255.20]) 	by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lBJ4S6qf032221; 	Tue, 18 Dec 2007 23:28:06 -0500
Received: from livre.oliva.athome.lsd.ic.unicamp.br (vpn-14-186.rdu.redhat.com [10.11.14.186]) 	by pobox.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lBJ4S4gT028220; 	Tue, 18 Dec 2007 23:28:05 -0500
Received: from livre.oliva.athome.lsd.ic.unicamp.br (localhost.localdomain [127.0.0.1]) 	by livre.oliva.athome.lsd.ic.unicamp.br (8.14.2/8.14.1) with ESMTP id lBJ4S22I017098; 	Wed, 19 Dec 2007 02:28:02 -0200
Received: (from aoliva@localhost) 	by livre.oliva.athome.lsd.ic.unicamp.br (8.14.2/8.14.1/Submit) id lBJ4RxIH017097; 	Wed, 19 Dec 2007 02:27:59 -0200
To: "Daniel Berlin" <dberlin@dberlin.org>
Cc: "Diego Novillo" <dnovillo@google.com>,         "Mark Mitchell" <mark@codesourcery.com>,         "Robert Dewar" <dewar@adacore.com>,         "Ian Lance Taylor" <iant@google.com>,         "Richard Guenther" <richard.guenther@gmail.com>,         gcc-patches@gcc.gnu.org, gcc@gcc.gnu.org
Subject: Re: Designs for better debug info in GCC
References: <or4pg114h5.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4aca3dc20712161712w1133fb96qd66be0e9a0bb1716@mail.gmail.com> 	<ord4t51a6i.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4766B8E5.60500@google.com> 	<or1w9l12x2.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4766DF5C.1020802@google.com> 	<orlk7szv07.fsf@oliva.athome.lsd.ic.unicamp.br> 	<47671BF4.5050704@google.com> 	<orwsrcy5vc.fsf@oliva.athome.lsd.ic.unicamp.br> 	<orir2wxw7n.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4aca3dc20712181519rb637c16oea78bbcc18abe097@mail.gmail.com>
From: Alexandre Oliva <aoliva@redhat.com>
Errors-To: aoliva@oliva.athome.lsd.ic.unicamp.br
Date: Wed, 19 Dec 2007 04:35:00 -0000
In-Reply-To: <4aca3dc20712181519rb637c16oea78bbcc18abe097@mail.gmail.com> (Daniel Berlin's message of "Tue\, 18 Dec 2007 18\:19\:31 -0500")
Message-ID: <orr6hjtik0.fsf@oliva.athome.lsd.ic.unicamp.br>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2007-12/txt/msg00557.txt.bz2

On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

>> int c = z;
>> whatever0(c);
>> c = x;

> Because you have added information you have no way of knowing.
> How exactly did you compute that the call *definitely sets c to the
> value of z_0*, and definitely sets the value of c to x_2.

Err...  I guess you're thinking memory, global variables, alias
analysis and that sort of stuff.

None of this applies to gimple registers, which is all the annotations
are about.

Yes, aliasing, memory references and must- and may-alias do play a
role at the time of turning the annotations into equivalence classes,
when memory locations that are not stack slots allocated to gimple
regs that couldn't get hardware registers show up in the equivalence
classes.  These don't seem too hard to handle conservatively (removing
even may-alias assignment destinations from equivalence classes, as
well as non-local memory references at function calls and volatile
asms), at the expense of incompleteness in debug information, or in a
more lax way, at the potential expense of correctness.  I still don't
know exactly where to draw the line here, this note-propagation
algorithm is one that I haven't completely figured out yet.

> However, value equivalene does not imply location equivalence, and all
> of our debug formats deal with locations of variables, except for
> constants.

Dwarf enables arbitrary value expressions too.  There's some
discussion about lvalue vs rvalue in the document, and this is also
something that will take some experimenting.  I'm not entirely sure
where to draw the line, and I'm not entirely sure there is a perfect
answer.

For example, consider that a variable's home is a stack slot, but for
a loop in which it's not modified, it's held in a register.  Clearly
in this case the correct representation is for the variable to be in
both locations, both as lvalues.

But if the variable is further copied to other variables or locations,
these additoinal locations probably shouldn't be regarded as the same
variable any more; at most, as rvalues, but maybe not even that.

And then, if for some particular instruction, the variable in the
register needs to be copied to a different register class, then it is
correct to state that, between the copy and the use, the variable is
held in all three locations.

I'm still trying to figure out how to deal with overlaps between
variables, deciding whether locations are to be handled as lvalues or
rvalues, this sort of stuff.  It is indeed a difficult problem.

> IE If you translate this directly into DWARF3, as written, you will
> claim that c and x_4 has the same location (since dwarf does not let
> you say "it has the same value as x, but not the same location),

Yeah.  The $1M question is, when two variables are coalesced into one,
does this mean we now have two variables sharing the same location, or
do we just use the rvalue of one (which?) for the other?  Isn't this
like talking about body and spirit of variables?  After optimization,
I'm not even sure that talking about location (body) of variables make
much sense.

An important part of the design process was to distinguish between
source-level variables and implementation-level variables.  Our naming
of stack slots or pseudos as variables is just a mnemonic artifact for
us compiler engineers, to simplify debugging.  Which variables they
actually represent depends a lot on optimization decisions, perhaps
even more than on the original code.

So I talk about binding a source-level variable to a value, rather
than to a location.  Then, we figure out the locations that hold the
value, what other variables do, how they overlap, maybe how they're
used, and then figure out which locations should be assigned to each
source variable.  Tricky.

The only certainty I have right now is that the annotations I've
proposed enable us to keep track of values.  Distributing locations in
equivalence classes to different user variables is an open problem,
and there are various possible solutions that could make sense, and
that would be arguably correct.

> if all you want is the values you compute above, on SSA, you can
> easily use a lattice to compute the same values you are going to
> compute as you update the annotations on the fly.

This sounds interesting, but I don't quite follow what you mean.  Can
you elaborate, maybe give some examples?

> Tracking which values *definitely represent user values* is actually
> quite easy at the tree level, and doesn't require any IR modification.

But is the binding of user variables to user values for specified
ranges part of this representation too?  I don't see that it is, and
this is the gap I'm trying to fill with the debug annotations.

> It may be worth doing at the RTL level, however, where the solution
> requires making up program points at each definition site and
> computing the dataflow problem in terms of them.

/me mumbles something about RTL-SSA, that Jeff Law started working on
before we took this turn into Tree-SSA.  I'm sort of having to
introduce some limited form of SSA in RTL to infer global equivalence
classes out of the annotations, in the RTL var-tracking pass.  Fun...
If only we had sticked to a single IR...  (No personal preference, I
like both, but I'd rather not have to duplicate work so as to deal
with both)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}