From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-141992-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 4958 invoked by alias); 12 Nov 2007 17:54:15 -0000
Received: (qmail 4945 invoked by uid 22791); 12 Nov 2007 17:54:14 -0000
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 12 Nov 2007 17:54:12 +0000
Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) 	by mx1.redhat.com (8.13.8/8.13.1) with ESMTP id lACHq9aJ008267; 	Mon, 12 Nov 2007 12:52:09 -0500
Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [10.11.255.20]) 	by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lACHq8K0024427; 	Mon, 12 Nov 2007 12:52:08 -0500
Received: from livre.oliva.athome.lsd.ic.unicamp.br (vpn-248-70.boston.redhat.com [10.13.248.70]) 	by pobox.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lACHq5VT006464; 	Mon, 12 Nov 2007 12:52:06 -0500
Received: from livre.oliva.athome.lsd.ic.unicamp.br (localhost.localdomain [127.0.0.1]) 	by livre.oliva.athome.lsd.ic.unicamp.br (8.14.1/8.13.8) with ESMTP id lACHq38Z008089; 	Mon, 12 Nov 2007 15:52:03 -0200
Received: (from aoliva@localhost) 	by livre.oliva.athome.lsd.ic.unicamp.br (8.14.1/8.13.5/Submit) id lACHq236008088; 	Mon, 12 Nov 2007 15:52:02 -0200
To: Mark Mitchell <mark@codesourcery.com>
Cc: Ian Lance Taylor <iant@google.com>,         Richard Guenther <richard.guenther@gmail.com>, gcc-patches@gcc.gnu.org,         gcc@gcc.gnu.org
Subject: Re: Designs for better debug info in GCC
References: <or4pg114h5.fsf@oliva.athome.lsd.ic.unicamp.br> 	<84fc9c000711050327x74845c78ya18a3329fcf9e4d2@mail.gmail.com> 	<or7ikuv6ex.fsf_-_@oliva.athome.lsd.ic.unicamp.br> 	<m3d4umrq04.fsf@localhost.localdomain> 	<orfxzhuazm.fsf@oliva.athome.lsd.ic.unicamp.br> 	<m3mytpr7fr.fsf@localhost.localdomain> 	<4732519C.6070802@codesourcery.com> 	<or640dfg4g.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4733554D.4040402@codesourcery.com> 	<or8x586ymz.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4737BBBF.3080400@codesourcery.com>
From: Alexandre Oliva <aoliva@redhat.com>
Errors-To: aoliva@oliva.athome.lsd.ic.unicamp.br
In-Reply-To: <4737BBBF.3080400@codesourcery.com> (Mark Mitchell's message of "Sun\, 11 Nov 2007 18\:34\:39 -0800")
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)
Date: Mon, 12 Nov 2007 18:22:00 -0000
Message-ID: <oroddzxsfy.fsf@oliva.athome.lsd.ic.unicamp.br>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2007-11/txt/msg00322.txt.bz2

On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> (We may already have lost some information, though.  For example, given:

>   i = 3;
>   f(i);
>   i = 7;
>   i = 2;
>   g(i);

> we may well have lost the "i = 7" assignment, so "i" might appear to
> have the value "3" right before we assign "2" to it, if we were to
> generate debug information right then.)

Yup.  And even if we could somehow preserve that information, there
wouldn't be any code to attach that information to.  There might be
uses for empty-range locations in debug information, but I can't think
of any.  Can anyone?  It's something we could try to preserve, and
with my design it would be quite easy to do so, but unless it's useful
for some purpose, I think we could just do away with it.

> The reason I want to make that assumption is that the part of this where
> the representation is in question is once we reach RTL, right?

I'm not sure what is in question at all.  I've proposed a design to
preserve debug information throughout compilation.  Other designs on
the table differ both in tree and rtl levels, and in the potential
quality and correctness of the debug information they can produce.

> I guess I still don't really understand what you're doing at the RTL
> level.

It's no different, except that instead of a DEBUG_STMT it's a
DEBUG_INSN, with the TREE exprssion converted to an RTL expression.

/me mumbles something about the silliness of keeping two completely
different yet nearly-isomorphic internal representations for
statements/instructions.

> What I don't understand is how it's actually going to work.  What
> are the notes you're inserting?

They're always of the form

  DEBUG user-variable = expression

where DEBUG stands for a DEBUG_STMT or a DEBUG_INSN, user-variable is
a tree that represents the user variable, and expression is a TREE or
RTL (depending on which representation we're in) that evaluates to the
value the user-variable is expected to hold at that point in the
program.

> Do they just say "here is an RTL expression for computing the value of
> user-variable V at this point in the program"?

In RTL, yes.

> Why does it make sense to have that, rather than notes on
> instructions that say what affect the instruction has on user
> variables?

Few instructions need such notes, so the proposal of growing SET by
33% doesn't quite appeal to me.  And then, optimizations move
instructions around, but I don't think they should move the assignment
notes around, for they should reflect the structure of the source
program, rather than the mangled representation that the optimizers
turn it into.

That said, growing SET to add to it a list of variables (or components
thereof) that the variable is assigned to could be made to work, to
some extent.  But when you optimize away such a set, you'd still have
to keep the note around, so it's not clear to me that adding code all
over to maintain the notes in place when the SETs go away or are
juggled around would bring us any advantage.  It would be just a
redundant notation for what the note would already convey, so it just
brings complexity for no actual advantage.

To make it concrete, consider that your example above could have become:

(set (reg i) (const_int 3)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem f))
(set (reg i) (const_int 7)) ;; assigns to i
(set (reg i) (const_int 2)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem g))

could have been optimized to:

(set (reg P1) (const_int 3))
(call (mem f))
(set (reg P1) (const_int 2))
(call (mem g))

and then you wouldn't have any debug information left for variable i.

whereas with the notes I propose, you'd be left with:

(debug i (const_int 3))
(set (reg P1) (const_int 3))
(call (mem f))
(debug i (const_int 7)) ;; may be dropped, as discussed above
(debug i (const_int 2))
(set (reg P1) (const_int 2))
(call (mem g))

even if no register at all ends up allocated for i.  And if there were
uses of i that followed the assignment to 7, to which the constant
could be propagated, you'd still be left with the annotation to
indicate that i has a new value at the correct point.

> As a meta-question, have you or anyone else on the list looked at the
> literature (IEEE/ACM, etc.) or how other compilers handle these problems?

I couldn't find much information about other compilers, but I've see a
number of (mostly dated) articles and US patents.  In fact, I'm
particularly concerned that US Patent 6091896 covers the design
proposed by Richi, that involves annotating the instructions
themselves.  I believe the independent, stand-alone annotations I
propose escape the patent claims.

That said, if anyone knows of articles that could be of use, I'd love
to hear about them.  It's not like my research was exhaustive.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}