From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4958 invoked by alias); 12 Nov 2007 17:54:15 -0000 Received: (qmail 4945 invoked by uid 22791); 12 Nov 2007 17:54:14 -0000 X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 12 Nov 2007 17:54:12 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.1) with ESMTP id lACHq9aJ008267; Mon, 12 Nov 2007 12:52:09 -0500 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [10.11.255.20]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lACHq8K0024427; Mon, 12 Nov 2007 12:52:08 -0500 Received: from livre.oliva.athome.lsd.ic.unicamp.br (vpn-248-70.boston.redhat.com [10.13.248.70]) by pobox.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lACHq5VT006464; Mon, 12 Nov 2007 12:52:06 -0500 Received: from livre.oliva.athome.lsd.ic.unicamp.br (localhost.localdomain [127.0.0.1]) by livre.oliva.athome.lsd.ic.unicamp.br (8.14.1/8.13.8) with ESMTP id lACHq38Z008089; Mon, 12 Nov 2007 15:52:03 -0200 Received: (from aoliva@localhost) by livre.oliva.athome.lsd.ic.unicamp.br (8.14.1/8.13.5/Submit) id lACHq236008088; Mon, 12 Nov 2007 15:52:02 -0200 To: Mark Mitchell Cc: Ian Lance Taylor , Richard Guenther , gcc-patches@gcc.gnu.org, gcc@gcc.gnu.org Subject: Re: Designs for better debug info in GCC References: <84fc9c000711050327x74845c78ya18a3329fcf9e4d2@mail.gmail.com> <4732519C.6070802@codesourcery.com> <4733554D.4040402@codesourcery.com> <4737BBBF.3080400@codesourcery.com> From: Alexandre Oliva Errors-To: aoliva@oliva.athome.lsd.ic.unicamp.br In-Reply-To: <4737BBBF.3080400@codesourcery.com> (Mark Mitchell's message of "Sun\, 11 Nov 2007 18\:34\:39 -0800") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) Date: Mon, 12 Nov 2007 18:22:00 -0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2007-11/txt/msg00322.txt.bz2 On Nov 12, 2007, Mark Mitchell wrote: > (We may already have lost some information, though. For example, given: > i = 3; > f(i); > i = 7; > i = 2; > g(i); > we may well have lost the "i = 7" assignment, so "i" might appear to > have the value "3" right before we assign "2" to it, if we were to > generate debug information right then.) Yup. And even if we could somehow preserve that information, there wouldn't be any code to attach that information to. There might be uses for empty-range locations in debug information, but I can't think of any. Can anyone? It's something we could try to preserve, and with my design it would be quite easy to do so, but unless it's useful for some purpose, I think we could just do away with it. > The reason I want to make that assumption is that the part of this where > the representation is in question is once we reach RTL, right? I'm not sure what is in question at all. I've proposed a design to preserve debug information throughout compilation. Other designs on the table differ both in tree and rtl levels, and in the potential quality and correctness of the debug information they can produce. > I guess I still don't really understand what you're doing at the RTL > level. It's no different, except that instead of a DEBUG_STMT it's a DEBUG_INSN, with the TREE exprssion converted to an RTL expression. /me mumbles something about the silliness of keeping two completely different yet nearly-isomorphic internal representations for statements/instructions. > What I don't understand is how it's actually going to work. What > are the notes you're inserting? They're always of the form DEBUG user-variable = expression where DEBUG stands for a DEBUG_STMT or a DEBUG_INSN, user-variable is a tree that represents the user variable, and expression is a TREE or RTL (depending on which representation we're in) that evaluates to the value the user-variable is expected to hold at that point in the program. > Do they just say "here is an RTL expression for computing the value of > user-variable V at this point in the program"? In RTL, yes. > Why does it make sense to have that, rather than notes on > instructions that say what affect the instruction has on user > variables? Few instructions need such notes, so the proposal of growing SET by 33% doesn't quite appeal to me. And then, optimizations move instructions around, but I don't think they should move the assignment notes around, for they should reflect the structure of the source program, rather than the mangled representation that the optimizers turn it into. That said, growing SET to add to it a list of variables (or components thereof) that the variable is assigned to could be made to work, to some extent. But when you optimize away such a set, you'd still have to keep the note around, so it's not clear to me that adding code all over to maintain the notes in place when the SETs go away or are juggled around would bring us any advantage. It would be just a redundant notation for what the note would already convey, so it just brings complexity for no actual advantage. To make it concrete, consider that your example above could have become: (set (reg i) (const_int 3)) ;; assigns to i (set (reg P1) (reg i)) (call (mem f)) (set (reg i) (const_int 7)) ;; assigns to i (set (reg i) (const_int 2)) ;; assigns to i (set (reg P1) (reg i)) (call (mem g)) could have been optimized to: (set (reg P1) (const_int 3)) (call (mem f)) (set (reg P1) (const_int 2)) (call (mem g)) and then you wouldn't have any debug information left for variable i. whereas with the notes I propose, you'd be left with: (debug i (const_int 3)) (set (reg P1) (const_int 3)) (call (mem f)) (debug i (const_int 7)) ;; may be dropped, as discussed above (debug i (const_int 2)) (set (reg P1) (const_int 2)) (call (mem g)) even if no register at all ends up allocated for i. And if there were uses of i that followed the assignment to 7, to which the constant could be propagated, you'd still be left with the annotation to indicate that i has a new value at the correct point. > As a meta-question, have you or anyone else on the list looked at the > literature (IEEE/ACM, etc.) or how other compilers handle these problems? I couldn't find much information about other compilers, but I've see a number of (mostly dated) articles and US patents. In fact, I'm particularly concerned that US Patent 6091896 covers the design proposed by Richi, that involves annotating the instructions themselves. I believe the independent, stand-alone annotations I propose escape the patent claims. That said, if anyone knows of articles that could be of use, I'd love to hear about them. It's not like my research was exhaustive. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org} Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}