From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-143140-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 28705 invoked by alias); 19 Dec 2007 18:01:23 -0000
Received: (qmail 28664 invoked by uid 22791); 19 Dec 2007 18:01:12 -0000
X-Spam-Check-By: sourceware.org
Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.45.13)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 19 Dec 2007 18:01:02 +0000
Received: from zps37.corp.google.com (zps37.corp.google.com [172.25.146.37]) 	by smtp-out.google.com with ESMTP id lBJI0vSU015738; 	Wed, 19 Dec 2007 10:00:57 -0800
Received: from localhost.localdomain.google.com (dhcp-172-18-119-235.corp.google.com [172.18.119.235]) 	(authenticated bits=0) 	by zps37.corp.google.com with ESMTP id lBJI0u1Y031114 	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); 	Wed, 19 Dec 2007 10:00:57 -0800
To: Alexandre Oliva <aoliva@redhat.com>
Cc: gcc@gcc.gnu.org
Subject: Re: Designs for better debug info in GCC
References: <or4pg114h5.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4737BF2C.70408@codesourcery.com> 	<or7iknzazd.fsf@oliva.athome.lsd.ic.unicamp.br> 	<47388599.2040701@codesourcery.com> 	<ory7cok8u6.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4749DE66.1090602@codesourcery.com> <4756B02D.9010302@google.com> 	<ory7bv3adb.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4aca3dc20712151903r46c9eceane35edb92d08240ac@mail.gmail.com> 	<oraboaero7.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4aca3dc20712161712w1133fb96qd66be0e9a0bb1716@mail.gmail.com> 	<ord4t51a6i.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4766B8E5.60500@google.com> 	<or1w9l12x2.fsf@oliva.athome.lsd.ic.unicamp.br> 	<4766DF5C.1020802@google.com> 	<orlk7szv07.fsf@oliva.athome.lsd.ic.unicamp.br> 	<47671BF4.5050704@google.com> 	<orwsrcy5vc.fsf@oliva.athome.lsd.ic.unicamp.br> 	<orir2wxw7n.fsf@oliva.athome.lsd.ic.unicamp.br> 	<m3lk7s0ym4.fsf@localhost.localdomain> 	<or8x3ruyum.fsf@oliva.athome.lsd.ic.unicamp.br>
From: Ian Lance Taylor <iant@google.com>
Date: Wed, 19 Dec 2007 18:41:00 -0000
In-Reply-To: <or8x3ruyum.fsf@oliva.athome.lsd.ic.unicamp.br>
Message-ID: <m3r6hiy37d.fsf@localhost.localdomain>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2007-12/txt/msg00580.txt.bz2

Alexandre Oliva <aoliva@redhat.com> writes:

> You snipped (skipped?) one aspect of the reasoning on why it is
> appropriate.  Of course this doesn't prove it's the best possibility,
> but I haven't seen evidence of why it isn't.

You will find it easier to demonstrate the worth of your proposal if
you act publically as though your interlocutors are people of good
will, even when it doesn't seem that way to you, and omit
interjections like "(skipped?)".  Assuming the goal is to get this
into mainline gcc, you have to convince us, not the other way around.
The first step in convincing people in this forum is not to irritate
them.


> Now, if you tell me that information about i_0 and j_2 is
> backward-propagated to the top of the function, where x and y are set
> up, I introduce say zero-initialization for i and j before probe1()
> (an actual function call, mind you), and then this representation is
> provably broken.

To be sure we are on the same page, I think your argument here is that
with this code:

int f(int x, int y) {
  int i = 0, j = 0;

  probe1();
  i = x;
  j = y;
  probe2();
  if (x < y)
    i += y;
  else
    j -= x;
  probe3();
  return g (i ,j);
}

if I set a breakpoint just before the call to probe2(), and I print
the values of 'i' and 'j', I should get the values of 'x' and 'y'.
That is, you want to emit a DWARF variable note at that point that the
value of 'i' can be found in the location corresponding to 'x'.

Of course there are no actual instructions between the calls to
probe1() and probe2().  If I use gdb's "finish" command out of
probe1(), what values should I see for 'i' and 'j' at that point?
Arguably I am now before the assignment statements, and should see '0'
and '0', the values that 'i' and 'j' have before they are changed.  Of
course, this is the same location as the breakpoint before probe2(),
and we can't see both '0'/'0' and 'x'/'y'.  So it seems to me that
this situation is actually somewhat ambiguous.  I don't see an
obviously correct answer.

Setting that aside, seeing the values 'x' and 'y' would probably be
more useful in practice, even if the other possibility is not wrong.
I think the general issue you are describing is how to handle an
assignment which appears in user code but which has been eliminated
during optimization.

You are certainly correct: the scheme I was outlining does not address
deleted assignments.

It seems to me that such eliminated assignments are inherently
ambiguous.  If the assignment is gone, then there is a point in the
generated code where the variable logically has both the old and the
new values.  I assume that the debugger can only display one value.
Which one should it be?

Your representation clearly makes a choice.  What makes it a
principled choice?  Consider a series of assignments to a local
variable, and suppose that all the assignments are deleted becaues
they are unused.  Are there dependencies between the DEBUG notes which
keep them in the right order?

One way to make a principled choice is to consider the line notes we
are going to emit with the debugging information.  Presumably we do
not have the goal of emitting correct debug information in between
line notes--e.g., when using the "stepi" command in gdb.  Our goal is
to emit correct debug information at the points where a debugger would
naturally stop--the notes for where a line starts.

I wonder whether it would be feasible for the debug info generation to
work from the assignments in the source code as generated by the
frontend.  For each assignment, we would find the corresponding line
note.  Then we would look at the right hand side, and try to identify
where that value could be found at that point in the program.  This
would be a variant of our current variable tracking pass.  I haven't
thought about this enough to know whether it would really work.


> > It is of course true that optimized code will move around
> > unpredictably, and your proposal doesn't handle that.
> 
> It handles that in that a variable will be regarded as being assigned
> to a value when execution crosses the debug stmt/insn originally
> inserted right after the assignment.  This is by design, but I realize
> now I forgot to mention this in the design document.
> 
> The idea is that, debug insns get high priority in scheduling.
> However, since they mention the assignment just before them, if the
> assignment is just moved earlier, without an intervening scheduling
> barrier, then the debug instruction will follow it.  If the assignment
> is removed, then the debug insn can be legitimately be move up to the
> point where the assignment, if remaining, might have been moved up to.
> However, if the assignment is moved to a separate basic block, say out
> of a loop or a conditional, then we don't want the debug insn to move
> with it: such that hoisting and commonizing are regarded as setting
> temporaries, and the value is only "committed" to the variable if we
> get to the point where the assignment would take place.

That will only work correctly if sched-deps.c introduces dependencies
between debug insns and real insns.  Otherwise, debug insns will move
ahead of real insns which change their values.  If you introduce those
dependencies, I don't understand how you will avoid changing the
schedulers behaviour in the presence of debug insns.  How did you work
around that problem?


> >> Testing for accuracy and completeness of debug information can be best
> >> accomplished using a debugging environment.
> 
> > Of course this is very unsatisfactory without an automated testsuite.
> 
> Err...  I didn't say the testing through a debugging environment
> wouldn't be automated.  My plan is to use something along the lines of
> the GDB testsuite scripts, but whether to use GDB or some other
> debugging or monitoring infrastructure is a tiny implementation detail
> that I haven't worried about at all.  The basic idea is to script the
> inspection of variables and verify that the obtained values are the
> expected ones, or that variables are defensibly unavailable at the
> inspection points.

Personally, I would like to see that testsuite first.  That will give
us an operational definition to aim for, rather than a theoretical
discussion which I find to be ambiguous.  And it will avoid the
problem of turning the testsuite into a regression testsuite rather
than an accuracy testsuite.  But of course I'm not doing the work.

Ian