Re: Designs for better debug info in GCC

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Designs for better debug info in GCC
       [not found]                       ` <m37ij64mwt.fsf@localhost.localdomain.suse.lists.egcs>
@ 2007-12-23  0:52                         ` Andi Kleen
  2007-12-23  1:32                           ` Daniel Jacobowitz
  0 siblings, 1 reply; 189+ messages in thread
From: Andi Kleen @ 2007-12-23  0:52 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

Ian Lance Taylor <iant@google.com> writes:

> I'm in favor of implementing this.  

Yes it would be great.

> As I'm sure you know, the GNU
> binutils 

Actually binutils only barely supports debuginfo. AFAIK
objcopy is the tool tool that knows anything about them.

> and gdb already support using a single separate file for
> debugging information.

That does not solve that problem because all that data still
has to be copied. In the current setup even two times
(.o -> exe -> objcopy to debuginfo and then another strip
which is another partial write). 

I assume that copying phase is the problem people are complaining 
about and debuginfo makes it even worse now.

> well during development for a program which is normally run on the
> same system on which it is developed.  It doesn't help much when the
> program must be run on a different system--it's possible to use
> gdbserver, but awkward.  And it doesn't help at all when it is
> sometimes necessary to debug executables which have been built and
> distributed widely.

The Linux distributions have debuginfo rpms that work
fine for that. But it does not solve the link time IO problem.

-Andi

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-23  0:52                         ` Designs for better debug info in GCC Andi Kleen
@ 2007-12-23  1:32                           ` Daniel Jacobowitz
  2007-12-23  1:36                             ` Andi Kleen
  0 siblings, 1 reply; 189+ messages in thread
From: Daniel Jacobowitz @ 2007-12-23  1:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ian Lance Taylor, gcc

On Sat, Dec 22, 2007 at 11:49:23PM +0100, Andi Kleen wrote:
> > As I'm sure you know, the GNU
> > binutils 
> 
> Actually binutils only barely supports debuginfo. AFAIK
> objcopy is the tool tool that knows anything about them.

I don't know why you say that.  ld knows a bit about debugging
sections, and how to read .debug_line for errors; objdump knows how to
decode debug info, as does readelf; strip knows how to remove it;
objcopy how to copy and separate it.

> The Linux distributions have debuginfo rpms that work
> fine for that. But it does not solve the link time IO problem.

FWIW, in the paragraph you were responding to Ian was talking about
the Darwin system, not the GNU one.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-23  1:32                           ` Daniel Jacobowitz
@ 2007-12-23  1:36                             ` Andi Kleen
  2007-12-23  5:55                               ` Daniel Jacobowitz
  0 siblings, 1 reply; 189+ messages in thread
From: Andi Kleen @ 2007-12-23  1:36 UTC (permalink / raw)
  To: Andi Kleen, Ian Lance Taylor, gcc

> I don't know why you say that.  ld knows a bit about debugging
> sections, and how to read .debug_line for errors; objdump knows how to
> decode debug info, as does readelf; strip knows how to remove it;
> objcopy how to copy and separate it.

Sorry I mean separate debuginfo, as Ian was refering too.

I actually had a patch once to hack it into objdump for -S and
also into addr2line but it was somewhat ugly and still 
had some problems and I didn't submit it. 

-Andi

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-23  1:36                             ` Andi Kleen
@ 2007-12-23  5:55                               ` Daniel Jacobowitz
  0 siblings, 0 replies; 189+ messages in thread
From: Daniel Jacobowitz @ 2007-12-23  5:55 UTC (permalink / raw)
  To: gcc

On Sun, Dec 23, 2007 at 02:33:44AM +0100, Andi Kleen wrote:
> > I don't know why you say that.  ld knows a bit about debugging
> > sections, and how to read .debug_line for errors; objdump knows how to
> > decode debug info, as does readelf; strip knows how to remove it;
> > objcopy how to copy and separate it.
> 
> Sorry I mean separate debuginfo, as Ian was refering too.
> 
> I actually had a patch once to hack it into objdump for -S and
> also into addr2line but it was somewhat ugly and still 
> had some problems and I didn't submit it. 

Oh, I see.  Yes, only BFD and GDB know much about it.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
       [not found]                                                             ` <y0my7baigdf.fsf@ton.toronto.redhat.com>
@ 2008-01-01 17:31                                                               ` Richard Guenther
  0 siblings, 0 replies; 189+ messages in thread
From: Richard Guenther @ 2008-01-01 17:31 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc

On Jan 1, 2008 12:39 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
> "Richard Guenther" <richard.guenther@gmail.com> writes:
>
> > [...]  I chose to ignore this problem and say we debug the optimized
> > program, not the source as far as life ranges are concerned. [...]
>
> Yes, and this choice has a certain pragmatism.  However, it seems to
> miss the basic observation that what drives debugging are the
> programmer-user's needs, not the compiler-writer's needs.  A
> programmer-user is primarily interested in his source, and I bet most
> would prefer not to think about optimization artifacts at all.  It
> would be a disservice to amplify the visibility of the latter.

While I would generally agree with you, if you look at what debugging
information could be retained at higher optimization levels you might
want to reconsider.  The less optimizations are applied to the program,
the closer 'source level debug info' and 'optimized program debug info'
are, so I believe this pragmatism is the right thing to do.  But of course
only real-life testing will tell.

Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21  1:54                                                                         ` Ian Lance Taylor
       [not found]                                                                           ` <orprx0izhp.fsf@oliva.atho! me.lsd.ic.unicamp.br>
  2007-12-21  2:11                                                                           ` Alexandre Oliva
@ 2007-12-31 19:39                                                                           ` Alexandre Oliva
  2 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-31 19:39 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: janis187, gcc

On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Right, which will significantly increase debugging size as you add two
> more notes around many lines.

FWIW, I've just got powerpc64-linux-gnu to pass bootstrap-debug and
bootstrap4-debug/-g0 (i.e., all host and target libraries pass
compare-debug when compiled with -g0 and -g2
-fvar-tracking-assignments).  I did
bootstrap4-debug/-fno-var-tracking-assignments as well, for comparison
purposes.  Here are the total sizes:

1487400 target libs at -g0
2239140 target libs with -g2 -fno-var-tracking-assignments
2190176 target libs with -g2 -fvar-tracking-assignments

So, with the new infrastructure in place, debug info gets smaller.  I
haven't evaluated its quality yet (e.g., the compiler may be losing
track of where variables are too often).  Also, the compiler is still
missing the improved version of var-tracking to keep track of all
copies of user variable values, which is expected to grow debug info.
But at least at this point it doesn't seem like the approach is
hopeless.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:22                                                         ` Ian Lance Taylor
  2007-12-18 16:28                                                           ` Robert Dewar
  2007-12-19  4:30                                                           ` Alexandre Oliva
@ 2007-12-31 16:55                                                           ` Richard Guenther
       [not found]                                                             ` <y0my7baigdf.fsf@ton.toronto.redhat.com>
  2 siblings, 1 reply; 189+ messages in thread
From: Richard Guenther @ 2007-12-31 16:55 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, gcc

On 18 Dec 2007 08:13:55 -0800, Ian Lance Taylor <iant@google.com> wrote:
> Alexandre Oliva <aoliva@redhat.com> writes:
>
> >       A plan to fix local variable debug information in GCC
> >
> >               by Alexandre Oliva <aoliva@redhat.com>
> >
> >                          2007-12-18 draft
>
> Thank you for writing this.  It makes an enormous difference.

Indeed.

>
> > == Goals
>
> I note that you don't say anything about the other big problem with
> debugging optimized code, which is that the debugger jumps around all
> over the place.  That is fine, of course.
>
>
> > Once this is established, a possible representation becomes almost
> > obvious: statements (in trees) or instructions (in rtl) that assert,
> > to the variable tracker, that a user variable or member is represented
> > by a given expression:
> >
> >   # DEBUG var expr
> >
> > By var, we mean a tree expression that denotes a user variable, for
> > now.  We envision trivially extending it to support components of
> > variables in the future.
>
> While you say that this is almost obvious, it still isn't obvious at
> all to me.  You consider trees and RTL together, but I don't see why
> that is appropriate.
>
> My biggest concern at the tree level is the significantly increased
> memory usage and the introduction of a sort of a weak pointer to
> values.  Since DEBUG statements shouldn't interfere with
> optimizations, we need to explicitly ignore them in things like
> has_single_use.  But since our data structures need to be coherent, we
> can not ignore them when we actually eliminate SSA names.  That seems
> sort of complicated.
>
> In SSA form it seems very natural to provide a set of associations
> with user variables for each GIMPLE variable.  Since the GIMPLE
> variables never change, these associations never change.  We have to
> get them right when we create a new GIMPLE variable and when we
> eliminate a GIMPLE variable.  While this obviously requires some work,
> to me it seems less intrusive than the notion of weak references.

This is what we do on the var-mappings-branch.  One obvious thing is
that SSA form doesn't help you to track the reverse of VAR = cst.  But
of course it's easy to do another reverse mapping from constants to vars.

A similar reverse mapping can be done for SET insns on RTL.

> By the way, we shouldn't confuse the source code live range of the
> variable with the annotations on the GIMPLE variables.  That will get
> us into the mapping of source code lines to optimized code.  It is of
> course true that optimized code will move around unpredictably, and
> your proposal doesn't handle that.  I don't see it as a flaw that it
> will be possible to view user variables outside of their source code
> range.

I think this is where Alexandes approach _might_ work.  (It at least
produces loads of funny DEBUG_INSNs ...)  I chose to ignore this
problem and say we debug the optimized program, not the source
as far as life ranges are concerned.

> In any case, RTL is different.  We can't reasonably associate
> annotations with pseudo-registers, because they change during the
> function.  The obvious choices are to annotate SET statements, or to
> annotate insns, or to introduce a DEBUG insn as you suggest.  It's not

We "annotate" SET insns by adding a bitmap argument to track user
variables it sets.  That seems to work nicely.

> obvious to me why a DEBUG insn is superior to a REG_NOTE attacked to
> an insn.  The problem with DEBUG insns is of course that the RTL code
> is very sensitive to new insns, and also the additional memory usage.
> You discuss those, but it's not obvious to me why your proposed
> solution is the best one.
>
>
> > Testing for accuracy and completeness of debug information can be best
> > accomplished using a debugging environment.
>
> Of course this is very unsatisfactory without an automated testsuite.

I was thinking of pulling in the gdb testsuite harness into gcc...

Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 20:43                                             ` Alexandre Oliva
  2007-12-17 21:20                                               ` Diego Novillo
@ 2007-12-31 14:45                                               ` Richard Guenther
  1 sibling, 0 replies; 189+ messages in thread
From: Richard Guenther @ 2007-12-31 14:45 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Daniel Berlin, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, gcc-patches, gcc, Michael Matz

On Dec 17, 2007 9:28 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:
>
> > On 12/17/07 12:51, Alexandre Oliva wrote:
> >> I guess I'm to blame, for having naÃ¯vely put the code out without as
> >> much as a design and goals document
>
> > Yes, you are.
>
> Wow, thanks.  At least we agree on something! ;-)
>
> > You need to provide such a document now.
>
> Can't I instead provide it when it's ready?
>
> You know, it wasn't me who asked to have the thing developed in the
> open.  I didn't push it out just so that people who didn't want to
> understand it could beat on it before it was ready to defend itself.
> I put it out because there was an offer for contribution.

Yeah - that was me...

Fact is we had a discussion about debug information earlier this year from which
I took the conclusion that most people would appreciate an on-the-side
representation
to address the most limiting design issue of GCCs tree representation (only one
variable per SSA_NAME to track).

So I had the impression you worked in that direction and offered help.  Now, you
seemed to have come to the conclusion that this approach would not help your
goal and started on a different route.  Now the "mistake" maybe was to
before starting
this not to revive the former discussion based on your findings and
elaborate on your
goals.  (I realize this is the way development for GCC works most of
the time, but
this is not what I consider good practice for open source development)

Now - I think your goal is valid, and the choice of implementation might even be
the best one for it.  But we (the GCC community) have not yet decided if the
combination of "your goal" and "this best implementation" is what we want.
(I haven't decided myself either ;))

So my suggestion for you is to continue with your implementation and produce a
white paper about your design (which you ideally would present during the next
GCC summit, where we should do a discussion on this topic in some form).

We (myself and Matz) will continue to implement what is "our goal" (because we
internally committed to it, and to see limitations or problems with
the approach)
and possibly also will present about its outcome at the summit.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-22  0:07                                                                                     ` Ian Lance Taylor
  2007-12-22  0:09                                                                                       ` Andrew Pinski
@ 2007-12-23 17:40                                                                                       ` Frank Ch. Eigler
  1 sibling, 0 replies; 189+ messages in thread
From: Frank Ch. Eigler @ 2007-12-23 17:40 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, gcc

Ian Lance Taylor <iant@google.com> writes:

> [...]  Because a compiler that generates incorrect instructions is
> completely useless for all users.

Surely you overstate this: gcc has always included a generous serving
of incorrect-code-generation bugs.

> A compiler that generates incorrect debug information, or no debug
> information at all, or debug information which is randomly correct
> and incorrect, is still quite useful for many users.  Evidence: gcc
> today.

Indeed.

> [...] Like it or not, the large size of debug information is a
> serious issue for many people.

It is profoundly ironic that, despite the great bulk of this data, its
quality has severe enough blemishes that people can't justify
installing/using it.  If it were a little larger but significantly
more complete/correct, perhaps the cost/benefit judgemment would swing
around.

Coincidentally, we (several RH engineers) are working on dwarf data
compression.

- FChE

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-22 11:44                                                                                           ` Chris Lattner
@ 2007-12-22 21:27                                                                                             ` Ian Lance Taylor
  0 siblings, 0 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-22 21:27 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Andrew Pinski, Alexandre Oliva, gcc

Chris Lattner <clattner@apple.com> writes:

> If debug info size and link time is really such a serious problem for
> so many users, perhaps people developing the gnu toolchain should
> investigate an extension like this.

I'm in favor of implementing this.  As I'm sure you know, the GNU
binutils and gdb already support using a single separate file for
debugging information.

But these approaches do not solve all problems.  The technique works
well during development for a program which is normally run on the
same system on which it is developed.  It doesn't help much when the
program must be run on a different system--it's possible to use
gdbserver, but awkward.  And it doesn't help at all when it is
sometimes necessary to debug executables which have been built and
distributed widely.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-22 13:33                                                                                     ` Andrew Haley
@ 2007-12-22 17:11                                                                                       ` Robert Dewar
  0 siblings, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-12-22 17:11 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Alexandre Oliva, Ian Lance Taylor, gcc

Andrew Haley wrote:

> We know you don't understand, but that isn't likely to change.  Would
> it not surely be better to cease this pointless argument and get on
> with the job of improving debuginfo?  This absolutist position you
> seem to have adopted isn't helping.
> 
> If we could talk about "better" and "worse" rather than "correct" and
> "incorrrect" we'd get much further.

I very much agree. Everyone is in favor of better debug information
if it is not too costly, we won't really see whether it is too costly
until we get some real data. But trying to argue for this in terms
of standards and conformance is a real red herring, the proper argument
for any improvement to debug information is utility.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21 22:46                                                                                   ` Alexandre Oliva
  2007-12-22  0:07                                                                                     ` Ian Lance Taylor
  2007-12-22  7:38                                                                                     ` Robert Dewar
@ 2007-12-22 13:33                                                                                     ` Andrew Haley
  2007-12-22 17:11                                                                                       ` Robert Dewar
  2 siblings, 1 reply; 189+ messages in thread
From: Andrew Haley @ 2007-12-22 13:33 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, gcc

Alexandre Oliva writes:
 > On Dec 21, 2007, Ian Lance Taylor <iant@google.com> wrote:
 > 
 > 
 > > Alexandre, I have to say that in my opinion absurd arguments like this
 > > do not strengthen your position.
 > 
 > I'm sorry that you feel that way, but I don't understand why you and
 > so many others apply different compliance standards to debug
 > information.

We know you don't understand, but that isn't likely to change.  Would
it not surely be better to cease this pointless argument and get on
with the job of improving debuginfo?  This absolutist position you
seem to have adopted isn't helping.

If we could talk about "better" and "worse" rather than "correct" and
"incorrrect" we'd get much further.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-22  3:16                                                                                         ` Andrew Pinski
@ 2007-12-22 11:44                                                                                           ` Chris Lattner
  2007-12-22 21:27                                                                                             ` Ian Lance Taylor
  0 siblings, 1 reply; 189+ messages in thread
From: Chris Lattner @ 2007-12-22 11:44 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc

On Dec 21, 2007, at 4:09 PM, Andrew Pinski wrote:

> On 12/21/07, Andrew Pinski <pinskia@gmail.com> wrote:
>> On 21 Dec 2007 16:02:38 -0800, Ian Lance Taylor <iant@google.com>  
>> wrote:
>>> Like it or not, the large size of debug information is a serious  
>>> issue
>>> for many people.
>>
>> Link times are hurt by large size of debugging information.  I have
>> many many complaints from some users of the PS3 toolchain that link
>> times are huge and from my investigation, found the size of the
>> debugging info contributed to most (if not all) of the increased link
>> times.
>
> I forgot to mention the increase in debugging information about
> prologue and eplogue (made by RTH) between 4.0.2 and 4.1.1 made the
> link time increase a huge amount.

It's worth noting that not all systems store debug information in  
executables.  On Mac OS 10.5, the linker leaves debug info in the .o  
files instead of copying it into the executable.  As such, size of  
debug info doesn't significantly affect link-time or executable size  
(but it can obviously affect time to launch the debugger).  I'm sure  
there are other systems that do similar things.

If debug info size and link time is really such a serious problem for  
so many users, perhaps people developing the gnu toolchain should  
investigate an extension like this.

-Chris

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21 22:46                                                                                   ` Alexandre Oliva
  2007-12-22  0:07                                                                                     ` Ian Lance Taylor
@ 2007-12-22  7:38                                                                                     ` Robert Dewar
  2007-12-22 13:33                                                                                     ` Andrew Haley
  2 siblings, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-12-22  7:38 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, gcc

Alexandre Oliva wrote:

> I'm sorry that you feel that way, but I don't understand why you and
> so many others apply different compliance standards to debug
> information.  Why do you regard compiler output that causes systems to
> fail because they process incorrect debug information as any more
> acceptable than compiler output that causes system to fail because
> they process incorrect instructions?

Incorrect debug output does not cause systems to fail in any
reasonable development methodology. It is simply a nuisance.
After all you can perfectly well develop an application without
a debugger at all if you have to, but you have to have correct
code being generated or things are MUCH harder.

I am all in favor of getting the debug information as accurate as
possible, but I agree with others who feel that this excessive
rhetoric is damaging the cause of achieving this. If you don't
understand why different compliance standards are applied in
the two cases, then there is something major you are missing.

> Just so that you, who don't care so much about the correctness of this
> information yet, can shave off some bytes from your object files?  Why
> shouldn't you use an option such as -gimme-just-what-I-need-no-more or
> -fsck-up-my-debug-info-I-dont-care-about-standards instead?

I am beginning to think this is a lost cause if you persist in
taking this flippant attitude, and fail to understand the basis
of the real concerns about what you propose.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-22  0:09                                                                                       ` Andrew Pinski
@ 2007-12-22  3:16                                                                                         ` Andrew Pinski
  2007-12-22 11:44                                                                                           ` Chris Lattner
  0 siblings, 1 reply; 189+ messages in thread
From: Andrew Pinski @ 2007-12-22  3:16 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, gcc

On 12/21/07, Andrew Pinski <pinskia@gmail.com> wrote:
> On 21 Dec 2007 16:02:38 -0800, Ian Lance Taylor <iant@google.com> wrote:
> > Like it or not, the large size of debug information is a serious issue
> > for many people.
>
> Link times are hurt by large size of debugging information.  I have
> many many complaints from some users of the PS3 toolchain that link
> times are huge and from my investigation, found the size of the
> debugging info contributed to most (if not all) of the increased link
> times.

I forgot to mention the increase in debugging information about
prologue and eplogue (made by RTH) between 4.0.2 and 4.1.1 made the
link time increase a huge amount.

This just an example of where increased debugging information hurts
developmental time.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-22  0:07                                                                                     ` Ian Lance Taylor
@ 2007-12-22  0:09                                                                                       ` Andrew Pinski
  2007-12-22  3:16                                                                                         ` Andrew Pinski
  2007-12-23 17:40                                                                                       ` Frank Ch. Eigler
  1 sibling, 1 reply; 189+ messages in thread
From: Andrew Pinski @ 2007-12-22  0:09 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, gcc

On 21 Dec 2007 16:02:38 -0800, Ian Lance Taylor <iant@google.com> wrote:
> Like it or not, the large size of debug information is a serious issue
> for many people.

Link times are hurt by large size of debugging information.  I have
many many complaints from some users of the PS3 toolchain that link
times are huge and from my investigation, found the size of the
debugging info contributed to most (if not all) of the increased link
times.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21 22:46                                                                                   ` Alexandre Oliva
@ 2007-12-22  0:07                                                                                     ` Ian Lance Taylor
  2007-12-22  0:09                                                                                       ` Andrew Pinski
  2007-12-23 17:40                                                                                       ` Frank Ch. Eigler
  2007-12-22  7:38                                                                                     ` Robert Dewar
  2007-12-22 13:33                                                                                     ` Andrew Haley
  2 siblings, 2 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-22  0:07 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > Alexandre, I have to say that in my opinion absurd arguments like this
> > do not strengthen your position.
> 
> I'm sorry that you feel that way, but I don't understand why you and
> so many others apply different compliance standards to debug
> information.  Why do you regard compiler output that causes systems to
> fail because they process incorrect debug information as any more
> acceptable than compiler output that causes system to fail because
> they process incorrect instructions?

Because a compiler that generates incorrect instructions is completely
useless for all users.  A compiler that generates incorrect debug
information, or no debug information at all, or debug information
which is randomly correct and incorrect, is still quite useful for
many users.  Evidence: gcc today.

I have to say that I find your arguments along these lines to be so
absurd as to be nearly incomprehensible.  gcc does not exist to adhere
to standards.  It exists to provide a service to its users.  I and so
many others apply different compliance standards to debug information
because that is appropriate for our user base.

> Do you just not see how serious the problem is, or just not care about
> the growing number of tools and people who need the information to be
> standard-compliant?

Do you just not see that your false dichotomies have nothing to do
with the real usage of gcc in the real world?  Is anybody out there
saying that we should absolutely not improve the debug information?
No, of course not.  All serious people are in favor of improving the
debug information.  We are just saying that for debug information it
is appropriate to weigh different user needs.  Those needs include
compilation time and size of generated files.  This is not true for
correctness of generated code.  There is no such weighing in that
area; the generated code must be correct or the compiler is completely
useless.

> > What we sacrifice in these cases is the ability to sometimes get a
> > correct view of at most two or three local variables being modified in
> > the exact statement being executed at the time of the signal.
> 
> Aren't you forgetting that complex statements and scheduling can make
> it much worse than this?  In fact, that there can be very many "active
> statements" at any single point in the code (and this is even more
> critical on some architectures such as IA64), and that, in these
> cases, your suggested notion of "line notes" is pretty much
> meaningless, for they will be present between pretty much every pair
> of statements anyway?

Fortunately not every single instruction is going to change a user
visible variable.  But, yes, that is a potential issue.  We will have
to see what the effect is on debug information size.

> > Moreover, a tool which reads the debug information can determine that
> > it is looking at instructions in the middle of the statement, and that
> > therefore the known locations of local variables need not be correct.
> > So in fact we don't even lose the ability to get a correct view.  What
> > we lose is the ability to in some cases see a value which actually is
> > available, but which the debugging tool can not prove to be available.
> 
> Feel like proposing this "relaxed mode" to the DWARF standardization
> committee?  At least an annotation that tells debug info consumers not
> to trust fully the information encoded there, because it's only valid
> at instructions marked with the "is_stmt" flag, or some such.

No, my personal interest in standardization of debugging information
is near-zero.

> Why do you want -g to generate incorrect debug information, and force
> debug information consumers that have use cases different than yours,
> and distributors of such debug information, to decide between changing
> their build procedures to get what the compiler should have long given
> them, or living with unreliable information?

I guess it must be because I'm an extremist who can only cares about
one thing, and I have no interest in considering issues that other
people might care about.  What other possible explanation could there
be?

> Just so that you, who don't care so much about the correctness of this
> information yet, can shave off some bytes from your object files?  Why
> shouldn't you use an option such as -gimme-just-what-I-need-no-more or
> -fsck-up-my-debug-info-I-dont-care-about-standards instead?

First, we add the option.  Second, we see what the results look like.
Third, we decide what the default should be.

Like it or not, the large size of debug information is a serious issue
for many people.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21 19:32                                                                                 ` Ian Lance Taylor
@ 2007-12-21 22:46                                                                                   ` Alexandre Oliva
  2007-12-22  0:07                                                                                     ` Ian Lance Taylor
                                                                                                       ` (2 more replies)
  0 siblings, 3 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-21 22:46 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On Dec 21, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:
>> On Dec 21, 2007, Ian Lance Taylor <iant@google.com> wrote:
>> 
>> >> Why would code, essential for debug information consumers that are
>> >> part of larger systems to work correctly, deserve any less attention
>> >> to correctness?
>> 
>> > Because for most people the use of debug information is to use it in a
>> > debugger.
>> 
>> Emitting incorrect debug information that most people wouldn't use
>> anyway is like breaking only the template instantiations that most
>> people wouldn't use anyway.
>> 
>> Would you defend the latter position?

> Alexandre, I have to say that in my opinion absurd arguments like this
> do not strengthen your position.

I'm sorry that you feel that way, but I don't understand why you and
so many others apply different compliance standards to debug
information.  Why do you regard compiler output that causes systems to
fail because they process incorrect debug information as any more
acceptable than compiler output that causes system to fail because
they process incorrect instructions?

Do you just not see how serious the problem is, or just not care about
the growing number of tools and people who need the information to be
standard-compliant?

> What we sacrifice in these cases is the ability to sometimes get a
> correct view of at most two or three local variables being modified in
> the exact statement being executed at the time of the signal.

Aren't you forgetting that complex statements and scheduling can make
it much worse than this?  In fact, that there can be very many "active
statements" at any single point in the code (and this is even more
critical on some architectures such as IA64), and that, in these
cases, your suggested notion of "line notes" is pretty much
meaningless, for they will be present between pretty much every pair
of statements anyway?

> Programmers can reasonably select a trade-off between larger debug
> information size and the ability to correctly inspect local
> variables when they asynchronously examine a program.

I don't have a problem with permitting people to make this trade-off,
as long as the information we generate is still arguably correct
(i.e., not necessarily in what I understand as correct), even if it is
incomplete.  I just don't see where to draw a line that makes sense to
me.

> Moreover, a tool which reads the debug information can determine that
> it is looking at instructions in the middle of the statement, and that
> therefore the known locations of local variables need not be correct.
> So in fact we don't even lose the ability to get a correct view.  What
> we lose is the ability to in some cases see a value which actually is
> available, but which the debugging tool can not prove to be available.

Feel like proposing this "relaxed mode" to the DWARF standardization
committee?  At least an annotation that tells debug info consumers not
to trust fully the information encoded there, because it's only valid
at instructions marked with the "is_stmt" flag, or some such.

> It appears to me that you think that there is a binary choice between
> debugging information that is correct by your definition and debugging
> information that is incorrect.  That is a false dichotomy.  There are
> many gradations of debugging information that are useful.  For
> example, I don't know what your position on -g1 is, but certainly many
> people find it to be useful and practical, just as many people find
> -g0 and -g2 to be useful and practical.  Presumably some people also
> find -g3 to be useful, although I don't know any of them myself.
> Correctness of debugging information is not a binary characteristic.

But this paragraph above is not about correctness, it's about
completeness.  -g0 is less complete than -g1 is less complete than -g2
is less complete than -g3.  They all have their uses, but they can all
be compliant with the debug information standards, because what they
leave out is optional information.

What you're proposing is something else.  It's not about leaving out
information that is specified as optional in the standard.  It's about
emitting information, rather than leaving it out, and emitting it in a
way that is non-compliant with the standard, which makes it misleading
and error-prone to debug information consumers that have no reason to
suspect it might be wrong.

And all this just because emitting correct and more complete
information would make it larger, but we don't even know by how much.

What are you trying with to accomplish?

Why do you want -g to generate incorrect debug information, and force
debug information consumers that have use cases different than yours,
and distributors of such debug information, to decide between changing
their build procedures to get what the compiler should have long given
them, or living with unreliable information?

Just so that you, who don't care so much about the correctness of this
information yet, can shave off some bytes from your object files?  Why
shouldn't you use an option such as -gimme-just-what-I-need-no-more or
-fsck-up-my-debug-info-I-dont-care-about-standards instead?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21 18:12                                                                               ` Alexandre Oliva
@ 2007-12-21 19:32                                                                                 ` Ian Lance Taylor
  2007-12-21 22:46                                                                                   ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-21 19:32 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> On Dec 21, 2007, Ian Lance Taylor <iant@google.com> wrote:
> 
> >> Why would code, essential for debug information consumers that are
> >> part of larger systems to work correctly, deserve any less attention
> >> to correctness?
> 
> > Because for most people the use of debug information is to use it in a
> > debugger.
> 
> Emitting incorrect debug information that most people wouldn't use
> anyway is like breaking only the template instantiations that most
> people wouldn't use anyway.
> 
> Would you defend the latter position?

Alexandre, I have to say that in my opinion absurd arguments like this
do not strengthen your position.  I think they make it weaker, because
it encourages people like me--the people you have to convince--to
write you off as somebody more interested in rhetoric than in actual
thought.

> > Even the use you mentioned of doing backtraces only requires adding
> > the notes around function calls, not around every line, unless you
> > enable -fnon-call-exceptions.
> 
> Asynchronous signals, anyone?
> 
> Asynchronous attachment to processes for inspection?
> 
> Inspection at random points in time?

What we sacrifice in these cases is the ability to sometimes get a
correct view of at most two or three local variables being modified in
the exact statement being executed at the time of the signal.  When I
say "correct view" here I mean that sometimes the tools will see the
wrong value for a variable, when the truth is that they should see
that the variable's value is unavailable.  We do not sacrifice
anything about the ability to look at variables declared in functions
higher up in the stack frame.  Programmers can reasonably select a
trade-off between larger debug information size and the ability to
correctly inspect local variables when they asynchronously examine a
program.

Moreover, a tool which reads the debug information can determine that
it is looking at instructions in the middle of the statement, and that
therefore the known locations of local variables need not be correct.
So in fact we don't even lose the ability to get a correct view.  What
we lose is the ability to in some cases see a value which actually is
available, but which the debugging tool can not prove to be available.

> > If you want to work on supporting this controlled by an option (-g4?),
> > that is fine with me.
> 
> So, how would you document -g2?  Generate debug information that is
> thoroughly broken, but that is hopefully good enough for some limited
> and dated scenarios of debugging?
> 
> And, more importantly, how would you go about introducing something
> that provides more meaningful information than the current
> (non-?)design does, but that discards just the right amount of
> information so as to keep debug information just barely enough for
> debugging, but without discarding too much?
> 
> In other words, how do you draw the line, algorithmically speaking?

I already told you one perfectly good place to draw the line: make
variable location information correct at line notes.  That suffices
for many practical uses.  And I already said that I'm willing to see
an option to permit more precise debugging information.

It appears to me that you think that there is a binary choice between
debugging information that is correct by your definition and debugging
information that is incorrect.  That is a false dichotomy.  There are
many gradations of debugging information that are useful.  For
example, I don't know what your position on -g1 is, but certainly many
people find it to be useful and practical, just as many people find
-g0 and -g2 to be useful and practical.  Presumably some people also
find -g3 to be useful, although I don't know any of them myself.
Correctness of debugging information is not a binary characteristic.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21  5:10                                                                             ` Ian Lance Taylor
@ 2007-12-21 18:12                                                                               ` Alexandre Oliva
  2007-12-21 19:32                                                                                 ` Ian Lance Taylor
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-21 18:12 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: janis187, gcc

On Dec 21, 2007, Ian Lance Taylor <iant@google.com> wrote:

>> Why would code, essential for debug information consumers that are
>> part of larger systems to work correctly, deserve any less attention
>> to correctness?

> Because for most people the use of debug information is to use it in a
> debugger.

Emitting incorrect debug information that most people wouldn't use
anyway is like breaking only the template instantiations that most
people wouldn't use anyway.

Would you defend the latter position?

> Even the use you mentioned of doing backtraces only requires adding
> the notes around function calls, not around every line, unless you
> enable -fnon-call-exceptions.

Asynchronous signals, anyone?

Asynchronous attachment to processes for inspection?

Inspection at random points in time?

Debugging is changing.  Please stop assuming the only use for debug
information is for interactive debugging sessions like those provided
by GDB.  Debug information specifications/standards should be on par
with language, ABI and ISA specifications/standards.

> If you want to work on supporting this controlled by an option (-g4?),
> that is fine with me.

So, how would you document -g2?  Generate debug information that is
thoroughly broken, but that is hopefully good enough for some limited
and dated scenarios of debugging?

And, more importantly, how would you go about introducing something
that provides more meaningful information than the current
(non-?)design does, but that discards just the right amount of
information so as to keep debug information just barely enough for
debugging, but without discarding too much?

In other words, how do you draw the line, algorithmically speaking?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21  2:11                                                                           ` Alexandre Oliva
  2007-12-21  3:16                                                                             ` Robert Dewar
@ 2007-12-21  5:10                                                                             ` Ian Lance Taylor
  2007-12-21 18:12                                                                               ` Alexandre Oliva
  1 sibling, 1 reply; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-21  5:10 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: janis187, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:
> 
> > Right, which will significantly increase debugging size as you add two
> > more notes around many lines.
> 
> If that's the price to avoid debug information consumers getting
> incorrect values...
> 
> Would you argue for a position such as:
> 
>   we can't go on expanding C++ templates for every conceivable type
>   users instatiate them, this would make applications too large.
>   let's try to figure out some way to reuse template expansions, even
>   if some programs break, because it's more important to keep programs
>   small than to enable them to behave correctly
> 
> ?

No, that would be an obviously stupid position to take.  I don't
understand why you even say such a thing.


> Why would code, essential for debug information consumers that are
> part of larger systems to work correctly, deserve any less attention
> to correctness?

Because for most people the use of debug information is to use it in a
debugger.  And for those people, correct information at line positions
suffices.

Even the use you mentioned of doing backtraces only requires adding
the notes around function calls, not around every line, unless you
enable -fnon-call-exceptions.

If you want to work on supporting this controlled by an option (-g4?),
that is fine with me.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21  2:11                                                                           ` Alexandre Oliva
@ 2007-12-21  3:16                                                                             ` Robert Dewar
  2007-12-21  5:10                                                                             ` Ian Lance Taylor
  1 sibling, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-12-21  3:16 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, janis187, gcc

Alexandre Oliva wrote:
> On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:
> 
>> Right, which will significantly increase debugging size as you add two
>> more notes around many lines.
> 
> If that's the price to avoid debug information consumers getting
> incorrect values...

It may be an unacceptable price, the cost of an executable going
from 50 megabytes to 80 megabytes can be the difference between
handling the situation being practical and impractical.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-21  1:54                                                                         ` Ian Lance Taylor
       [not found]                                                                           ` <orprx0izhp.fsf@oliva.atho! me.lsd.ic.unicamp.br>
@ 2007-12-21  2:11                                                                           ` Alexandre Oliva
  2007-12-21  3:16                                                                             ` Robert Dewar
  2007-12-21  5:10                                                                             ` Ian Lance Taylor
  2007-12-31 19:39                                                                           ` Alexandre Oliva
  2 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-21  2:11 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: janis187, gcc

On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Right, which will significantly increase debugging size as you add two
> more notes around many lines.

If that's the price to avoid debug information consumers getting
incorrect values...

Would you argue for a position such as:

  we can't go on expanding C++ templates for every conceivable type
  users instatiate them, this would make applications too large.
  let's try to figure out some way to reuse template expansions, even
  if some programs break, because it's more important to keep programs
  small than to enable them to behave correctly

?

Why would code, essential for debug information consumers that are
part of larger systems to work correctly, deserve any less attention
to correctness?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20 21:38                                                                       ` Alexandre Oliva
@ 2007-12-21  1:54                                                                         ` Ian Lance Taylor
       [not found]                                                                           ` <orprx0izhp.fsf@oliva.atho! me.lsd.ic.unicamp.br>
                                                                                             ` (2 more replies)
  0 siblings, 3 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-21  1:54 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: janis187, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:
> 
> > It is technically feasible but problematic for other reasons.
> >     i = i * m + ((i / j) + k) / n;
> > On a two register machine like the x86 i will change several times
> > during that calculation.
> 
> No.  The register used to hold its initial value will.  Keep in mind
> the separation between user variables and implementation locations.
> The user variable 'i' is only supposed to change when assignment
> operation is performed, (even if only in a theoretical level), when
> the final value of the RHS is available and stored in the location
> then assigned to hold the value of variable 'i'.

OK, fair enough.

> Now, it is possible that the previous value of 'i' becomes unavailable
> while the expression is evaluated.  Then, in order to represent this
> correctly, we just have to note that 'i' is no longer available as
> soon as all locations holding its original value are clobbered, and
> that it's available again when its new location holds the assigned
> value.

Right, which will significantly increase debugging size as you add two
more notes around many lines.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20 16:52                                                                     ` Ian Lance Taylor
@ 2007-12-20 21:38                                                                       ` Alexandre Oliva
  2007-12-21  1:54                                                                         ` Ian Lance Taylor
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-20 21:38 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: janis187, gcc

On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:

> It is technically feasible but problematic for other reasons.
>     i = i * m + ((i / j) + k) / n;
> On a two register machine like the x86 i will change several times
> during that calculation.

No.  The register used to hold its initial value will.  Keep in mind
the separation between user variables and implementation locations.
The user variable 'i' is only supposed to change when assignment
operation is performed, (even if only in a theoretical level), when
the final value of the RHS is available and stored in the location
then assigned to hold the value of variable 'i'.

Now, it is possible that the previous value of 'i' becomes unavailable
while the expression is evaluated.  Then, in order to represent this
correctly, we just have to note that 'i' is no longer available as
soon as all locations holding its original value are clobbered, and
that it's available again when its new location holds the assigned
value.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20 16:44                                                                       ` Ian Lance Taylor
@ 2007-12-20 20:42                                                                         ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-20 20:42 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Richard Guenther, gcc-patches, gcc

On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:
>> > How do i know i need to change this DEBUG expression.
>> 
>> As reassoc looks for sets of variables it can freely mess with, it
>> should take note of variables that are used in debug annotations in
>> addition to the kind of single (?) non-debug uses it's interested in,
>> such that, when it modifies these variables, the annotations can be
>> compensated for.

> The question is how it finds them efficiently, without doing a scan of
> all instructions.

It must keep track of variables it can mess with, so it might as well
take notes about those it has to be more careful about.

*Or* it can just introduce new temporaries, rename the uses and leave
the original sets behind for "garbage collection" AKA dead code
elimination, like I said.

One is more implementation work, the other is potentially more
wasteful in terms of memory use.  None look particularly hard to me.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20  8:00                                                               ` Alexandre Oliva
  2007-12-20  8:01                                                                 ` Alexandre Oliva
@ 2007-12-20 17:02                                                                 ` Ian Lance Taylor
  1 sibling, 0 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-20 17:02 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > And it will avoid the problem of turning the testsuite into a
> > regression testsuite rather than an accuracy testsuite.
> 
> Sorry, I don't understand what you mean here.

It's not a major point.

When one adds a testsuite to working code, it is natural to write
tests that expect to see what the code generates.  The risk is that
any change to the code causes the test to fail.  This is the essence
of a regression testsuite.  For an example, see the linker testsuite
in the binutils.  Practically any change to the linker, correct or
not, causes some tests to fail.

An accuracy testsuite is one written independently of the code.  It
tests for the specific features that are desired, rather than testing
for what the code currently does.

Of course you can write an accuracy testsuite with working code.  It's
just much easier to write a regression testsuite, and it's easy to
backslide into that.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20  6:10                                                                   ` Alexandre Oliva
@ 2007-12-20 16:52                                                                     ` Ian Lance Taylor
  2007-12-20 21:38                                                                       ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-20 16:52 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: janis187, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> On Dec 19, 2007, Ian Lance Taylor <iant@google.com> wrote:
> 
> > For some things, sure, but we are just talking about the values in
> > user visible variables stored in registers.  There is no way we can
> > make that information be correct between line notes.
> 
> Err...  I think there is, and one way to do it is with the design I've
> proposed.  Do you have anything to back up your implied assertion that
> the design can't accomplish this?

It is technically feasible but problematic for other reasons.
    i = i * m + ((i / j) + k) / n;
On a two register machine like the x86 i will change several times
during that calculation.  You could issue debug notes making it
correct at every machine instruction.  But that would balloon the
amount of debug info that we generate, for near-zero gain in real
usability of the debugger.  We already generate huge amounts of debug
info--a typical C++ executable has more debug info than text and data
combined.  Increasing the amount of debug info significantly, for
little gain, is contraindicated.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20  5:16                                                                     ` Alexandre Oliva
@ 2007-12-20 16:44                                                                       ` Ian Lance Taylor
  2007-12-20 20:42                                                                         ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-20 16:44 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > How do i know i need to change this DEBUG expression.
> 
> As reassoc looks for sets of variables it can freely mess with, it
> should take note of variables that are used in debug annotations in
> addition to the kind of single (?) non-debug uses it's interested in,
> such that, when it modifies these variables, the annotations can be
> compensated for.

The question is how it finds them efficiently, without doing a scan of
all instructions.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20  8:00                                                               ` Alexandre Oliva
@ 2007-12-20  8:01                                                                 ` Alexandre Oliva
  2007-12-20 17:02                                                                 ` Ian Lance Taylor
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-20  8:01 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On Dec 20, 2007, Alexandre Oliva <aoliva@redhat.com> wrote:

> I'm addressing this in a bit more detail in a revised version of the
> spec, that I intend to publish in the GCC wiki RSN.

http://gcc.gnu.org/wiki/Var_Tracking_Assignments

Here's a diff between the version I posted a couple of days ago and
the one before I started adjusting formatting for better rendering in
the wiki.

Index: debug-var-loc.txt
===================================================================
RCS file: /home/aoliva/.cvs/txt/free-software/gcc/debug-var-loc.txt,v
retrieving revision 1.2
retrieving revision 1.4
diff -u -p -d -u -p -r1.2 -r1.4
--- debug-var-loc.txt	18 Dec 2007 08:03:42 -0000	1.2
+++ debug-var-loc.txt	20 Dec 2007 07:32:56 -0000	1.4
@@ -34,54 +34,66 @@ optimization passes discard information 
 emit correct and complete variable location lists.
 
 Coalescing, scalarizing, substituting, propagating, and many other
-transformations prevent the late-running variable tracker from doing
-an accurate job.  By the time it runs, many variables no longer show
-up in the retained annotations, although they're still conceptually
-available.
+transformations prevent the late-running variable tracker from doing a
+complete or even accurate job.  By the time it runs, many variables no
+longer show up in the retained annotations, although they're still
+conceptually available.
 
-The variable tracker can't tell when a user variable overlaps with
-another, and it can't tell when a variable is overwritten, if the
-assignment is optimized away.  These limitations are inherent to a
-model based on inspecting actual code and trying to make inferences
-from that.  In order to be able to represent not only what remained in
-the code, but also what was optimized, combined or otherwise
-apparently-removed, additional information needs to be kept around.
+The variable tracker can't handle sharing of a location by multiple
+user variables, multiple active locations for the same variable, and
+it can't tell when a variable is overwritten, if the assignment is
+optimized away.  This last limitations is inherent to a model based on
+inspecting only actual code, and trying to make inferences from that.
+In order to be able to represent not only what remained in the code,
+but also what was optimized, combined or otherwise apparently-removed,
+additional information needs to be kept around.
 
 This paper describes an approach to maintain this information.
 
 
 == Goals
 
-* Ensure that, for every user variable for which we emit debug
-information, the information is correct, i.e., if it says the value of
-a variable at a certain instruction is at certain locations, or is a
-known constant, then the variable must not be at any other location at
-that point, and the locations or values must match reasonable
-expectations based on source code inspection.
+=== Correctness
 
-* Defining "reasonable expectations" is tricky, for code reordering
-typical of optimization can make room for numerous surprises.  I don't
-have a precise definition for this yet, but very clearly to me saying
-that a variable holds a value that it couldn't possibly hold (e.g.,
-because it is only assigned that value in a code path that is
-knowingly not taken) is a very clear indication that something is
-amiss.  The general guiding rule is, if we aren't sure the information
-is correct (or we're sure it isn't), we shouldn't pretend that it is.
+Ensure that, for every user variable for which we emit debug
+information, the information is correct, i.e., if it provides
+locations or value expressions for a variable in a certain range of
+instructions, then, for all instructions in that range, the values
+specified in the debug information must match the value the user
+variable is bound to.
 
-* Try to ensure that, if the value of a variable is a known constant
+We say a variable is bound to a value when control flow crosses a
+theoretical instruction placed at the point of the program in which
+the user variable is, or should be have been, assigned that value.
+This theoretical instruction is maintained roughly in place regardless
+of optimizations that move, remove or otherwise optimize any code
+generated to implement the source-level variable modification.  More
+details below, in the "scheduling and reordering" section.
+
+
+=== Completeness
+
+Try to ensure that, if the value of a variable is a known constant at
+a certain point in the program, this information is present in debug
+information.
+
+Try to ensure that, if the value of a variable is available at any
+location, or computable from values available at any other locations
 at a certain point in the program, this information is present in
 debug information.
 
-* Try to ensure that, if the value of a variable is available or
-computable at any location at a certain point in the program, this
-information is present in debug information.
 
-* Stop missing optimizations for the sake of preserving debug
-information.
+=== Run-time efficienty
 
-* Avoid using additional memory and CPU cycles that would be needed
-only for debug information when compiling without generating debug
-information
+Stop missing optimizations for the sake of preserving variable
+location debug information.
+
+
+=== Compile-time efficienty
+
+Avoid using additional memory and CPU cycles that would be needed only
+to generate debug information when compiling without generating debug
+information.
 
 
 == Internal Representation
@@ -118,9 +130,9 @@ most optimization passes, be handled jus
 Once this is established, a possible representation becomes almost
 obvious: statements (in trees) or instructions (in rtl) that assert,
 to the variable tracker, that a user variable or member is represented
-by a given expression:
+by a given expression, or that bind a user variable to a value:
 
-  # DEBUG var expr
+  # DEBUG var => expr
 
 By var, we mean a tree expression that denotes a user variable, for
 now.  We envision trivially extending it to support components of
@@ -128,103 +140,204 @@ variables in the future.
 
 By expr, we mean a tree or rtl expression that computes the value of
 the variable at the point in which the statement or instruction
-appears in the program.  A special value needs to be specified for
-each representation that denotes a location or value that cannot be
-determined or represented in debug information, for example, the
-location of a variable that was completely optimized away.  It might
-be useful to represent the expression as a list of expressions, and to
-distinguish lvalues from rvalues, but for now let's keep this simple.
+appears in the program, and that the variable is expected to hold
+until (i) execution crosses another such annotation for that variable,
+or (ii) the value becomes no longer computable, because all locations
+containing it or usable to compute it are no longer provably usable to
+compute it.  For example, if the variable is bound to the value of a
+certain hardware register, and the register is subsequently modified,
+but the bound value is not known to be available elsewhere, then the
+variable is regarded as unavailable at that point.
+
+A special value needs to be specified for each debug annotation
+representation that denotes an unavailable variable.  Although in some
+cases this condition can be detected implicitly, as described above,
+in others we must be able to describe that, at the point of the
+binding, the value that should be bound to the variable is not
+available, for example, because it was completely optimized away and
+it's not even computable any more, or because the compiler has been
+unable to represent or to keep track of the expected value of the
+variable at that point.
+
+Also, it might be useful to represent the expression as a list of
+expressions, to establish larger equivalence classes to begin with and
+to get better resistance against complete loss of values.  It may also
+be useful distinguish lvalues from rvalues in the representation, but
+for now we're keeping it simpler, to see if we can make do without the
+additional complexity.
 
 
 == Generating debug information
 
 Generating initial annotations when entering SSA is early enough in
 the translation that the program will still reflect very reliably the
-original source code.  Annotations are only generated for user
-variables that are GIMPLE registers, i.e., variables that represent
-scalar values and that never have their address taken.  Other kinds of
-variables don't have varying locations, so we don't need to worry
-about them.
+original source code.  We will only emit such annotations for user
+variables that are GIMPLE registers, i.e., variables that present in
+the source code, that are not addressable and that hold scalar values.
+Addressable or non-scalar user variables don't have varying locations,
+so we don't need these annotations to generate correct debug
+information for them.
 
-After every assignment to such a variable, we emit a DEBUG statement
-that will preserve, throughout compilation, the information that, at
-that point, the assigned variable was represented by that expression.
-So, after turning an assignment such as the following into SSA form,
-we emit the debug statement below right after it:
+As optimizations transform the code, the initially-trivial mapping
+between such user variables and implementation locations gets more and
+more fuzzy.  Even when the compiler retains mnemonic names that
+resemble user variable names for such implementation locations (GIMPLE
+registers, RTL pseudos, hardware registers and stack slots), it is
+important to keep in mind that source- and implementation concepts are
+in different name spaces, and that the implementation locations cannot
+be assumed to remain associated with the user variables they were
+initially named after.
+
+The purpose of the annotations is precisely to establish a mapping
+from user variables to implementation concepts without preventing
+optimizations.  The choice of focusing not so much on locations, but
+rather on values, is intended to minimize the impact of optimizations
+on the ability to represent the value a variable holds, which is what
+debug information consumers are most often interested in.  Actual
+locations are a slightly secondary issue, that we expect to be able to
+infer from the value binding annotations, but that may require more
+explicit annotations, as mentioned above.
+
+After every assignment to user variables that are GIMPLE registers, we
+emit a DEBUG statement that will preserve, throughout compilation, the
+information that, at that point, the user variable was bound to the
+value of that expression.  So, after putting an assignment such as the
+following in SSA form, we emit the debug statement below right after
+it:
 
   x_1 = whatever;
-  # DEBUG x x_1
+  # DEBUG x => x_1
 
-Likewise, at control flow merge points, for each PHI node we introduce
-in the SSA representation, we emit an annotation:
+Likewise, at control flow merge points, for each PHI node associated
+with a user variable we introduce in the SSA representation, we emit
+an annotation:
 
   # x_4 = PHI <x_1(3), x_2(4), x_3(7)>;
-  # DEBUG x x_4
+  # DEBUG x => x_4
 
 Then, we let tree optimizers do their jobs.  Whenever they rename,
 renumber, coalesce, combine or otherwise optimize a variable, they
-will automatically update debug statements that mention them as well.
+will most likely automatically update debug statements that mention
+them as well.
 
 In the rare cases in which the presence of such a statement might
 prevent an optimization, we need to adjust the optimizer code such
 that the optimization is not prevented.  This most often amounts to
-skipping or otherwise ignoring debug statements.  In a few very rare
-cases, special code might be needed to adjust debug statements
-manually.
+skipping or otherwise ignoring debug statements.  In a few rare cases,
+additional code might be needed specifically to adjust debug
+statements.
 
-After transformation to RTL, the representation needs translation, but
-conceptually it's still the same: a mapping from variable to
-expression.  Again, optimizers will most often adjust debug
-instructions automatically.
+During conversion to RTL, the debug statements also decay to debug
+instructions, and the tree value expressions are trivially converted
+to RTL.  Conceptually, however, it's still the same representation: a
+binding from user variable to expression.  RTL optimizers will most
+often adjust debug instructions automatically.
 
-The exceptions can be handled at no cost: the test for whether an
-element of the instruction stream is an instruction or some kind of
-note, that never needs updating, is a range test, in its optimized
+The exceptions can be handled often at no cost: the test for whether
+an element of the instruction stream is an instruction or some kind of
+note (that never needs updating) is a range test, in its optimized
 form.  By placing the identifier for a debug instruction at one of the
-limits of this range, testing for both ranges requires identical code,
-except for the constants.
+limits of this range, testing for ranges that include or exclude debug
+instructions requires identical code, except for the constants.
 
 Since most code that tests for INSN_P and handles instructions can and
 should match debug instructions as well, in order to keep them up to
 date, we extend INSN_P so as to match debug instructions, and modify
-the exceptions, that need to skip debug instructions, by using an
-alternate test, with the same meaning as the original definition of
-INSN_P.  These simple and non-intrusive changes are relatively common,
-but still, by far, the exception rather than the rule.
+the code in the exceptions, that need to skip debug instructions, by
+using an alternate test, with the same meaning as the original
+definition of INSN_P.  These simple and non-intrusive changes are
+relatively common, but still, by far, the exception rather than the
+rule.  As in tree level, there are transformations that require
+special handling of debug annotations, but these are even rarer.
 
 When optimizations are completed, including register allocation and
-scheduling, it is time to pick up the debug instructions and emit
-debug information out of them.  Conceptually, the debug instructions
-represent points of assignment, at which a user variable ought to
-evaluate to the annotated expression, maintained throughout
-compilation.  However, when the value of a variable is live at more
-than one location, it is important to note it, such that, if a
-debugging session attempts to modify the variable, all copies are
-modified.
+scheduling, it is time to take the data collected in debug
+instructions and emit debug information out of them.  Conceptually,
+the debug instructions represent points of assignment, at which a user
+variable ought to evaluate to the annotated expression, maintained
+throughout compilation.  However, when the value of a user variable is
+available at more than one location (think, for example, stack
+variable temporarily held also in a register), it is important to note
+it, such that, if a debugging session attempts to modify the variable,
+all copies are modified.
 
 The idea is to use some mechanism to determine equivalent expressions
 throughout a function (say some variant of Global Value Numbering).
 At debug instructions, we assert that the value of the named variable
-is in the equivalence class represented by the expression.  As we scan
+is in the equivalence class the expression belongs to.  As we scan
 basic blocks forward and find that expressions in an equivalence class
 are modified, we remove them from the equivalence class, and thus from
-the list of available locations for the variable.  When such
-expressions are further copied, we add them to equivalence classes.
-At function calls and volatile asm statements, we remove
-non-function-private memory slots from equivalence classes.  At
-function calls, we also remove call-clobbered registers from
-equivalence classes.  When no live expression remains in the
-equivalence class that represents a variable, it is understood that
-its value is no longer available.  At basic block confluences, we
-combine information from the end states of the incoming blocks and the
-debug statements added as a side effect of PHI nodes.
+the list of available locations for the variables that hold that
+value.  When members of an equivalence class are copied, we add the
+copies to equivalence class.  At function calls and volatile asm
+statements, we remove non-function-private memory slots from
+equivalence classes.  At function calls, we also remove call-clobbered
+registers from all equivalence classes.  When no live expression
+remains in the equivalence class that represents a variable, it is
+understood that its value is no longer available.  At basic block
+confluences, we combine information from the end states of the
+incoming blocks and the block-entry debug statements that had been
+added after PHI nodes earlier.
 
-The end result is accurate debug information.  Also, except for
-transformations that require special handling to update debug
-annotations properly, debug information should come out as complete as
+When multiple variables are held in the same equivalence class, some
+care must be taken to determine which locations can be used as
+modifiable copies of a variable and which hold incidental copies.
+More investigation is needed to design strategies to make this
+partitioning, such that the end result is accurate debug information.
+
+Also, except for transformations that require special handling to
+update debug annotations properly but that haven't been improved
+accordingly, debug information should come out as complete as
 possible.
 
 
+== Scheduling and reordering
+
+Optimizing code involves a lot of moving code around.  Basic block
+reordering, loop unrolling, and other forms of code duplication,
+movement or removal that affect placement of sequences of
+instructions, but not so much the instructions to be executed in a
+given execution path, have no effect on the debug information
+annotations presented in this article.  When moving, duplicating or
+removing code along these lines, debug annotations can be regarded
+just like regular instructions.
+
+Other than that, debug annotations should generally remain in place,
+serving as guides for what would amount to the natural execution order
+of the program, regardless of optimizations that reorder instructions,
+move instructions out of loops or conditionals.
+
+For example, if we move to an unconditional block a computation that
+was only to be performed inside a conditional, the debug annotation
+that binds the variable to the conditionally-computed value should
+remain in the conditional block, unless it is completely eliminated.
+Likewise, if some computation is hoisted out of a loop, the debug
+annotation should remain in the loop, where the user expects the
+assignment to take place.
+
+Moving a computation to an earlier point shouldn't require
+modification in subsequent debug annotations, but moving it to a later
+point may, especially when the move crosses the annotation.  For
+example, if an assignment instruction, say x = y, is moved past the
+end of a loop, debug annotations that refer to x in their expressions
+probably need to have it replaced with y, such that the binding
+remains with the same value in spite of the assignment move.
+
+Transformations that reorder instructions within a single block, such
+as instruction scheduling, don't require modification of annotations.
+Debug annotations should be maintained after the assignments they
+refer to, if the assignments are still nearby, and this is trivially
+accomplished through scheduling dependencies.  Other than that, debug
+annotations should generally have high scheduling priority, such that
+they are kept right after the corresponding assignment, or moved early
+when an assignment was hoisted out of a loop.  That said, reordering
+debug annotations may be undesirable and surprising at times.  Also,
+care must be taken to not schedule too early bindings for values that
+are completely optimized away: because these have no dependencies,
+they might be moved too early, to the point of making the range of the
+previous binding an empty range.
+
+
 == Testability
 
 Since debug annotations are added early, and, in most cases,
@@ -240,9 +353,9 @@ maintaining debug annotations throughout
 them away at the end.  This is undesirable, for it would slow down
 compilation without debug information and waste memory while at that.
 
-Therefore, we've built testing mechanisms into the compiler to detect
-cases in which the presence of debug annotations would cause code
-changes.
+Therefore, we've built testing mechanisms into the compiler build
+machinery to detect cases in which the presence of debug annotations
+would cause code changes.
 
 The bootstrap-debug Makefile target, by default, compiles the second
 bootstrap stage without debug information, and the third bootstrap
@@ -285,11 +398,13 @@ or whether the value is available or com
 missing, is a harder problem, but it's not part of the accuracy test,
 but rather of the completeness test.
 
-The completeness score for an unoptimized program might very often be
+A completeness score for an unoptimized program might very often be
 unachievable for optimized programs, not because the compiler is doing
 a poor job at maintaining debug information, but rather because the
-compiler is doing a good job at optimizing it, to the point that it is
-no longer possible to determine the value of the inspected variable.
+compiler is doing a good job at optimizing it, to the point that no
+possibility remains of computing the value of certain variables at
+certain points in the program.  This should be taken into account when
+desigining completeness tests.
 
 
 == Concerns
@@ -303,14 +418,16 @@ bit.
 In order to generate correct debug information, more information needs
 to be retained throughout compilation.  The only way to arrange for
 debug information to not require any additional memory is to waste
-memory when not generating debug information.  But this is
-undesirable.
+memory when not generating debug information.  But this is probably
+undesirable, even if it would minimize the risks of debug annotations
+affecting optimizations and modifying the generated code.
 
 Therefore, the better debug information we want, the more memory
 overhead we're going to have to tolerate.
 
 Of course at times we can trade memory for efficiency, using more
-computationally expensive representations that are more compact.
+computationally expensive representations that are more compact, when
+we can't have both compactness and efficiency.
 
 At other times, we may trade memory for maintainability.  For example,
 instead of emitting annotations as soon as we enter SSA mode, we could
@@ -319,29 +436,31 @@ modified an SSA assignment for which we 
 annotation.  Additional memory would be needed to mark assignments
 that should have gained annotations but haven't, and care must be
 taken to make sure that transformations aren't made without leaving a
-correct debug statement in place.  It is not clear that this would
-save significant memory, for a large fraction of relevant assignments
-are modified or moved anyway, so it might very well be a
-maintainability loss and a performance penalty for no measurable
-memory gains.
+correct (even if still implied) debug annotation in place.  It is not
+clear that this would save significant memory, for a large fraction of
+relevant assignments are probably modified or moved anyway, so it
+might turn out to be a maintainability and performance loss for small
+memory gains.  More investigation is required to determine whether
+this is indeed the case.
 
-Worst case, we may trade memory for debug information quality: if
-memory use of this scheme is too high for some scenario, one can
-disable debug information annotations through a command line option,
-or disable debug information altogether.
+Worst case, a user may trade memory for debug information quality: if
+the memory use of this scheme turns out to be too high for some
+scenario, the user can disable debug information annotations through a
+command line option, or disable debug information altogether.
 
 
 === Intrusiveness
 
 Given that nearly all compiler transformations would require
 reflection in debug information, any solution that doesn't take
-advantage of this fact is bound to require changes all over the place.
+advantage of this fact is bound to require changes all over the
+compiler.
 
 Perhaps not so much for Tree-SSA passes, that are relatively
 well-behaved and use a narrow API to make transformations, but very
 clearly so for RTL passes, that very often modify instructions in
 place, and at times even reuse locations assigned to user variables as
-temporaries.
+temporaries (the same is true of tree-ssa-reassoc, FWIW).
 
 Even when we do use the strength of optimizers to maintain debug
 information up to date, there are exceptions in which detailed
@@ -378,7 +497,8 @@ below.
 Worrying about the representation of debug annotations as statements
 or instructions, rather than notes, is missing the fact that, most of
 the time, we do want them to be updated just like statements and
-instructions.
+instructions, rather than handled like notes, that never need
+updating.
 
 Worrying about the representation of debug annotations in-line, rather
 than an on-the-side representation, is a valid concern, but it's
@@ -400,19 +520,21 @@ generates actually matches the executabl
 complete as viable.
 
 The goal is not to disable optimizations so as to preserve variables
-or code, such that it can be represented in debug information and
+or code, such that they could be represented in debug information and
 provide for a debugging experience more like that of code that is not
-optimized.
-
-If debug information disables any optimization, that's a bug that
-needs fixing.
+optimized.  If debug information disables any optimization, that's a
+bug that needs fixing.  Preventing optimizations that lower the
+quality of debug information is a separate feature, and one that will
+benefit from this work, but that won't be accomplished through this
+work.
 
-Now, while testing this design, a number of opportunities for
-optimization that GCC missed were detected and fixed, others were
-merely detected, and at least one optimization shortcoming kept in
-place in order to get better debug information could be removed, for
-the new debug information infrastructure enables the optimization to
-be applied in its fullest extent.
+It is worth mentioning that, while testing the implementation of this
+design, a number of opportunities for optimization that GCC missed
+were detected and fixed, others were merely detected sof ar, and at
+least one optimization shortcoming kept in place in order to get
+better debug information could be removed, for the new debug
+information infrastructure enables the optimization to be applied in
+its fullest extent.
 
 
 == Examples
@@ -449,7 +571,9 @@ print the correct values for i if we kee
 In this case, before the call to h, not only the assignment to i was
 dead, but also the value of the incoming argument x had already been
 clobbered.  If i had been assigned to another constant instead, debug
-information could easily represent this.
+information could easily represent this, through an extension to DWARF
+version 3 that enable location lists to contain value expressions, in
+addition to location expressions.
 
 Another example that covers PHI nodes and conditionals:
 
@@ -491,7 +615,8 @@ x2 (int x, int y, int z)
 
 Note how, without debug annotations, c is only initialized just before
 the call to whatever4.  At all other points, the value of c would be
-unavailable to the debugger, possibly even wrong.
+unavailable to the debugger, possibly even wrong, if prior assignments
+to c had survived optimization.
 
 If we were to annotate the SSA definitions forward-propagated into c
 versions as applying to c, we'd end up with all of x_2, y_3 and z_0
@@ -506,23 +631,23 @@ x2 (int x, int y, int z)
   int c;
   # bb 1
   c_4 = z_0(D);
-  # DEBUG c c_4
+  # DEBUG c => c_4
   whatever0(c_4);
   c_5 = x_2(D);
-  # DEBUG c c_5
+  # DEBUG c => c_5
   whatever1();
   if (some_condition)
     {
       # bb 2
       whatever2();
       c_6 = y_3(D);
-      # DEBUG c c_6
+      # DEBUG c => c_6
       whatever3();
     }
   
   # bb 3
   # c_1 = PHI <c_5(D)(1), c_6(D)(2)>
-  # DEBUG c c_1
+  # DEBUG c => c_1
   whatever4(c_1);
 }
 
@@ -533,20 +658,20 @@ x2 (int x, int y, int z)
 {
   int c;
   # bb 1
-  # DEBUG c z_0(D)
+  # DEBUG c => z_0(D)
   whatever0(z_0(D));
-  # DEBUG c x_2(D)
+  # DEBUG c => x_2(D)
   whatever1();
   if (some_condition)
     {
       # bb 2
       whatever2();
-      # DEBUG y_3(D)
+      # DEBUG c => y_3(D)
       whatever3();
     }
   # bb 3
   # c_1 = PHI <x_2(D)(1), y_3(D)(2)>;
-  # DEBUG c c_1
+  # DEBUG c => c_1
   whatever4(c_1);
 }
 


-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 18:41                                                             ` Ian Lance Taylor
  2007-12-19 19:00                                                               ` Daniel Jacobowitz
  2007-12-19 19:53                                                               ` Janis Johnson
@ 2007-12-20  8:00                                                               ` Alexandre Oliva
  2007-12-20  8:01                                                                 ` Alexandre Oliva
  2007-12-20 17:02                                                                 ` Ian Lance Taylor
  2 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-20  8:00 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On Dec 19, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:
>> You snipped (skipped?) one aspect of the reasoning on why it is
>> appropriate.  Of course this doesn't prove it's the best possibility,
>> but I haven't seen evidence of why it isn't.

> You will find it easier to demonstrate the worth of your proposal if
> you act publically as though your interlocutors are people of good
> will, even when it doesn't seem that way to you, and omit
> interjections like "(skipped?)".

Sorry, I didn't mean it in a demeaning tone.  I realize I should have
been more careful, given the heat of the debate, for which I
apologize.

It just so happens that I'm just used to having texts I write skimmed
through rather than read in detail, so, when someone makes a point
that appears to disregard something that I write about, I tend to
assume that the person missed the portion in which I discussed it.
That was what the 'skipped?' was about.  I know I tend to pack too
much information in small spaces when I write (and I'm not proud of
it, mind you :-), so having readers miss points I did try to address
is unfortunately quite common.

Again, I apologize for not realizing this could be interpreted in a
different way than the one I meant.  It was indeed inappropriate.

> To be sure we are on the same page, I think your argument here is that
> with this code:

> int f(int x, int y) {
>   int i = 0, j = 0;

>   probe1();
>   i = x;
>   j = y;
>   probe2();
>   if (x < y)
>     i += y;
>   else
>     j -= x;
>   probe3();
>   return g (i ,j);
> }

> if I set a breakpoint just before the call to probe2(), and I print
> the values of 'i' and 'j', I should get the values of 'x' and 'y'.
> That is, you want to emit a DWARF variable note at that point that the
> value of 'i' can be found in the location corresponding to 'x'.

Yep.  That would be correct and complete.  It would also be
acceptable, but undesirable, to emit information to the effect that
the locations of 'i' and 'j' are unknown at those points; for this
would be correct, even if incomplete.

> Of course there are no actual instructions between the calls to
> probe1() and probe2().  If I use gdb's "finish" command out of
> probe1(), what values should I see for 'i' and 'j' at that point?
> Arguably I am now before the assignment statements, and should see '0'
> and '0', the values that 'i' and 'j' have before they are changed.  Of
> course, this is the same location as the breakpoint before probe2(),
> and we can't see both '0'/'0' and 'x'/'y'.  So it seems to me that
> this situation is actually somewhat ambiguous.  I don't see an
> obviously correct answer.

Dan has dealt with this point, but, if it floats your boat, you can
disregard any hope of getting it right between probe1() and probe2(),
since there aren't instructions in between them, and focus on getting
it right at probe2() or while probe2() is active in a lower stack
frame.

> I think the general issue you are describing is how to handle an
> assignment which appears in user code but which has been eliminated
> during optimization.

Yes, this is a way to describe it.

I'm addressing this in a bit more detail in a revised version of the
spec, that I intend to publish in the GCC wiki RSN.

> It seems to me that such eliminated assignments are inherently
> ambiguous.  If the assignment is gone, then there is a point in the
> generated code where the variable logically has both the old and the
> new values.  I assume that the debugger can only display one value.
> Which one should it be?

I don't think this characterization is correct.  There are points that
are logically before the removed assignment, and there are points that
are logically after it.  If we actually emitted a nop for the removed
assignment, then we could single-step through it and observe the
change in the logical variable even though no observable change
occurred in the program state (other than the advance of the PC past
this nop).  Except that, in the implementation plan I have in mind,
the observable change would quite often be from "unknown value" to
"assigned value", because the location holding the previous value will
likely have already been overwritten when we reach the debug insn.

> Consider a series of assignments to a local variable, and suppose
> that all the assignments are deleted becaues they are unused.  Are
> there dependencies between the DEBUG notes which keep them in the
> right order?

There ought to be, for sure, such that the last one prevails.

> Presumably we do not have the goal of emitting correct debug
> information in between line notes

I do.  Stack traces, for one, are seldom taken at line note
boundaries, for stack frames other than the top active one.  If we
didn't have correct debug information at those points, monitors
wouldn't be able to do a correct job.  Going from that to backtraces
that cross signal handling frames makes it only slightly more complex,
from a theoretical standpoint.  I.e., I don't see that solving the
problem such that it addresses the apparently-simpler requirement
would take significantly less implementation effort than solving the
apparently-more-complex requirement.

> I wonder whether it would be feasible for the debug info generation to
> work from the assignments in the source code as generated by the
> frontend.  For each assignment, we would find the corresponding line
> note.  Then we would look at the right hand side, and try to identify
> where that value could be found at that point in the program.  This
> would be a variant of our current variable tracking pass.  I haven't
> thought about this enough to know whether it would really work.

I've been giving something along these lines some thought, but it's a
bit more elaborate, and I'm not ready to present even a draft of my
thoughts on this topic.  And I unfortunately may have to discuss it
with lawyers before I can do anything concrete about it.

> That will only work correctly if sched-deps.c introduces dependencies
> between debug insns and real insns.

Yep, it does, have a look at the vta branch.  In fact, sched is the
pass that has given me the most headaches to get bootstrap-debug to
pass.

> If you introduce those dependencies, I don't understand how you will
> avoid changing the schedulers behaviour in the presence of debug
> insns.  How did you work around that problem?

Debug insns don't use any actual machine resources, and they sort of
always fit, so the scheduler can accept them as soon as they become
ready, without changing any other internal state.  I haven't
introduced explicit deps among debug insns, because I get the
impression that they're implied by the original instruction order and
the fact that, if two debug insns become simultaneously ready, there's
nothing that would reorder them (sorting is stable).

That said, I'm pretty sure I still have some scheduling issues to sort
out.  Trying to get bootstrap-debug to pass on ppc64 and ia64 has
exposed a number of scheduling issues, but IIRC almost all of them
were in the machine-specific scheduling code, that needed adjusting to
tolerate debug insns without internal state changes.  But I may still
be missing additional tweaks to the machine-independent scheduling
code.

> Personally, I would like to see that testsuite first.  That will give
> us an operational definition to aim for, rather than a theoretical
> discussion which I find to be ambiguous.

The two examples at the end of the design document are sort of meant
as a starting point for the testsuite.  As we discuss further
interesting examples, I'll probably add them, if not to the document,
to some collection of interesting debug info testcases.

I'm not ready to spend time figuring out the precise incantations to
automate these tests yet, but contributions along these lines would
obviously be welcome.  As for myself, I need to complete the design of
the GVN-like algorithm to turn RTL debug insns into var tracking
notes, that's currently underspecified.  Once that's done, we'll be
able to start testing things more seriously, and polishing the
heuristics that are going to be needed to decide between lvalue
location or rvalue for variables, partitioning lvalues that happen to
be in the same value equivalence classes into different user
variables, this sort of stuff.  I think this will take some
experimentation to get a reasonable idea of what is right, or at least
reasonable.

> And it will avoid the problem of turning the testsuite into a
> regression testsuite rather than an accuracy testsuite.

Sorry, I don't understand what you mean here.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 21:17                                                                 ` Ian Lance Taylor
@ 2007-12-20  6:10                                                                   ` Alexandre Oliva
  2007-12-20 16:52                                                                     ` Ian Lance Taylor
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-20  6:10 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: janis187, gcc

On Dec 19, 2007, Ian Lance Taylor <iant@google.com> wrote:

> For some things, sure, but we are just talking about the values in
> user visible variables stored in registers.  There is no way we can
> make that information be correct between line notes.

Err...  I think there is, and one way to do it is with the design I've
proposed.  Do you have anything to back up your implied assertion that
the design can't accomplish this?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 21:11                                                                   ` Daniel Berlin
@ 2007-12-20  5:16                                                                     ` Alexandre Oliva
  2007-12-20 16:44                                                                       ` Ian Lance Taylor
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-20  5:16 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

>> Now, if z_5 were present in a debug expression, then it would need
>> adjusting.  No different from the adjusting need for any other
>> instruction in which z_5 was present, though.

> uh, but if you don't adjust in the fixed examples, DEBUG(x, x_4) will
> give an invalid value.

My point was that optimizers already had to know how to adjust things
such that it doesn't break code.

Now, in this optimization, it takes additional liberties with existing
variables because it sees they're only used within the sequence.
IMHO, it would be more appropriate to introduce alternate temporaries,
rather than reusing SSA names for different purposes, in this case.
If this approach was taken, the debug annotations referring to a
no-longer-defined SSA name would be recognized as invalid, and the
variable binding would be removed (i.e., turned into a "value unknown"
annotation).  Or, if we left the definitions in place, even though
they're dead, the same code that cleans up undefined SSA names could
recognize these SSA names as unused except in debug information and
substitute them for their values, maintaining accurate and complete
debug information.

But can we do better without introducing more SSA names and keeping
assignments around that are known to be dead?  Yes, with some
additional effort, see below.

> How do i know i need to change this DEBUG expression.

As reassoc looks for sets of variables it can freely mess with, it
should take note of variables that are used in debug annotations in
addition to the kind of single (?) non-debug uses it's interested in,
such that, when it modifies these variables, the annotations can be
compensated for.

OTOH, if the compiler performs reassoc on user variables today, it
means we do get mangled debug information for such variables already,
and they get incorrect values.  So, even if we didn't address this
problem right away, it wouldn't be much of a regression.

But, of course, not dealing with it breaks the goal of having correct
debug information, so it ought to be dealt with properly.

Do you happen to have a yummy testcase handy that I could use to
trigger this kind of transformation in ways that affect the value of
user variables?

Thanks in advance,

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 19:53                                                               ` Janis Johnson
@ 2007-12-19 21:17                                                                 ` Ian Lance Taylor
  2007-12-20  6:10                                                                   ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-19 21:17 UTC (permalink / raw)
  To: janis187; +Cc: Alexandre Oliva, gcc

Janis Johnson <janis187@us.ibm.com> writes:

> On Wed, 2007-12-19 at 10:00 -0800, Ian Lance Taylor wrote:
> > One way to make a principled choice is to consider the line notes we
> > are going to emit with the debugging information.  Presumably we do
> > not have the goal of emitting correct debug information in between
> > line notes--e.g., when using the "stepi" command in gdb.  Our goal is
> > to emit correct debug information at the points where a debugger would
> > naturally stop--the notes for where a line starts.
> 
> Debugging in between line notes is important for core files and
> when moving up and down the call stack, so at such locations the
> debugger needs to at least know whether debug information is
> reliable or not.

For some things, sure, but we are just talking about the values in
user visible variables stored in registers.  There is no way we can
make that information be correct between line notes.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 20:00                                                                 ` Alexandre Oliva
@ 2007-12-19 21:11                                                                   ` Daniel Berlin
  2007-12-20  5:16                                                                     ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Daniel Berlin @ 2007-12-19 21:11 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>
> > Here is the easy one:
>
> > z_5 = a_3 + b_3
> > x_4 = z_5 + c_3
>
> > DEBUG(x, x_4)
>
>
> > Reassoc may transform this into:
>
>
> > z_5 = c_3 + b_3
> > x_4 = z_5 + a_3
>
> > DEBUG(x, x_4)
>
> > Now x has the wrong value.
>
> As Andrew said, no, it doesn't.
>
Yes, I corrected it later.
You didn't address the other one, which is much harder and does
require addressing by you.


> Now, if z_5 were present in a debug expression, then it would need
> adjusting.  No different from the adjusting need for any other
> instruction in which z_5 was present, though.
uh, but if you don't adjust in the fixed examples, DEBUG(x, x_4) will
give an invalid value.

You can cause this to value to change without ever changing x_4, and
do so legally.
How do i know i need to change this DEBUG expression.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 20:00                                                                 ` Andrew MacLeod
@ 2007-12-19 20:40                                                                   ` Daniel Berlin
  0 siblings, 0 replies; 189+ messages in thread
From: Daniel Berlin @ 2007-12-19 20:40 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On 12/19/07, Andrew MacLeod <amacleod@redhat.com> wrote:
>
> > It gets worse, however
> >
> > c_3 = a_1 + b_2
> > z_5 = c_3 + d_9
> > x_4 = z_5 + e_10
> > DEBUG(x, x_4)
> > y_7 = x_4 + f_11
> > z_8 =  y_7 + g_12
> > ->
> >
> > c_3 = a_1 + b_2
> > z_5 = c_3 + g_12
> > x_4 = z_5 + e_10
> > DEBUG(x, x_4)
> > y_7 = x_4 + f_11
> > z_8 = y_7 + d_9
> >
> >
> > x_4 now no longer represents the value of x, but we haven't directly
> > changed x_4, it's immediate users, or the statements that immediately
> > make up it's defining values.
> >
> >
>
> This does seem more troublesome. Reassociation shuffles things around
> without changing the LHS presumably because it has looked at the uses
> and knows there are no uses outside the expression, so it can manipulate
> them however it wants. It elects not to create new temps since it knows
> the old ones aren't being used elsewhere, so why wast new entries.

Yes.

>
> So if it was aware of the debug stmt, there would be a use of x_4
> outside the expression, and it would no longer do the same reassociation.

Either that, or you would have to hunt all the uses of every single
thing in the chain to see if any were debug expressions, and if the
value is going to change.

>
> Is that the jist of it?
Yes

>
> Andrew
>

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 19:13                                                               ` Alexandre Oliva
@ 2007-12-19 20:11                                                                 ` Daniel Jacobowitz
  0 siblings, 0 replies; 189+ messages in thread
From: Daniel Jacobowitz @ 2007-12-19 20:11 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Wed, Dec 19, 2007 at 05:02:52PM -0200, Alexandre Oliva wrote:
> That said...  I can't find any more the equivalent of
> DW_CFA_val_expression in DW_OP_*s that could be used in location
> expressions.  I just *knew* it was there, but I guess I just imagined
> it.  This is embarrassing.

I am pretty sure such an extension has already been proposed.  Might
want to check with the committee (see dwarf.org).

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  6:18                                                             ` Daniel Berlin
  2007-12-19 16:01                                                               ` Daniel Berlin
@ 2007-12-19 20:03                                                               ` Alexandre Oliva
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-19 20:03 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
>> On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>> 
>> > Consider PRE alone,
>> 
>> > If your debug statement strategy is "move debug statements when we
>> > insert code that is equivalent"
>> 
>> Move?  Debug statements don't move, in general.  I'm not sure what you
>> have in mind, but I sense some disconnect here.

> OKay, so if you aren't going to move them, you have to erase them when
> you move statements around.

Why?  They still represent the point of binding between user variable
and value.

> How were you going to generate the initial set of debug annotations?

It's in the document: after each assignment to user variable, and at
PHI nodes for user variables.  The debug statement means the variable
holds that value from that point on until conflicting information
arises (i.e., another debug statement for the same variable, or a
control flow merge with different values for the same variable)

> How were you going to update it if you saw a statement was updated to
> say x_5 = x_4 instead of x_5 = x_3 + x_2.

No update needed, if x_5 is the value of interest.  I'm not sure
that's what you're asking, though.

> So then how will using your debug annotations and updating them come
> out any different than say performing a value numbering pass where you
> also associate user variables with the ssa names (IE alongside our
> value numbers), and propagate them around as well?

First, debug annotations may be at different points than the
corresponding SSA definitions, because the same SSA definition may be
bound to different variables at different ranges.

Second, debug annotations may contain more complex expressions than a
single SSA name, and there may not be any SSA name that represents the
value of these expressions left.  For example, given:

  x_3 = a_1 + b_2;
  # DEBUG x => x_3
  foo();

if we find that x_3 is unused elsewhere, we can drop it without
discarding debug information about the value of x at that point

  # DEBUG x => a_1 + b_2
  foo();

such that, if we stop at the call and print x, we get the expected
value, even though the actual variable was optimized away.

> At the end, you could emit DEBUG(user var, ssa name) right after each
> SSA_NAME_DEF_STMT for all user vars in the user var set for ssa name.

This doesn't work.  Consider:

  a_2 = whatever1;
  b_4 = whatever2;

  x_1 = a_2;
  probe();

  if (condition) {
    probe();
    x_3 = b_4;
    probe();
  }

  x_5 = PHI <x_1(!condition), x_3(condition)>;
  probe();

Now, if you optimize it and apply the debug stmt generation
technique you suggested, this is what you get:

  T_2 = whatever1;
  # DEBUG a => T_2
  # DEBUG x => T_2
  T_4 = whatever2;
  # DEBUG b => T_4
  # DEBUG x => T_4

  probe();

  if (condition) {
    probe();
    probe();
  }

  T_5 = PHI <T_2(!condition), T_4(condition)>
  # DEBUG x => T_5
  probe();

What do you get if you print x at each of the probe points?

> I don't see why you believe user variables/bindings are special and
> can't be propagated in this manner,

It's not that I don't believe it, it's just that just being able to
propagate them is not enough.  We must also take the binding point
into account.

Now, as I wrote to Ian last night, if we just add a binding point
annotation to this mix, then we have sufficient information:

  T_2 = whatever1;
  # DEBUG a => T_2 here
  # DEBUG x => T_2 at P1
  T_4 = whatever2;
  # DEBUG b => T_4 here
  # DEBUG x => T_4 at P2

  probe();
  # DEBUG point P1

  if (condition) {
    probe();
    # DEBUG point P2
    probe();
  }

  T_5 = PHI <T_2(!condition), T_4(condition)>
  # DEBUG x => T_5
  probe();

I still don't see how, in this notation, we'd represent something like
"at this point, the value of this user variable is unknown".  Any
ideas?

Also, this strategy works for the nice and well-behaved Tree SSA
optimization passes.  For RTL, that is far less abstract, especially
after register allocation, I don't see that we can rely on such a
simple strategy.  But, in a way, I hope I'm wrong ;-)

>> > #3 is a dataflow problem, and not something you want to do every time
>> > you insert a call.

>> I'm not sure what you mean by "inserting calls".  We don't do that.

> Sure we do.
> We will definitely insert new calls when we PRE const/pure calls, or
> calls we determine to be movable to the point we want to move them

I think of that as moving, rather than inserting.  That said, I still
don't quite see what you're getting at.  Calls don't mess with gimple
registers of their callers, ever, so it appears to me that inserting a
call in the tree level is a NOP in terms of debug information
annotations.

> I'm not sure why you believe all the calls that we end up with in the
> IR are actually in the source (or even implied by it).

Conceptually, they are, kind-a sort of :-)  Except perhaps for
profiling calls, that are meant to be fully transparent anyway.
Others are more akin to inlining, or using a call for convenience
rather than expanding a copy or something to that effect.

>> But I'm not computing that in trees.  I'm just collecting and
>> maintaining data points for var-tracking, all the way from the tree
>> level.

> Okay, then for trees,  why bother tracking it when you can compute it
> right before translation with the same accuracy you can if you update
> it every time you make statement changes?

Just because we still haven't found a reliable way to do so that
doesn't drop essential information for correct debug info.  If we do,
I'll be delighted to immediately drop the proposed debug annotations
in the tree level.  And in the RTL level as well.

>> And debug information is not just about the values, it's about
>> mapping variables to values and locations.

> You have no locations at the tree level,

?!?  Locations as in point of execution, rather than DWARF locations,
is waht I mean.

> and i've explicitly said what
> i said applies to the tree level :)

Indeed ;-)

>> So, we can't infer all the
>> information we need.

> Again, i believe we can at the tree level.

Good, let's keep on it.  How about you use something like the example
above to explain how to accomplish it?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:01                                                               ` Daniel Berlin
  2007-12-19 16:29                                                                 ` Andrew MacLeod
  2007-12-19 20:00                                                                 ` Andrew MacLeod
@ 2007-12-19 20:00                                                                 ` Alexandre Oliva
  2007-12-19 21:11                                                                   ` Daniel Berlin
  2 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-19 20:00 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> Here is the easy one:

> z_5 = a_3 + b_3
> x_4 = z_5 + c_3

> DEBUG(x, x_4)


> Reassoc may transform this into:


> z_5 = c_3 + b_3
> x_4 = z_5 + a_3

> DEBUG(x, x_4)

> Now x has the wrong value.

As Andrew said, no, it doesn't.

Now, if z_5 were present in a debug expression, then it would need
adjusting.  No different from the adjusting need for any other
instruction in which z_5 was present, though.  That's what I mean when
I talk about letting the optimizers do their job on debug instructions
too.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:01                                                               ` Daniel Berlin
  2007-12-19 16:29                                                                 ` Andrew MacLeod
@ 2007-12-19 20:00                                                                 ` Andrew MacLeod
  2007-12-19 20:40                                                                   ` Daniel Berlin
  2007-12-19 20:00                                                                 ` Alexandre Oliva
  2 siblings, 1 reply; 189+ messages in thread
From: Andrew MacLeod @ 2007-12-19 20:00 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc


> It gets worse, however
>
> c_3 = a_1 + b_2
> z_5 = c_3 + d_9
> x_4 = z_5 + e_10
> DEBUG(x, x_4)
> y_7 = x_4 + f_11
> z_8 =  y_7 + g_12
> ->
>
> c_3 = a_1 + b_2
> z_5 = c_3 + g_12
> x_4 = z_5 + e_10
> DEBUG(x, x_4)
> y_7 = x_4 + f_11
> z_8 = y_7 + d_9
>
>
> x_4 now no longer represents the value of x, but we haven't directly
> changed x_4, it's immediate users, or the statements that immediately
> make up it's defining values.
>
>   

This does seem more troublesome. Reassociation shuffles things around 
without changing the LHS presumably because it has looked at the uses 
and knows there are no uses outside the expression, so it can manipulate 
them however it wants. It elects not to create new temps since it knows 
the old ones aren't being used elsewhere, so why wast new entries.

So if it was aware of the debug stmt, there would be a use of x_4 
outside the expression, and it would no longer do the same reassociation.

Is that the jist of it?

Andrew

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 18:41                                                             ` Ian Lance Taylor
  2007-12-19 19:00                                                               ` Daniel Jacobowitz
@ 2007-12-19 19:53                                                               ` Janis Johnson
  2007-12-19 21:17                                                                 ` Ian Lance Taylor
  2007-12-20  8:00                                                               ` Alexandre Oliva
  2 siblings, 1 reply; 189+ messages in thread
From: Janis Johnson @ 2007-12-19 19:53 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, gcc

On Wed, 2007-12-19 at 10:00 -0800, Ian Lance Taylor wrote:
> One way to make a principled choice is to consider the line notes we
> are going to emit with the debugging information.  Presumably we do
> not have the goal of emitting correct debug information in between
> line notes--e.g., when using the "stepi" command in gdb.  Our goal is
> to emit correct debug information at the points where a debugger would
> naturally stop--the notes for where a line starts.

Debugging in between line notes is important for core files and
when moving up and down the call stack, so at such locations the
debugger needs to at least know whether debug information is
reliable or not.

Janis 

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:29                                                                 ` Andrew MacLeod
@ 2007-12-19 19:25                                                                   ` Daniel Berlin
  0 siblings, 0 replies; 189+ messages in thread
From: Daniel Berlin @ 2007-12-19 19:25 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On 12/19/07, Andrew MacLeod <amacleod@redhat.com> wrote:
> Daniel Berlin wrote:
> >
> > Here is the easy one:
> >
> > z_5 = a_3 + b_3
> > x_4 = z_5 + c_3
> >
> > DEBUG(x, x_4)
> >
> >
> > Reassoc may transform this into:
> >
> >
> > z_5 = c_3 + b_3
> > x_4 = z_5 + a_3
> >
> > DEBUG(x, x_4)
> >
> > Now x has the wrong value.
> >
> ??
>
> x_4 looks like it has the value 'a_3 + b_3 + c_3' in both examples to
> me, although computed in different orders...
>
> so isn't that still the right value?

Yes, sorry, you have to add one more set of adds below and move one so
you can make it have a different value

You get the general idea though :)
Reassoc knows they are all only used in each other, and that it is
okay to change their intermediate value as long as the last thing int
he chain retains its value (which it does since they are all
commutative operations)
>
> Andrew
>

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:12                                                             ` Daniel Berlin
@ 2007-12-19 19:13                                                               ` Alexandre Oliva
  2007-12-19 20:11                                                                 ` Daniel Jacobowitz
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-19 19:13 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> On 12/18/07, Alexandre Oliva <aoliva@redhat.com> wrote:

>> Dwarf enables arbitrary value expressions too.
> Well, uh, no.

> The only way to directly specify the value of a variable is for
> constants. DW_AT_const_value does not allow location descriptions.

DW_AT_const_value is irrelevant for location lists.  It's DW_OP_* that
I'm talking about.

That said...  I can't find any more the equivalent of
DW_CFA_val_expression in DW_OP_*s that could be used in location
expressions.  I just *knew* it was there, but I guess I just imagined
it.  This is embarrassing.

At this point, there are three options available:

- go back to the drawing board

- discard altogether expressions that don't represent lvalues (maybe
  don't even keep track of them)

- introduce a DWARF extension that enables value expressions to be
  used in location lists (say DW_OP_value, DW_OP_temp_location, or
  even DW_OP_self_location (*))

(*) maps value to a virtual location that, if dereferenced, evaluates
to the value.  Could be "easily" implemented through a virtual
out-of-range base address, plus the offset that represents the value
on dereference, but there are many other ways to implement this in
debug information consumers.

> I'm still curious where you think it describes value expressions for
> variables other than constants

Me too :-)  :-(

Thanks for drawing my attention to this incorrect assumption I made
about DWARF location lists.

> i'd support such an extension

Cool.  Do you happen to know the procedure to propose DWARF standard
extensions?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 18:41                                                             ` Ian Lance Taylor
@ 2007-12-19 19:00                                                               ` Daniel Jacobowitz
  2007-12-19 19:53                                                               ` Janis Johnson
  2007-12-20  8:00                                                               ` Alexandre Oliva
  2 siblings, 0 replies; 189+ messages in thread
From: Daniel Jacobowitz @ 2007-12-19 19:00 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, gcc

On Wed, Dec 19, 2007 at 10:00:38AM -0800, Ian Lance Taylor wrote:
> int f(int x, int y) {
>   int i = 0, j = 0;
> 
>   probe1();
>   i = x;
>   j = y;
>   probe2();

> Of course there are no actual instructions between the calls to
> probe1() and probe2().  If I use gdb's "finish" command out of
> probe1(), what values should I see for 'i' and 'j' at that point?
> Arguably I am now before the assignment statements, and should see '0'
> and '0', the values that 'i' and 'j' have before they are changed.  Of
> course, this is the same location as the breakpoint before probe2(),
> and we can't see both '0'/'0' and 'x'/'y'.  So it seems to me that
> this situation is actually somewhat ambiguous.  I don't see an
> obviously correct answer.

For once, I do.  As far as a debugger dares to distinguish, any
location is always the beginning of the next instruction, not the end
of the preceeding instruction.  If you want to see the zeroes, stop in
probe1 and say "up" instead of "finish".

A hypothetical -Og which placed observation points between statements
would probably need a minimum of one nop per source line.  Similarly
for observation points at sequence points.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  4:30                                                           ` Alexandre Oliva
@ 2007-12-19 18:41                                                             ` Ian Lance Taylor
  2007-12-19 19:00                                                               ` Daniel Jacobowitz
                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-19 18:41 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> You snipped (skipped?) one aspect of the reasoning on why it is
> appropriate.  Of course this doesn't prove it's the best possibility,
> but I haven't seen evidence of why it isn't.

You will find it easier to demonstrate the worth of your proposal if
you act publically as though your interlocutors are people of good
will, even when it doesn't seem that way to you, and omit
interjections like "(skipped?)".  Assuming the goal is to get this
into mainline gcc, you have to convince us, not the other way around.
The first step in convincing people in this forum is not to irritate
them.

> Now, if you tell me that information about i_0 and j_2 is
> backward-propagated to the top of the function, where x and y are set
> up, I introduce say zero-initialization for i and j before probe1()
> (an actual function call, mind you), and then this representation is
> provably broken.

To be sure we are on the same page, I think your argument here is that
with this code:

int f(int x, int y) {
  int i = 0, j = 0;

  probe1();
  i = x;
  j = y;
  probe2();
  if (x < y)
    i += y;
  else
    j -= x;
  probe3();
  return g (i ,j);
}

if I set a breakpoint just before the call to probe2(), and I print
the values of 'i' and 'j', I should get the values of 'x' and 'y'.
That is, you want to emit a DWARF variable note at that point that the
value of 'i' can be found in the location corresponding to 'x'.

Of course there are no actual instructions between the calls to
probe1() and probe2().  If I use gdb's "finish" command out of
probe1(), what values should I see for 'i' and 'j' at that point?
Arguably I am now before the assignment statements, and should see '0'
and '0', the values that 'i' and 'j' have before they are changed.  Of
course, this is the same location as the breakpoint before probe2(),
and we can't see both '0'/'0' and 'x'/'y'.  So it seems to me that
this situation is actually somewhat ambiguous.  I don't see an
obviously correct answer.

Setting that aside, seeing the values 'x' and 'y' would probably be
more useful in practice, even if the other possibility is not wrong.
I think the general issue you are describing is how to handle an
assignment which appears in user code but which has been eliminated
during optimization.

You are certainly correct: the scheme I was outlining does not address
deleted assignments.

It seems to me that such eliminated assignments are inherently
ambiguous.  If the assignment is gone, then there is a point in the
generated code where the variable logically has both the old and the
new values.  I assume that the debugger can only display one value.
Which one should it be?

Your representation clearly makes a choice.  What makes it a
principled choice?  Consider a series of assignments to a local
variable, and suppose that all the assignments are deleted becaues
they are unused.  Are there dependencies between the DEBUG notes which
keep them in the right order?

One way to make a principled choice is to consider the line notes we
are going to emit with the debugging information.  Presumably we do
not have the goal of emitting correct debug information in between
line notes--e.g., when using the "stepi" command in gdb.  Our goal is
to emit correct debug information at the points where a debugger would
naturally stop--the notes for where a line starts.

I wonder whether it would be feasible for the debug info generation to
work from the assignments in the source code as generated by the
frontend.  For each assignment, we would find the corresponding line
note.  Then we would look at the right hand side, and try to identify
where that value could be found at that point in the program.  This
would be a variant of our current variable tracking pass.  I haven't
thought about this enough to know whether it would really work.

> > It is of course true that optimized code will move around
> > unpredictably, and your proposal doesn't handle that.
> 
> It handles that in that a variable will be regarded as being assigned
> to a value when execution crosses the debug stmt/insn originally
> inserted right after the assignment.  This is by design, but I realize
> now I forgot to mention this in the design document.
> 
> The idea is that, debug insns get high priority in scheduling.
> However, since they mention the assignment just before them, if the
> assignment is just moved earlier, without an intervening scheduling
> barrier, then the debug instruction will follow it.  If the assignment
> is removed, then the debug insn can be legitimately be move up to the
> point where the assignment, if remaining, might have been moved up to.
> However, if the assignment is moved to a separate basic block, say out
> of a loop or a conditional, then we don't want the debug insn to move
> with it: such that hoisting and commonizing are regarded as setting
> temporaries, and the value is only "committed" to the variable if we
> get to the point where the assignment would take place.

That will only work correctly if sched-deps.c introduces dependencies
between debug insns and real insns.  Otherwise, debug insns will move
ahead of real insns which change their values.  If you introduce those
dependencies, I don't understand how you will avoid changing the
schedulers behaviour in the presence of debug insns.  How did you work
around that problem?

> >> Testing for accuracy and completeness of debug information can be best
> >> accomplished using a debugging environment.
> 
> > Of course this is very unsatisfactory without an automated testsuite.
> 
> Err...  I didn't say the testing through a debugging environment
> wouldn't be automated.  My plan is to use something along the lines of
> the GDB testsuite scripts, but whether to use GDB or some other
> debugging or monitoring infrastructure is a tiny implementation detail
> that I haven't worried about at all.  The basic idea is to script the
> inspection of variables and verify that the obtained values are the
> expected ones, or that variables are defensibly unavailable at the
> inspection points.

Personally, I would like to see that testsuite first.  That will give
us an operational definition to aim for, rather than a theoretical
discussion which I find to be ambiguous.  And it will avoid the
problem of turning the testsuite into a regression testsuite rather
than an accuracy testsuite.  But of course I'm not doing the work.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:01                                                               ` Daniel Berlin
@ 2007-12-19 16:29                                                                 ` Andrew MacLeod
  2007-12-19 19:25                                                                   ` Daniel Berlin
  2007-12-19 20:00                                                                 ` Andrew MacLeod
  2007-12-19 20:00                                                                 ` Alexandre Oliva
  2 siblings, 1 reply; 189+ messages in thread
From: Andrew MacLeod @ 2007-12-19 16:29 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Daniel Berlin wrote:
>
> Here is the easy one:
>
> z_5 = a_3 + b_3
> x_4 = z_5 + c_3
>
> DEBUG(x, x_4)
>
>
> Reassoc may transform this into:
>
>
> z_5 = c_3 + b_3
> x_4 = z_5 + a_3
>
> DEBUG(x, x_4)
>
> Now x has the wrong value.
>   
??

x_4 looks like it has the value 'a_3 + b_3 + c_3' in both examples to 
me, although computed in different orders...

so isn't that still the right value?

Andrew

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  4:35                                                           ` Alexandre Oliva
@ 2007-12-19 16:12                                                             ` Daniel Berlin
  2007-12-19 19:13                                                               ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Daniel Berlin @ 2007-12-19 16:12 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/18/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>
> >> int c = z;
> >> whatever0(c);
> >> c = x;
>
> > Because you have added information you have no way of knowing.
> > How exactly did you compute that the call *definitely sets c to the
> > value of z_0*, and definitely sets the value of c to x_2.
>
> Err...  I guess you're thinking memory, global variables, alias
> analysis and that sort of stuff.
>

Yes, i mixed your examples up, i apologize.

> None of this applies to gimple registers, which is all the annotations
> are about.
>
>
> > However, value equivalene does not imply location equivalence, and all
> > of our debug formats deal with locations of variables, except for
> > constants.
>
> Dwarf enables arbitrary value expressions too.
Well, uh, no.

The only way to directly specify the value of a variable is for
constants. DW_AT_const_value does not allow location descriptions.

"An entry describing a variable or formal parameter whose value is
constant and not
represented by an object in the address space of the program, or an
entry describing a named
constant, does not have a location attribute. Such entries have a
DW_AT_const_value
attribute, whose value may be a  string or any of the constant data or
data block forms, as
appropriate for the representation of the variable's value. The value
of this attribute is the
actual constant value of the variable, represented as it would be on
the target architecture.
"

There are no other provisions in DWARF for describing the value of a
variable, it is expected you describe their locations using
DW_AT_location (which gives you the full power of location
descriptions, but requires you to return a location, not a value)
> There's some
> discussion about lvalue vs rvalue in the document, and this is also
> something that will take some experimenting.  I'm not entirely sure
> where to draw the line, and I'm not entirely sure there is a perfect
> answer.
I'm still curious where you think it describes value expressions for
variables other than constants (which again, can't use the location
description language)

Again, i'd support such an extension, but it does not currently exist.
Rest answers in other message.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  6:18                                                             ` Daniel Berlin
@ 2007-12-19 16:01                                                               ` Daniel Berlin
  2007-12-19 16:29                                                                 ` Andrew MacLeod
                                                                                   ` (2 more replies)
  2007-12-19 20:03                                                               ` Alexandre Oliva
  1 sibling, 3 replies; 189+ messages in thread
From: Daniel Berlin @ 2007-12-19 16:01 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/19/07, Daniel Berlin <dberlin@dberlin.org> wrote:
> On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> > On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
> >
> > > Consider PRE alone,
> >
> > > If your debug statement strategy is "move debug statements when we
> > > insert code that is equivalent"
> >
> > Move?  Debug statements don't move, in general.  I'm not sure what you
> > have in mind, but I sense some disconnect here.
>
> OKay, so if you aren't going to move them, you have to erase them when
> you move statements around.
>

Besides this, how do you plan on handling the following situations
(both of which reassoc performs *right now*).  These are the
relatively easy ones

Here is the easy one:

z_5 = a_3 + b_3
x_4 = z_5 + c_3

DEBUG(x, x_4)


Reassoc may transform this into:


z_5 = c_3 + b_3
x_4 = z_5 + a_3

DEBUG(x, x_4)

Now x has the wrong value.

At least in this case, you can tell which DEBUG statement to eliminate
easily (it is an immediate use of x_4)

It gets worse, however

c_3 = a_1 + b_2
z_5 = c_3 + d_9
x_4 = z_5 + e_10
DEBUG(x, x_4)
y_7 = x_4 + f_11
z_8 =  y_7 + g_12
->

c_3 = a_1 + b_2
z_5 = c_3 + g_12
x_4 = z_5 + e_10
DEBUG(x, x_4)
y_7 = x_4 + f_11
z_8 = y_7 + d_9


x_4 now no longer represents the value of x, but we haven't directly
changed x_4, it's immediate users, or the statements that immediately
make up it's defining values.

How do you propose we figure out which DEBUG statements we may have
affected without doing all kinds of walks?

(This is of course, a more general problem of how do i find which
debug statements are reached by my transformation without doing linear
walks)

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  6:07                                                           ` Alexandre Oliva
@ 2007-12-19  6:18                                                             ` Daniel Berlin
  2007-12-19 16:01                                                               ` Daniel Berlin
  2007-12-19 20:03                                                               ` Alexandre Oliva
  0 siblings, 2 replies; 189+ messages in thread
From: Daniel Berlin @ 2007-12-19  6:18 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>
> > Consider PRE alone,
>
> > If your debug statement strategy is "move debug statements when we
> > insert code that is equivalent"
>
> Move?  Debug statements don't move, in general.  I'm not sure what you
> have in mind, but I sense some disconnect here.

OKay, so if you aren't going to move them, you have to erase them when
you move statements around.

>
> > because our equivalence is based on value equivalence, not location
> > equivalence.  We only guarantee it has the same value as the
> > whatever it is a copy of at that point, not that it has the same
> > location.

This  is just a problem with an initial state and some propagation at
each statement.
How were you going to generate the initial set of debug annotations?
This is how you get your initial state for your dataflow problem
How were you going to update it if you saw a statement was updated to
say x_5 = x_4 instead of x_5 = x_3 + x_2.
The same operation you perform to update your annotations when you see
 x_5 = x_4 works whether you started with x_5 = x_3 + x_2 or not (it
better, or else your updating will give different results for the same
IR depending on how you got there, which is *incredibly* bad).

So then how will using your debug annotations and updating them come
out any different than say performing a value numbering pass where you
also associate user variables with the ssa names (IE alongside our
value numbers), and propagate them around as well?

If you want to associate multiple user variables with a single SSA
definition point, you can do that as well (use union instead of copy).
You can do whatever you think is best at phi nodes (empty set if user
var sets are not equal, or union them or intersect them).

At the end, you could emit DEBUG(user var, ssa name) right after each
SSA_NAME_DEF_STMT for all user vars in the user var set for ssa name.

The right DEBUG statements would then appear at the points you can
guarantee the user variable has the same *value* as the gimple
register you've said it does.
From there, it is up to you to do what you like with the result.

(it's late, so i may have described/ calculated the dataflow problem
backwards, but you get the idea)

This is, after all, more or less what PRE does for it's value
numbering. It computes which things have the same value at what points
in the program, then uses this after computing some more dataflow
problems that say where this implies reuse.

I don't see why you believe user variables/bindings are special and
can't be propagated in this manner, given that you can't depend on the
type of statement change that has occurred, only what the IR looks
like after the statement change.  Otherwise, again, the same IR and
source may have different debug annotations depending on the set of
changes you applied to get that IR from the initial IR, which is not
good the standard reasons [maintainability, determinism,
reproducibility, etc].
>
> > #3 is a dataflow problem, and not something you want to do every time
> > you insert a call.
>
> I'm not sure what you mean by "inserting calls".  We don't do that.

Sure we do.
We will definitely insert new calls when we PRE const/pure calls, or
calls we determine to be movable to the point we want to move them
(using call clobbered results, etc).
This will insert calls in latch blocks, above loops, in branch conditions
This is not just movement.
It is insertion of calls that did not exist in the source code at a
given point, but are allowed to be executed at that point in the
source code anyway.

> Calls are present in the source code (even when implied by stuff like
> TLS, OpenMP or builtins such as memcpy), and they're either kept
> around, eliminated or inlined.
No, we can and will insert new calls.
Not just for PRE, but for profiling, devirtualization, struct reorg, SRA, etc
struct reorg inserts new mallocs and frees
profiling inserts profiling calls
devirt will insert branches and new calls to replace virtual function calls
SRA will insert memcpys to and from structures that were not there in
user source before.
i could go on if you like.
I'm not sure why you believe all the calls that we end up with in the
IR are actually in the source (or even implied by it).

>
> But I'm not computing that in trees.  I'm just collecting and
> maintaining data points for var-tracking, all the way from the tree
> level.
Okay, then for trees,  why bother tracking it when you can compute it
right before translation with the same accuracy you can if you update
it every time you make statement changes?
>
> > All you have done is annotated the IR in some places to make explicit
> > some bits in the dataflow problem that you could inference anyway.
>
> Now, this is not true.  I could infer values, yes, but I couldn't
> infer the variables they relate to, nor the point of binding

See above.

>  And
> debug information is not just about the values, it's about mapping
> variables to values and locations.

You have no locations at the tree level, and i've explicitly said what
i said applies to the tree level :)
> So, we can't infer all the
> information we need.

Again, i believe we can at the tree level.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 23:19                                                         ` Daniel Berlin
@ 2007-12-19  6:07                                                           ` Alexandre Oliva
  2007-12-19  6:18                                                             ` Daniel Berlin
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-19  6:07 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> Consider PRE alone,

> If your debug statement strategy is "move debug statements when we
> insert code that is equivalent"

Move?  Debug statements don't move, in general.  I'm not sure what you
have in mind, but I sense some disconnect here.

> because our equivalence is based on value equivalence, not location
> equivalence.  We only guarantee it has the same value as the
> whatever it is a copy of at that point, not that it has the same
> location.

This sounds perfect to me.  I'm concerned about values.  Locations are
an implementation detail.  The thing to keep in mind is that what was
originally a single user variable may end up mangloptimized into
multiple stack slots, registers, with multiple simultaneously-live
versions.  Trying to pretend that any of these represent the user
variable sounds like a recipe for madness to me.  So I focus on values
instead, and then on trying to recover locations based on binding and
sharing of values.

> How do i say debug info for some variable is now dead, we have no idea
> what it is right now?

For annotations, look for VAR_DEBUG_VALUE_NOVALUE in tree.h and
VAR_LOC_UNKNOWN_P in rtl.h, in the VTA branch.

For dwarf location lists, you just refrain from emitting locations for
a given range.

> How do I figure out which debug statements need to be modified when
> you introduce new memory operations?

None.  By definition, debug annotations are only about variables that
are not addressable.  Those that are are fixed at a single location,
so there's no reason to track them in a fancy way.

> If i insert a new call
> DEBUG(x, x_3): 1
> x_3 = x

> foo() // May modify x and *&x)

> y = x_3

> Now you have two problems.

You're talking about a real problem, but your example is misguided.
Let me give you a real problem scenario.

(set (reg <T>) (<whatever>))
(var_location x (reg <T>))
(set (mem <addr>) (reg <T>))
(set (reg <T>) (<somethingelse>))
(call (mem (symbol_ref foo)))

So, at the var_location debug_insn, we know that x is in reg <T>.
That's stored at *addr, so now we might be able to use it as an
additional location for x.  And then, when reg is modified, we remove
T from the equivalence class, and then only location holding the value
of x is *addr.  Then, a function call, that might modify *addr.

So, do we decide that x is no longer available after the call, or do
we hope *addr still represents it?

The thing to remember is that the annotations are only about gimple
regs.  This means calls don't modify them, ever.  But we still have to
decide whether *addr represents x or not.

My thoughts are leaning towards looking at the memory address or other
memory attributes to tell whether it's an addressable stack slot or
not.  If it's addressable, remove it from the equivalence class at the
call, so the equivalence class becomes empty, and the variable is
regarded as dead.  If it's not addressable (a pseudo assigned to
memory), then we can keep it, even if x is actually dead past the
call.

What we'll see is that, if x is not dead after the call, the compiler
will arrange to preserve its value in one such local non-addressable
stack slot, and it will probably extend the equivalence class again
after the call, as the pseudo is restored.  Or the pseudo will be
temporarily assigned to a call-saved register, which, for being
call-saved, won't be removed from equivalence classes at call
instructions.  Whereas, if x is dead and its value was just copied to
some random memory location, then we may as well flag it as dead at
the call site, where the memory location may be modified.

So, it all works out nicely, because we know we're only dealing with
gimple regs.

volatile asms make this slightly trickier, because they're totally
unpredictable.  I'm thinking it's safe to simply remove addressable
memory locations from equivalence classes at them, just for safety,
but I don't have it completely figured out.

> #3 is a dataflow problem, and not something you want to do every time
> you insert a call.

I'm not sure what you mean by "inserting calls".  We don't do that.
Calls are present in the source code (even when implied by stuff like
TLS, OpenMP or builtins such as memcpy), and they're either kept
around, eliminated or inlined.

(disgression intended to be funny: this "inserting a call" discussion
reminds me of those impossible initial conditions in electromagnetism
textbook exercises, such as uniform magnetic fields in which charged
particle suddenly appear ;-)

> If your answer is #1 or #2, then what you are really doing is
> computing roughly the same dataflow problem var-location does, except
> on trees and with a different meet-operation.

I am actually computing the same dataflow problem of var-tracking.
That's the whole point.  But I'm giving it more information, to enable
it to track more variables.  And it needs to deal with multiple
concurrent locations for the same variable, and multiple variables in
the same locations, which are "slight" complications.  But you're
right, in the end it's the same problem.

But I'm not computing that in trees.  I'm just collecting and
maintaining data points for var-tracking, all the way from the tree
level.

> var-location generates incorrect info not because it represents
> something fundamentally different than you are (it doesn't), it falls
> down because it uses union as the meet operation.

> It says "oh, i don't know which of these locations is right, it must
> be both of them".

However, it can't deal with parallel locations, so this is at odds
with your statement.  I haven't got 'round to studying the exact
dataflow algorithm var-tracking uses, I just figured I needed to do
something along these lines.  Maybe it does need tweaking, if I end up
using it.  I'm not sure yet it's going to make sense to use it for the
more detailed tracking of copying that I'm going to have to do.

> If you changed the meet operation to "oh, i don't know which of these
> locations is right, it must be none of them", and did a little more
> work you would inference the same info as yours *at the tree level*

Intersection sounds like the right approach to me.  I assumed
var-tracking did this, except for unknowns.  It's a bit trickier than
this because var-tracking has to deal with a lot of incomplete
information.  But at least for vta values, we are going to have a
complete picture, so we can be stricter when it comes to gimple reg
variables.

Now, whether the fact that we could infer the very same values at the
tree level is relevant, I don't know.  The tree level is neither
source level nor the final executable code, so unless we can establish
useful mappings from the tree level to both source level and final
executable code, this information is of little use, no matter how true
it is.

> Nothing you have proposed is fundamentally going to give you better info.

Except for what tree transformations currently discard, such as the
points of the program in which variables are bound to values.  This is
indeed the one of the elements that the annotations are trying to
preserve, that the compiler has not cared about preserving.  (The
other being expressions that end up not computed at run time, but that
could still be computed by a debugger based on state available
elsewhere)

> All you have done is annotated the IR in some places to make explicit
> some bits in the dataflow problem that you could inference anyway.

Now, this is not true.  I could infer values, yes, but I couldn't
infer the variables they relate to, nor the point of binding.  And
debug information is not just about the values, it's about mapping
variables to values and locations.  So, we can't infer all the
information we need.

> There is absolutely no reason what you are trying to do needs to
> modify the tree IR at all to achieve exactly the same accuracy of
> debug info as your design proposes at the tree level.

So far these claims have been unconvincing.  I still get the feeling
that you're missing some aspects of the problem, but I invite you to
show me how the information available in the current IR could be used
to generate accurate debug information for the two examples in the
design document.  Even if we leave the RTL aspect of it aside for a
moment.  I certainly wouldn't mind having to generate annotations only
when we move from Trees to RTL, but I can't imagine how we'd
reintroduce bindings at points that are not marked in the tree level,
for variables that are (partially or entirely) gone from the tree IR.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 23:31                                                         ` Daniel Berlin
@ 2007-12-19  4:35                                                           ` Alexandre Oliva
  2007-12-19 16:12                                                             ` Daniel Berlin
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-19  4:35 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

>> int c = z;
>> whatever0(c);
>> c = x;

> Because you have added information you have no way of knowing.
> How exactly did you compute that the call *definitely sets c to the
> value of z_0*, and definitely sets the value of c to x_2.

Err...  I guess you're thinking memory, global variables, alias
analysis and that sort of stuff.

None of this applies to gimple registers, which is all the annotations
are about.

Yes, aliasing, memory references and must- and may-alias do play a
role at the time of turning the annotations into equivalence classes,
when memory locations that are not stack slots allocated to gimple
regs that couldn't get hardware registers show up in the equivalence
classes.  These don't seem too hard to handle conservatively (removing
even may-alias assignment destinations from equivalence classes, as
well as non-local memory references at function calls and volatile
asms), at the expense of incompleteness in debug information, or in a
more lax way, at the potential expense of correctness.  I still don't
know exactly where to draw the line here, this note-propagation
algorithm is one that I haven't completely figured out yet.

> However, value equivalene does not imply location equivalence, and all
> of our debug formats deal with locations of variables, except for
> constants.

Dwarf enables arbitrary value expressions too.  There's some
discussion about lvalue vs rvalue in the document, and this is also
something that will take some experimenting.  I'm not entirely sure
where to draw the line, and I'm not entirely sure there is a perfect
answer.

For example, consider that a variable's home is a stack slot, but for
a loop in which it's not modified, it's held in a register.  Clearly
in this case the correct representation is for the variable to be in
both locations, both as lvalues.

But if the variable is further copied to other variables or locations,
these additoinal locations probably shouldn't be regarded as the same
variable any more; at most, as rvalues, but maybe not even that.

And then, if for some particular instruction, the variable in the
register needs to be copied to a different register class, then it is
correct to state that, between the copy and the use, the variable is
held in all three locations.

I'm still trying to figure out how to deal with overlaps between
variables, deciding whether locations are to be handled as lvalues or
rvalues, this sort of stuff.  It is indeed a difficult problem.

> IE If you translate this directly into DWARF3, as written, you will
> claim that c and x_4 has the same location (since dwarf does not let
> you say "it has the same value as x, but not the same location),

Yeah.  The $1M question is, when two variables are coalesced into one,
does this mean we now have two variables sharing the same location, or
do we just use the rvalue of one (which?) for the other?  Isn't this
like talking about body and spirit of variables?  After optimization,
I'm not even sure that talking about location (body) of variables make
much sense.

An important part of the design process was to distinguish between
source-level variables and implementation-level variables.  Our naming
of stack slots or pseudos as variables is just a mnemonic artifact for
us compiler engineers, to simplify debugging.  Which variables they
actually represent depends a lot on optimization decisions, perhaps
even more than on the original code.

So I talk about binding a source-level variable to a value, rather
than to a location.  Then, we figure out the locations that hold the
value, what other variables do, how they overlap, maybe how they're
used, and then figure out which locations should be assigned to each
source variable.  Tricky.

The only certainty I have right now is that the annotations I've
proposed enable us to keep track of values.  Distributing locations in
equivalence classes to different user variables is an open problem,
and there are various possible solutions that could make sense, and
that would be arguably correct.

> if all you want is the values you compute above, on SSA, you can
> easily use a lattice to compute the same values you are going to
> compute as you update the annotations on the fly.

This sounds interesting, but I don't quite follow what you mean.  Can
you elaborate, maybe give some examples?

> Tracking which values *definitely represent user values* is actually
> quite easy at the tree level, and doesn't require any IR modification.

But is the binding of user variables to user values for specified
ranges part of this representation too?  I don't see that it is, and
this is the gap I'm trying to fill with the debug annotations.

> It may be worth doing at the RTL level, however, where the solution
> requires making up program points at each definition site and
> computing the dataflow problem in terms of them.

/me mumbles something about RTL-SSA, that Jeff Law started working on
before we took this turn into Tree-SSA.  I'm sort of having to
introduce some limited form of SSA in RTL to infer global equivalence
classes out of the annotations, in the RTL var-tracking pass.  Fun...
If only we had sticked to a single IR...  (No personal preference, I
like both, but I'd rather not have to duplicate work so as to deal
with both)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:22                                                         ` Ian Lance Taylor
  2007-12-18 16:28                                                           ` Robert Dewar
@ 2007-12-19  4:30                                                           ` Alexandre Oliva
  2007-12-19 18:41                                                             ` Ian Lance Taylor
  2007-12-31 16:55                                                           ` Richard Guenther
  2 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-19  4:30 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On Dec 18, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:
>> A plan to fix local variable debug information in GCC
>> 
>> by Alexandre Oliva <aoliva@redhat.com>
>> 
>> 2007-12-18 draft

> Thank you for writing this.  It makes an enormous difference.

NP.  Thanks for the encouragement.

>> == Goals

> I note that you don't say anything about the other big problem with
> debugging optimized code, which is that the debugger jumps around all
> over the place.

Yep, it's a separate project, that I'm somewhat interested in, and
maybe somewhat easy to fix with judicious use of is_stmt notes, but
it's not my top priority ATM.

>> Once this is established, a possible representation becomes almost
>> obvious: statements (in trees) or instructions (in rtl) that assert,
>> to the variable tracker, that a user variable or member is represented
>> by a given expression:
>> 
>> # DEBUG var expr
>> 
>> By var, we mean a tree expression that denotes a user variable, for
>> now.  We envision trivially extending it to support components of
>> variables in the future.

> While you say that this is almost obvious, it still isn't obvious at
> all to me.  You consider trees and RTL together, but I don't see why
> that is appropriate.

You snipped (skipped?) one aspect of the reasoning on why it is
appropriate.  Of course this doesn't prove it's the best possibility,
but I haven't seen evidence of why it isn't.

> My biggest concern at the tree level is the significantly increased
> memory usage

One of the first measurements we had from my code was from Richi, who
said it didn't increase it too much.

> and the introduction of a sort of a weak pointer to
> values.  Since DEBUG statements shouldn't interfere with
> optimizations, we need to explicitly ignore them in things like
> has_single_use.

That's probably the easiest part, and it's already done.

> But since our data structures need to be coherent, we can not ignore
> them when we actually eliminate SSA names.  That seems sort of
> complicated.

It's not.  The code to do this is ready.  After I got bootstrap-debug
to pass on x86_64-linux-gnu, I don't recall needing any further
changes in the tree passes for i386-linux-gnu, and none of the
ia64-linux-gnu or ppc64-linux-gnu fixes I've made so far (most to
their machine-dependent schedulers) required changes in the tree
passes either.  So, we can safely count that as easy and maintainable.
Looking at the patches in the vta branch for the tree infrastructure
will give you a very good idea of the involved effort.

> In SSA form it seems very natural to provide a set of associations
> with user variables for each GIMPLE variable.

Yes.  This provides for a simple AND WRONG representation (but not
hopeless, see below, after the sample code).

We went through some of this already.  You can't recover the
information with something that throws away information about the
point of assignment.  Even the basic block of assignment is lost.  You
can't generate correct debug information with this.

The limitation of approaches like this is addressed in passing in the
examples, but I didn't want to carry discussions about broken designs
that I thought we'd already left behind into the concise design
document.

> Since the GIMPLE variables never change, these associations never
> change.  We have to get them right when we create a new GIMPLE
> variable and when we eliminate a GIMPLE variable.

Maybe you can show us how to represent the annotations for the two
trivial examples I've chosen in the paper, to show that the compiler
can stand a chance of generating correct debug information.

> Of course this means that we are keeping the debug information in a
> reversed form.

This is not such a big deal; it would just lose some in completeness,
and it would probably carry around lots of useless notes.  The real
problem is that it loses essential information for correct debug
information generation.

> Instead of saying that a user variable is associated with an
> expression in terms of GIMPLE variables, we will say that a GIMPLE
> variable is associated with an expression in terms of user
> variables.

Let me see if I understand what you have in mind.  Given:

int f(int x, int y) {
  int i, j;

  probe1();
  i = x;
  j = y;
  probe2();
  if (x < y)
    i += y;
  else
    j -= x;
  probe3();
  return g (i ,j);
}

we'd SSAify it into something like:

int f(int x, int y) {
  int i;
  int j;
  int T;

  probe1();
  i_0 = x_1(D); /* i */
  j_2 = y_3(D); /* j */
  probe2();
  if (x_1(D) < y_3(D))

    i_4 = i_0 + y_3(D); /* i */

  else
    j_5 = j_2 - x_1(D); /* j */

  i_6 = PHI <i_4(bb_then), i_0(bb_else)> /* i */
  j_7 = PHI <j_2(bb_then), j_5(bb_else)> /* j */
  probe3();
  T_8 = g (i_6, j_7);
  return T_8;
}

And I can see that setting breakpoints at the probe points would get
you correct values for i and j.  In fact, these annotations, so far,
are no different from what we already have today.

But then, if we optimize this just a little bit, I can't quite tell
what we'd get to enable correct debug information:

int f(int x, int y) {
  int i;
  int j;
  int T;

  probe1();
  /* p1: ??? i, j */
  probe2();
  if (x_1(D) < y_3(D))

    i_4 = x_1(D) + y_3(D); /* i */

  else
    j_5 = y_3(D) - x_1(D); /* j */

  i_6 = PHI <i_4(bb_then), x_1(D)(bb_else)> /* i */
  j_7 = PHI <y_3(D)(bb_then), j_5(bb_else)> /* j */
  probe3();
  T_8 = g (i_6, j_7);
  return T_7;
}

Now, if you tell me that information about i_0 and j_2 is
backward-propagated to the top of the function, where x and y are set
up, I introduce say zero-initialization for i and j before probe1()
(an actual function call, mind you), and then this representation is
provably broken.

And, if you tell me that you just discard that information, then at
probe2() the variables will appear to be uninitialized (or
zero-initialized after the change), and again the representation is
wrong.

If you tell me that you keep notes at those points to tell debug
information that at probe2() both variables have unknown values, then
you may get correct debug information, but you're willfully making it
incomplete for an extremely common scenario (this example is
intentionally made similar to a scenario after one pass of inlining
into f, where i and j were former arguments to the inlined function).

If you tell me that you keep notes at that point that indicate the
expected values of i and j, then you've reached the representation I
propose.

If you tell me you keep different notes between probe1() and probe2(),
that just tell the point at which i and j receive the values of x and
y, but the annotations are still attached to the SSA assignment, then
this stands a chance of generating correct debug information.
Something like:

  x_1(D) /* x starting at entry point, and also i starting at p1 */
  y_3(D) /* y starting at entry point, and also j starting at p1 */

Maybe these annotations interspersed in the code might be easier to
handle.  I hadn't considered this before.  It's worth investigating.

But I still haven't got your proposal entirely clear.  I don't quite
see how this would handle transformations other than trivial
substitutions.

Can you perhaps give examples of how you'd get from trivial
annotations to more complex, potentially ambiguous expressions, as
optimization passes make complex transformations?  Maybe what you have
in mind is something along the lines of induction variables, that loop
optimizers would have to annotate explicitly, is that so?

> It is of course true that optimized code will move around
> unpredictably, and your proposal doesn't handle that.

It handles that in that a variable will be regarded as being assigned
to a value when execution crosses the debug stmt/insn originally
inserted right after the assignment.  This is by design, but I realize
now I forgot to mention this in the design document.

The idea is that, debug insns get high priority in scheduling.
However, since they mention the assignment just before them, if the
assignment is just moved earlier, without an intervening scheduling
barrier, then the debug instruction will follow it.  If the assignment
is removed, then the debug insn can be legitimately be move up to the
point where the assignment, if remaining, might have been moved up to.
However, if the assignment is moved to a separate basic block, say out
of a loop or a conditional, then we don't want the debug insn to move
with it: such that hoisting and commonizing are regarded as setting
temporaries, and the value is only "committed" to the variable if we
get to the point where the assignment would take place.

Neat, eh?

I'll add something to this effect to the design document.

> I don't see it as a flaw that it will be possible to view user
> variables outside of their source code range.

Agreed.  Extending the range of a (variable value) binding to a point
in which the variable wouldn't exist (yet or any more) without
optimization is fine, but extending the range of such a binding across
an assignment, even an optimized-away one, isn't.

> It's not obvious to me why a DEBUG insn is superior to a REG_NOTE
> attacked to an insn.

Mainly because we won't want to always move the note along with the
insn.  A REG_NOTE isn't unambiguous for parallel sets, but there are
ways around that.

As written in the document, combining the debug annotation with an
assignment is doable and not discarded from the plan, but at some
point the note may need to be detached, and then it's not clear to me
that the potential memory savings of this combination are worth the
additional maintenance burden of splitting them out on demand, which
is my greatest concern.

On top of that, after splitting, all the maintenance burden (no matter
how small) of dealing with stand-alone debug annotations would have to
be undertaken anyway, so it appears to me that the combination would
just add complexity.  But then again, I'm not sure about it, so I
haven't ruled it out; the design is open to it.

> The problem with DEBUG insns is of course that the RTL code
> is very sensitive to new insns, and also the additional memory usage.
> You discuss those, but it's not obvious to me why your proposed
> solution is the best one.

I can't assert it's the best, no matter how hard I've worked on this
design.  I've presented my thoughts (or at least as many of them as I
could remember; I may have forgotten some along the way ;-), and I've
shown why other designs presented before didn't solve the problem I
had to solve, as far as I could tell.

Your annotations along with the point-marking notes are an approach I
hadn't considered before, and I'm pretty sure I don't quite follow how
this would work to the fullest extect, but on first sight it appears
to me that it might work.  So let's look further into it.

>> Testing for accuracy and completeness of debug information can be best
>> accomplished using a debugging environment.

> Of course this is very unsatisfactory without an automated testsuite.

Err...  I didn't say the testing through a debugging environment
wouldn't be automated.  My plan is to use something along the lines of
the GDB testsuite scripts, but whether to use GDB or some other
debugging or monitoring infrastructure is a tiny implementation detail
that I haven't worried about at all.  The basic idea is to script the
inspection of variables and verify that the obtained values are the
expected ones, or that variables are defensibly unavailable at the
inspection points.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  8:39                                                       ` Alexandre Oliva
                                                                           ` (2 preceding siblings ...)
  2007-12-18 23:19                                                         ` Daniel Berlin
@ 2007-12-18 23:31                                                         ` Daniel Berlin
  2007-12-19  4:35                                                           ` Alexandre Oliva
  3 siblings, 1 reply; 189+ messages in thread
From: Daniel Berlin @ 2007-12-18 23:31 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

>
> It is desirable to be able to represent constants and other
> optimized-away values, rather than stating variables have values they
> can no longer have:
>
> int
> x1 (int x)
> {
>   int i;
>
>   i = 2;
>   f(i);
>   i = x;
>   h();
>   i = 7;
>   g(i);
> }
>
> Even if variable i is completely optimized away, a debugger can still
> print the correct values for i if we keep annotations such as:

>
>   (debug (var_location i (const_int 2)))
>   (set (reg arg0) (const_int 2))
>   (call (mem (symbol_ref f)))
>   (debug (var_location i unknown))
>   (call (mem (symbol_ref h)))
>   (debug (var_location i (const_int 7)))
>   (set (reg arg0) (const_int 7))
>   (call (mem (symbol_ref g)))
>
> In this case, before the call to h, not only the assignment to i was
> dead, but also the value of the incoming argument x had already been
> clobbered.  If i had been assigned to another constant instead, debug
> information could easily represent this.
>
> Another example that covers PHI nodes and conditionals:
>
> int
> x2 (int x, int y, int z)
> {
>   int c = z;
>   whatever0(c);
>   c = x;
>   whatever1();
>   if (some_condition)
>     {
>       whatever2();
>       c = y;
>       whatever3();
>     }
>   whatever4(c);
> }
>
> With SSA infrastructure, this program can be optimized to:
>
> int
> x2 (int x, int y, int z)
> {
>   int c;
>   # bb 1
>   whatever0(z_0(D));
>   whatever1();
>   if (some_condition)
>     {
>       # bb 2
>       whatever2();
>       whatever3();
>     }
>   # bb 3
>   # c_1 = PHI <x_2(D)(1), y_3(D)(2)>;
>   whatever4(c_1);
> }
>
> Note how, without debug annotations, c is only initialized just before
> the call to whatever4.  At all other points, the value of c would be
> unavailable to the debugger, possibly even wrong.
>
> If we were to annotate the SSA definitions forward-propagated into c
> versions as applying to c, we'd end up with all of x_2, y_3 and z_0

I> f you forward propagate any annotations, ever,
> applied to c throughout the entire function, in the absence of
> additional markers.
>
> Now, with the annotations proposed in this paper, what is initially:
>
> int
> x2 (int x, int y, int z)
> {
>   int c;
>   # bb 1
>   c_4 = z_0(D);
>  # DEBUG c z_0(D)
> whatever0(z_0(D));
> # DEBUG c x_2(D)
> whatever1();

> and then, at every one of the inspection points, we get the correct
> value for variable c.
Because you have added information you have no way of knowing.
How exactly did you compute that the call *definitely sets c to the
value of z_0*, and definitely sets the value of c to x_2.

This must be "may-information", because we don't know what the call does.

Ignoring this (the solution is to not assume anything at calls,
because you run the risk of gettng the wrong answer at meet points
later on!) your scheme is sufficient to get correct values, but not
correct locations.

However, value equivalene does not imply location equivalence, and all
of our debug formats deal with locations of variables, except for
constants.

IE If you translate this directly into DWARF3, as written, you will
claim that c and x_4 has the same location (since dwarf does not let
you say "it has the same value as x, but not the same location), and
thus incorrectly represent that p *x_4=5 modifies c if i were to do it
in the debugger.  Because of the may-problem, you will also claim the
same value/location for c and x_2, which you can't prove is right,
because you don't know what whatever1/2 actually does.

if all you want is the values you compute above, on SSA, you can
easily use a lattice to compute the same values you are going to
compute as you update the annotations on the fly.

(This is because it is a flow sensitive problem, and you want the flow
answers at each unique definition point, which SSA neatly provides,
except for calls, where you could hang it off the vops).

Tracking which values *definitely represent user values* is actually
quite easy at the tree level, and doesn't require any IR modification.

It may be worth doing at the RTL level, however, where the solution
requires making up program points at each definition site and
computing the dataflow problem in terms of them.
--Dan

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  8:39                                                       ` Alexandre Oliva
  2007-12-18 13:15                                                         ` Diego Novillo
  2007-12-18 16:22                                                         ` Ian Lance Taylor
@ 2007-12-18 23:19                                                         ` Daniel Berlin
  2007-12-19  6:07                                                           ` Alexandre Oliva
  2007-12-18 23:31                                                         ` Daniel Berlin
  3 siblings, 1 reply; 189+ messages in thread
From: Daniel Berlin @ 2007-12-18 23:19 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/18/07, Alexandre Oliva <aoliva@redhat.com> wrote:

> Then, we let tree optimizers do their jobs.  Whenever they rename,
> renumber, coalesce, combine or otherwise optimize a variable, they
> will automatically update debug statements that mention them as well.
>
Speaking only about the tree level, in this entire email
I make no representations about the RTL level ;)

This is much harder than you give it credit for, unless you plan on
throwing out all the info at elimination points.

Consider PRE alone, which makes new statements that are combinations
of old ones, and eliminate tons of variables in favor of it.

If your debug statement strategy is "move debug statements when we
insert code that is equivalent", it won't work, because our
equivalence is based on value equivalence, not location equivalence.
We only guarantee it has the same value as the whatever it is a copy
of at that point, not that it has the same location.

So you will lose info every time PRE makes an insertion, unless you
make serious modifications to PRE.

This is not to mention the data you lose if you just throw it away at
elimination points.

Let's take another problem.

How do i say debug info for some variable is now dead, we have no idea
what it is right now?
How do I figure out which debug statements need to be modified when
you introduce new memory operations?

When you pass something by address, you get vops.
The vops are not variables, and have no relation to the original
variable (they can be partitions containing more vairables).

If i have

DEBUG(x, x_3)
x_3 = x; // Read from global

y = x_3;
....

If i insert a new call
DEBUG(x, x_3): 1
x_3 = x

foo() // May modify x and *&x)

y = x_3

Now you have two problems.

It is no longer true that at the point of y = x_3, that DEBUG (x, x_3) is true
In act, x_3 may no longer have any relation to x.
You have three choices:
1. Either destroy the DEBUG(x, x_3) losing valuable and correct info
2. Add a new DEBUG (x, unknown)
3. Figure out which debug statement are reached by your call

#3 is a dataflow problem, and not something you want to do every time
you insert a call.

If your answer is #1 or #2, then what you are really doing is
computing roughly the same dataflow problem var-location does, except
on trees and with a different meet-operation.

var-location generates incorrect info not because it represents
something fundamentally different than you are (it doesn't), it falls
down because it uses union as the meet operation.

It says "oh, i don't know which of these locations is right, it must
be both of them".

If you changed the meet operation to "oh, i don't know which of these
locations is right, it must be none of them", and did a little more
work you would inference the same info as yours *at the tree level*

Nothing you have proposed is fundamentally going to give you better info.
All you have done is annotated the IR in some places to make explicit
some bits in the dataflow problem that you could inference anyway.  It
is provable you can inference them with a simple lattice and
associated value, *unless you are going to start guessing* (which you
have said you don't want to do because it can generate incorrect
info).

There is absolutely no reason what you are trying to do needs to
modify the tree IR at all to achieve exactly the same accuracy of
debug info as your design proposes at the tree level.  You could
simply compute the global dataflow problem.

The RTL level is harder, of course.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 13:29                                                       ` Robert Dewar
@ 2007-12-18 22:15                                                         ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18 22:15 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:
>> On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

>>> OK, so you are agreeing that good debuggability is impossible
>>> with all the optimizations in place, so once again, let's have
>>> an optimziation level that optimizes as far as possible without
>>> harming debuggability.

>> It's just that changing optimizations is precisely *against* the goals
>> of my current project.  So, don't expect significant efforts to this
>> end from me at this time.

> But you can't achieve the above criterion with your approach.

Actually, you can.  My approach is about ensuring the mapping between
the location of source and implementation variables is correct.  This
is orthogonal to how much optimization you make.

If you optimize more, more values or locations may become unavailable,
but this is not about correctness (what fraction of the annotations
point at locations that hold the correct value), and it's not even
about completeness (what fraction of the source variables are
represented at all locations they are available), it's just about
theoretical completeness (what fraction of the source variables are
represented at all locations they would be available without
optimization).

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:31                                                             ` Andrew Haley
  2007-12-18 16:42                                                               ` Robert Dewar
@ 2007-12-18 17:12                                                               ` Richard Kenner
  1 sibling, 0 replies; 189+ messages in thread
From: Richard Kenner @ 2007-12-18 17:12 UTC (permalink / raw)
  To: aph; +Cc: aoliva, dewar, gcc, iant

> Short of putting a barrier at every sequence point, how would you stop
> the debugger from jumping all over the place?  I'm assuming that you
> do want the debugger to show what is actually going on, not fake it.

You could, for example, add a -Og option that says "don't do any
optimizations that will move instructions between lines".

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:42                                                               ` Robert Dewar
@ 2007-12-18 17:04                                                                 ` Andrew Haley
  0 siblings, 0 replies; 189+ messages in thread
From: Andrew Haley @ 2007-12-18 17:04 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc

Robert Dewar writes:
 > Andrew Haley wrote:
 > =
 > >  > I don't think it is fine, we have constant complaints from our
 > >  > users about this. I think we definitely need an optimization
 > >  > level that avoids this.
 > > 
 > > Short of putting a barrier at every sequence point, how would you stop
 > > the debugger from jumping all over the place?  I'm assuming that you
 > > do want the debugger to show what is actually going on, not fake it.
 > 
 > Note that putting a barrier at every sequence point is exactly what
 > Geert proposed, and I think we really need an optimization level
 > that does the equivalent of this. It is also needed for effective
 > source-object traceability for certification purposes. Yes, you
 > can use -O0, but the trouble is that we generate so much rubbish
 > at this level, much worse than commpetitive compilers with "optimization
 > off", and the shear amount of object code makes the traceability
 > analysis harder (and makes executables unnecessarily huge).

I agree.  It's a really interesting idea and should be fairly easy to
prototype.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:32                                                             ` Daniel Jacobowitz
@ 2007-12-18 16:44                                                               ` Robert Dewar
  0 siblings, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-12-18 16:44 UTC (permalink / raw)
  To: Robert Dewar, Ian Lance Taylor, Alexandre Oliva, gcc

Daniel Jacobowitz wrote:
> On Tue, Dec 18, 2007 at 11:22:12AM -0500, Robert Dewar wrote:

>> I don't think it is fine, we have constant complaints from our
>> users about this. I think we definitely need an optimization
>> level that avoids this.
> 
> It's fine because it's not the problem he's working on.  We don't have
> to fix everything at once!

Fair enough, I am all in favor of improving all aspects of
debuggability :-)
> 

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:31                                                             ` Andrew Haley
@ 2007-12-18 16:42                                                               ` Robert Dewar
  2007-12-18 17:04                                                                 ` Andrew Haley
  2007-12-18 17:12                                                               ` Richard Kenner
  1 sibling, 1 reply; 189+ messages in thread
From: Robert Dewar @ 2007-12-18 16:42 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc

Andrew Haley wrote:
=
>  > I don't think it is fine, we have constant complaints from our
>  > users about this. I think we definitely need an optimization
>  > level that avoids this.
> 
> Short of putting a barrier at every sequence point, how would you stop
> the debugger from jumping all over the place?  I'm assuming that you
> do want the debugger to show what is actually going on, not fake it.

Note that putting a barrier at every sequence point is exactly what
Geert proposed, and I think we really need an optimization level
that does the equivalent of this. It is also needed for effective
source-object traceability for certification purposes. Yes, you
can use -O0, but the trouble is that we generate so much rubbish
at this level, much worse than commpetitive compilers with "optimization
off", and the shear amount of object code makes the traceability
analysis harder (and makes executables unnecessarily huge).
> 
> Andrew.
> 

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:28                                                           ` Robert Dewar
  2007-12-18 16:31                                                             ` Andrew Haley
@ 2007-12-18 16:32                                                             ` Daniel Jacobowitz
  2007-12-18 16:44                                                               ` Robert Dewar
  1 sibling, 1 reply; 189+ messages in thread
From: Daniel Jacobowitz @ 2007-12-18 16:32 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc

On Tue, Dec 18, 2007 at 11:22:12AM -0500, Robert Dewar wrote:
>>> == Goals
>>
>> I note that you don't say anything about the other big problem with
>> debugging optimized code, which is that the debugger jumps around all
>> over the place.  That is fine, of course.
>
> I don't think it is fine, we have constant complaints from our
> users about this. I think we definitely need an optimization
> level that avoids this.

It's fine because it's not the problem he's working on.  We don't have
to fix everything at once!

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:28                                                           ` Robert Dewar
@ 2007-12-18 16:31                                                             ` Andrew Haley
  2007-12-18 16:42                                                               ` Robert Dewar
  2007-12-18 17:12                                                               ` Richard Kenner
  2007-12-18 16:32                                                             ` Daniel Jacobowitz
  1 sibling, 2 replies; 189+ messages in thread
From: Andrew Haley @ 2007-12-18 16:31 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc

Robert Dewar writes:
 > Ian Lance Taylor wrote:
 > > Alexandre Oliva <aoliva@redhat.com> writes:
 > > 
 > >> 	A plan to fix local variable debug information in GCC
 > >>
 > >> 		by Alexandre Oliva <aoliva@redhat.com>
 > >>
 > >> 			   2007-12-18 draft
 > > 
 > > Thank you for writing this.  It makes an enormous difference.
 > > 
 > > 
 > >> == Goals
 > > 
 > > I note that you don't say anything about the other big problem with
 > > debugging optimized code, which is that the debugger jumps around all
 > > over the place.  That is fine, of course.
 > 
 > I don't think it is fine, we have constant complaints from our
 > users about this. I think we definitely need an optimization
 > level that avoids this.

Short of putting a barrier at every sequence point, how would you stop
the debugger from jumping all over the place?  I'm assuming that you
do want the debugger to show what is actually going on, not fake it.

Andrew.

-- 
Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, UK
Registered in England and Wales No. 3798903

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 16:22                                                         ` Ian Lance Taylor
@ 2007-12-18 16:28                                                           ` Robert Dewar
  2007-12-18 16:31                                                             ` Andrew Haley
  2007-12-18 16:32                                                             ` Daniel Jacobowitz
  2007-12-19  4:30                                                           ` Alexandre Oliva
  2007-12-31 16:55                                                           ` Richard Guenther
  2 siblings, 2 replies; 189+ messages in thread
From: Robert Dewar @ 2007-12-18 16:28 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, gcc

Ian Lance Taylor wrote:
> Alexandre Oliva <aoliva@redhat.com> writes:
> 
>> 	A plan to fix local variable debug information in GCC
>>
>> 		by Alexandre Oliva <aoliva@redhat.com>
>>
>> 			   2007-12-18 draft
> 
> Thank you for writing this.  It makes an enormous difference.
> 
> 
>> == Goals
> 
> I note that you don't say anything about the other big problem with
> debugging optimized code, which is that the debugger jumps around all
> over the place.  That is fine, of course.

I don't think it is fine, we have constant complaints from our
users about this. I think we definitely need an optimization
level that avoids this.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  8:39                                                       ` Alexandre Oliva
  2007-12-18 13:15                                                         ` Diego Novillo
@ 2007-12-18 16:22                                                         ` Ian Lance Taylor
  2007-12-18 16:28                                                           ` Robert Dewar
                                                                             ` (2 more replies)
  2007-12-18 23:19                                                         ` Daniel Berlin
  2007-12-18 23:31                                                         ` Daniel Berlin
  3 siblings, 3 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-12-18 16:22 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> 	A plan to fix local variable debug information in GCC
> 
> 		by Alexandre Oliva <aoliva@redhat.com>
> 
> 			   2007-12-18 draft

Thank you for writing this.  It makes an enormous difference.

> == Goals

I note that you don't say anything about the other big problem with
debugging optimized code, which is that the debugger jumps around all
over the place.  That is fine, of course.

> Once this is established, a possible representation becomes almost
> obvious: statements (in trees) or instructions (in rtl) that assert,
> to the variable tracker, that a user variable or member is represented
> by a given expression:
> 
>   # DEBUG var expr
> 
> By var, we mean a tree expression that denotes a user variable, for
> now.  We envision trivially extending it to support components of
> variables in the future.

While you say that this is almost obvious, it still isn't obvious at
all to me.  You consider trees and RTL together, but I don't see why
that is appropriate.

My biggest concern at the tree level is the significantly increased
memory usage and the introduction of a sort of a weak pointer to
values.  Since DEBUG statements shouldn't interfere with
optimizations, we need to explicitly ignore them in things like
has_single_use.  But since our data structures need to be coherent, we
can not ignore them when we actually eliminate SSA names.  That seems
sort of complicated.

In SSA form it seems very natural to provide a set of associations
with user variables for each GIMPLE variable.  Since the GIMPLE
variables never change, these associations never change.  We have to
get them right when we create a new GIMPLE variable and when we
eliminate a GIMPLE variable.  While this obviously requires some work,
to me it seems less intrusive than the notion of weak references.

Of course this means that we are keeping the debug information in a
reversed form.  Instead of saying that a user variable is associated
with an expression in terms of GIMPLE variables, we will say that a
GIMPLE variable is associated with an expression in terms of user
variables.  We will have to reverse the latter expression to get the
correct debug information.  Of course in some cases this will be
impossible, as when a GIMPLE variable is associated with a sum of user
variables; presumably in those cases you would have to drop the DEBUG
statement anyhow.

By the way, we shouldn't confuse the source code live range of the
variable with the annotations on the GIMPLE variables.  That will get
us into the mapping of source code lines to optimized code.  It is of
course true that optimized code will move around unpredictably, and
your proposal doesn't handle that.  I don't see it as a flaw that it
will be possible to view user variables outside of their source code
range.

In any case, RTL is different.  We can't reasonably associate
annotations with pseudo-registers, because they change during the
function.  The obvious choices are to annotate SET statements, or to
annotate insns, or to introduce a DEBUG insn as you suggest.  It's not
obvious to me why a DEBUG insn is superior to a REG_NOTE attacked to
an insn.  The problem with DEBUG insns is of course that the RTL code
is very sensitive to new insns, and also the additional memory usage.
You discuss those, but it's not obvious to me why your proposed
solution is the best one.

> Testing for accuracy and completeness of debug information can be best
> accomplished using a debugging environment.

Of course this is very unsatisfactory without an automated testsuite.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 13:15                                                         ` Diego Novillo
@ 2007-12-18 15:06                                                           ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18 15:06 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 18, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/18/07 03:07, Alexandre Oliva wrote:
>> Rats, this below-the-waistline attack really got me annoyed.

> I'm sorry you feel that way, it was not meant as a personal attack,
> though it was rather brusque.  I was getting tired of asking for the
> same thing over and over again.

>> So, what do you say now?

> Thank you.  Now I have something concrete to read and comment on.

You already had it.  Really.  You just didn't feel like reading and
commenting on it, for whatever reason I can't understand, which is why
you kept asking for what you already had over and over again.

Anyhow...  I expect your feedback, err...  "now" ;-P :-D

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  7:56                                                     ` Alexandre Oliva
@ 2007-12-18 13:29                                                       ` Robert Dewar
  2007-12-18 22:15                                                         ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Robert Dewar @ 2007-12-18 13:29 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

Alexandre Oliva wrote:
> On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

>> OK, so you are agreeing that good debuggability is impossible
>> with all the optimizations in place, so once again, let's have
>> an optimziation level that optimizes as far as possible without
>> harming debuggability.
> 
> I don't oppose such an optimization level, even though I don't know
> that we agree on what "good debuggability" stands for.

My definition is that it should be indistinguishable from -O0
except that I could live without being able to modify variables.
> 
> It's just that changing optimizations is precisely *against* the goals
> of my current project.  So, don't expect significant efforts to this
> end from me at this time.

But you can't achieve the above criterion with your approach.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  8:39                                                       ` Alexandre Oliva
@ 2007-12-18 13:15                                                         ` Diego Novillo
  2007-12-18 15:06                                                           ` Alexandre Oliva
  2007-12-18 16:22                                                         ` Ian Lance Taylor
                                                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 189+ messages in thread
From: Diego Novillo @ 2007-12-18 13:15 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/18/07 03:07, Alexandre Oliva wrote:

> Rats, this below-the-waistline attack really got me annoyed.

I'm sorry you feel that way, it was not meant as a personal attack, 
though it was rather brusque.  I was getting tired of asking for the 
same thing over and over again.

> So, what do you say now?

Thank you.  Now I have something concrete to read and comment on.


Diego.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  5:17                                                     ` Alexandre Oliva
  2007-12-18  8:06                                                       ` Kai Henningsen
@ 2007-12-18  8:39                                                       ` Alexandre Oliva
  2007-12-18 13:15                                                         ` Diego Novillo
                                                                           ` (3 more replies)
  1 sibling, 4 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18  8:39 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

On Dec 18, 2007, Alexandre Oliva <aoliva@redhat.com> wrote:

> On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:
>> On 12/17/07 19:50, Alexandre Oliva wrote:
>>> Now, since you're so interested in it and you've already read the
>>> various perspectives on the issue that I listed in my yesterday's
>>> e-mail to you, would you help me improve this document, by letting me
>>> know what you believe to be missing from the selected postings on
>>> design strategies, rationales and goals:

>> No.  I am not interested in organizing your thoughts for you.

> Wow, nice shot!

Rats, this below-the-waistline attack really got me annoyed.

So annoyed that I spent the night writing up this consolidated design
document.

So, what do you say now?

[-- Attachment #2: debug-var-loc.txt --]
[-- Type: text/plain, Size: 22558 bytes --]

	A plan to fix local variable debug information in GCC

		by Alexandre Oliva <aoliva@redhat.com>

			   2007-12-18 draft

== Introduction

The DWARF Debugging Information Format, version 3, determines the ways
a compiler can communicate the location of user variables at run time
to debug information consumers such as debuggers, program analysis
tools, run-time monitors, etc.

One possibility is that the location of a variable is fixed throughout
the execution of a function.  This is generally good enough for
unoptimized programs.

However, for optimized programs, the location of a variable can vary.
The variable may be live for some parts of a function, even in
multiple locations simultaneously.  At other parts, it may be
completely unavailable, or it may still be computable even if no
location actually holds its value.  The encoding, in these cases, can
be a location list: tuples with possibly-overlapping ranges of
instructions, and location expressions that determine a location or a
value for the variable.

Historically, GCC started with the simpler, fixed-location model.  In
fact, back then, there weren't debug information formats that could
represent anything better than this.

More recently, GCC gained code to keep track of varying locations, and
to emit debug information accordingly.  Unfortunately, very many
optimization passes discard information that would be necessary to
emit correct and complete variable location lists.

Coalescing, scalarizing, substituting, propagating, and many other
transformations prevent the late-running variable tracker from doing
an accurate job.  By the time it runs, many variables no longer show
up in the retained annotations, although they're still conceptually
available.

The variable tracker can't tell when a user variable overlaps with
another, and it can't tell when a variable is overwritten, if the
assignment is optimized away.  These limitations are inherent to a
model based on inspecting actual code and trying to make inferences
from that.  In order to be able to represent not only what remained in
the code, but also what was optimized, combined or otherwise
apparently-removed, additional information needs to be kept around.

This paper describes an approach to maintain this information.

== Goals

* Ensure that, for every user variable for which we emit debug
information, the information is correct, i.e., if it says the value of
a variable at a certain instruction is at certain locations, or is a
known constant, then the variable must not be at any other location at
that point, and the locations or values must match reasonable
expectations based on source code inspection.

* Defining "reasonable expectations" is tricky, for code reordering
typical of optimization can make room for numerous surprises.  I don't
have a precise definition for this yet, but very clearly to me saying
that a variable holds a value that it couldn't possibly hold (e.g.,
because it is only assigned that value in a code path that is
knowingly not taken) is a very clear indication that something is
amiss.  The general guiding rule is, if we aren't sure the information
is correct (or we're sure it isn't), we shouldn't pretend that it is.

* Try to ensure that, if the value of a variable is a known constant
at a certain point in the program, this information is present in
debug information.

* Try to ensure that, if the value of a variable is available or
computable at any location at a certain point in the program, this
information is present in debug information.

* Stop missing optimizations for the sake of preserving debug
information.

* Avoid using additional memory and CPU cycles that would be needed
only for debug information when compiling without generating debug
information

== Internal Representation

For historical reasons, GCC has two completely different, even if
nearly isomorphic, internal representations: trees and RTL.  This
decision has required a lot of code to be duplicated for low-level
manipulation and simplification of each of these representations.

Since tracking variables and their values must start early to ensure
correctness, and be carried throughout the complete optimization
process, it might seem tempting to introduce yet another
representation for debug information, decaying both isomorphic
representations into a single debug information representation.  The
drawbacks would be additional duplication of internal representation
manipulation code, and the possibility of increasing memory use out of
the need for representing information in yet another format.

Another concern is that even the simplest compiler transformations may
need to be reflected in debug information.  This might indicate a need
for modifying every point of transformation in every optimization pass
so as to propagate information into the debug information
representation.  This is undesirable, because it would be very
intrusive.

But then, keeping references to the correct values, expressions or
variables, as transformations are made, is precisely what optimization
passes have to do to perform their jobs correctly.  Finding a way to
take advantage of this is a very non-intrusive way of keeping debug
information accurate.  In fact, most transformations wouldn't need any
changes whatsoever: uses of variables in debug information can, in
most optimization passes, be handled just like any other uses.

Once this is established, a possible representation becomes almost
obvious: statements (in trees) or instructions (in rtl) that assert,
to the variable tracker, that a user variable or member is represented
by a given expression:

  # DEBUG var expr

By var, we mean a tree expression that denotes a user variable, for
now.  We envision trivially extending it to support components of
variables in the future.

By expr, we mean a tree or rtl expression that computes the value of
the variable at the point in which the statement or instruction
appears in the program.  A special value needs to be specified for
each representation that denotes a location or value that cannot be
determined or represented in debug information, for example, the
location of a variable that was completely optimized away.  It might
be useful to represent the expression as a list of expressions, and to
distinguish lvalues from rvalues, but for now let's keep this simple.

== Generating debug information

Generating initial annotations when entering SSA is early enough in
the translation that the program will still reflect very reliably the
original source code.  Annotations are only generated for user
variables that are GIMPLE registers, i.e., variables that represent
scalar values and that never have their address taken.  Other kinds of
variables don't have varying locations, so we don't need to worry
about them.

After every assignment to such a variable, we emit a DEBUG statement
that will preserve, throughout compilation, the information that, at
that point, the assigned variable was represented by that expression.
So, after turning an assignment such as the following into SSA form,
we emit the debug statement below right after it:

  x_1 = whatever;
  # DEBUG x x_1

Likewise, at control flow merge points, for each PHI node we introduce
in the SSA representation, we emit an annotation:

  # x_4 = PHI <x_1(3), x_2(4), x_3(7)>;
  # DEBUG x x_4

Then, we let tree optimizers do their jobs.  Whenever they rename,
renumber, coalesce, combine or otherwise optimize a variable, they
will automatically update debug statements that mention them as well.

In the rare cases in which the presence of such a statement might
prevent an optimization, we need to adjust the optimizer code such
that the optimization is not prevented.  This most often amounts to
skipping or otherwise ignoring debug statements.  In a few very rare
cases, special code might be needed to adjust debug statements
manually.

After transformation to RTL, the representation needs translation, but
conceptually it's still the same: a mapping from variable to
expression.  Again, optimizers will most often adjust debug
instructions automatically.

The exceptions can be handled at no cost: the test for whether an
element of the instruction stream is an instruction or some kind of
note, that never needs updating, is a range test, in its optimized
form.  By placing the identifier for a debug instruction at one of the
limits of this range, testing for both ranges requires identical code,
except for the constants.

Since most code that tests for INSN_P and handles instructions can and
should match debug instructions as well, in order to keep them up to
date, we extend INSN_P so as to match debug instructions, and modify
the exceptions, that need to skip debug instructions, by using an
alternate test, with the same meaning as the original definition of
INSN_P.  These simple and non-intrusive changes are relatively common,
but still, by far, the exception rather than the rule.

When optimizations are completed, including register allocation and
scheduling, it is time to pick up the debug instructions and emit
debug information out of them.  Conceptually, the debug instructions
represent points of assignment, at which a user variable ought to
evaluate to the annotated expression, maintained throughout
compilation.  However, when the value of a variable is live at more
than one location, it is important to note it, such that, if a
debugging session attempts to modify the variable, all copies are
modified.

The idea is to use some mechanism to determine equivalent expressions
throughout a function (say some variant of Global Value Numbering).
At debug instructions, we assert that the value of the named variable
is in the equivalence class represented by the expression.  As we scan
basic blocks forward and find that expressions in an equivalence class
are modified, we remove them from the equivalence class, and thus from
the list of available locations for the variable.  When such
expressions are further copied, we add them to equivalence classes.
At function calls and volatile asm statements, we remove
non-function-private memory slots from equivalence classes.  At
function calls, we also remove call-clobbered registers from
equivalence classes.  When no live expression remains in the
equivalence class that represents a variable, it is understood that
its value is no longer available.  At basic block confluences, we
combine information from the end states of the incoming blocks and the
debug statements added as a side effect of PHI nodes.

The end result is accurate debug information.  Also, except for
transformations that require special handling to update debug
annotations properly, debug information should come out as complete as
possible.

== Testability

Since debug annotations are added early, and, in most cases,
maintained up-to-date by the same code that optimizers use to maintain
executable code up-to-date, debug annotations are likely to remain
accurate throughout compilation.

The risk of this approach is that the annotations get in the way of
optimizations, thus causing executable code to vary depending on
whether or not debug information is to be generated.  The risk of
varying code could be removed at the expense of generating and
maintaining debug annotations throughout compilation and just throwing
them away at the end.  This is undesirable, for it would slow down
compilation without debug information and waste memory while at that.

Therefore, we've built testing mechanisms into the compiler to detect
cases in which the presence of debug annotations would cause code
changes.

The bootstrap-debug Makefile target, by default, compiles the second
bootstrap stage without debug information, and the third bootstrap
stage with it, and then compares all object files after stripping
them, a process that discards all debug information.

Furthermore, bootstrap4-debug, after bootstrap-debug and
prepare-bootstrap4-debug-lib-g0, rebuilds all target libraries without
debug information, and compares them with the stage3 target libraries,
built with debug information.

At the time of this writing, both tests pass on platforms
x86_64-linux-gnu and i686-linux-gnu, and ppc64-linux-gnu and
ia64-linux-gnu are getting close.

Additional testing mechanisms should be built in, to exercise a wider
range of internal GCC behaviors and extensions, for example, by
comparing the compiler output with and without debug information while
compiling all of its testsuite.

Even if testing mechanisms fail to catch an error, the generation of
debug annotations is controlled by a command-line option, such that
any code changes caused by it can be easily avoided, at the expense of
the quality of the debug information.

Testing for accuracy and completeness of debug information can be best
accomplished using a debugging environment.  For example, writing
programs of increasing complexity, adding functional-call or asm probe
points to stabilize the internal execution state, and then examining
the state of the program at these probe points in a debugger, shall
let us know how accurate and how complete variable location
information is.

Measuring accuracy is easy: if you ask for the value of a variable,
and get a value other than the expected, there's a bug in the
compiler.  If you get "unavailable", this can still be regarded as
accurate, for locations are always optional.  However, it might be
incomplete.  Telling whether the variable was indeed optimized away,
or whether the value is available or computable but the information is
missing, is a harder problem, but it's not part of the accuracy test,
but rather of the completeness test.

The completeness score for an unoptimized program might very often be
unachievable for optimized programs, not because the compiler is doing
a poor job at maintaining debug information, but rather because the
compiler is doing a good job at optimizing it, to the point that it is
no longer possible to determine the value of the inspected variable.

== Concerns

=== Memory consumption

Keeping more information around requires more memory; information
theory tells us that there's only so much information you can fit in a
bit.

In order to generate correct debug information, more information needs
to be retained throughout compilation.  The only way to arrange for
debug information to not require any additional memory is to waste
memory when not generating debug information.  But this is
undesirable.

Therefore, the better debug information we want, the more memory
overhead we're going to have to tolerate.

Of course at times we can trade memory for efficiency, using more
computationally expensive representations that are more compact.

At other times, we may trade memory for maintainability.  For example,
instead of emitting annotations as soon as we enter SSA mode, we could
emit them on demand, i.e., whenever we deleted, moved or significantly
modified an SSA assignment for which we would have emitted a debug
annotation.  Additional memory would be needed to mark assignments
that should have gained annotations but haven't, and care must be
taken to make sure that transformations aren't made without leaving a
correct debug statement in place.  It is not clear that this would
save significant memory, for a large fraction of relevant assignments
are modified or moved anyway, so it might very well be a
maintainability loss and a performance penalty for no measurable
memory gains.

Worst case, we may trade memory for debug information quality: if
memory use of this scheme is too high for some scenario, one can
disable debug information annotations through a command line option,
or disable debug information altogether.

=== Intrusiveness

Given that nearly all compiler transformations would require
reflection in debug information, any solution that doesn't take
advantage of this fact is bound to require changes all over the place.

Perhaps not so much for Tree-SSA passes, that are relatively
well-behaved and use a narrow API to make transformations, but very
clearly so for RTL passes, that very often modify instructions in
place, and at times even reuse locations assigned to user variables as
temporaries.

Even when we do use the strength of optimizers to maintain debug
information up to date, there are exceptions in which detailed
knowledge about the transformation taking place enables us to adjust
the annotations properly, if possible, or to discard location
information for the variable otherwise.

It is just not possible to hope that information can be maintained
accurate throughout compilation without any effort from optimizers, or
even through a trivial API for a debug information generator.  A
number of the exceptions that require detailed knowledge about the
ongoing transformation would be indistinguishable from other common
transformations that would have very different effects on debug
information.  At this point, any expectations of lower intrusiveness
by use of such an API vanish.

By letting optimizers do their jobs on debug annotations, and handling
exceptions only at the few locations where they are needed, trivially
in most such cases, we keep intrusiveness at a minimum.

Of course we could get even lower intrusiveness by accepting errors in
debug information, or accepting to generate different code depending
on debug information command-line options.  But these options
shouldn't be considered seriously.

=== Complexity

The annotations are conceptually trivial and they can be immediately
handled by optimizers.  It is hard to imagine a simpler design that
would still enable us to get right cases such as those in the examples
below.

Worrying about the representation of debug annotations as statements
or instructions, rather than notes, is missing the fact that, most of
the time, we do want them to be updated just like statements and
instructions.

Worrying about the representation of debug annotations in-line, rather
than an on-the-side representation, is a valid concern, but it's
addressed by the testability of the design, and the in-line
representation is highly advantageous, not only for using optimizers
to keep debug information accurate, but also for doing away with the
need for yet another internal representation and all the efforts into
maintaining it accurate.

=== Optimizations

Correct and more complete debugging information isn't supposed to
disable optimizations.  Keep in mind that enabling debug information
isn't supposed to modify the executable code in any way whatsoever.

The goal is to ensure that whatever debug information the compiler
generates actually matches the executable code, and that it is as
complete as viable.

The goal is not to disable optimizations so as to preserve variables
or code, such that it can be represented in debug information and
provide for a debugging experience more like that of code that is not
optimized.

If debug information disables any optimization, that's a bug that
needs fixing.

Now, while testing this design, a number of opportunities for
optimization that GCC missed were detected and fixed, others were
merely detected, and at least one optimization shortcoming kept in
place in order to get better debug information could be removed, for
the new debug information infrastructure enables the optimization to
be applied in its fullest extent.

== Examples

It is desirable to be able to represent constants and other
optimized-away values, rather than stating variables have values they
can no longer have:

int
x1 (int x)
{
  int i;

  i = 2;
  f(i);
  i = x;
  h();
  i = 7;
  g(i);
}

Even if variable i is completely optimized away, a debugger can still
print the correct values for i if we keep annotations such as:

  (debug (var_location i (const_int 2)))
  (set (reg arg0) (const_int 2))
  (call (mem (symbol_ref f)))
  (debug (var_location i unknown))
  (call (mem (symbol_ref h)))
  (debug (var_location i (const_int 7)))
  (set (reg arg0) (const_int 7))
  (call (mem (symbol_ref g)))

In this case, before the call to h, not only the assignment to i was
dead, but also the value of the incoming argument x had already been
clobbered.  If i had been assigned to another constant instead, debug
information could easily represent this.

Another example that covers PHI nodes and conditionals:

int
x2 (int x, int y, int z)
{
  int c = z;
  whatever0(c);
  c = x;
  whatever1();
  if (some_condition)
    {
      whatever2();
      c = y;
      whatever3();
    }
  whatever4(c);
}

With SSA infrastructure, this program can be optimized to:

int
x2 (int x, int y, int z)
{
  int c;
  # bb 1
  whatever0(z_0(D));
  whatever1();
  if (some_condition)
    {
      # bb 2
      whatever2();
      whatever3();
    }
  # bb 3
  # c_1 = PHI <x_2(D)(1), y_3(D)(2)>;
  whatever4(c_1);
}

Note how, without debug annotations, c is only initialized just before
the call to whatever4.  At all other points, the value of c would be
unavailable to the debugger, possibly even wrong.

If we were to annotate the SSA definitions forward-propagated into c
versions as applying to c, we'd end up with all of x_2, y_3 and z_0
applied to c throughout the entire function, in the absence of
additional markers.

Now, with the annotations proposed in this paper, what is initially:

int
x2 (int x, int y, int z)
{
  int c;
  # bb 1
  c_4 = z_0(D);
  # DEBUG c c_4
  whatever0(c_4);
  c_5 = x_2(D);
  # DEBUG c c_5
  whatever1();
  if (some_condition)
    {
      # bb 2
      whatever2();
      c_6 = y_3(D);
      # DEBUG c c_6
      whatever3();
    }

  # bb 3
  # c_1 = PHI <c_5(D)(1), c_6(D)(2)>
  # DEBUG c c_1
  whatever4(c_1);
}

is optimized into:

int
x2 (int x, int y, int z)
{
  int c;
  # bb 1
  # DEBUG c z_0(D)
  whatever0(z_0(D));
  # DEBUG c x_2(D)
  whatever1();
  if (some_condition)
    {
      # bb 2
      whatever2();
      # DEBUG y_3(D)
      whatever3();
    }
  # bb 3
  # c_1 = PHI <x_2(D)(1), y_3(D)(2)>;
  # DEBUG c c_1
  whatever4(c_1);
}

and then, at every one of the inspection points, we get the correct
value for variable c.

== Conclusion

This design enables a compiler to emit variable location debug
information that complies with the DWARF version 3 standard, and that
is likely to be as complete as theoretically possible, with an
implementation that is conceptually simple, relatively easy to
introduce, trivial to test and easy to maintain in the long run.  Not
wasting memory or CPU cycles during compilation without debug
information are welcome bonuses.

[-- Attachment #3: Type: text/plain, Size: 250 bytes --]

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  6:16                                               ` Robert Dewar
@ 2007-12-18  8:09                                                 ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18  8:09 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Geert Bosch, Joe Buck, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:
>> Yes, I've considered something along these lines, but decided against
>> it, for we can't afford for debug information to affect executable
>> code generation in any way whatsoever, and we don't want to pessimize
>> optimized code when compiling without -g just so that compiling with
>> -g would get us the same code.

> I disagree, I think it would be fine to degrade -O1 slightly to achieve
> full debuggability,

Sure.  But this is just not relevant to my project of getting GCC to
emit correct (and, ideally, as complete as possible) variable location
information, no matter what the optimization level.

My goal is not so much about aiming at a perfect debugging experience,
but rather at making sure that what the compiler encodes in debug
information actually reflects the code it produced.

This will surely benefit a future full debuggability project, of
course.  But, as much as I see value in perfect debuggability at some
new optimization level, my current task is to get correct and more
complete variable location information at vanilla-build optimization
levels, i.e., at -O2 -g.

It is possible to do much better than what we do now, and it appears
to me that it's even possible to do much better than my current plan.
But I need to get this task wrapped up before I can spend further time
figuring out how to make it even better.

In either case, it probably won't be like -O0, for optimizations are
performed that make it impossible, and I'm not supposed to sacrifice
them for the sake of better debug information.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  5:17                                                     ` Alexandre Oliva
@ 2007-12-18  8:06                                                       ` Kai Henningsen
  2007-12-18  8:39                                                       ` Alexandre Oliva
  1 sibling, 0 replies; 189+ messages in thread
From: Kai Henningsen @ 2007-12-18  8:06 UTC (permalink / raw)
  To: gcc

On Tue, Dec 18, 2007 at 02:38:31AM -0200, Alexandre Oliva wrote:

> Would reformatting these and stamping a title on top make it worthy of
> your interest?

Actually, I think that *would* help (though, of course, it's impossible
to predict if it would help *enough*).

I've noticed before (though this thread is a particularly extreme
example) that GCC developers seem no more immune than other people, from
being able to ignore what's in a mail message (or news article) they're
replying to, even up to ignoring the carefully-selected part they're
quoting.

I don't claim to understand it (nor to be completely immune to it
myself), but I'm no longer surprised by it. Disappointed, but not
surprised.

Anyway, the point is that this seems much rarer when the subject is
*not* in the inbox or a newsgroup. For whatever reason, people apply
their reading skills differently in different situations.

So, my advice would be:

1. Wait a while, so people have time to calm down.

2. Reformat and reorganize the stuff.

3. Put it in an obviously different format - say, give a link to a PDF,
instead of putting it in a mail to this list.

Oh, and it probably wouldn't hurt to give a short summary of what you
did to the various optimizers, including mentioning "no change", *after*
you know that that actually works. (For a work in progress, people seem
to often disbelieve such claims, however well justified ... at least, if
they're already looking hard for arguments against it, however
spurious.)

And no, I have no idea why this particular discussion degenerated so
badly, and similar others didn't. Your style of argumentation may not
have been perfect, but the same can be said for many other people here,
and it doesn't always seem to lead to a meltdown. Maybe it depends on
unpredictable factors like the mood people are in when they go reading
their mail.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  7:45                                                   ` Robert Dewar
@ 2007-12-18  7:56                                                     ` Alexandre Oliva
  2007-12-18 13:29                                                       ` Robert Dewar
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18  7:56 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:
>> Yep.  Sometimes code just is optimized away.  Can't stop that without
>> harming optimizations.

> OK, so you are agreeing that good debuggability is impossible
> with all the optimizations in place, so once again, let's have
> an optimziation level that optimizes as far as possible without
> harming debuggability.

I don't oppose such an optimization level, even though I don't know
that we agree on what "good debuggability" stands for.

It's just that changing optimizations is precisely *against* the goals
of my current project.  So, don't expect significant efforts to this
end from me at this time.

>> If dwarf line number programs were smarter, we could perhaps encode
>> multiple lines for the same instruction, along with conditions to tell
>> when the instruction applies to such or such lines, and even more
>> fancy stuff like that.  But line number programs don't let us express
>> this in Dwarf3.

> So, that's not an option.

Yup.  Best we can do right now is to emit the condition line number.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  4:40                                                 ` Alexandre Oliva
@ 2007-12-18  7:45                                                   ` Robert Dewar
  2007-12-18  7:56                                                     ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Robert Dewar @ 2007-12-18  7:45 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

Alexandre Oliva wrote:

> Yep.  Sometimes code just is optimized away.  Can't stop that without
> harming optimizations.

OK, so you are agreeing that good debuggability is impossible
with all the optimizations in place, so once again, let's have
an optimziation level that optimizes as far as possible without
harming debuggability.
> 
> If dwarf line number programs were smarter, we could perhaps encode
> multiple lines for the same instruction, along with conditions to tell
> when the instruction applies to such or such lines, and even more
> fancy stuff like that.  But line number programs don't let us express
> this in Dwarf3.

So, that's not an option.


^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:24                                             ` Alexandre Oliva
  2007-12-18  2:02                                               ` Joe Buck
@ 2007-12-18  6:16                                               ` Robert Dewar
  2007-12-18  8:09                                                 ` Alexandre Oliva
  1 sibling, 1 reply; 189+ messages in thread
From: Robert Dewar @ 2007-12-18  6:16 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Geert Bosch, Joe Buck, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

Alexandre Oliva wrote:

> Yes, I've considered something along these lines, but decided against
> it, for we can't afford for debug information to affect executable
> code generation in any way whatsoever, and we don't want to pessimize
> optimized code when compiling without -g just so that compiling with
> -g would get us the same code.

I disagree, I think it would be fine to degrade -O1 slightly to achieve
full debuggability, and of course -g cannot affect the generated code.
If indeed

a) it is possible to get perfect debuggability without any pessimization
b) that includes unexpected jumping around
c) everyone agrees on how to achieve a) and b)
d) this is implemented

then fine, but in the absence of these conditions, if we need to
pessimize -O1 code slightly to achieve this, that's OK by me. If
it really worries people, introduce a -Og that achieves this. In
my experience people use -O1 not because they are very performance
sensitive (those folk use -O2), but because -O0 is so horrible,
that they need something better than that for production delivery.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:14                                                   ` Diego Novillo
@ 2007-12-18  5:17                                                     ` Alexandre Oliva
  2007-12-18  8:06                                                       ` Kai Henningsen
  2007-12-18  8:39                                                       ` Alexandre Oliva
  0 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18  5:17 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/17/07 19:50, Alexandre Oliva wrote:
>> Now, since you're so interested in it and you've already read the
>> various perspectives on the issue that I listed in my yesterday's
>> e-mail to you, would you help me improve this document, by letting me
>> know what you believe to be missing from the selected postings on
>> design strategies, rationales and goals:

> No.  I am not interested in organizing your thoughts for you.

Wow, nice shot!

So tell me, what part of what you've read in the selected bibliography
seemed not organized for you?  Maybe that's what I have to work on
first.

> I am interested in reading a single, concise and well organized design
> document that you produce for all of us to understand what you want to
> do.

You got that already, except now I'm no longer sure you've actually
read it.  Have you?

You got the goals.  You got the way I intend to get there, in two
levels of detail.  You got examples that show why the goals can't be
achieved in other simpler ways.  You got various justifications for
the representation I've chosen.

Would reformatting these and stamping a title on top make it worthy of
your interest?

I really don't see what else you might want, and if the above isn't
enough, then my rephrasing it all into a single document still
wouldn't be enough.  I'd be just wasting my time, and yours.

So, please do tell me, what is it that you're still missing?  Note
that I can't promise to deliver, but I can't possibly give you what
you want unless you help me figure out what it is.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  2:02                                               ` Joe Buck
@ 2007-12-18  4:40                                                 ` Alexandre Oliva
  2007-12-18  7:45                                                   ` Robert Dewar
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18  4:40 UTC (permalink / raw)
  To: Joe Buck
  Cc: Geert Bosch, Daniel Berlin, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 17, 2007, Joe Buck <Joe.Buck@synopsys.COM> wrote:

> On Mon, Dec 17, 2007 at 11:11:46PM -0200, Alexandre Oliva wrote:
>> Line number information has a well-defined meaning: it ought to
>> represent the source code line that best represents the source-code
>> construct that ended up implemented using that instruction.

> You implicitly assume that souch a source code line exists.

Actually, no.  I'm not sure where you got that impression, and how you
came to the conclusion that I'd assign line numbers the way you have.
To me, when you hoist something that is present in both blocks of a
conditional, it probably makes more sense to give it the line number
of the conditional, rather than that of either block.  But I won't
pretend to have thought very hard about this particular issue.  For
the time being, I'm focusing my efforts on local variable locations.

Anyhow, very clearly you don't want to mark such hoisted-out
computation as is_stmt.  This should eliminate at least the solvable
problem you're worried about.

>   out = a + b;
>   if (!cond)
>     out += c;
>   return out;

> Furthermore, there isn't a place to put a breakpoint that will
> trigger only for the case where cond is true, as you can on
> unoptimized code.

Yep.  Sometimes code just is optimized away.  Can't stop that without
harming optimizations.

If dwarf line number programs were smarter, we could perhaps encode
multiple lines for the same instruction, along with conditions to tell
when the instruction applies to such or such lines, and even more
fancy stuff like that.  But line number programs don't let us express
this in Dwarf3.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:24                                             ` Alexandre Oliva
@ 2007-12-18  2:02                                               ` Joe Buck
  2007-12-18  4:40                                                 ` Alexandre Oliva
  2007-12-18  6:16                                               ` Robert Dewar
  1 sibling, 1 reply; 189+ messages in thread
From: Joe Buck @ 2007-12-18  2:02 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Geert Bosch, Daniel Berlin, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Mon, Dec 17, 2007 at 11:11:46PM -0200, Alexandre Oliva wrote:
> Line number information has a well-defined meaning: it ought to
> represent the source code line that best represents the source-code
> construct that ended up implemented using that instruction.

You implicitly assume that souch a source code line exists.
Consider something like

int func(bool cond, int a, int b, int c)
{
  int out;
  if (cond)
    out = a + b;
  else
    out = a + b + c;
  return out;
}

The optimizer might produce something that structurally resembles

  out = a + b;
  if (!cond)
    out += c;
  return out;

If you set a breakpoint on the addition of a and b, it will trigger
regardless of the value of cond.  Furthermore, there isn't a place
to put a breakpoint that will trigger only for the case where cond
is true, as you can on unoptimized code.  So you need to choose
between natural debugging and optimization.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  8:20                                           ` Geert Bosch
@ 2007-12-18  1:24                                             ` Alexandre Oliva
  2007-12-18  2:02                                               ` Joe Buck
  2007-12-18  6:16                                               ` Robert Dewar
  0 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18  1:24 UTC (permalink / raw)
  To: Geert Bosch
  Cc: Joe Buck, Daniel Berlin, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 17, 2007, Geert Bosch <bosch@adacore.com> wrote:

> We could conceptually have inspection points between each source
> statement and declaration, which would roughly correspond to a
> use of all memory and all source variables, wether in memory or
> in registers.
> These inspections points would be considered potentially trapping.

Yes, I've considered something along these lines, but decided against
it, for we can't afford for debug information to affect executable
code generation in any way whatsoever, and we don't want to pessimize
optimized code when compiling without -g just so that compiling with
-g would get us the same code.

> Also, since no user-visible state can be modified by speculatively
> executed instructions such as loads, such instructions should not
> be tagged with their original source location information.

Line number information has a well-defined meaning: it ought to
represent the source code line that best represents the source-code
construct that ended up implemented using that instruction.

To address what we have in mind, there's an additional annotation on
top of line number information: the is_stmt flag.  This is what we
should use to tell debuggers what the best instruction is to set a
breakpoint at a certain line number or so, and for debuggers to be
able to step line by line more seamlessly.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:01                                                 ` Alexandre Oliva
@ 2007-12-18  1:14                                                   ` Diego Novillo
  2007-12-18  5:17                                                     ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Diego Novillo @ 2007-12-18  1:14 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/17/07 19:50, Alexandre Oliva wrote:

> Now, since you're so interested in it and you've already read the
> various perspectives on the issue that I listed in my yesterday's
> e-mail to you, would you help me improve this document, by letting me
> know what you believe to be missing from the selected postings on
> design strategies, rationales and goals:

No.  I am not interested in organizing your thoughts for you.

I am interested in reading a single, concise and well organized design 
document that you produce for all of us to understand what you want to do.

Take your time.  It doesn't need to be now.


Diego.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 21:20                                               ` Diego Novillo
@ 2007-12-18  1:01                                                 ` Alexandre Oliva
  2007-12-18  1:14                                                   ` Diego Novillo
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-18  1:01 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/17/07 15:28, Alexandre Oliva wrote:
>>> You need to provide such a document now.
>> 
>> Can't I instead provide it when it's ready?

> Of course.

Thanks,

Now, since you're so interested in it and you've already read the
various perspectives on the issue that I listed in my yesterday's
e-mail to you, would you help me improve this document, by letting me
know what you believe to be missing from the selected postings on
design strategies, rationales and goals:

http://gcc.gnu.org/ml/gcc/2007-11/msg00229.html (goals)
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html (initial plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00261.html (detailed plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00317.html (example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00590.html (more example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00176.html (design rationale)
http://gcc.gnu.org/ml/gcc/2007-11/msg00177.html (clarification)

I could then focus on these missing aspects too, in addition to the
ones I already have, while designing the best form to present the
ideas.

Thanks in advance,

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 20:43                                             ` Alexandre Oliva
@ 2007-12-17 21:20                                               ` Diego Novillo
  2007-12-18  1:01                                                 ` Alexandre Oliva
  2007-12-31 14:45                                               ` Richard Guenther
  1 sibling, 1 reply; 189+ messages in thread
From: Diego Novillo @ 2007-12-17 21:20 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/17/07 15:28, Alexandre Oliva wrote:

>> You need to provide such a document now.
> 
> Can't I instead provide it when it's ready?

Of course.


Diego.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 18:02                                           ` Diego Novillo
@ 2007-12-17 20:43                                             ` Alexandre Oliva
  2007-12-17 21:20                                               ` Diego Novillo
  2007-12-31 14:45                                               ` Richard Guenther
  0 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-17 20:43 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/17/07 12:51, Alexandre Oliva wrote:
>> I guess I'm to blame, for having naÃ¯vely put the code out without as
>> much as a design and goals document

> Yes, you are.

Wow, thanks.  At least we agree on something! ;-)

> You need to provide such a document now.

Can't I instead provide it when it's ready?

You know, it wasn't me who asked to have the thing developed in the
open.  I didn't push it out just so that people who didn't want to
understand it could beat on it before it was ready to defend itself.
I put it out because there was an offer for contribution.

> I can't see how you'll be able to incorporate your implementation
> without a convincing design.

Agreed, I don't see how this would be doable for any but the most
trivial patches.

> The barrier is probably going to be higher.
> You raised too much controversy, so I have my doubts about your
> simplicity claims.

Oh, nice!  *I* raised too much controversy.  So people first ask me to
put the code out such that they can peek at it and help, then most
refrain from peeking at it because it's not ready and some who do
raise some concerns that are not reflected by the code, and then
everyone doubts I've taken those concerns into account and demand a
design document that will no more than just repeat the information
that's already out there but that people fail to take into account.

And then, this is a technical discussion, so historical controversy
shouldn't play any role in it, if people were rational about it.

Now, can you please explain to me how the efforts of repeating myself
one more time, rather than completing the implementation, are going to
make it any more likely that people who have already made up their
minds based on groundless fears will be convinced?

If you really think it would be worth it, can you point out at what
you feel to be missing in the consolidated documentation I posted
upthread, in response to your request?  I'd be happy to fill in the
blanks, if you're willing to listen.  But I wouldn't be happy to waste
more time.

(This is not to say that the document won't ever be produced; it's to
say that I'm to work on it right now.  I have other deliverables ahead
of it.)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  5:38                                         ` Joe Buck
  2007-12-17  8:20                                           ` Geert Bosch
@ 2007-12-17 18:33                                           ` Alexandre Oliva
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-17 18:33 UTC (permalink / raw)
  To: Joe Buck
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Dec 16, 2007, Joe Buck <Joe.Buck@synopsys.COM> wrote:

> However, since preserving accurate debug information
> has a cost, I think it would be better to turn -O1, not -O2, into the
> mode that Alexandre wants, where debug information is preserved.

In terms of memory, that's true, it does have a cost, for we have to
keep more information around.  That's one of the reasons why I'm
implementing this all under the control of a command-line option: you
can selectively enable or disable it, regardless of the level of
optimization.  If we want to make it default for -O1, but not for -O2,
sure, that works.

But this won't make much of a difference in terms of code change.
Except for the fact that we could simply leave alone the passes that
are only executed at -O2 or higher (which is not worth it, given that
I've already done the small work needed for them to keep debug info
accurate), most of the passes will still keep the information
accurate, nearly all of them without any code changes whatsoever.

So, doing this only for -O1 seems like a waste, given that -O2 is the
most common optimization level, and it's most often accompanied by -g.

> Trying to rework all optimizations to keep perfect debug information
> is going to take forever and make the compiler worse.

This statement is easy to make and to believe, but my approach is
proving it false, given a design that took this concern into account.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 17:59                                         ` Alexandre Oliva
@ 2007-12-17 18:02                                           ` Diego Novillo
  2007-12-17 20:43                                             ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Diego Novillo @ 2007-12-17 18:02 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/17/07 12:51, Alexandre Oliva wrote:

> I guess I'm to blame, for having naÃƒÂ¯vely put the code out without as
> much as a design and goals document

Yes, you are.

You need to provide such a document now.  I can't see how you'll be able 
to incorporate your implementation without a convincing design.

The barrier is probably going to be higher.  You raised too much 
controversy, so I have my doubts about your simplicity claims.


Diego.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  1:27                                       ` Daniel Berlin
  2007-12-17  5:38                                         ` Joe Buck
@ 2007-12-17 17:59                                         ` Alexandre Oliva
  2007-12-17 18:02                                           ` Diego Novillo
  1 sibling, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-17 17:59 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 16, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

>> It is obvious that you misunderstood what I want, and how intrusive
>> the approach is.

> Yes Alexandre, everyone who disagrees with you must not understand!

My conclusion is not based on disagreement, but rather on the faulty
arguments presented during the discussion.

For example, when you took the argument that every transformation had
effects on debug information, and used that to conclude that every
transformation would need difficult changes to generate correct debug
information, you left out from your reasoning a major strength of the
design, that I had mentioned in the e-mail you responded to: that the
optimizers already perform the transformations we need to keep debug
information accurate.

So, by missing or misunderstanding an essential part of the thought
process that went into the design, you came to a false conclusion
about it.

> That's really the problem here.
> None of us understand but you.

I guess I'm to blame, for having naïvely put the code out without as
much as a design and goals document, such that people started looking
at it without actually understanding what it was about, and at the
same time taking conclusions about it based on hunches rather than on
solid logical grounds.

At this point, we have a scenario in which people have already jumped
to their conclusions, and whatever I say requires a much higher
threshold to be listened to and accepted.  It's quite unfortunate that
psychological factors take such a large role in the making of
technical decisions, and I naïvely assumed this wouldn't raise so much
rejection, for being such a simple and well thought-out design.  Oh,
well...  Something to avoid next time...

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  5:38                                         ` Joe Buck
@ 2007-12-17  8:20                                           ` Geert Bosch
  2007-12-18  1:24                                             ` Alexandre Oliva
  2007-12-17 18:33                                           ` Alexandre Oliva
  1 sibling, 1 reply; 189+ messages in thread
From: Geert Bosch @ 2007-12-17  8:20 UTC (permalink / raw)
  To: Joe Buck
  Cc: Daniel Berlin, Alexandre Oliva, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 16, 2007, at 20:27, Joe Buck wrote:
> I have some sympathy for going in Alexandre's direction, in that it
> would be nice to have a mode that provided optimization as well as
> accurate debugging.  However, since preserving accurate debug  
> information
> has a cost, I think it would be better to turn -O1, not -O2, into the
> mode that Alexandre wants, where debug information is preserved.   
> Trying
> to rework all optimizations to keep perfect debug information is going
> to take forever and make the compiler worse.

Right, at the moment -O1 is far too much like -O2.
There is room for an optimization mode that is mostly local,
scales well far large programs and allows for high-quality debug
information. Fortunately, these goals seem all to match.

We could conceptually have inspection points between each source
statement and declaration, which would roughly correspond to a
use of all memory and all source variables, wether in memory or
in registers.
These inspections points would be considered potentially trapping.

This approach would still allow some scheduling. For example, loads
and arithmetic operations that are known not to trap could still
be done early. On the other hand, when breaking at any statement,
all variables can be printed.

Also, since no user-visible state can be modified by speculatively
executed instructions such as loads, such instructions should not
be tagged with their original source location information.
This would prevent the very annoying and unhelpful jumping around
the program during debugging.

The method I describe here, which roughly corresponds to the semantics
of Ada's "pragma Inspection_Point", seems relatively easy to implement
using an empty "asm" or similar.

   -Geert

PS. For convenience, I'm including a snippet of the Ada 2005 standard,
     the full version of which is freely available on the web.

H.3.2 Pragma Inspection_Point

1     An occurrence of a pragma Inspection_Point identifies a set of  
objects
each of whose values is to be available at the point(s) during program
execution corresponding to the position of the pragma in the  
compilation unit.
The purpose of such a pragma is to facilitate code validation.

                                    Syntax

2     The form of a pragma Inspection_Point is as follows:

3       pragma Inspection_Point[(object_name {, object_name})];

                                Legality Rules

4     A pragma Inspection_Point is allowed wherever a declarative_item  
or
statement is allowed. Each object_name shall statically denote the  
declaration
of an object.

                               Static Semantics

5/2   An inspection point is a point in the object code corresponding  
to the
occurrence of a pragma Inspection_Point in the compilation unit. An  
object is
inspectable at an inspection point if the corresponding pragma
Inspection_Point either has an argument denoting that object, or has no
arguments and the declaration of the object is visible at the inspection
point.

                               Dynamic Semantics

6     Execution of a pragma Inspection_Point has no effect.

                          Implementation Requirements

7     Reaching an inspection point is an external interaction with  
respect to
the values of the inspectable objects at that point (see 1.1.3).

                          Documentation Requirements

8     For each inspection point, the implementation shall identify a  
mapping
between each inspectable object and the machine resources (such as  
memory
locations or registers) from which the object's value can be obtained.

       NOTES

9/2   7  The implementation is not allowed to perform "dead store
       elimination" on the last assignment to a variable prior to a  
point where the
       variable is inspectable. Thus an inspection point has the  
effect of an
       implicit read of each of its inspectable objects.

10    8  Inspection points are useful in maintaining a correspondence  
between
       the state of the program in source code terms, and the machine  
state
       during the program's execution. Assertions about the values of  
program
       objects can be tested in machine terms at inspection points.  
Object code
       between inspection points can be processed by automated tools  
to verify
       programs mechanically.

11    9  The identification of the mapping from source program objects  
to
       machine resources is allowed to be in the form of an annotated  
object
       listing, in human-readable or tool-processable form.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  1:27                                       ` Daniel Berlin
@ 2007-12-17  5:38                                         ` Joe Buck
  2007-12-17  8:20                                           ` Geert Bosch
  2007-12-17 18:33                                           ` Alexandre Oliva
  2007-12-17 17:59                                         ` Alexandre Oliva
  1 sibling, 2 replies; 189+ messages in thread
From: Joe Buck @ 2007-12-17  5:38 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Sun, Dec 16, 2007 at 08:12:07PM -0500, Daniel Berlin wrote:
> > It is obvious that you misunderstood what I want, and how intrusive
> > the approach is.
> >
> 
> Yes Alexandre, everyone who disagrees with you must not understand!
> That's really the problem here.
> None of us understand but you.

I have some sympathy for going in Alexandre's direction, in that it
would be nice to have a mode that provided optimization as well as
accurate debugging.  However, since preserving accurate debug information
has a cost, I think it would be better to turn -O1, not -O2, into the
mode that Alexandre wants, where debug information is preserved.  Trying
to rework all optimizations to keep perfect debug information is going
to take forever and make the compiler worse.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-16 12:47                                     ` Alexandre Oliva
@ 2007-12-17  1:27                                       ` Daniel Berlin
  2007-12-17  5:38                                         ` Joe Buck
  2007-12-17 17:59                                         ` Alexandre Oliva
  0 siblings, 2 replies; 189+ messages in thread
From: Daniel Berlin @ 2007-12-17  1:27 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

> It is obvious that you misunderstood what I want, and how intrusive
> the approach is.
>

Yes Alexandre, everyone who disagrees with you must not understand!
That's really the problem here.
None of us understand but you.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-15 22:51                                 ` Alexandre Oliva
  2007-12-16  6:27                                   ` Daniel Berlin
@ 2007-12-16 22:20                                   ` Mark Mitchell
  1 sibling, 0 replies; 189+ messages in thread
From: Mark Mitchell @ 2007-12-16 22:20 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Robert Dewar, Ian Lance Taylor, Richard Guenther,
	gcc-patches, gcc

Alexandre Oliva wrote:

>> Yes, please.  I would very much like to see an abstract design
>> document on what you are trying to accomplish.
> 
> Other than the ones I've already posted, here's one:
> 
> http://dwarfstd.org/Dwarf3Std.php
> 
> Seriously.  There is a standard for this stuff. 

That's the specification for the encoding format.  I agree with you that
emitting incorrect debugging information, in the sense of declaring that
the location of a variable is in one place, even though its value is not
available in that place, is bad.  In -O0 code, I consider it a serious bug.

In -O2 code, I think it's still a bug, but with our current
infrastructure, we may have little choice: we either deny all knowledge
of the variable's location, or give one that's sometimes incorrect.
Which alternative is better depends on what you're trying to do with the
information; for interactive debugging, mostly-right is probably better
than nothing, whereas for some programmatic activities, the opposite may
be true.

If your goal is to avoid the information ever being wrong -- without
worrying about whether it is complete -- there is of course a trivial
solution: do not emit the information.  That is not a serious
suggestion, but it does provide a path to a serious suggestion, which I
gave earlier: conservatively emit location information you provide based
on what you can prove at the time you generate debugging information.
For example, if the value of "x" is in a register, and you cross a call
which might clobber that register value, then emit debugging information
that says that at that point the value is unavailable.  You could
probably do this kind of thing with relatively few changes to the GCC
internal representation; you would run a pass before debug-information
generation that attempted to prove dataflow properties about variables
and told you where values could reliably be found.

Your earlier messages, however, suggest that you are trying to do
something harder: emit information that is essentially both complete (in
the sense of providing as much information as possible about the
locations and values of variables) and correct (in the sense of never
giving incorrect information).  If you want to do that, you're going to
have to answer the harder questions, like "what line number corresponds
to this address?" and "what should the debugging information say that
the value of a variable is when it has been optimized away?"

If that's still your goal, then pointing at the DWARF3 specification
doesn't help.  Diego and I are asking you to confront these fundamental
questions about what information you want to provide and what the
correctness criteria are.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-16  6:27                                   ` Daniel Berlin
@ 2007-12-16 12:47                                     ` Alexandre Oliva
  2007-12-17  1:27                                       ` Daniel Berlin
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-16 12:47 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 16, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> There is no portion of the DWARF3 spec which requires you output
> information that is correct or useful. The same way the C standard
> does not require you to write correct programs, only valid ones, the
> DWARF3 spec does not require you to output correct information, only
> information that is encoded properly.

But if a C compiler translated programs to garbage, that would be
wrong.  By the same reasoning, if a Dwarf producer created garbage,
that would be wrong.

It's true that most of Dwarf 3 attributes are optional.  But when it
says "if you output this attribute, its operand must be such and
such", if you output the attribute with operands that don't match the
specification, that's a bug.

> It is certainly a goal of DWARF3 to allow producers to provide correct
> info

Exactly.  And where's the permission to provide incorrect info, rather
than merely leaving it out?

>> I've heard this "intrusiveness" argument be pointed out so many times,
>> by so many people that claim to not have been able to keep up with the
>> thread, and who claim to have not looked at the patches at all, that
>> I'm more and more convinced it's just fear of the unknown than any
>> actual rational evaluation of the impact of the changes.

> Well, no.
> You yourself have shown it to be intrusiveness in the extreme, in the
> very next paragraphs!

> "
> At some point you have to face reality and see that such information
> isn't kept around by magic, it takes some effort, and this effort is
> needed at every location where there are changes that might affect
> debug information.  And that's pretty much everywhere. "

> So, everywhere needs to change. That's pretty intrusiveness, no?

No.  Looks like selective attention, because you're reasoning out the
part in which I discussed using the strength of the optimizers against
the problem, by letting them do what they are already used to on the
debug information too.

If we add a new RTL code or a new TREE code, is that intrusive because
now every optimization pass will deal with the new node types in very
much the same way they've dealt with other similar node types forever?
Of course not.

And if we have to add a few exceptions here and there to deal with the
specifics of this new node type, does that become too intrusive then?
I don't think so.

Then what's the fuss about the new node types?  Do you want to count
the number of places in which INSN_P remains there, lexically
unchanged, and compare with the number of places in which I've added a
!DEBUG_INSN_P after it?

> Having to stop and think at every point in an optimization about the
> debug info,

Well, sorry, writing compilers is hard.  You have to think about
several things at the same time.  Shall we just go shopping instead?

I'm trying to make it as simple as possible.  The fact that nearly
100% of the code is unchanged seems to indicate to me that it's not
such a bad an approach, but if you want something that just magically
works, you're up for much disappointment.

> (having to stop and think about debug info at every single point of
> every single optimization).

Information doesn't come out of thin air, and thin air doesn't
maintain information accurate just because we wish it does.  We have
to work to create and update the information throughout compilation,
at every transformation, and my reasoning is precisely that optimizers
already do this all the time, so why not use them for what we need?

> You don't need to be this intrusiveness to stop outputting the
> incorrect info we do.

What do you have to back your statement up?

Let me help you: sure we don't.  We can just refrain from outputting
any debug information whatsoever.  Then, it will be compliant with the
standard.  But it won't be useful.

>> I've never seen this documented as such, and we've never worked toward
>> these stated goals.

> Who is we?
> I certainly have worked exactly towards these goals.
> As have almost all the authors of the current debugging info
> framework.

Oh, wow, I guess I just wasn't welcome into the club, because I didn't
get the guidelines book.  How unfortunate, now I have to give up my
plan of doing better and abide by the unpublished and undocumented
goals of some small cabal.  Or do I?

> If you look in the mailing list archives, you will even discover Diego
> is not the first one have exactly the viewpoint about what should and
> should not be debuggable, and that the community has consistenly
> worked towards exactly the viewpoint diego describes.

I've seen several different viewpoints from "the community".

> Anyway, I give up on reading this thread.  It has turned into a mess.
> You really need to step back

Oh, do I?  Why is that?

> and see that you have not achieved any sort of consensus of what
> levels of optimization should be how debuggable,

Why would I expect to get any consensus on that?  I haven't even
tried, and I won't.  This is not what the issue is about.  The issue
is about not emitting incorrect information.  Better debuggability for
all levels of optimization will be a side effect of achieving that,
and it will be achievable incrementally once we have an actual
framework that enables us to take steps in this direction without
introducing further regressions.

> I certainly wouldn't agree that we should take such intrusive steps to
> make -O2 -g as debuggable as you want,

It is obvious that you misunderstood what I want, and how intrusive
the approach is.

> I'd much rather see us do what we can easily, and drop any info that
> ends up being incorrect.

So what's your plan to find out what's incorrect?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-15 22:51                                 ` Alexandre Oliva
@ 2007-12-16  6:27                                   ` Daniel Berlin
  2007-12-16 12:47                                     ` Alexandre Oliva
  2007-12-16 22:20                                   ` Mark Mitchell
  1 sibling, 1 reply; 189+ messages in thread
From: Daniel Berlin @ 2007-12-16  6:27 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/15/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec  5, 2007, Diego Novillo <dnovillo@google.com> wrote:
>
> > On 11/25/07 3:43 PM, Mark Mitchell wrote:
>
> >> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
> >> GCC developer with an interest in improving the compiler in the same way
> >> that you're trying to do) is that you stop writing code and start
> >> writing a paper about what you're trying to do.
> >>
> >> Ignore the implementation.  Describe the problem in detail.  Narrow its
> >> scope if necessary.  Describe the success criteria in detail.  Ideally,
> >> the success criteria are mechanically checkable properties: i.e., given
> >> a C program as input, and optimized code + debug information as output,
> >> it should be possible to algorithmically prove whether the output is
> >> correct.
>
> > Yes, please.  I would very much like to see an abstract design
> > document on what you are trying to accomplish.
>
> Other than the ones I've already posted, here's one:
>
> http://dwarfstd.org/Dwarf3Std.php
>
> Seriously.  There is a standard for this stuff.  My ultimate goal in
> this project is that we comply with it
Comply with it how?

There is no portion of the DWARF3 spec which requires you output
information that is correct or useful. The same way the C standard
does not require you to write correct programs, only valid ones, the
DWARF3 spec does not require you to output correct information, only
information that is encoded properly.

It is certainly a goal of DWARF3 to allow producers to provide correct
info (as witness by the one of the listed goals: "Debugging
information must provide consumers a way to find the location of
program variables,  determine the bounds of dynamic arrays and
strings, and possibly to find the base address of a  subroutine's
stack frame or the return address of a subroutine. Furthermore, to
meet the needs of recent computer architectures and optimization
techniques, debugging information must be  able to describe the
location of an object whose location changes over the object's
lifetime.")

If you search the entire spec for the word "correct", you will find it
3 times.  If you search for "must", you will discover they all related
to encoding or the goals of the standard.

It may be entirely useless to output incorrect information, and in
fact, worse than useless.
It is however, compliant, as long as they are encoded properly.

I have to say, this is typical of the argumentation you have used thus
far in this thread, and honestly, it's not winning you any points.

That said, nobody here believes we should output useless or incorrect
info, even though we could.  A lot of people appear to disagree with
you about the best way to do it, and in fact, about what we should be
trying to provide users in what cases.

>
>What part of instrusiveness are you concerned about?  The change of
>INSN_P such that it covers DEBUG_INSN_P too in the supported range?
>Or the few changes that revert to the original INSN_P, in the few
>exceptions in which DEBUG_INSN_P is not to be handled as an INSN?

>I've heard this "intrusiveness" argument be pointed out so many times,
>by so many people that claim to not have been able to keep up with the
>thread, and who claim to have not looked at the patches at all, that
>I'm more and more convinced it's just fear of the unknown than any
>actual rational evaluation of the impact of the changes.

Well, no.
You yourself have shown it to be intrusiveness in the extreme, in the
very next paragraphs!

"
At some point you have to face reality and see that such information
isn't kept around by magic, it takes some effort, and this effort is
needed at every location where there are changes that might affect
debug information.  And that's pretty much everywhere. "

So, everywhere needs to change. That's pretty intrusiveness, no?

"Sure, this might require a little bit more thinking in some
optimizations.  But in my experience fixing up the tree and rtl passes
that needed tweaking, the additional thinking needed is a no-brainer
in most cases; in a few, you have to work a bit harder to keep
information around rather than simply noting it as unavailable. "

Having to stop and think at every point in an optimization about the
debug info, having to deal with debug info at every single point of
change, and then your other patches
This is intrusiveness as well (having to stop and think about debug
info at every single point of every single optimization).

You don't need to be this intrusiveness to stop outputting the
incorrect info we do.

>I've never seen this documented as such, and we've never worked toward
> these stated goals.

Who is we?
I certainly have worked exactly towards these goals.
As have almost all the authors of the current debugging info
framework.  The reason it is the way it is  because these in fact,
*were exactly the goals we were working towards*.
As for not documented, a lot of gcc is not documented.
If you look in the mailing list archives, you will even discover Diego
is not the first one have exactly the viewpoint about what should and
should not be debuggable, and that the community has consistenly
worked towards exactly the viewpoint diego describes.

Anyway, I give up on reading this thread.  It has turned into a mess.
You really need to step back and see that you have not achieved any
sort of consensus of what levels of optimization should be how
debuggable, before you start telling everyone their approach isn't as
good as yours.

I certainly wouldn't agree that we should take such intrusive steps to
make -O2 -g as debuggable as you want,  I'd much rather see us do what
we can easily, and drop any info that ends up being incorrect.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-05 14:21                               ` Diego Novillo
  2007-12-05 22:10                                 ` Joe Buck
@ 2007-12-15 22:51                                 ` Alexandre Oliva
  2007-12-16  6:27                                   ` Daniel Berlin
  2007-12-16 22:20                                   ` Mark Mitchell
  1 sibling, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-15 22:51 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Mark Mitchell, Robert Dewar, Ian Lance Taylor, Richard Guenther,
	gcc-patches, gcc

On Dec  5, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 11/25/07 3:43 PM, Mark Mitchell wrote:

>> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
>> GCC developer with an interest in improving the compiler in the same way
>> that you're trying to do) is that you stop writing code and start
>> writing a paper about what you're trying to do.
>> 
>> Ignore the implementation.  Describe the problem in detail.  Narrow its
>> scope if necessary.  Describe the success criteria in detail.  Ideally,
>> the success criteria are mechanically checkable properties: i.e., given
>> a C program as input, and optimized code + debug information as output,
>> it should be possible to algorithmically prove whether the output is
>> correct.

> Yes, please.  I would very much like to see an abstract design
> document on what you are trying to accomplish.

Other than the ones I've already posted, here's one:

http://dwarfstd.org/Dwarf3Std.php

Seriously.  There is a standard for this stuff.  My ultimate goal in
this project is that we comply with it, at least as far as emitting
debug information for location of variables is concerned.

Here are some relevant postings on design strategies, rationales and
goals:

http://gcc.gnu.org/ml/gcc/2007-11/msg00229.html (goals)
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html (initial plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00261.html (detailed plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00317.html (example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00590.html (more example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00176.html (design rationale)
http://gcc.gnu.org/ml/gcc/2007-11/msg00177.html (clarification)

> I would like to see exactly what Mark is asking for.  Perhaps a
> presentation in next year's Summit?

Sure, if there's interest, I could sure plan on doing that.  I could
use sponsors, BTW; I haven't discussed this with my employer, and
writing articles and presenting speeches are not part of this
assignment I was given.  Anyhow, by the time of the next year's
Summit, I hope this is mostly old news.

> I don't think I understand the goal of the project.

Follow the standard, as in (1) emit debug information that is correct
(standard-compliant), as in, if we emit some piece of debug
information, it reflects reality, rather than being a sometimes
distant approximation of some past reality long destroyed by some
optimization pass, and (2) emit debug information that is more
complete, as in, we currently fail to emit a lot of debug information
that we could, because we lose track of the location of variables as
optimization passes fail to maintain the needed information to do so.

> "Correct debugging info" means little, particularly if you say that
> it's not debuggers that you are thinking about.

Thinking of the debuggers is a mistake.  We don't think of specific
compilers when reading a programming language standard.  We don't
think of specific processors when reading an ISA or ABI specification.
Even when we read documentation specific to a processor, we still
don't think of its internal implementation details in order to write a
compiler for it; even the scheduling properties are abstracted out in
the design specification and optimization guidelines.

When someone finds that the compiler deviates from one of these
standards, we just cite chapter and verse of the relevant standard,
and people see there's a bug.

Why should debug information standards be treated any differently?

> It's certainly worrisome that your implementation seems to be
> intrusive to the point of brittleness.

What part of instrusiveness are you concerned about?  The change of
INSN_P such that it covers DEBUG_INSN_P too in the supported range?
Or the few changes that revert to the original INSN_P, in the few
exceptions in which DEBUG_INSN_P is not to be handled as an INSN?

I've heard this "intrusiveness" argument be pointed out so many times,
by so many people that claim to not have been able to keep up with the
thread, and who claim to have not looked at the patches at all, that
I'm more and more convinced it's just fear of the unknown than any
actual rational evaluation of the impact of the changes.

Seriously.  Have a look at the patches and tell me what in them you
regard as intrusive.

We're talking about infrastructure here, needed to fix GCC's
carelessness about maintaining a mapping between source and
implementation concepts that went on for years and years, while
optimizations were added and debug information was degraded.

At some point you have to face reality and see that such information
isn't kept around by magic, it takes some effort, and this effort is
needed at every location where there are changes that might affect
debug information.  And that's pretty much everywhere.  Even if we had
consistent interfaces to make some changes, such as variable renaming,
substitution, etc, this would only cover a small amount of the data a
debug info generator would need: it needs higher-level information
than that, especially in rtl, where transformations, for historical
reasons, are messier than in the tree IL.

So, the approach I've taken is to use the strength of the problem
against itself: take advantage of the fact that optimizers already
know how to perform transformations they need to do in order to keep
things consistent, and represent debug information in a way that, to
them, will look just like any other use, so they will adjust it
likewise.  And then, on top of that, handle the few exceptions, in
which the optimizer needs to do something cleverer, because the
transformation it performs wouldn't work when say there's more than
one use or so.

> Will every new optimization need to think about debug information
> from scratch and refrain from doing certain transformations?

Refraining from doing certain transformations would be wrong.  We
don't want debug information to affect code generation, and we don't
want it to reduce the amount of optimization you can make.  So, you
optimize away, and if you find that you can't keep track of debug
information, you mark stuff as unavailable, or, most likely, the
safety nets in place will do that for you, rather than taking the
current approach, in which we silently corrupt debug information.

Sure, this might require a little bit more thinking in some
optimizations.  But in my experience fixing up the tree and rtl passes
that needed tweaking, the additional thinking needed is a no-brainer
in most cases; in a few, you have to work a bit harder to keep
information around rather than simply noting it as unavailable.  But
it has never required optimizations to be disabled, and it must not do
so.  In fact, in a few cases, I noticed we were missing trivial
optimizations and fixed them.

> In my simplistic view of this problem, I've always had the idea that
> -O0 -g means "full debugging bliss", -O1 -g means "tolerable
> debugging" (symbols shouldn't disappear, for instance, though they do
> now) and -O2 -g means "you can probably know what line+function you're
> executing".

I've never seen this documented as such, and we've never worked toward
these stated goals.  However, I see that, underlying all of this, we
should be concerned about emitting debug information that is correct,
i.e., never emit information that says the location of FOO is BAR
while it's actually at BAZ.

I've seen many people (including myself, in a distant past) claiming
that imprecise information is better than no information.  I've
learned better.  Debugger information consumers are often equipped
with heuristics to fill in common gaps in debug information.

But if the information is there, and wrong, the heuristics that might
very well have worked are disabled in favor of the incorrect
information, and then the whole system (debuggers, monitors, etc,
along with the program) misbehaves.

And then, even when heuristics don't exist and the information is
gone, it's better to tell the user "I don't know how to get you that"
than to hand it something other than it needs (e.g., an incorrect
variable location).

> But you seem to be addressing other problems.  And it even seems to me
> that you want debugging information that is capable of deconstructing
> arbitrary transformations done by the optimizers.

No.  I don't see where this notion came from, but it appears to be
quite widespread.  Omitting certain pieces of debug information is
almost always correct, since most debug info attributes are optional.
But emitting information that doesn't reflect the program is always
incorrect.

So, if you perform an arbitrary transformation that is too hard to
represent in debug information, that's fine, just throw the
information away.  The debug information might become less complete,
and therefore less useful, but it will at least won't induce errors
elsewhere.

The parallel I draw is that emitting an optional piece of debug
information is like applying an optional optimization.  If it's
correct, and it's not too expensive, go for it.  But if it's going to
get you the wrong output, it's broken, so don't do it.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-15 20:32             ` Alexandre Oliva
@ 2007-12-15 21:41               ` Robert Dewar
  0 siblings, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-12-15 21:41 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

Alexandre Oliva wrote:
> On Nov 24, 2007, Robert Dewar <dewar@adacore.com> wrote:
> 
>> Alexandre Oliva wrote:
> 
>>> Besides, the Ada RTS compiles differently with -g than without -g,
>>> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
>>> but me seems to care.
> 
>> We certainly care about this, and appreciate efforts to fix it!
> 
> Should be fixed now, FWIW.

Good to hear, definition worth while!
that's an important invariant.
> 

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25  0:39           ` Robert Dewar
@ 2007-12-15 20:32             ` Alexandre Oliva
  2007-12-15 21:41               ` Robert Dewar
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-12-15 20:32 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

On Nov 24, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:

>> Besides, the Ada RTS compiles differently with -g than without -g,
>> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
>> but me seems to care.

> We certainly care about this, and appreciate efforts to fix it!

Should be fixed now, FWIW.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-05 14:21                               ` Diego Novillo
@ 2007-12-05 22:10                                 ` Joe Buck
  2007-12-15 22:51                                 ` Alexandre Oliva
  1 sibling, 0 replies; 189+ messages in thread
From: Joe Buck @ 2007-12-05 22:10 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Mark Mitchell, Alexandre Oliva, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Wed, Dec 05, 2007 at 09:05:33AM -0500, Diego Novillo wrote:
> In my simplistic view of this problem, I've always had the idea that -O0 
> -g means "full debugging bliss", -O1 -g means "tolerable debugging" 
> (symbols shouldn't disappear, for instance, though they do now) and -O2 
> -g means "you can probably know what line+function you're executing".

I'd be happy enough if the state of -O1 -g debugging were improved,
perhaps using some of Alexandre's ideas so that it could be "full
debugging bliss" with some optimization as well.  Speeding up the
compile/test/debug/modify cycle would result.  We could then have fast
but fully debuggable code at -O1, and even faster code at -O2 not
constrained by the requirement of, as Diego says, "deconstructing
arbitrary transformations done by the optimizers". 

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26  6:10                             ` Mark Mitchell
@ 2007-12-05 14:21                               ` Diego Novillo
  2007-12-05 22:10                                 ` Joe Buck
  2007-12-15 22:51                                 ` Alexandre Oliva
  0 siblings, 2 replies; 189+ messages in thread
From: Diego Novillo @ 2007-12-05 14:21 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Alexandre Oliva, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 11/25/07 3:43 PM, Mark Mitchell wrote:

> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
> GCC developer with an interest in improving the compiler in the same way
> that you're trying to do) is that you stop writing code and start
> writing a paper about what you're trying to do.
> 
> Ignore the implementation.  Describe the problem in detail.  Narrow its
> scope if necessary.  Describe the success criteria in detail.  Ideally,
> the success criteria are mechanically checkable properties: i.e., given
> a C program as input, and optimized code + debug information as output,
> it should be possible to algorithmically prove whether the output is
> correct.

Yes, please.  I would very much like to see an abstract design document 
on what you are trying to accomplish.  I have been trying to follow this 
thread but I've gotten lost.  It's full of implementation details, 
rhetoric and high-level discussion.

I would like to see exactly what Mark is asking for.  Perhaps a 
presentation in next year's Summit?  I don't think I understand the goal 
of the project.  "Correct debugging info" means little, particularly if 
you say that it's not debuggers that you are thinking about.

It's certainly worrisome that your implementation seems to be intrusive 
to the point of brittleness.  Will every new optimization need to think 
about debug information from scratch and refrain from doing certain 
transformations?

In my simplistic view of this problem, I've always had the idea that -O0 
-g means "full debugging bliss", -O1 -g means "tolerable debugging" 
(symbols shouldn't disappear, for instance, though they do now) and -O2 
-g means "you can probably know what line+function you're executing".

But you seem to be addressing other problems.  And it even seems to me 
that you want debugging information that is capable of deconstructing 
arbitrary transformations done by the optimizers.  But I think I'm just 
lost in this thread, so a high-level design document would be perfect to 
  expose your ideas.

Diego.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-27 18:33                               ` Michael Matz
@ 2007-11-27 20:37                                 ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-27 20:37 UTC (permalink / raw)
  To: Michael Matz
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 27, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Mon, 26 Nov 2007, Alexandre Oliva wrote:

>> >> And then, you have to tweak everything else to keep the note that
>> >> replaced the set up to date as you further optimize the code.
>> 
>> > No.  remove_insn() would replace the SET with a note.
>> 
>> What information would this note convey?

> Oh my, sorry for adding confusion to the topic: I meant to write "would 
> _not_ replace the SET with a note".

Aah, ok.  So, you do indeed completely lose track of the crucial
differences between the two cases for the removal of a SET.  And not
only about their implications, but also about where they ought to take
effect.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-27  7:31                             ` Alexandre Oliva
@ 2007-11-27 18:33                               ` Michael Matz
  2007-11-27 20:37                                 ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Michael Matz @ 2007-11-27 18:33 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Hi,

On Mon, 26 Nov 2007, Alexandre Oliva wrote:

> >> And then, you have to tweak everything else to keep the note that
> >> replaced the set up to date as you further optimize the code.
> 
> > No.  remove_insn() would replace the SET with a note.
> 
> What information would this note convey?

Oh my, sorry for adding confusion to the topic: I meant to write "would 
_not_ replace the SET with a note".


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 18:19                           ` Michael Matz
@ 2007-11-27  7:31                             ` Alexandre Oliva
  2007-11-27 18:33                               ` Michael Matz
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-27  7:31 UTC (permalink / raw)
  To: Michael Matz
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 26, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Fri, 23 Nov 2007, Alexandre Oliva wrote:

>> On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:
>> 
>> > The nice thing is, that there are only few places which really get rid of 
>> > SETs: remove_insn.  You have to tweak that to keep the information around, 
>> > not much else (though that claim remains to be proven :) ).
>> 
>> And then, you have to tweak everything else to keep the note that
>> replaced the set up to date as you further optimize the code.

> No.  remove_insn() would replace the SET with a note.

What information would this note convey?

> After all, there must have been a reason for the SET to be deleted:
> the destination is dead, hence whatever user-variables were
> associated with it also are dead.

Note quite.  The destination could be merely redundant.  And the
difference is crucial.

If you delete a copy (or some other redundant computation, you don't
seem to handle this case) that would install a value in a variable
that is available elsewhere, and then adjust the uses of the variable
such that they use the value elsewhere, you ought to note that the
variable holds that value, and at that point.

If you delete a computation because the result is completely unused,
then you ought to note that you no longer know the value of the
variable (or, ideally, that the variable would hold the result of that
computation if there was code to compute it).

In both cases, you ought to note that earlier values of the variable
are no longer current at that point.

In both cases, the notion of "at that point" is crucial, especially
when you deal with conditional assignments.  You don't want to make it
seem like a conditional assignment applies when the condition doesn't
hold.  Consider:

int foo(bool p, int x, int y) {
  int i = x;

  p1();

  if (p)
    i = y;

  p2();

  i++;

  p3(i);
}

int main() {
  foo (false, 3, 5);
}

At p1()'s caller's frame, you want i to hold the value 3.  At p2()'s,
you want i to still hold the value 3.  At p3(int)'s, it should be 4.

Now, if you change the program such that p is true, then at p1 i is
still 3, but at p2 it ought to be 5, and at p3(int)'s it should be 6.

How do you get that if you drop the assignments on the floor, or even
if you replace them assignments with notes that don't keep the correct
values associated not only with the names, but also with the points in
the program?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 18:10                         ` Michael Matz
@ 2007-11-27  3:48                           ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-27  3:48 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov 26, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Fri, 23 Nov 2007, Alexandre Oliva wrote:

>> Yep.  Nowhere does that bug report request parameters to be forced live.  

> Not in that bug report perhaps, but we got requests for exactly this, i.e. 
> to be able to introspect all parameters of all functions, be they inlined 
> or not, at all time.  I think that's a reasonable request even (which in 
> some situations comes at a cost).

Fair enough.  And we agree this is not about debug info, it's about
limiting optimizations, so this is indeed a different problem from the
one I was asked to address.

>> 2. function is inlined, the argument is unused and thus optimized
>> away, but the function does some other useful computation
>> 
>> At the inlined entry point, we have a note that binds the argument to
>> its expected value.  As we transform the program and optimize away the
>> argument, we retain and update the note,

> As far as possible.  If it's not possible you loose (with our 
> requirements).

If the argument is completely removed, yes, you won't be able to get
to it by merely improving debug information.  You actually have to
change the generated code.

>> If the value of a variable is completely optimized away at a point in 
>> the porogram, the correct representation for its location at that point 
>> is an empty set.

> I think this is academic.  If a value is dead, but happens to lie in a 
> place which isn't yet overwritten with something else, it is harmless to 
> reveal this value.  It's the "last" value the variable had.  If OTOH the 
> place _is_ already overwritten then it's important that we _don't_ say the 
> dead variable lies therein.

Exactly.  Full agreement.  I wasn't talking about the *location* of
the variable, or the variable itself.  I was talking about the value.
And I wrote "completely optimized away", not "dead".  Liveness has
very little to do with this issue.

The only catch is that, once a variable should be *expected* to hold a
different value, if debug information still claims the variable still
holds the old value it shouldn't hold any more, just because the value
happens to be around and the assignment of the new value could be
optimized away, then I'd say debug information is incorrect.

> So, for me correctness is defined a bit different than for you:
> 1) if location L contains value X, then debug info should say so (as much 
>    as possible, i.e. here the quality of the info comes into play)
> 2) if location L does not contain value X, debug info should not say that 
>    it does.  This is the correctness part.

Your definition is exactly what I've been trying to communicate.  It
looks like we're in complete agreement as to the goals and the two
different metrics (1 being completeness, 2 being correctness).  So
either there's some other underlying difference or you'll soon realize
that the simple SSA name<->variable mapping is insufficient to get you
correctness.

> Where we differ in opinion (I think) is, when location L doesn't contain 
> value X anymore.  For you it's when X becomes dead.  For me it's when X is 
> dead and when location L is overwritten (with something different than X).  

For me, it's when X is overwritten.  That's the point at which the
user is entitled to expect the variable to no longer hold its previous
value (assuming they're different).

Consider this program:

int foo(int x) {
  int i;

  i = x;
  p1();
  i++;
  p2(i);
  i++;
  p3();
}

int main() {
  foo(1);
}

If you set a breakpoint in p1(), go up one frame and print i, you
should ideally get 1 (although "unavailable" is always correct, even
if undesirable).  If you set a breakpoint in p2(int), you should get
2, but "unavailable" is quite likely in the presence of optimization,
depending on the calling conventions.  If you set a breakpoint in
p3(), you should get 3, but "unavailable" is quite likely, given that
the value is not even computed, and it's based on a value that is dead
and thus may have been overwritten.

Getting any other values at any of these points would be a bug in the
compiler.

Does this sound sound to you?

Did you somehow get the impression that the SSA<->names mapping can
get you correct results?

>> Accuracy comes first.  If we ever emit debug information saying 'this
>> variable is here' for a point in the program in which it's in fact
>> elsewhere

> I agree here ...

>> or unavailable, that's a bug to be fixed.

> ... and disagree here.  If a value is dead it's not necessarily 
> unavailable in my world.

I never said "dead", you did.  I said "unavailable", and by that I
don't mean "dead", I really mean "unavailable".  The value I'm talking
about is not "whatever was last assigned to something that resembles
the variable after numerous optimizations" but rather "a value the
user might expect the variable to hold at that point in the program",
given some user tolerance to reordering and other optimizations.

One reason I use separate functions for the breakpoint locations is
precisely because at those points users are entitled to expect the
state of the program to be stable, i.e., there isn't a lot of
reordering or other surprises that a compiler can introduce across
function calls that are by themselves in a statement.

Another reason is that I still don't have a good answer for breakpoint
locations at other points in the program that are less stable across
optimizations, and I can't quite describe what I think users are
entitled to expect at such other points.  But the infrastructure
needed to bring great improvements even in this regard is being set in
place by getting them correct at stable points such as function calls.

That said, I'm putting some thought into getting better debug
information in these less stable points, but making it completely
unsurprising in spite of optimizations isn't the task I was assigned.
Making it correct and far more complete is.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 18:36 J.C. Pizarro
@ 2007-11-26 18:55 ` J.C. Pizarro
  0 siblings, 0 replies; 189+ messages in thread
From: J.C. Pizarro @ 2007-11-26 18:55 UTC (permalink / raw)
  To: gcc, Alexandre Oliva

On Nov 26, 2007, J.C. Pizarro <jcpiza@gmail.com> that i wrote:
>  ...,  last access data for elimination from bigger cache, etc. }

I'm sorry, it's date, not data:

...,  last access date for elimination from bigger cache, etc. }

Sincerely, J.C.Pizarro

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
@ 2007-11-26 18:36 J.C. Pizarro
  2007-11-26 18:55 ` J.C. Pizarro
  0 siblings, 1 reply; 189+ messages in thread
From: J.C. Pizarro @ 2007-11-26 18:36 UTC (permalink / raw)
  To: gcc, Alexandre Oliva

On Nov 26, 2007, "Alexandre Oliva" <aoliva@redhat.com> wrote:
> On Nov 26, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
> > On Nov 26, 2007 7:57 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> >> On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
> >>
> >> > No, hashing is fine, but doing walks over a hashtable when your algorithm
> >> > depends on ordering is not.
> >>
> >> Point.
> >>
> >> > I have patches to fix the instance of walking over all referenced
> >> > vars.  Which is in the case of UIDs using bitmaps and a walk over a
> >> > bitmap (which ensures walks in UID order).
> >>
> >> Why is such memory and CPU overhead better than avoiding the
> >> divergence of UIDs in the first place?
>
> > Actually my patches should be an overall memory savings.
>
> Err...  I don't see how using a bitmap in addition to a hashtable can
> save memory over using only a hashtable.  Or are you saying you do
> away with the hashtables?  I can see that this is possible and
> desirable.
>
> > But, as you (and me and others) look at bugs that happen because of
> > UID divergence, it is easier to use UIDs in a way that guarantees
> > that generated code does not change in such cases.
>
> Agreed, this property is desirable.  But I wouldn't say it is enough.
> Ensuring UIDs remain constant across compilations has helped
> tremendously in locating other compilation divergences, for comparing
> debug dumps becomes much easier.  So, even if we use algorithms that
> don't depend on UIDs remaining constant across compilations, I believe
> it is highly desirable that we keep them constant across compilations.
>
> > Otherwise what's the point in using UIDs?
>
> There are several different reasons for having UIDs, some of which
> could be having some unique identifier for an object, even in the
> presence of a moving garbage collector; being able to create
> fully-ordered sets of objects; being able to easily identify objects
> across a single compilation; being able to easily identify objects
> even across multiple compilations; and I'm sure it's possible to come
> up with other reasons that would justify the idea of UIDs on their
> own.

Hashtables? Bitmaps? Why not to use a database manager?

Maintaining UIDs only in memory isn't a good idea if many re-compilations
has to make.

I've a better idea to maintain the UIDs (unique identifiers of objects)
across compilations using the fastest database's manager "Tokyio Cabinet"
( http://tokyocabinet.sourceforge.net/ ) LGPLed instead of "Berkeley DB".

To use MD4 (128), MD5 (128) or SHA1 (160) as one-way hashing function of UIDs.

The implementor must to decide how to be the coding of input for the hashing
function of UID and how to store the complex object of UID in the DB.

The info to store each object, it's an example:

{ UID, compiled version of GCC, name of object, path of object, creation date,
  type of object, namespace that is using, number of parameters, options
  passed to GCC,  last access data for elimination from bigger cache, etc. }

Sincerely, J.C.Pizarro

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-23 23:56                         ` Alexandre Oliva
@ 2007-11-26 18:19                           ` Michael Matz
  2007-11-27  7:31                             ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Michael Matz @ 2007-11-26 18:19 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Hi,

On Fri, 23 Nov 2007, Alexandre Oliva wrote:

> On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:
> 
> > The nice thing is, that there are only few places which really get rid of 
> > SETs: remove_insn.  You have to tweak that to keep the information around, 
> > not much else (though that claim remains to be proven :) ).
> 
> And then, you have to tweak everything else to keep the note that
> replaced the set up to date as you further optimize the code.

No.  remove_insn() would replace the SET with a note.  It would look at 
other SETs where the information could be put in which is lost.  After 
all, there must have been a reason for the SET to be deleted: the 
destination is dead, hence whatever user-variables were associated with it 
also are dead.  (if they also lie in other places, those are not 
affected).  So it's okay to completely get rid of the SET and decl 
associations.

One special case of the above is, when a SET is deleted which is a copy, 
where the LHS was associated with some variables, but the RHS was not.  
From that point on we can (under certain circumstances) associate the RHS 
with the decls (by changing it's initial SET).

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 12:38                       ` Richard Guenther
@ 2007-11-26 18:10                         ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-26 18:10 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 26, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:

> On Nov 26, 2007 7:57 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
>> On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
>> 
>> > No, hashing is fine, but doing walks over a hashtable when your algorithm
>> > depends on ordering is not.
>> 
>> Point.
>> 
>> > I have patches to fix the instance of walking over all referenced
>> > vars.  Which is in the case of UIDs using bitmaps and a walk over a
>> > bitmap (which ensures walks in UID order).
>> 
>> Why is such memory and CPU overhead better than avoiding the
>> divergence of UIDs in the first place?

> Actually my patches should be an overall memory savings.

Err...  I don't see how using a bitmap in addition to a hashtable can
save memory over using only a hashtable.  Or are you saying you do
away with the hashtables?  I can see that this is possible and
desirable.

> But, as you (and me and others) look at bugs that happen because of
> UID divergence, it is easier to use UIDs in a way that guarantees
> that generated code does not change in such cases.

Agreed, this property is desirable.  But I wouldn't say it is enough.
Ensuring UIDs remain constant across compilations has helped
tremendously in locating other compilation divergences, for comparing
debug dumps becomes much easier.  So, even if we use algorithms that
don't depend on UIDs remaining constant across compilations, I believe
it is highly desirable that we keep them constant across compilations.

> Otherwise what's the point in using UIDs?

There are several different reasons for having UIDs, some of which
could be having some unique identifier for an object, even in the
presence of a moving garbage collector; being able to create
fully-ordered sets of objects; being able to easily identify objects
across a single compilation; being able to easily identify objects
even across multiple compilations; and I'm sure it's possible to come
up with other reasons that would justify the idea of UIDs on their
own.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24  4:58                       ` Alexandre Oliva
@ 2007-11-26 18:10                         ` Michael Matz
  2007-11-27  3:48                           ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Michael Matz @ 2007-11-26 18:10 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

Hi,

On Fri, 23 Nov 2007, Alexandre Oliva wrote:

> Yep.  Nowhere does that bug report request parameters to be forced live.  

Not in that bug report perhaps, but we got requests for exactly this, i.e. 
to be able to introspect all parameters of all functions, be they inlined 
or not, at all time.  I think that's a reasonable request even (which in 
some situations comes at a cost).

> 2. function is inlined, the argument is unused and thus optimized
> away, but the function does some other useful computation
> 
> At the inlined entry point, we have a note that binds the argument to
> its expected value.  As we transform the program and optimize away the
> argument, we retain and update the note,

As far as possible.  If it's not possible you loose (with our 
requirements).

> > For us it also happened in the kernel in namei.c, where real_lookup is 
> > inlined sometimes, and it's arguments are missing.  That might or 
> > might not be reversible functions, so your scheme perhaps would have 
> > helped there.  But generally it won't solve the problem for good.
> 
> It looks like you're trying to solve a different problem.

We work on two fronts:
1) increasing the precision of debug information
2) forcing values life

Our branch, and our ssa-name<->user-name map (and the SET<->decls 
association) is concerned with the first topic.  The second topic can be 
implemented (or hacked) already now, but will potentially be more usefull 
when we also have (1).  So, as in your branch, we are not trying to limit 
optimizers to reach the goal, that's the concern of (2), and happens 
somewhere else.

> I'm trying to get GCC to emit debug information that correctly matches
> the instructions it generated.
> 
> If the value of a variable is completely optimized away at a point in 
> the porogram, the correct representation for its location at that point 
> is an empty set.

I think this is academic.  If a value is dead, but happens to lie in a 
place which isn't yet overwritten with something else, it is harmless to 
reveal this value.  It's the "last" value the variable had.  If OTOH the 
place _is_ already overwritten then it's important that we _don't_ say the 
dead variable lies therein.

So, for me correctness is defined a bit different than for you:
1) if location L contains value X, then debug info should say so (as much 
   as possible, i.e. here the quality of the info comes into play)
2) if location L does not contain value X, debug info should not say that 
   it does.  This is the correctness part.

Where we differ in opinion (I think) is, when location L doesn't contain 
value X anymore.  For you it's when X becomes dead.  For me it's when X is 
dead and when location L is overwritten (with something different than X).  
I think for users there is no practical difference between our approaches, 
but there's a higher cost of implementation for your definition.

> > Then I'm probably still confused what problem you're actually trying to 
> > solve.  If you don't want to be sure you get precise location information 
> > 100% of the time, then what percentage are you required to get?
> 
> Accuracy comes first.  If we ever emit debug information saying 'this
> variable is here' for a point in the program in which it's in fact
> elsewhere

I agree here ...

> or unavailable, that's a bug to be fixed.

... and disagree here.  If a value is dead it's not necessarily 
unavailable in my world.  I think a world requiring this (and hence the 
constraints you were given) is unreasonable.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 11:37                     ` Alexandre Oliva
@ 2007-11-26 12:38                       ` Richard Guenther
  2007-11-26 18:10                         ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Richard Guenther @ 2007-11-26 12:38 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 26, 2007 7:57 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
>
> > No, hashing is fine, but doing walks over a hashtable when your algorithm
> > depends on ordering is not.
>
> Point.
>
> > I have patches to fix the instance of walking over all referenced
> > vars.  Which is in the case of UIDs using bitmaps and a walk over a
> > bitmap (which ensures walks in UID order).
>
> Why is such memory and CPU overhead better than avoiding the
> divergence of UIDs in the first place?

Actually my patches should be an overall memory savings.  But, as you (and
me and others) look at bugs that happen because of UID divergence, it is
easier to use UIDs in a way that guarantees that generated code does not
change in such cases.  Otherwise what's the point in using UIDs?  If you
later do hashtable walks anyway you can hash on the pointer as well.

So, IMHO an algorithm should produce the same result if for an ordered set
of UIDs M { u1, u2, u3 } instead an ordered set M' { u1', u2', u3' } is used
where element correspondence is u1 : u1', u2 : u2', u3 : u3' independent
on the actual values uN or differences between values uN - uM.

Anything else is a bug.  And compensating for those bugs in other places
by trying to preserve the exact values of UIDs is broken (and in this case,
as it delays memory optimization, actually bad).

Just my few euro-cents,
Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25  2:36                   ` Richard Guenther
@ 2007-11-26 11:37                     ` Alexandre Oliva
  2007-11-26 12:38                       ` Richard Guenther
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-26 11:37 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:

> No, hashing is fine, but doing walks over a hashtable when your algorithm
> depends on ordering is not.

Point.

> I have patches to fix the instance of walking over all referenced
> vars.  Which is in the case of UIDs using bitmaps and a walk over a
> bitmap (which ensures walks in UID order).

Why is such memory and CPU overhead better than avoiding the
divergence of UIDs in the first place?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24  4:31                           ` Alexandre Oliva
@ 2007-11-26  6:10                             ` Mark Mitchell
  2007-12-05 14:21                               ` Diego Novillo
  0 siblings, 1 reply; 189+ messages in thread
From: Mark Mitchell @ 2007-11-26  6:10 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:

>> You're again trying to make this a binary-value question.  Why?
> 
> Because in my mind, when we agree there is a bug, then a fix for it
> can is easier to swallow even if it makes the compiler spend more
> resources, whereas a mere quality-of-implementation issue is subject
> to quite different standards.

Unfortunately, not all questions are black-and-white.  I don't think
you're going to get consensus that this issue is as important to fix as
wrong-code (in the traditional sense) problems.  So, arguing about
whether this is a "correctness issue" isn't very productive.

Neither is arguing that there is now some urgent need for machine-usable
debugging information in a way that there wasn't before.  Machines have
been using debugging information for various purposes other than
interactive debugging for ages.  But, they've always had to deal with
the kinds of problems that you're encountering, especially with
optimized code.

I think that at this point you're doing research.  I don't think we have
a well-defined notion of what exactly debugging information should be
for optimized code.  Robert Dewar's definition of -O1 as doing
optimizations that don't interfere with debugging is coherent (though
informal, of course), but you're asking for something more: full
optimization, and, somehow, accurate debugging information in the
presence of that.  I'm all for research, and the thinking that you're
doing is unquestionably valuable.  But, you're pushing hard for a
particular solution and that may be premature at this point.

Debugging information just isn't rich enough to describe the full
complexity of the optimization transformations.  There's no great way to
assign a line number to an instruction that was created by the compiler
when it inserted code on some flow-graph edge.  You can't get exact
information about variable lifetimes because the scope doesn't start at
a particular point in the generated code in the same way that it does in
the source code.

My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
GCC developer with an interest in improving the compiler in the same way
that you're trying to do) is that you stop writing code and start
writing a paper about what you're trying to do.

Ignore the implementation.  Describe the problem in detail.  Narrow its
scope if necessary.  Describe the success criteria in detail.  Ideally,
the success criteria are mechanically checkable properties: i.e., given
a C program as input, and optimized code + debug information as output,
it should be possible to algorithmically prove whether the output is
correct.

For example, how do you define the correctness of debug information for
a variable's location at a given PC?  Perhaps we want to say that giving
the answer "no information available" is always correct, but that saying
"the value is here" when it's not is incorrect; that gives us a
conservative fallback.  How do you define the point in the source
program given a PC?  If the value of "x" changes on line 100, and we're
at an instruction which corresponds line 101, are we guaranteed to see
the changed value?  Or is seeing the previous value OK?  What about some
intermediate value if "x" is being changed byte-by-byte?  What about a
garbage value if the compiler happens to optimize by throwing away the
old value of "x" before assigning a new one?

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25  1:21                 ` Alexandre Oliva
@ 2007-11-25  2:36                   ` Richard Guenther
  2007-11-26 11:37                     ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Richard Guenther @ 2007-11-25  2:36 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 25, 2007 12:28 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
>
> > Generated code shouldn't change if we allocate extra DECL_UIDs, but
> > only possibly if we change DECL_UID ordering.  (If that is the
> > problem, as I remember your analysis)
>
> That is indeed the problem, but I'm not sure your requirement is
> feasible.  If we permit DECL_UID divergence, it means we can't use
> DECL_UID for hashing any more.  Since they already stand for hashable
> proxies for the decl pointers, I don't see what we'd gain by
> introducing yet another hashable uid that's stable across -g.
>
> What do you suggest us to use for hashing?  Or do you suggest us to do
> away with hashing and use sorted set or map data structures?

No, hashing is fine, but doing walks over a hashtable when your algorithm
depends on ordering is not.  I have patches to fix the instance of walking
over all referenced vars.  Which is in the case of UIDs using bitmaps and
a walk over a bitmap (which ensures walks in UID order).

Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 22:34               ` Richard Guenther
@ 2007-11-25  1:21                 ` Alexandre Oliva
  2007-11-25  2:36                   ` Richard Guenther
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-25  1:21 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:

> Generated code shouldn't change if we allocate extra DECL_UIDs, but
> only possibly if we change DECL_UID ordering.  (If that is the
> problem, as I remember your analysis)

That is indeed the problem, but I'm not sure your requirement is
feasible.  If we permit DECL_UID divergence, it means we can't use
DECL_UID for hashing any more.  Since they already stand for hashable
proxies for the decl pointers, I don't see what we'd gain by
introducing yet another hashable uid that's stable across -g.

What do you suggest us to use for hashing?  Or do you suggest us to do
away with hashing and use sorted set or map data structures?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:21         ` Alexandre Oliva
  2007-11-24 20:48           ` Bernd Schmidt
  2007-11-24 21:24           ` Richard Kenner
@ 2007-11-25  0:39           ` Robert Dewar
  2007-12-15 20:32             ` Alexandre Oliva
  2 siblings, 1 reply; 189+ messages in thread
From: Robert Dewar @ 2007-11-25  0:39 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

Alexandre Oliva wrote:

> Besides, the Ada RTS compiles differently with -g than without -g,
> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
> but me seems to care.

We certainly care about this, and appreciate efforts to fix it!
Robert Dewar. We = all the GNAT folks.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 22:01             ` Alexandre Oliva
  2007-11-24 22:34               ` Richard Guenther
@ 2007-11-25  0:20               ` Alexandre Oliva
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-25  0:20 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

On Nov 24, 2007, Alexandre Oliva <aoliva@redhat.com> wrote:

> On Nov 24, 2007, Bernd Schmidt <bernds_cb1@t-online.de> wrote:
>> Alexandre Oliva wrote:
>>> And then, despite the consensus that GCC must not generate different
>>> code with and without -g, the patch that fixes one such regression has
>>> been lingering for months, and the patch that introduced the
>>> regression hasn't been reverted either.

>> Pointers?

> Regression introduced here:

> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01745.html

> first reported here:

> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00127.html

> last proposed patch here:

> http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00608.html

I take it back that this patch wasn't approved.  Mark had approved it
on Nov 5, I didn't want to check it in before going on a trip and,
when I returned, I forgot about the approval because it was in an
unrelated thread.  http://gcc.gnu.org/ml/gcc/2007-11/msg00139.html

I'll shortly check in that one and a bunch of others that also got
approval but that I deferred until my return.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 22:12                         ` Alexandre Oliva
@ 2007-11-24 22:42                           ` Richard Kenner
  0 siblings, 0 replies; 189+ messages in thread
From: Richard Kenner @ 2007-11-24 22:42 UTC (permalink / raw)
  To: aoliva; +Cc: Joe.Buck, gcc-patches, gcc, iant, mark, richard.guenther

> The piece of the puzzle we're still missing is how to get debuggers
> clever enough to decide where to set a breakpoint.  Nowadays, debuggers
> (at least those I'm familiar with) tend to set breakpoints at the
> lowest-numbered PC corresponding to a given source line number.  While
> this is useful at times, at other times what you want is the lowest PC
> after all instructions corresponding to the previous line, because at
> that point you know all the state of the previous line should be stable
> and hopefully still observable.  Or something along these lines.  I don't
> have a complete solution for this problem.  It's very far from trivial,
> and I don't see that debug information can carry enough information for
> the compiler to aid the debugger in selecting where to place breakpoints
> in this regard.

Or you want the first instruction of that line that shows the actual flow
of control.  Or sometimes other things, as you say.

A few of us were discussing this issue in person last week and we strongly
agree with your characterization that it's very far from trivial.  The
consensus we came to is that the compiler should continue associating the
original line number with each instruction that came from it, but perhaps
should also provide additional, not-yet-defined annotations to allow the
debugger to be able to provide various different types of breakpoints,
corresponding to various purposes the programmer us using the breakpoints
for.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 22:01             ` Alexandre Oliva
@ 2007-11-24 22:34               ` Richard Guenther
  2007-11-25  1:21                 ` Alexandre Oliva
  2007-11-25  0:20               ` Alexandre Oliva
  1 sibling, 1 reply; 189+ messages in thread
From: Richard Guenther @ 2007-11-24 22:34 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 24, 2007 9:19 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, Bernd Schmidt <bernds_cb1@t-online.de> wrote:
>
> > Alexandre Oliva wrote:
> >> And then, despite the consensus that GCC must not generate different
> >> code with and without -g, the patch that fixes one such regression has
> >> been lingering for months, and the patch that introduced the
> >> regression hasn't been reverted either.
>
> > Pointers?
>
> Regression introduced here:
>
> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01745.html
>
> first reported here:
>
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00127.html
>
> last proposed patch here:
>
> http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00608.html

Well - it's a workaround for a bug that's elsewhere.  Generated code
shouldn't change
if we allocate extra DECL_UIDs, but only possibly if we change
DECL_UID ordering.
(If that is the problem, as I remember your analysis)

Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 20:08                       ` Joe Buck
@ 2007-11-24 22:12                         ` Alexandre Oliva
  2007-11-24 22:42                           ` Richard Kenner
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24 22:12 UTC (permalink / raw)
  To: Joe Buck
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Joe Buck <Joe.Buck@synopsys.COM> wrote:

> consider dropping observable points where the states will not match.

We can't really do that.  The line number mapping is from PC to line
number, regardless of how far into the execution or earlier lines the
code is.  Omitting certain mappings from PC to line numbers would be
wrong.

The piece of the puzzle we're still missing is how to get debuggers
clever enough to decide where to set a breakpoint.  Nowadays,
debuggers (at least those I'm familiar with) tend to set breakpoints
at the lowest-numbered PC corresponding to a given source line number.
While this is useful at times, at other times what you want is the
lowest PC after all instructions corresponding to the previous line,
because at that point you know all the state of the previous line
should be stable and hopefully still observable.  Or something along
these lines.  I don't have a complete solution for this problem.  It's
very far from trivial, and I don't see that debug information can
carry enough information for the compiler to aid the debugger in
selecting where to place breakpoints in this regard.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:48           ` Bernd Schmidt
@ 2007-11-24 22:01             ` Alexandre Oliva
  2007-11-24 22:34               ` Richard Guenther
  2007-11-25  0:20               ` Alexandre Oliva
  0 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24 22:01 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

On Nov 24, 2007, Bernd Schmidt <bernds_cb1@t-online.de> wrote:

> Alexandre Oliva wrote:
>> And then, despite the consensus that GCC must not generate different
>> code with and without -g, the patch that fixes one such regression has
>> been lingering for months, and the patch that introduced the
>> regression hasn't been reverted either.

> Pointers?

Regression introduced here:

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01745.html

first reported here:

http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00127.html

last proposed patch here:

http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00608.html

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 21:24           ` Richard Kenner
@ 2007-11-24 21:55             ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24 21:55 UTC (permalink / raw)
  To: Richard Kenner
  Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

On Nov 24, 2007, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:

>> Besides, the Ada RTS compiles differently with -g than without -g,
>> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
>> but me seems to care.

> That's wierd.  Except on Windows, VXWorks, and VMS, there's almost
> no code in that file.

Yep.  On GNU/Linux, the difference is precisely that, when compiling
with -g, you get the variables that represent the file open modes to
the output, while compiling without -g they're completely optimized
away.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:21         ` Alexandre Oliva
  2007-11-24 20:48           ` Bernd Schmidt
@ 2007-11-24 21:24           ` Richard Kenner
  2007-11-24 21:55             ` Alexandre Oliva
  2007-11-25  0:39           ` Robert Dewar
  2 siblings, 1 reply; 189+ messages in thread
From: Richard Kenner @ 2007-11-24 21:24 UTC (permalink / raw)
  To: aoliva; +Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

> Besides, the Ada RTS compiles differently with -g than without -g,
> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
> but me seems to care.

That's wierd.  Except on Windows, VXWorks, and VMS, there's almost
no code in that file.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:21         ` Alexandre Oliva
@ 2007-11-24 20:48           ` Bernd Schmidt
  2007-11-24 22:01             ` Alexandre Oliva
  2007-11-24 21:24           ` Richard Kenner
  2007-11-25  0:39           ` Robert Dewar
  2 siblings, 1 reply; 189+ messages in thread
From: Bernd Schmidt @ 2007-11-24 20:48 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

Alexandre Oliva wrote:

> And then, despite the consensus that GCC must not generate different
> code with and without -g, the patch that fixes one such regression has
> been lingering for months, and the patch that introduced the
> regression hasn't been reverted either.

Pointers?


Bernd

-- 
This footer brought to you by insane German lawmakers.
Analog Devices GmbH      Wilhelm-Wagenfeld-Str. 6      80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:11         ` Alexandre Oliva
@ 2007-11-24 20:46           ` Richard Guenther
  0 siblings, 0 replies; 189+ messages in thread
From: Richard Guenther @ 2007-11-24 20:46 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Steven Bosscher, Mark Mitchell, Ian Lance Taylor, gcc-patches, gcc

On Nov 24, 2007 4:00 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:
>
> > And it has to handle this new case everywhere.
>
> I've already explained why this isn't true.  It's not even close to
> being true.  In fact, I've chosen this representation *precisely*
> because I reasoned it would lead to the least global impact.  Of
> course you can refuse to believe that and point at the changes I had
> to make as alleged counter-proof, failing to notice how many other
> locations I haven't had to change and that just work because adjusting
> other instructions after transformations is precisely what all
> transformation passes already do.

It also makes some things easier - for example during inlining of a function
body we re-map all DECLs in the inlined copy.  With an on-the-side
representation you have to ensure to make the same mapping explicitly,
with DEBUG_INSNs the mapping is automatically done during the copying
of the IL.  A similar problem with using SSA_NAME definition points to
store information is using the renamer to rename a variable that already
has SSA_NAMES (which is IMHO bogus, as we do not detect the errorneous
case of overlapping life-ranges - but ignore that for now) - in this case you
need some magic to transfer the on-the-side debug information from the
old SSA_NAMEs to the new ones (where possible).

Just to mention a few problems we are running into ;)

Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 15:18       ` Richard Kenner
@ 2007-11-24 20:21         ` Alexandre Oliva
  2007-11-24 20:48           ` Bernd Schmidt
                             ` (2 more replies)
  0 siblings, 3 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24 20:21 UTC (permalink / raw)
  To: Richard Kenner
  Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

On Nov 24, 2007, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:

>> Yes, catching all such cases hasn't been trivial.  If we miss some,
>> then what happens is that -O2 -g -fvar-tracking-assignments outputs
>> different executable code than -O2.

> But that's a very serious type of bug because it means you have
> situations where a program fails and you can't debug it because when
> you turn on debugging information, it doesn't fail anymore.  We need
> to make an absolute rule that this *cannot* happen and luckily this is
> one of the easiest types of errors to project against.

I agree completely.  That's why I've gone to such great lengths to
ensure these errors are easily testable in my implementation, and to
put all my changes under control of a command-line option.  Then, you
can still get (poorer) debug information by disabling (or not
enabling) this option.

And then, despite the consensus that GCC must not generate different
code with and without -g, the patch that fixes one such regression has
been lingering for months, and the patch that introduced the
regression hasn't been reverted either.

Besides, the Ada RTS compiles differently with -g than without -g,
such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
but me seems to care.

I'm sure I'm going to find other differences between -g and -g0 once I
fix this and bootstrap4-debug gets past this point and builds other
target libraries.  I'm not looking forward to the discussions that
will ensue if any fixes for these problems imply any costs whatsoever,
given the experience I've had with the SSA-coalescing and the
optimize-basic-blocks issues that are all about debug information
versus optimization :-(

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 16:07       ` Steven Bosscher
@ 2007-11-24 20:11         ` Alexandre Oliva
  2007-11-24 20:46           ` Richard Guenther
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24 20:11 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 24, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:

> On Nov 24, 2007 5:54 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
>> > Apparently, you can't treat DEBUG_INSN just like any other normal
>> > insn.
>> 
>> Obviously not.  They're weaker uses than anything else.  We haven't
>> had any such thing in the compiler before.

> So we get a "third way".  GCC has insns and notes, and now it gets a
> third object to deal with in the insns stream.

Not quite.  It's an insn.  But it is different in some ways.  It's not
unheard of.  Asm insns are also different in some ways.  USEs and
CLOBBERs too.  Delayed-branch instruction groups too.

It would be great if infrastructure for weak uses was already in
place, but if it's needed (we haven't determined that, but I'm
convinced there's no better way) and it isn't there, then it has to be
put in.

> And it has to handle this new case everywhere.

I've already explained why this isn't true.  It's not even close to
being true.  In fact, I've chosen this representation *precisely*
because I reasoned it would lead to the least global impact.  Of
course you can refuse to believe that and point at the changes I had
to make as alleged counter-proof, failing to notice how many other
locations I haven't had to change and that just work because adjusting
other instructions after transformations is precisely what all
transformation passes already do.

> I didn't say "complex conditionals" but ugly conditionals ;-)
> I mean all the "INSN_P && ! DEBUG_INSN_P" conditionals.

Oh, that's easy: NON_DEBUG_INSN_P can simplify that.  There are, what,
a few dozens of such tests in the compiler right now, compared with
the hundreds of tests for INSN_P and a few tens of tests for
DEBUG_INSN_P.  I didn't think it was worth creating yet another macro,
but if you find this so unacceptable, maybe I can rework it.

Would you prefer NON_DEBUG_INSN_P, or would you prefer the original
INSN_P and all uses thereof to be spelled differently, just to keep
the few objectionable INSN_P && ! DEBUG_INSN_P tests more beautiful?

>> Sufficient for what?  Efforts towards what?  Generating more incorrect
>> debug information just for the sake of it?  Adding more debug
>> information while breaking some that's just fine now?  Is that really
>> progress?

> Ah, there you go again with this extremist pro-debug-info stance.  How
> can one argue with you when you keep ridiculing other points of view
> using ridiculous arguments?  Who said anything about "generating more
> incorrect information just for the sake of it"?

Getting even the trivial cases wrong and dismissing those without
realizing how things would fall apart in the big picture looks like
"generating more incorrect information just for the sake of it" to me.
Now, maybe it's not.  Maybe it's just human behavior, a wish that some
simpler solution will take care of a problem and that the simple
counter-examples I've pointed out are rare situations.  I don't see
that they are.  I've put a lot of thought into this problem, I've been
working on it for quite a long time, and I've fallen in many of the
traps that I pointed out, and avoided several others.

I realize I come off as arrogant when I feel cornered by a majority
that obviously hasn't spent enough on the issue to realize the
obvious-to-me major problems with the alternatives that are on the
table.  I realize in such situations I often react in ways that are
detrimental to the points I'm trying to make.  I realize this doesn't
help.  I hope people can see through the mess of proposal-name-calling
that this is turning into.

> The "for the sake of it" part is just offensive.

I agree, and I apologize for that.  It's been a very frustrating
debate.

> You seem imply that people are arguing gcc should emit wrong debug
> information on purpose.

That's how it feels to me when the claims come up that it's not a
matter of correctness, or that it's not important to get it right.

> Your colleague expressed perfectly how I define "sufficiently good
> debug info":

> "It needs to be good enough
> that a semi-knowledgable person or a dumb but heuristic-laden program
> that processes debugging info can nevertheless extract reliable
> information."
> (http://gcc.gnu.org/ml/gcc/2007-11/msg00581.html)

I'm very happy you agree with him.  Unfortunately, you appear to be
focusing on the sloppiness afforded by the wording "good enough", and
assuming that this can be pushed beyond the point of "extract
*reliable* information", which is the key operative qualifier here.

If it's "good enough" for other purposes, but it's not possible to
"extract reliable information from debugging info", then we don't
satisfy the predicate above.

That's why I'm aiming at correctness (it's reliable) rather than
completeness (optimizations can discard stuff).

> Here is another "extremist" point of view:

> Correctness for a optimization algorithm means that it does not miss
> optimization opportunities that it is designed to catch.  Therefore if
> an optimization algorithm implementation misses an optimization that
> it should catch, then this is a correctness issue.
> ;-)

I happen to agree, indeed, but it's a correctness issue of the
implementation, not a correctness issue of the compiler output, which
is what I'm talking about when I speak of correctness issues.

> You said you now get the same code with and without
> -fvar-tracking-assignments on your branch.  Can you also prove that
> the branch does not introduce new missed optimizations wrt. the latest
> revision that you merged from the trunk?

I could, and that's a very good idea (thanks!), but it will be easier
to do that after my next merge, when there won't be fixes for missed
optimizations, that I detected with my testing, missing from the
baseline.

After all such missed optimizations are in the trunk, I intend to
merge that into the branch and compare mergepoint and branch for
compiler output changes other than in debug information.  If there are
any changes (extremenly unlikely), these are bugs that I'll have to
fix.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 15:08     ` Alexandre Oliva
  2007-11-24 15:18       ` Richard Kenner
@ 2007-11-24 16:07       ` Steven Bosscher
  2007-11-24 20:11         ` Alexandre Oliva
  1 sibling, 1 reply; 189+ messages in thread
From: Steven Bosscher @ 2007-11-24 16:07 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 24, 2007 5:54 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> > Apparently, you can't treat DEBUG_INSN just like any other normal
> > insn.
>
> Obviously not.  They're weaker uses than anything else.  We haven't
> had any such thing in the compiler before.

So we get a "third way".  GCC has insns and notes, and now it gets a
third object to deal with in the insns stream.  And it has to handle
this new case everywhere.  To me it seems that your approach will not
help to make GCC easier to work with and understand.  Unless there are
compelling reasons to do this, I think this is a step in the wrong
direction.

> > but for the moment I fear you're just going to see a lot of
> > duplication of ugly conditionals
>
> Your fear is understandable but not justified.  Go look at the
> patches.  x86_64-linux-gnu now bootstraps and produces exactly the
> same code with and without -fvar-tracking-assignments.  And no complex
> conditionals were needed.  The most I've needed so far was to ignore
> debug insns at certain spots.

I didn't say "complex conditionals" but ugly conditionals ;-)
I mean all the "INSN_P && ! DEBUG_INSN_P" conditionals.  There seem to
be a lot of those, and it's not immediately obvious where and when
you'd need them.

> > and bugs where such conditionals are forgotten/overlooked/missing.
>
> See above.  One of the reasons for the approach I've taken is that
> such cases will, in the worst case, cause missed optimizations, not
> incorrect compiler output.

Ah! More on that later.

> > And the benefit, well, let's just say I'm not convinced that less
> > elaborate efforts are not sufficient.
>
> Sufficient for what?  Efforts towards what?  Generating more incorrect
> debug information just for the sake of it?  Adding more debug
> information while breaking some that's just fine now?  Is that really
> progress?

Ah, there you go again with this extremist pro-debug-info stance.  How
can one argue with you when you keep ridiculing other points of view
using ridiculous arguments?  Who said anything about "generating more
incorrect information just for the sake of it"?  I don't think anyone
did.  The "for the sake of it" part is just offensive. You seem imply
that people are arguing gcc should emit wrong debug information on
purpose.  Please step out of your own world of thoughts for a second,
and try to understand that other people can have a different but
nevertheless reasonable point of view.

I think it is impossible to get perfect debug info after very complex
code transformations.  And because of that, I also think it is
reasonable to not get perfect debug info in less complex cases.  Your
colleague expressed perfectly how I define "sufficiently good debug
info":

"It needs to be good enough
that a semi-knowledgable person or a dumb but heuristic-laden program
that processes debugging info can nevertheless extract reliable
information."
(http://gcc.gnu.org/ml/gcc/2007-11/msg00581.html)

Note how this "good enough" does not imply correctness at all cost".

Here is another "extremist" point of view:

Correctness for a optimization algorithm means that it does not miss
optimization opportunities that it is designed to catch.  Therefore if
an optimization algorithm implementation misses an optimization that
it should catch, then this is a correctness issue.
;-)

You said you now get the same code with and without
-fvar-tracking-assignments on your branch.  Can you also prove that
the branch does not introduce new missed optimizations wrt. the latest
revision that you merged from the trunk?

Gr.
Steven

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 15:08     ` Alexandre Oliva
@ 2007-11-24 15:18       ` Richard Kenner
  2007-11-24 20:21         ` Alexandre Oliva
  2007-11-24 16:07       ` Steven Bosscher
  1 sibling, 1 reply; 189+ messages in thread
From: Richard Kenner @ 2007-11-24 15:18 UTC (permalink / raw)
  To: aoliva; +Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

> Yes, catching all such cases hasn't been trivial.  If we miss some,
> then what happens is that -O2 -g -fvar-tracking-assignments outputs
> different executable code than -O2.

But that's a very serious type of bug because it means you have
situations where a program fails and you can't debug it because when
you turn on debugging information, it doesn't fail anymore.  We need
to make an absolute rule that this *cannot* happen and luckily this is
one of the easiest types of errors to project against.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 10:27   ` Steven Bosscher
@ 2007-11-24 15:08     ` Alexandre Oliva
  2007-11-24 15:18       ` Richard Kenner
  2007-11-24 16:07       ` Steven Bosscher
  0 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24 15:08 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 23, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:

>> So, what's this prejudice against debug insns?  Why do you regard them
>> as notes rather than insns?

> What worries me is that GCC will have to special-case DEBUG_INSN
> everywhere where it looks at INSNs.

This is just not true.  Anywhere that simply wants to update insns for
the effects of other transformations won't have to do that.  Only
places in which we need the weak-use semantics of debug_insns need to
give them special treatment.  Not because they're not insns, but
because they're weak uses, i.e., uses that shouldn't interfere with
optimizations.

Yes, catching all such cases hasn't been trivial.  If we miss some,
then what happens is that -O2 -g -fvar-tracking-assignments outputs
different executable code than -O2.  Everything still works just fine,
we eventually get a bug report, we fix it and move on.

This is *much* better than starting out with notes, that nearly
nothing cares about, and try to add code to update the notes as code
transformations are performed.  In this case, we get incorrect,
non-functional compiler output unless we catch absolutely all bugs
upfront.

> Apparently, you can't treat DEBUG_INSN just like any other normal
> insn.

Obviously not.  They're weaker uses than anything else.  We haven't
had any such thing in the compiler before.

> but for the moment I fear you're just going to see a lot of
> duplication of ugly conditionals

Your fear is understandable but not justified.  Go look at the
patches.  x86_64-linux-gnu now bootstraps and produces exactly the
same code with and without -fvar-tracking-assignments.  And no complex
conditionals were needed.  The most I've needed so far was to ignore
debug insns at certain spots.

It's true that in a number of situations this is an oversimplified
course of action, and some additional effort might be needed to
actually update the debug insns when they would have interfered with
optimizations.  Time will tell, I guess.  So far, it doesn't look like
it's been a problem, and I don't foresee these duplicated or ugly
conditionals you fear.

> and bugs where such conditionals are forgotten/overlooked/missing.

See above.  One of the reasons for the approach I've taken is that
such cases will, in the worst case, cause missed optimizations, not
incorrect compiler output.

> And the benefit, well, let's just say I'm not convinced that less
> elaborate efforts are not sufficient.

Sufficient for what?  Efforts towards what?  Generating more incorrect
debug information just for the sake of it?  Adding more debug
information while breaking some that's just fine now?  Is that really
progress?

> (And to be perfectly honest, I think GCC has bigger issues to solve
> than getting perfect debug info -- such as getting compile times of a
> linux kernel down ;-))

Compile speed is a quality of implementation issue.  Output
correctness and standard compliance comes first in my book.

And then, I'm supposed to fix this correctness problem, not other
issues that others might find more important.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-23  2:30                   ` Richard Guenther
  2007-11-23 23:40                     ` Frank Ch. Eigler
@ 2007-11-24 13:52                     ` Robert Dewar
  1 sibling, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-11-24 13:52 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Frank Ch. Eigler, Mark Mitchell, David Edelsohn,
	Ian Lance Taylor, Alexandre Oliva, gcc-patches, gcc

Richard Guenther wrote:
> On Nov 22, 2007 8:22 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>> Mark Mitchell <mark@codesourcery.com> writes:
>>
>>> [...]
>>>>      Who is "we"?  What better debugging are GCC users demanding?  What
>>>> debugging difficulties are they experiencing?  Who is that set of users?
>>>> What functional changes would improve those cases?  What is the cost of
>>>> those improvements in complexity, maintainability, compile time, object
>>>> file size, GDB start-up time, etc.?
>>> That's what I'm asking.  First and foremost, I want to know what,
>>> concretely, Alexandre is trying to achieve, beyond "better debugging
>>> info for optimized code".  Until we understand that, I don't see how we
>>> can sensibly debate any methods of implementation, possible costs, etc.
>> It may be asking to belabour the obvious.  GCC users do not want to
>> have to compile with "-O0 -g" just to debug during development (or
>> during crash analysis *after deployment*!).  Developers would like to
>> be able to place breakpoints anywhere by reference to the source code,
>> and would like to access any variables logically present there.
>> Developers will accept that optimized code will by its nature make
>> some of these fuzzy, but incorrect data must be and incomplete data
>> should be minimized.
>>
>> That they put up with the status quo at all is a historical artifact
>> of being told so long not to expect any better.
> 
> As it is (without serious overhead) impossible to do both, you either have
> to live with possibly incorrect but elaborate or incomplete but correct
> debug information for optimized code.  Choose one ;)

I don't think you can use the phrase "serious overhead" without rather
extensive statistics. To me, -O1 should be reasonably debuggable, as it
always was back in earlier gcc days. It is nice that -O1 is somewhat
more efficient than it was in those earlier days, but not nice enough
to warrant a severe regression in debug capabilities. To me anyone who
is so concerned about performance as to really appreciate this
difference will likely be using -O2 anyway.

The trouble is that we have set as the criterion for -O1 all the
optimizations that are reasonably cheap in compile time. I think
it is essential that there be an optimization level that means

All the optimizations that are reasonably cheap to implement
and that do not impact debugging information significantly
(except I would say it is OK to impact the ability to change
variables).

For me it would be fine for -O1 to mean that but if there is a
a consensus that an extra level (-Od or whatever) is worth while
that's fine by me.

I find working on the Ada front end that it used to be that I could
always use -O1, OK for debugging, and OK for performance. Now I have
to switch between -O0 for debugging, and then I use -O2 for performance
(for me, the debuggability of -O1 and -O2 are equivalent in this
context, both hopeless, so I might as well use -O2). So I no longer
use -O1 at all (the extra compile time for -O2 is negligible on my
fast note book).

> 
> What we (Matz and myself) are trying to do is provide elaborate debug
> information with the chance of wrong (I'd call it superflous, or extra)
> debug information.  Alexandre seems to aim at the world-domination
> solution (with the serious overhead in terms of implementation and
> verboseness).
> 
> Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-23 23:40 ` Alexandre Oliva
@ 2007-11-24 10:27   ` Steven Bosscher
  2007-11-24 15:08     ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Steven Bosscher @ 2007-11-24 10:27 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 23, 2007 9:45 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> So, yes, debug stmts and insns are notes in the sense that they don't
> output code.  Like USE insns, labels, empty asm insns and other
> UNSPECs.  But wait, those are insns, not notes.  And they do generate
> code, just not in the .text section, but rather in .debug sections.

All of them relate to code generation though.  Without them, we create
wrong code.  I'm aware of how you feel about debug info and
correctness and so on.

> So, what's this prejudice against debug insns?  Why do you regard them
> as notes rather than insns?

What worries me is that GCC will have to special-case DEBUG_INSN
everywhere where it looks at INSNs.  One can already see some of that
happening on your branch.  Apparently, you can't treat DEBUG_INSN just
like any other normal insn.

What I see happening with your DEBUG_INSN approach, is that all passes
that use NEXT_INSN/PREV_INSN will have to special-case DEBUG_INSN in
addition to the NOTE_P or INSN_P checks that they already have.  I
have seen too many bugs with passes who forgot to look through notes
to feel comfortable about adding another
not-a-note-but-also-not-an-insn like thing to the insn stream. The
fact that DEBUG_INSN also has real operands that are not really real
operands is bound to confuse the matter even more.  Life with proper
insn and operands iterators for RTL would be so much easier, but for
the moment I fear you're just going to see a lot of duplication of
ugly conditionals and bugs where such conditionals are
forgotten/overlooked/missing.

So to summarize: I'm just worried your approach is going to make GCC
even slower, buggier, more difficult to maintain and more difficult to
understand and modify.  And the benefit, well, let's just say I'm not
convinced that less elaborate efforts are not sufficient.

(And to be perfectly honest, I think GCC has bigger issues to solve
than getting perfect debug info -- such as getting compile times of a
linux kernel down ;-))

Gr.
Steven

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-13 14:22                     ` Michael Matz
@ 2007-11-24  4:58                       ` Alexandre Oliva
  2007-11-26 18:10                         ` Michael Matz
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24  4:58 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Mon, 12 Nov 2007, Alexandre Oliva wrote:

>> With the design I've proposed, it is possible to compute the value of i, 

> No.  Only if the function is reservible.

Of course.  I meant it for that particular case.  The generalization
is obvious, but I didn't mean it would be always possible.

>> As I wrote before, I'm not aware of any systemtap bug report about a
>> situation in which an argument was actually optimized away.

> I think it all started from PR23551.

Yep.  Nowhere does that bug report request parameters to be forced
live.  What it does request is that parameters that are not completely
optimized away be present in debug information.

Now, consider these cases:

1. function is not inlined

At its entry point, we bind the argument to the register or stack slot
in which the argument is live.  Worst case, it's clobbered at the
entry point instruction itself, because it's entirely unused.  By
emitting a live range from the entry point to the death point, we're
emitting accurate and complete debug information for the argument.  We
win.

2. function is inlined, the argument is unused and thus optimized
away, but the function does some other useful computation

At the inlined entry point, we have a note that binds the argument to
its expected value.  As we transform the program and optimize away the
argument, we retain and update the note, such that we can still
represent the value of the inlined argument for as long as it's
available.

3. function is inlined and completely optimized away

No instruction remains in which the argument is in scope, so we might
as well refrain from emitting location information for it.  Even
though we can figure out where the value lives, there's no code to
attach this information to.  So there's no place to set a breakpoint
on to inspect the variable location anyway.

> For us it also happened in the kernel in namei.c, where real_lookup
> is inlined sometimes, and it's arguments are missing.  That might or
> might not be reversible functions, so your scheme perhaps would have
> helped there.  But generally it won't solve the problem for good.

It looks like you're trying to solve a different problem.

I'm not trying to find a way to ensure that arguments are live.

I'm trying to get GCC to emit debug information that correctly matches
the instructions it generated.

If the value of a variable is completely optimized away at a point in
the porogram, the correct representation for its location at that
point is an empty set.

>> I wouldn't go as far as stopping the optimization just so that systemtap 
>> can monitor the code.

> Like I said, at some point you have to or accept that some code remains to 
> be not introspectable.

Yep.  It's easy enough to tweak the code to keep a variable live, if
you absolutely need it.  But this is not something I'm working to get
the compiler to do by itself.  Quite the opposite, in fact.  I'm going
to set the compiler free to perform some optimizations that it
currently refrains from performing for the sake of debug information,
when the conflict is only apparent because of past implementation
decisions that I'm working to fix.

> Then I'm probably still confused what problem you're actually trying to 
> solve.  If you don't want to be sure you get precise location information 
> 100% of the time, then what percentage are you required to get?

Accuracy comes first.  If we ever emit debug information saying 'this
variable is here' for a point in the program in which it's in fact
elsewhere or unavailable, that's a bug to be fixed.

Completeness comes second.  If we could have emitted debug information
saying 'the value of this variable is here' for a point in the
program, and we instead claim the variable is unavailable at that
point, that's an improvement that can be made.

> And how do you measure this?

Good question.  The implementation approach I've taken, that exposes
debug annotations as actual code, starts out with 100% accuracy
(that's the theory, anyway, otherwise generated code would change,
and, even though we still don't have a complete framework to ensure
code doesn't change, if it does, then at least debug information will
model the change accurately), and we can then grow completeness
incrementally.

> Or is the task rather "emit better debug info"?

Nope.  That's a secondary goal that will be achieved as we get
accurate and sufficiently complete debug information.  I don't have
completeness goals set, but I have reasons to expect we're going to
get much better results than we have now without too much additional
effort.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:09                         ` Mark Mitchell
@ 2007-11-24  4:31                           ` Alexandre Oliva
  2007-11-26  6:10                             ` Mark Mitchell
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24  4:31 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Alexandre Oliva wrote:
>> On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
>> 
>>> Clearly, for some users, incorrect debugging information on optimized
>>> code is not a terribly big deal.  It's certainly less important to many
>>> users than that the program get the right answer.  On the other hand,
>>> there are no doubt users where, whether for debugging, certification, or
>>> whatever, it's vitally important that the debugging information meet
>>> some standard of accuracy.
>> 
>> How is this different from a port of the compiler for a CPU that few
>> people care about?  That many users couldn't care less whether the
>> compiler output on that port works at all doesn't make it any less of
>> a correctness issue.

> You're again trying to make this a binary-value question.  Why?

Because in my mind, when we agree there is a bug, then a fix for it
can is easier to swallow even if it makes the compiler spend more
resources, whereas a mere quality-of-implementation issue is subject
to quite different standards.

> Lots of things are "a correctness issue".  But, some categories tend to
> be worse than others.  There is certainly a qualitative difference in
> the severity of a defect that results in the compiler generating code
> that computes the wrong answer and a defect that results in the compiler
> generating wrong debugging information for optimized code.

That depends a lot on whether your application depends uses the
incorrect compiler output or not.

If the compiler produces incorrect code, but your application doesn't
ever exercise that error, would you argue for leaving the bug unfixed?

These days, applications are built that depend on the correctness of
the compiler output in certain sections that historically weren't all
that functionally essential, namely, the meta-information sections
that we got used to calling debug information.

I.e., these days, applications exercise the "code paths" that formerly
weren't exercised.  This exposes bugs in the compiler.  Worse: bugs
that we have no infrastructure to test, and that we don't even agree
are actual bugs, because the standards that specify the "ISA and ABI"
in which such code ought to be output are apparently regarded as
irrelevant by some.

Just because their perception is distorted by a single use of such
information, which involves a high amount of human interaction, and
humans are able to tolerate and adapt to error conditions.

But as more and more uses of such information are actual production
systems rather than humans behind debuggers, such errors can no longer
be tolerated, because when the debug output is wrong, the system
breaks.  It's that simple.  It's really no different from any other
compiler bug.

> Let's put it this way: if a user has to choose whether the compiler will
> (a) generate code that runs correctly for their application, or (b)
> generate debugging information that's accurate, which one will they choose?

(a), for sure.  But bear in mind that, when the application's correct
execution depends on the correctness of debugging information, then a
implies b.

> But what's the point of this argument?  It sounds like you're trying to
> argue that debug info for optimized code is a correctness issue, and
> therefore we should work as hard on it as we would on code-generation
> bugs.

I'm working hard on it.  I'm not asking others to join me.  I'm just
asking people to understand how serious a problem it is, and that,
even those fixing these bugs may have a cost, it's bugs we're talking
about, it's incorrect compiler output that causes applications to
break, not mere inconvenience for debuggers.

> I'd like better debugging for optimized code, but I'm certainly more
> concerned that (a) we generate correct, fast code when optimizing,
> and (b) we generate good debugging information when not optimizing.

This just goes to show that you're not concerned with the kind of
application that *depends* on correct debug information for
functioning.  And it's not debuggers I'm talking about here.

That's a reasonable point of view.  Maybe the GCC community can decide
that the debug information it produces is just for (poor) consumption
by debug programs, and that we have no interest in *complying* with
the debug information standards that document the debug information
that other applications depend on.  And I mean *complying* with the
standards, rather than merely outputting whatever seems to be easy and
approximately close to what the standard mandates.

I just wish the GCC community doesn't make this decision, and it
accepts fixes to these bugs even when they impose some overhead,
especially when such overhead can be easily avoided with command-line
options, or even is disabled by default (because debug info is not
emitted by default, after all).

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-13 10:50                       ` Mark Mitchell
@ 2007-11-24  4:05                         ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24  4:05 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 13, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Alexandre Oliva wrote:
>>> What I don't understand is how it's actually going to work.  What
>>> are the notes you're inserting?
>> 
>> They're always of the form
>> 
>> DEBUG user-variable = expression

> Good, I understand that now.

> Why is this better than associating user variables with assignments?

I've already explained that, but let me try to sum it up again.

If we annotate assignments, then not only do the annotations move
around along with assignments (I don't think that's desirable), but
when we optimize such assignments away, the annotations are either
dropped or have to stand on their own.

Since dropping annotations and moving them around are precisely
opposed the goal of making debug information accurate, then keeping
the annotations in place and enabling them to stand on their own is
the right thing to do.

Now, since we have to enable them to stand on their own, then we're
faced with the following decision: either we make that the canonical
annotation representation all the way from the beginning, or we
piggyback the annotations on assignments until they're moved or
removed, at which point they become stand-alone annotations.  The
former seems much more maintainable and simpler to deal with, and I
don't see that there's a significant memory or performance penalty to
this.

>> That said, growing SET to add to it a list of variables (or components
>> thereof) that the variable is assigned to could be made to work, to
>> some extent.  But when you optimize away such a set, you'd still have
>> to keep the note around

> Why?  It seems to me that if we're no longer doing the assignment, then
> the location where the value of the user variable can be found (if any)
> is not changing at this point.

The thing is that the *location* of the user variable is changing at
that point.  Either because its previous value was unavalable, or
because it had remained only at a different location.  Only at the
point of the assignment should we associate the variable with the
location that holds its current value.

>> (set (reg i) (const_int 3)) ;; assigns to i
>> (set (reg P1) (reg i))
>> (call (mem f))
>> (set (reg i) (const_int 7)) ;; assigns to i
>> (set (reg i) (const_int 2)) ;; assigns to i
>> (set (reg P1) (reg i))
>> (call (mem g))
>> 
>> could have been optimized to:
>> 
>> (set (reg P1) (const_int 3))
>> (call (mem f))
>> (set (reg P1) (const_int 2))
>> (call (mem g))
>> 
>> and then you wouldn't have any debug information left for variable i.

> Actually, you would, in the method I'm making up.  In particular, both
> of the first two lines in the top example (setting "i" and setting "P1")
> would be marked as providing the value of the user variable "i".

Yes, this works in this very simple case.  But it doesn't when i is
assigned, at different points, to the values of two separate
variables, that are live and initialized much earlier in the program.
Using hte method you seem to be envisioning would extend the life of
the binding of variable 'i' to the life of the two other variables,
ending up with two overlapping and conflicting live ranges for i, or
it would have to drop one in favor of the other.  You can't possibly
retain correct (non-overlapping) live ranges for both unless you keep
notes at the points of assignment.

To make the example clear, consider:

(set (reg x [x]) ???1)
(set (reg y [y]) ???2)
(set (reg i [i]) (reg x [x]))
(set (reg P1) (reg i))
(call (mem f))
(set (reg i [i]) (reg y [y]))
(call (mem g))
(set (reg P1) (reg i))
(call (mem f))

if it gets optimized to:

(set (reg P1 [x, i]) ???1)
(set (reg y [y, i]) ???2)
(call (mem f))
(call (mem g))
(set (reg P1) (reg y))
(call (mem f))

then we lose.  There's no way you can emit debug information for i
based on these annotations such that, at the call to g, the value of i
is correct.  Even if you annotate the copy from y to P1, you still
won't have it right, and, worse, you won't even be able to tell that,
before the call to g, i should have held a different value.  So you'll
necessarily emit incorrect debug information for this case: you'll
state i still holds a value at a point in which it shouldn't hold that
value any more.  This is worse that stating you don't know what the
value of i is.

> What I'm suggesting is that this is something akin to a dataflow
> problem.  We start by marking user variables, in the original TREE
> representation.  Then, any time we copy the value of a user variable, we
> know that what we're doing is providing another place where we can find
> the value of that user variable.  Then, when generating debug
> information, for every program region, we can find the location(s) where
> the value of the user variable is available, and we can output any one
> of those locations for the debugger.

That's exactly what I have in mind.

> This method gives us accurate debug information, in the sense that if we
> say that the value of V is at location X, then it is in fact there, and
> the value there is a value assigned to V.  It does not necessarily give
> us complete information, though, in that there may be times when the
> value is somewhere and we don't realize it.  Like, if:

>   x = y + 3;
>   f(x);

> is optimized to:

>   f(y + 3)

> Then, right before the call to "f", we might not know that the value of
> "x" is available, or we might say that "x" has a previous value.

It's not just previous value.  It can be arbitrarily wrong value too.
Consider again the conditional case:

foo (int x, int y, int z)
{
  int c = z;
  whatever0(c);
  c = x;
  whatever1();
  if (some_condition)
    {
      whatever2();
      c = y;
      whatever3();
    }
  whatever4(c);
}

In the tree representation, the assignments to c just go away, in
favor of a PHI node that takes x from the !some_condition block and y
from the some_condition block.

So, you could recover the correct value for c at the PHI node, but
since the other assignments are all dropped, you can at best figure
out that you don't know the value held by c between whatever1() and
the PHI node, and at worst claim that it's z or x or y, or even both x
and y, depending on how you update the notes.

> method I've proposed will say that the value is unavailable [when
> it's a constant and the assignment is optimized away]

I don't see how, unless you keep a note saying at least that the
variable was modified to an unknown value at that point.

> I don't see that as an unreasonable limitation when debugging
> optimized code, but that's open for debate.

If it did that reliably, then it would be a reasonable limitation,
indeed, for it would be accurate, even if incomplete.  It would no
longer be a correctness issue, just a quality of implementation issue.
But then, I'm yet to understand how you'd generate debug info to note
that the value is unavailable if you don't keep notes around to
indicate the point of the assignment that was optimized away.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 22:43                       ` Ian Lance Taylor
@ 2007-11-24  1:44                         ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-24  1:44 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Mark Mitchell, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:

>> And then, optimizations move instructions around, but I don't think
>> they should move the assignment notes around, for they should
>> reflect the structure of the source program, rather than the
>> mangled representation that the optimizers turn it into.

> I'm not sure I follow this.  If the equivalent of some source code
> line is hoisted out of a loop, shouldn't the user variable assignments
> follow it?

Why should it?  The user is entitled to expect the variable to be set
to that value at the right point in the program, no earlier than that.
Before the assignment point in the program, we ought to note that the
variable holds its previous value, or that its previous value is no
longer available.  But noting it holds a value it should only hold at
a later point doesn't seem right to me.

Consider, again, the example:

f(int x, int y) {
  int c;

  c = x;
  do_something_with_c();

  c = y;
  do_something_with_c();
}

If we optimize away the assignments c=x and c=y, and just use x and y
instead (assume c is not otherwise modified), what should we note in
debug info?  Should we pretend that c is dead all over, just because
it was optimized away?  Should we note that it's live in both x and y
registers/stack slots?  Or should we vary its location between x and
y, at the assignment points, as expected by the user?

Now, what if f() is inlined into a loop, such that c could be
versioned and the assignments to it could be hoisted, because x and y
don't vary?  Should this then change the debug information generated
for variable c from the IMHO correct points to the loop entry points?

> After the scheduler has run over a large basic block, the
> structure of the source program is gone.

The mapping becomes more difficult, yes.  But the structure of the
source program remains untouched, in the source program.  And debug
information is about mapping source concepts to implementation
concepts.  So we should try to map source concepts that remain in the
implementation to the remaining implementation concepts.

> Side note: I think it would be unwise to discuss specific patents on
> this public mailing list.  I think that where we have specific patent
> concerns, the steering committee should raise them on a telephone call
> with the FSF and/or the SFLC.  If you have concerns about a specific
> patent, I recommend that you telephone some member of the SC, or send
> e-mail directly to that person.

That makes sense.  I hadn't actually seen that patent before the day I
mentioned it, and I still haven't got 'round to reading it.  I just
thought it would be wise to inform people about the danger of going
down that path, but now I realize it may not have been wise at all.
Sorry for not thinking about it.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-23 23:40                     ` Frank Ch. Eigler
@ 2007-11-23 23:56                       ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-23 23:56 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Richard Guenther, Mark Mitchell, David Edelsohn,
	Ian Lance Taylor, gcc-patches, gcc

On Nov 23, 2007, "Frank Ch. Eigler" <fche@redhat.com> wrote:

>> > It may be asking to belabour the obvious.  GCC users do not want to
>> > have to compile with "-O0 -g" just to debug during development [...]
>> > Developers will accept that optimized code will by its nature make
>> > some of these fuzzy, but incorrect data must be and incomplete data
                                                    ^avoided?
>> > should be minimized. [...]

Richard Guenther replied:

>> As it is (without serious overhead) impossible to do both,

Is it?  

>> you either have to live with possibly incorrect but elaborate or
>> incomplete but correct debug information for optimized code.

You have proof of that?

>> Choose one ;)

As in, command line options?  Or are we going to make a choice and
impose that on all our users, as if it fit all?

Frank followed up:

>> What we (Matz and myself) are trying to do is provide elaborate
>> debug information with the chance of wrong (I'd call it superflous,
>> or extra) debug information.

It's not just superfluous or extra.  Your approach actively regresses
debug information for some cases, while it's arguable whether it
actually improves others.

> That ("world-domination") seems an overly unkind characterization

+1

It would be like myself pointing out that, for every problem, there's
a solution that's simple, elegant and wrong ;-)

Given the problems with sequential live ranges being made parallel and
conflicting, values subject to conditions being made inconditional,
and overwritten values remaining noted as live, I wouldn't think the
characterization above would be unfair, but I'd managed to resist it
so far.

I don't think pulling the blanket such that it covers your face while
it uncovers your feet is the way to go.  It's even worse, because
then, with your face covered, you won't even see that your feet are
uncovered ;-)

Regressions are bad, and this proposed approach guarantees
regressions, while it might fix a few trivial cases.  This is not
enough for me.  I'm not just hacking up a quick fix for a
poorly-worded problem.  I'm doing actual software engineering here,
trying to get GCC to comply with existing debug info standards.

> It does not seem to me like there is
> substantial disagreement over the ideal of correct

Unfortunately, that is indeed up for debate.  There are even those who
dispute that there's any correctness issue involved.  Most other
approaches are actually overreaching in completeness, trading
correctness for more information, as if more unreliable information
was any better than no information at all.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-13 15:46                       ` Michael Matz
@ 2007-11-23 23:56                         ` Alexandre Oliva
  2007-11-26 18:19                           ` Michael Matz
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-23 23:56 UTC (permalink / raw)
  To: Michael Matz
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:

> The nice thing is, that there are only few places which really get rid of 
> SETs: remove_insn.  You have to tweak that to keep the information around, 
> not much else (though that claim remains to be proven :) ).

And then, you have to tweak everything else to keep the note that
replaced the set up to date as you further optimize the code.  So what
was the point of adding the note to the SET, again?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-13  7:52 Steven Bosscher
@ 2007-11-23 23:40 ` Alexandre Oliva
  2007-11-24 10:27   ` Steven Bosscher
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-23 23:40 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:

> DEBUG_INSN in RTL (with one noteworthy difference, namely that having
> note-like GIPMLE statements is a totally new concept

Not quite.  There were codeless gimple constructs before (think
labels, for one).  Or empty asm statements.  But then, I'm not sure
what you mean by note-like; maybe it's something else.  As I explained
before, debug insns and debug stmts are more like code than like
notes, because notes generally don't need adjusting as code is
modified elsewhere, whereas code does.  And debug insns and stmts
definitely need adjusting like regular insns.

> while DEBUG_INSN is just a wannabe-real-insn INSN_NOTE).

Except for this tiny detail that INSN_NOTEs are never adjusted as code
is modified, because in general they don't even contain RTL.
VAR_LOCATION is a recent exception, and it used to be introduced so
late precisely because there's no infrastructure to keep notes
up-to-date as code transformations are performed.

So, yes, debug stmts and insns are notes in the sense that they don't
output code.  Like USE insns, labels, empty asm insns and other
UNSPECs.  But wait, those are insns, not notes.  And they do generate
code, just not in the .text section, but rather in .debug sections.

So, what's this prejudice against debug insns?  Why do you regard them
as notes rather than insns?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-23  2:30                   ` Richard Guenther
@ 2007-11-23 23:40                     ` Frank Ch. Eigler
  2007-11-23 23:56                       ` Alexandre Oliva
  2007-11-24 13:52                     ` Robert Dewar
  1 sibling, 1 reply; 189+ messages in thread
From: Frank Ch. Eigler @ 2007-11-23 23:40 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mark Mitchell, David Edelsohn, Ian Lance Taylor, Alexandre Oliva,
	gcc-patches, gcc

Hi -

(BTW, sorry for reopening this old thread if people are sick & tired of it.)

> > Mark Mitchell <mark@codesourcery.com> writes:
> > > [...]
> > > That's what I'm asking.  First and foremost, I want to know what,
> > > concretely, Alexandre is trying to achieve, beyond "better debugging
> > > info for optimized code".  [...]
> >
> > It may be asking to belabour the obvious.  GCC users do not want to
> > have to compile with "-O0 -g" just to debug during development [...]
> > Developers will accept that optimized code will by its nature make
> > some of these fuzzy, but incorrect data must be and incomplete data
> > should be minimized. [...]
> 
> As it is (without serious overhead) impossible to do both, you either have
> to live with possibly incorrect but elaborate or incomplete but correct
> debug information for optimized code.  Choose one ;)

I did say "minimized", not "eliminated".  It needs to be good enough
that a semi-knowledgable person or a dumb but heuristic-laden program
that processes debugging info can nevertheless extract reliable
information.

> What we (Matz and myself) are trying to do is provide elaborate
> debug information with the chance of wrong (I'd call it superflous,
> or extra) debug information.

(I will need to reread the thread to see what this extra information
can do in terms of misleading users or tools, such as giving incorrect
variable values/locations.  I'd appreciate a link if you have one
handy.)

> Alexandre seems to aim at the world-domination solution (with the
> serious overhead in terms of implementation and verboseness).

That ("world-domination") seems an overly unkind characterization - we
could simply say he's trying an exhaustive, straining-to-be-correct
solution.

It seems to me that we will shortly see the actual impacts of both of
these approaches in terms of compiler complexity as well as any
improvements in data quality.  It does not seem to me like there is
substantial disagreement over the ideal of correct and to a lesser
extent complete information, so let's see the implementations and then
compare.

- FChE

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-23  2:20                 ` Frank Ch. Eigler
@ 2007-11-23  2:30                   ` Richard Guenther
  2007-11-23 23:40                     ` Frank Ch. Eigler
  2007-11-24 13:52                     ` Robert Dewar
  0 siblings, 2 replies; 189+ messages in thread
From: Richard Guenther @ 2007-11-23  2:30 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Mark Mitchell, David Edelsohn, Ian Lance Taylor, Alexandre Oliva,
	gcc-patches, gcc

On Nov 22, 2007 8:22 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
> Mark Mitchell <mark@codesourcery.com> writes:
>
> > [...]
> >>      Who is "we"?  What better debugging are GCC users demanding?  What
> >> debugging difficulties are they experiencing?  Who is that set of users?
> >> What functional changes would improve those cases?  What is the cost of
> >> those improvements in complexity, maintainability, compile time, object
> >> file size, GDB start-up time, etc.?
> >
> > That's what I'm asking.  First and foremost, I want to know what,
> > concretely, Alexandre is trying to achieve, beyond "better debugging
> > info for optimized code".  Until we understand that, I don't see how we
> > can sensibly debate any methods of implementation, possible costs, etc.
>
> It may be asking to belabour the obvious.  GCC users do not want to
> have to compile with "-O0 -g" just to debug during development (or
> during crash analysis *after deployment*!).  Developers would like to
> be able to place breakpoints anywhere by reference to the source code,
> and would like to access any variables logically present there.
> Developers will accept that optimized code will by its nature make
> some of these fuzzy, but incorrect data must be and incomplete data
> should be minimized.
>
> That they put up with the status quo at all is a historical artifact
> of being told so long not to expect any better.

As it is (without serious overhead) impossible to do both, you either have
to live with possibly incorrect but elaborate or incomplete but correct
debug information for optimized code.  Choose one ;)

What we (Matz and myself) are trying to do is provide elaborate debug
information with the chance of wrong (I'd call it superflous, or extra)
debug information.  Alexandre seems to aim at the world-domination
solution (with the serious overhead in terms of implementation and
verboseness).

Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:01               ` Mark Mitchell
  2007-11-08  5:15                 ` Alexandre Oliva
@ 2007-11-23  2:20                 ` Frank Ch. Eigler
  2007-11-23  2:30                   ` Richard Guenther
  1 sibling, 1 reply; 189+ messages in thread
From: Frank Ch. Eigler @ 2007-11-23  2:20 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: David Edelsohn, Ian Lance Taylor, Alexandre Oliva,
	Richard Guenther, gcc-patches, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> [...]
>> 	Who is "we"?  What better debugging are GCC users demanding?  What
>> debugging difficulties are they experiencing?  Who is that set of users?
>> What functional changes would improve those cases?  What is the cost of
>> those improvements in complexity, maintainability, compile time, object
>> file size, GDB start-up time, etc.?
>
> That's what I'm asking.  First and foremost, I want to know what,
> concretely, Alexandre is trying to achieve, beyond "better debugging
> info for optimized code".  Until we understand that, I don't see how we
> can sensibly debate any methods of implementation, possible costs, etc.

It may be asking to belabour the obvious.  GCC users do not want to
have to compile with "-O0 -g" just to debug during development (or
during crash analysis *after deployment*!).  Developers would like to
be able to place breakpoints anywhere by reference to the source code,
and would like to access any variables logically present there.
Developers will accept that optimized code will by its nature make
some of these fuzzy, but incorrect data must be and incomplete data
should be minimized.

That they put up with the status quo at all is a historical artifact
of being told so long not to expect any better.

- FChE

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:22                     ` Alexandre Oliva
                                         ` (2 preceding siblings ...)
  2007-11-13 10:50                       ` Mark Mitchell
@ 2007-11-13 15:46                       ` Michael Matz
  2007-11-23 23:56                         ` Alexandre Oliva
  3 siblings, 1 reply; 189+ messages in thread
From: Michael Matz @ 2007-11-13 15:46 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Hi,

On Mon, 12 Nov 2007, Alexandre Oliva wrote:

> > Why does it make sense to have that, rather than notes on instructions 
> > that say what affect the instruction has on user variables?
> 
> Few instructions need such notes, so the proposal of growing SET by 33% 
> doesn't quite appeal to me.

Though I don't have produced hard numbers yet, that every SET now contains 
an additional pointer is less of an issue than one might think.  There 
only ever exists one RTL body at each point in time, hence the memory use 
for RTL is vastly dominated by the memory use of GIMPLE, which exists for 
all functions at the same time.

Having this annotation in the SET is just the esthetically most pleasing 
place.  If you do it with notes on insns you have issues with multi-set 
insns, and you have to move them around in case you change the insns.  
Putting them in the SET itself keeps them up-to-date nearly automatically 
(of course you still have to touch them once in a while).

> That said, growing SET to add to it a list of variables (or components
> thereof) that the variable is assigned to could be made to work, to
> some extent.  But when you optimize away such a set, you'd still have
> to keep the note around, so it's not clear to me that adding code all
> over to maintain the notes in place when the SETs go away or are
> juggled around would bring us any advantage.

The nice thing is, that there are only few places which really get rid of 
SETs: remove_insn.  You have to tweak that to keep the information around, 
not much else (though that claim remains to be proven :) ).

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:17                   ` Alexandre Oliva
@ 2007-11-13 14:22                     ` Michael Matz
  2007-11-24  4:58                       ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Michael Matz @ 2007-11-13 14:22 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

Hi,

On Mon, 12 Nov 2007, Alexandre Oliva wrote:

> With the design I've proposed, it is possible to compute the value of i, 

No.  Only if the function is reservible.  There are many which aren't:

static inline int foo(int i)
{
  return i % 10;
}
int foobar(int j)
{
  return foo(j % 20);
}
int main(int argc, char **argv)
{
  return foobar(argc);
}

If foo is inlined and foobar simplified (to return j%10), the value for 
'i' (j % 20) can not be recovered anymore.  Hence for a 100% solution (and 
for systemtap you want that) you have no choice than to force the value to 
be live, e.g. by a volatile asm or the like.

> As I wrote before, I'm not aware of any systemtap bug report about a
> situation in which an argument was actually optimized away.

I think it all started from PR23551.  For us it also happened in the 
kernel in namei.c, where real_lookup is inlined sometimes, and it's 
arguments are missing.  That might or might not be reversible functions, 
so your scheme perhaps would have helped there.  But generally it won't 
solve the problem for good.

> I wouldn't go as far as stopping the optimization just so that systemtap 
> can monitor the code.

Like I said, at some point you have to or accept that some code remains to 
be not introspectable.

> > at which point you have to force the value of 'i' being live, if you 
> > want to be sure that systemtap works in all cases.
> 
> I don't want to be sure of that.  At least that was not the problem I 
> was asked to solve.

Then I'm probably still confused what problem you're actually trying to 
solve.  If you don't want to be sure you get precise location information 
100% of the time, then what percentage are you required to get?  And how 
do you measure this?  Or is the task rather "emit better debug info"?  But 
that can be done also in our scheme, so why is there a need for DEBUG_INSN 
if it can't solve the systemtap problem for good?

> And, indeed, it's not solvable without disabling optimizations.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:22                     ` Alexandre Oliva
  2007-11-12 20:08                       ` Joe Buck
  2007-11-12 22:43                       ` Ian Lance Taylor
@ 2007-11-13 10:50                       ` Mark Mitchell
  2007-11-24  4:05                         ` Alexandre Oliva
  2007-11-13 15:46                       ` Michael Matz
  3 siblings, 1 reply; 189+ messages in thread
From: Mark Mitchell @ 2007-11-13 10:50 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:

>> What I don't understand is how it's actually going to work.  What
>> are the notes you're inserting?
> 
> They're always of the form
> 
>   DEBUG user-variable = expression

Good, I understand that now.

Why is this better than associating user variables with assignments?  In
other words, if we have:

  X = E;

where X is the location in which a user variable V is presently being
stored, we could just put a note on the assignment that says "assigns to
user variable V".  If X is, for example, a hard register, and we're now
clobbering the value of a user variable V (so that the value of the
variable is no longer available there), we can add a note that says
"clobbers user variable V".  (The value might still be available
somewhere else; we can figure that out by seeing if any instruction that
is annotated as setting V dominates this instruction, without an
intervening clobbering of that location.)

> That said, growing SET to add to it a list of variables (or components
> thereof) that the variable is assigned to could be made to work, to
> some extent.  But when you optimize away such a set, you'd still have
> to keep the note around

Why?  It seems to me that if we're no longer doing the assignment, then
the location where the value of the user variable can be found (if any)
is not changing at this point.

> (set (reg i) (const_int 3)) ;; assigns to i
> (set (reg P1) (reg i))
> (call (mem f))
> (set (reg i) (const_int 7)) ;; assigns to i
> (set (reg i) (const_int 2)) ;; assigns to i
> (set (reg P1) (reg i))
> (call (mem g))
> 
> could have been optimized to:
> 
> (set (reg P1) (const_int 3))
> (call (mem f))
> (set (reg P1) (const_int 2))
> (call (mem g))
> 
> and then you wouldn't have any debug information left for variable i.

Actually, you would, in the method I'm making up.  In particular, both
of the first two lines in the top example (setting "i" and setting "P1")
would be marked as providing the value of the user variable "i".  The
first line obviously has the value of "i", so we would have a "value of
i" note.  The second would also have a "value of i" note because its
copying a value with such a note.

What I'm suggesting is that this is something akin to a dataflow
problem.  We start by marking user variables, in the original TREE
representation.  Then, any time we copy the value of a user variable, we
know that what we're doing is providing another place where we can find
the value of that user variable.  Then, when generating debug
information, for every program region, we can find the location(s) where
the value of the user variable is available, and we can output any one
of those locations for the debugger.  Now, of course, we can generate
more compact information by trying to use the same location as often as
possible, but that's just an optimization problem.

This method gives us accurate debug information, in the sense that if we
say that the value of V is at location X, then it is in fact there, and
the value there is a value assigned to V.  It does not necessarily give
us complete information, though, in that there may be times when the
value is somewhere and we don't realize it.  Like, if:

  x = y + 3;
  f(x);

is optimized to:

  f(y + 3)

Then, right before the call to "f", we might not know that the value of
"x" is available, or we might say that "x" has a previous value.

As a special case of incompleteness, this fails utterly with respect to
variables whose values are constants if those variables are then
optimized away.  If there's no location holding the constant, then the
method I've proposed will say that the value is unavailable -- rather
than cleverly telling the debugger that the value is a constant.  I
don't see that as an unreasonable limitation when debugging optimized
code, but that's open for debate.

I'm not claiming this is better than what you're suggesting.  I'm just
throwing it out there.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
@ 2007-11-13  7:52 Steven Bosscher
  2007-11-23 23:40 ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Steven Bosscher @ 2007-11-13  7:52 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Alexandre Oliva, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

xf. http://gcc.gnu.org/ml/gcc/2007-11/msg00293.html
Mark Mitchell wrote:
> The reason I want to make that assumption is that the part of this where
> the representation is in question is once we reach RTL, right?

The representation in GIMPLE should also be discussed IMVHO. For
GIMPLE Alex has invented DEBUG_STMT, which has the same properties as
DEBUG_INSN in RTL (with one noteworthy difference, namely that having
note-like GIPMLE statements is a totally new concept while DEBUG_INSN
is just a wannabe-real-insn INSN_NOTE).

Gr.
Steven

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:22                     ` Alexandre Oliva
  2007-11-12 20:08                       ` Joe Buck
@ 2007-11-12 22:43                       ` Ian Lance Taylor
  2007-11-24  1:44                         ` Alexandre Oliva
  2007-11-13 10:50                       ` Mark Mitchell
  2007-11-13 15:46                       ` Michael Matz
  3 siblings, 1 reply; 189+ messages in thread
From: Ian Lance Taylor @ 2007-11-12 22:43 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Mark Mitchell, Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > Why does it make sense to have that, rather than notes on
> > instructions that say what affect the instruction has on user
> > variables?
> 
> Few instructions need such notes, so the proposal of growing SET by
> 33% doesn't quite appeal to me.

We could add a note to the relevant instructions.  We don't need to
change the SET representation.  That approach would only increase
memory usage for relevant instructions.

> And then, optimizations move
> instructions around, but I don't think they should move the assignment
> notes around, for they should reflect the structure of the source
> program, rather than the mangled representation that the optimizers
> turn it into.

I'm not sure I follow this.  If the equivalent of some source code
line is hoisted out of a loop, shouldn't the user variable assignments
follow it?  After the scheduler has run over a large basic block, the
structure of the source program is gone.  Are we going to somehow try
to retain it in the debugging information?  Does that make sense?

Side note: I think it would be unwise to discuss specific patents on
this public mailing list.  I think that where we have specific patent
concerns, the steering committee should raise them on a telephone call
with the FSF and/or the SFLC.  If you have concerns about a specific
patent, I recommend that you telephone some member of the SC, or send
e-mail directly to that person.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:22                     ` Alexandre Oliva
@ 2007-11-12 20:08                       ` Joe Buck
  2007-11-24 22:12                         ` Alexandre Oliva
  2007-11-12 22:43                       ` Ian Lance Taylor
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 189+ messages in thread
From: Joe Buck @ 2007-11-12 20:08 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Mon, Nov 12, 2007 at 03:52:01PM -0200, Alexandre Oliva wrote:
> On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
> 
> > (We may already have lost some information, though.  For example, given:
> 
> >   i = 3;
> >   f(i);
> >   i = 7;
> >   i = 2;
> >   g(i);
> 
> > we may well have lost the "i = 7" assignment, so "i" might appear to
> > have the value "3" right before we assign "2" to it, if we were to
> > generate debug information right then.)
> 
> Yup.  And even if we could somehow preserve that information, there
> wouldn't be any code to attach that information to.  There might be
> uses for empty-range locations in debug information, but I can't think
> of any.  Can anyone?  It's something we could try to preserve, and
> with my design it would be quite easy to do so, but unless it's useful
> for some purpose, I think we could just do away with it.

If we drop the "i = 7" assignment, then a debugger could have a consistent
view of what is going on if, given

   i = 3;  // line 10
   f(i);   // line 11
   i = 7;  // line 12
   i = 2;  // line 13
   g(i);   // line 14

"next" would step from line 10, to 11, to 12, to 14.  We would not be able
to stop after the execution of a no-longer-existing statement; if we could
stop at the beginning of line 13, it would imply that line 12 has run and
line 13 has not, which does not reflect what the optimized code is doing.

We don't do it this way at the moment; we would be able to set a
breakpoint at line 13.  But perhaps the right way to think about your
project, Alexandre, is to make things match up at the point where the gdb
user can observe the state, and consider dropping observable points where
the states will not match.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 10:55                   ` Mark Mitchell
@ 2007-11-12 18:22                     ` Alexandre Oliva
  2007-11-12 20:08                       ` Joe Buck
                                         ` (3 more replies)
  0 siblings, 4 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-12 18:22 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> (We may already have lost some information, though.  For example, given:

>   i = 3;
>   f(i);
>   i = 7;
>   i = 2;
>   g(i);

> we may well have lost the "i = 7" assignment, so "i" might appear to
> have the value "3" right before we assign "2" to it, if we were to
> generate debug information right then.)

Yup.  And even if we could somehow preserve that information, there
wouldn't be any code to attach that information to.  There might be
uses for empty-range locations in debug information, but I can't think
of any.  Can anyone?  It's something we could try to preserve, and
with my design it would be quite easy to do so, but unless it's useful
for some purpose, I think we could just do away with it.

> The reason I want to make that assumption is that the part of this where
> the representation is in question is once we reach RTL, right?

I'm not sure what is in question at all.  I've proposed a design to
preserve debug information throughout compilation.  Other designs on
the table differ both in tree and rtl levels, and in the potential
quality and correctness of the debug information they can produce.

> I guess I still don't really understand what you're doing at the RTL
> level.

It's no different, except that instead of a DEBUG_STMT it's a
DEBUG_INSN, with the TREE exprssion converted to an RTL expression.

/me mumbles something about the silliness of keeping two completely
different yet nearly-isomorphic internal representations for
statements/instructions.

> What I don't understand is how it's actually going to work.  What
> are the notes you're inserting?

They're always of the form

  DEBUG user-variable = expression

where DEBUG stands for a DEBUG_STMT or a DEBUG_INSN, user-variable is
a tree that represents the user variable, and expression is a TREE or
RTL (depending on which representation we're in) that evaluates to the
value the user-variable is expected to hold at that point in the
program.

> Do they just say "here is an RTL expression for computing the value of
> user-variable V at this point in the program"?

In RTL, yes.

> Why does it make sense to have that, rather than notes on
> instructions that say what affect the instruction has on user
> variables?

Few instructions need such notes, so the proposal of growing SET by
33% doesn't quite appeal to me.  And then, optimizations move
instructions around, but I don't think they should move the assignment
notes around, for they should reflect the structure of the source
program, rather than the mangled representation that the optimizers
turn it into.

That said, growing SET to add to it a list of variables (or components
thereof) that the variable is assigned to could be made to work, to
some extent.  But when you optimize away such a set, you'd still have
to keep the note around, so it's not clear to me that adding code all
over to maintain the notes in place when the SETs go away or are
juggled around would bring us any advantage.  It would be just a
redundant notation for what the note would already convey, so it just
brings complexity for no actual advantage.

To make it concrete, consider that your example above could have become:

(set (reg i) (const_int 3)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem f))
(set (reg i) (const_int 7)) ;; assigns to i
(set (reg i) (const_int 2)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem g))

could have been optimized to:

(set (reg P1) (const_int 3))
(call (mem f))
(set (reg P1) (const_int 2))
(call (mem g))

and then you wouldn't have any debug information left for variable i.

whereas with the notes I propose, you'd be left with:

(debug i (const_int 3))
(set (reg P1) (const_int 3))
(call (mem f))
(debug i (const_int 7)) ;; may be dropped, as discussed above
(debug i (const_int 2))
(set (reg P1) (const_int 2))
(call (mem g))

even if no register at all ends up allocated for i.  And if there were
uses of i that followed the assignment to 7, to which the constant
could be propagated, you'd still be left with the annotation to
indicate that i has a new value at the correct point.

> As a meta-question, have you or anyone else on the list looked at the
> literature (IEEE/ACM, etc.) or how other compilers handle these problems?

I couldn't find much information about other compilers, but I've see a
number of (mostly dated) articles and US patents.  In fact, I'm
particularly concerned that US Patent 6091896 covers the design
proposed by Richi, that involves annotating the instructions
themselves.  I believe the independent, stand-alone annotations I
propose escape the patent claims.

That said, if anyone knows of articles that could be of use, I'd love
to hear about them.  It's not like my research was exhaustive.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09 14:23                 ` Michael Matz
@ 2007-11-12 18:17                   ` Alexandre Oliva
  2007-11-13 14:22                     ` Michael Matz
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-12 18:17 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov  9, 2007, Michael Matz <matz@suse.de> wrote:

> static inline int foo(int i)
> {
>   return i-1;
> }

> int foobar(int j)
> {
>   return foo(j+2);
> }

> int main(int argc, char **argv)
> {
>   return foobar(argc);
> }
> ------------------------------------

> And similar examples.  Depending on circumstances the formal argument 'i' 
> of "foo" might be optimized away.

With the design I've proposed, it is possible to compute the value of
i, for the end result is live, which ensures that the inputs used to
compute i are not completely optimized away.  This means at any point
in the execution of foo it is possible to compute i based on the
inputs (argc or j) or the outputs (the return values of foo, foobar
and main), no matter how much inlining takes place.  Now, it is
perfectly possible that foo is completely optimized away, such that no
instruction remains in the scope in which i is live.  In this case,
it's debatable whether i still remains, but we could still emit debug
information for it if we wanted to.

> If you want to use systemtap to show the actual arguments for all
> calls to foo, even the inlined ones, then you somehow have to make
> sure that the value of 'i' itself is not optimized away.

As I wrote before, I'm not aware of any systemtap bug report about a
situation in which an argument was actually optimized away.  I
wouldn't go as far as stopping the optimization just so that systemtap
can monitor the code.  I'm not working on changing optimization to
improve debugging, I'm working on fixing debug information such that
it matches optimizations that occur.

> at which point you have to force the value of 'i' being live, if you
> want to be sure that systemtap works in all cases.

I don't want to be sure of that.  At least that was not the problem I
was asked to solve.  And, indeed, it's not solvable without disabling
optimizations.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:05                       ` Alexandre Oliva
@ 2007-11-12 18:09                         ` Mark Mitchell
  2007-11-24  4:31                           ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Mark Mitchell @ 2007-11-12 18:09 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:
> On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
> 
>> Clearly, for some users, incorrect debugging information on optimized
>> code is not a terribly big deal.  It's certainly less important to many
>> users than that the program get the right answer.  On the other hand,
>> there are no doubt users where, whether for debugging, certification, or
>> whatever, it's vitally important that the debugging information meet
>> some standard of accuracy.
> 
> How is this different from a port of the compiler for a CPU that few
> people care about?  That many users couldn't care less whether the
> compiler output on that port works at all doesn't make it any less of
> a correctness issue.

You're again trying to make this a binary-value question.  Why?

Lots of things are "a correctness issue".  But, some categories tend to
be worse than others.  There is certainly a qualitative difference in
the severity of a defect that results in the compiler generating code
that computes the wrong answer and a defect that results in the compiler
generating wrong debugging information for optimized code.

The impact on a user affected by the first problem is likely very
severe: the application does not run correctly.  The impact on a user
affected by the second problem is likely less severe: the debugger
doesn't work as well, or some other external tool doesn't work as well.

Let's put it this way: if a user has to choose whether the compiler will
(a) generate code that runs correctly for their application, or (b)
generate debugging information that's accurate, which one will they choose?

But what's the point of this argument?  It sounds like you're trying to
argue that debug info for optimized code is a correctness issue, and
therefore we should work as hard on it as we would on code-generation
bugs.  I don't find that argument persuasive.  I'd like better debugging
for optimized code, but I'm certainly more concerned that (a) we
generate correct, fast code when optimizing, and (b) we generate good
debugging information when not optimizing.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 12:59                     ` Mark Mitchell
@ 2007-11-12 18:05                       ` Alexandre Oliva
  2007-11-12 18:09                         ` Mark Mitchell
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-12 18:05 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Clearly, for some users, incorrect debugging information on optimized
> code is not a terribly big deal.  It's certainly less important to many
> users than that the program get the right answer.  On the other hand,
> there are no doubt users where, whether for debugging, certification, or
> whatever, it's vitally important that the debugging information meet
> some standard of accuracy.

How is this different from a port of the compiler for a CPU that few
people care about?  That many users couldn't care less whether the
compiler output on that port works at all doesn't make it any less of
a correctness issue.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  2:09               ` Robert Dewar
@ 2007-11-12 17:52                 ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-12 17:52 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Michael Matz, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:

>>> 1. I don't think we should care much about the ability to
>>> *SET* values of variables in optimized code.
>> 
>> Indeed.  We should care about correctness of debug information, and
>> then this ability will come naturally ;-)

> Not really, there are optimizations that will still allow
> reading the value of a variable, but not setting it,

Indeed.  I was thinking implementation-level variables, rather than
source-level variables.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09 12:22                   ` Robert Dewar
@ 2007-11-12 12:59                     ` Mark Mitchell
  2007-11-12 18:05                       ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Mark Mitchell @ 2007-11-12 12:59 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Robert Dewar wrote:
> Ian Lance Taylor wrote:
>> Alexandre Oliva <aoliva@redhat.com> writes:
>>
>>> So...  The compiler is outputting code that tells other tools where to
>>> look for certain variables at run time, but it's putting incorrect
>>> information there.  How can you possibly argue that this is not a code
>>> correctness issue?
>>
>> I don't see any point to going around this point again, so I'll just
>> note that I disagree.
> 
> Well I very much agree.

The trick is that we're being asked to give a binary answer ("is it a
correctness issue?") when it's not really a binary issue.

Clearly, for some users, incorrect debugging information on optimized
code is not a terribly big deal.  It's certainly less important to many
users than that the program get the right answer.  On the other hand,
there are no doubt users where, whether for debugging, certification, or
whatever, it's vitally important that the debugging information meet
some standard of accuracy.

Part of my concern with this whole discussion is that we seem to be
saying we want the debugging information to be better, but not saying
very clearly what the requirements on better are.  Are we going to
consider it a bug if the value of a variable is unavailable, but the
debugging information says it is available?  (Yes, this seems like a bug
to me.)  What if an old value is available, but a simple-minded reading
of the program would have now assigned a new value?  (No, I wouldn't
consider this a bug.)  What if the value is available in two places, and
we only describe one of them?  (No, I wouldn't consider this a bug.)
What if the value is available, but we say that it isn't because we lost
track of it at some point?  (I would say "it depends".)

We could certainly track user variables through SSA and RTL, at least
insofar as knowing that some REGs refer to SSA names that refer to user
VAR_DECLs.  We can use dataflow analysis to compute where those values
(might) die.  Thus, we can probably do a reasonable job of guaranteeing
that when we say a variable is somewhere, it is in fact in that place.

I don't yet understand what else we're trying to do.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  9:10                 ` Alexandre Oliva
@ 2007-11-12 10:55                   ` Mark Mitchell
  2007-11-12 18:22                     ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Mark Mitchell @ 2007-11-12 10:55 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:

> 1. introduce, early in compilation (when entering SSA), annotations
> that map user-level variables whose location may vary throughout their
> lifetime to implementation-level variables or expressions at every
> point of assignment and PHI joins.
> 
> 2. keep those annotations accurate throughout compilation, without
> letting them interfere with optimizations, but making sure they are
> kept up-to-date or marked untrackable.
> 
> 3. in var-tracking, starting from the expressions in the annotations
> and their equivalent expressions computed with a dataflow-globalized
> cse analysis, emit traditional var-tracking var_location notes for all
> variables.  For variables that didn't start out as gimple regs, the
> current debug info behavior should be preserved.
> 
>> I think that most of the goals boil down to making sure that, at any
>> point in the program, the debug information for a variable meets the
>> following criteria:
> 
>> (a) if the variable has not been optimized away, gives the location
>> where that variable's current value can be found, or
>> (b) if the variable has been optimized away, and the value is not a
>> constant, says that the value is not available, or
>> (c) if the variable has been optimized away, but is a constant, says
>> what the constant value is
> 
> yes, except that instead of constant and constant value, I'd put it as
> 'computable expression from other live values'.
> 
> And I'd say "locations" rather than just "location".

I agree; those are generalizations, of which my bullets are a needlessly
constrained special case.  (Of course, we can gradually approach
"computable" by starting with "constant", and then adding more and more
refinement, if we like.)

>> But, how are we going to track this information?  Algorithmically, what
>> needs to change in the compiler to maintain this state?
> 
> Most optimizations passes must already update uses of gimple or pseudo
> regs they modify, so these will be taken care of automatically (which
> is why I chose this representation).

For the purposes of this discussion, let's assume that upon exit from
SSA we still have the information we need.  In particular, we know which
SSA names correspond to which user variables.  That tells us how to get
the values of user variables at the points where their values are
available, and also tells us when those variables do not have their
values available.

(We may already have lost some information, though.  For example, given:

  i = 3;
  f(i);
  i = 7;
  i = 2;
  g(i);

we may well have lost the "i = 7" assignment, so "i" might appear to
have the value "3" right before we assign "2" to it, if we were to
generate debug information right then.)

The reason I want to make that assumption is that the part of this where
the representation is in question is once we reach RTL, right?

I guess I still don't really understand what you're doing at the RTL
level.  I understand the objectives.  I understand some of the things
you're claiming as virtues of DEBUG_INSN.  What I don't understand is
how it's actually going to work.  What are the notes you're inserting?
Do they just say "here is an RTL expression for computing the value of
user-variable V at this point in the program"?  Why does it make sense
to have that, rather than notes on instructions that say what affect the
instruction has on user variables?  (For example, "this SET makes the
value of V unavailable".  Or "this SET makes the value of the V
available in the destination register"?)

As a meta-question, have you or anyone else on the list looked at the
literature (IEEE/ACM, etc.) or how other compilers handle these problems?

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09 18:40                 ` Daniel Jacobowitz
@ 2007-11-09 19:02                   ` Robert Dewar
  0 siblings, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-11-09 19:02 UTC (permalink / raw)
  To: gcc-patches, gcc

Daniel Jacobowitz wrote:

> Careful.  Eliminating reads from memory messes up debugger
> modification of variables, unless you can explain to the debugger that
> the variable is currently in both locations - this has been discussed
> but AFAIK there is no representation for it yet.  Changing the memory
> location won't change the next operation that thinks it's in the
> register.  Changing the register will be lost later.

I still think that changing memory locations is a marginal capability
compared to reading them, and that is is fine if this capability is
impacted by even low level optimization.
> 


^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  2:13               ` Joe Buck
@ 2007-11-09 18:40                 ` Daniel Jacobowitz
  2007-11-09 19:02                   ` Robert Dewar
  0 siblings, 1 reply; 189+ messages in thread
From: Daniel Jacobowitz @ 2007-11-09 18:40 UTC (permalink / raw)
  To: gcc-patches, gcc

[Can we pick just gcc@ or just gcc-patches@ please?]

On Thu, Nov 08, 2007 at 05:11:24PM -0800, Joe Buck wrote:
> Debugging would be just as easy and natural if -O0 only made sure that
> values of variables are written out to memory at positions where the
> user can set a breakpoint; the code doesn't need to preserve every
> operation exactly as written, or read variables in from memory that
> are already in registers.  Kind of an -O0.5 would be more desirable
> than what we have now.

Careful.  Eliminating reads from memory messes up debugger
modification of variables, unless you can explain to the debugger that
the variable is currently in both locations - this has been discussed
but AFAIK there is no representation for it yet.  Changing the memory
location won't change the next operation that thinks it's in the
register.  Changing the register will be lost later.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 18:18               ` Alexandre Oliva
@ 2007-11-09 14:23                 ` Michael Matz
  2007-11-12 18:17                   ` Alexandre Oliva
  0 siblings, 1 reply; 189+ messages in thread
From: Michael Matz @ 2007-11-09 14:23 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

Hi,

On Thu, 8 Nov 2007, Alexandre Oliva wrote:

> > If you want to be really sure no arguments disappear (necessary for 
> > instance for meaningful use of systemtap) you also need to inhibit 
> > some transformations,
> 
> I'm not aware of any situations in which we must force an argument not 
> to disappear.  All of the problems I'm aware of are those in which the 
> argument is there, we're just missing debug information for it.  If you 
> have information about needs for preserving arguments that are actually 
> dead, please send it my way.

------------------------------------
static inline int foo(int i)
{
  return i-1;
}

int foobar(int j)
{
  return foo(j+2);
}

int main(int argc, char **argv)
{
  return foobar(argc);
}
------------------------------------

And similar examples.  Depending on circumstances the formal argument 'i' 
of "foo" might be optimized away.  If you want to use systemtap to show 
the actual arguments for all calls to foo, even the inlined ones, then you 
somehow have to make sure that the value of 'i' itself is not optimized 
away.  Again, in this specific case, due to the simplicity of the involved 
expression, it would theoretically be possible to express this with just 
DWARF expressions (relating to the formal argument 'j' of foobar).  In 
more complicated situtation that's not possible anymore, at which point 
you have to force the value of 'i' being live, if you want to be sure that 
systemtap works in all cases.

> > during the next months, i.e. improve code quality at -O0 at least to a 
> > point it was in the 3.x line of GCC.
> 
> Aah, I guess the problem here is all the gimple-introduced temps,
> right?  That our current -O0 is more like -O-1? :-)

Indeed :)  Perhaps also doing a simple DCE and local regalloc, none of 
which inhibits debugging.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09 12:31                   ` Seongbae Park (박성배, 朴成培)
@ 2007-11-09 12:42                     ` Robert Dewar
  0 siblings, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-11-09 12:42 UTC (permalink / raw)
  To: "Seongbae Park (¹Ú¼º¹è,
	ÚÓà÷ÛÆ)"
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Seongbae Park (Â¹ÃšÂ¼ÂºÂ¹Ã¨, ÃšÃ“Ã Ã·Ã›Ã†) wrote:
> Most people
> fall in this camp
> and this is what gcc has implemented. This camp doesn't want to change the code
> so that they can get better debugging information.

This is definitely not the case. At least among our users, very few fall
into this camp. But in any case I think we all agree that there should 
be a mode in which this is the emphasis.
> 
> Of course, the real world is somewhere in between, but in practice,
> most people fall in the latter group
> (aka performance crowd).

You must live in a strange world, after all think about it, lots of
people find Java quite fine, even though it throws away a lot of
performance.

> Of course, another possible opinion would be to ignore the debuggability crowd
> on the ground that they are not important or big.

Actually I think big serious users with programs in the millions of
lines category are much more likely to be in the "debuggability" crowd.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  1:26                 ` Ian Lance Taylor
  2007-11-09 12:22                   ` Robert Dewar
@ 2007-11-09 12:31                   ` Seongbae Park (박성배, 朴成培)
  2007-11-09 12:42                     ` Robert Dewar
  1 sibling, 1 reply; 189+ messages in thread
From: Seongbae Park (박성배, 朴成培) @ 2007-11-09 12:31 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

I think both sides are talking over each other, partially because two
different goals are in mind.
IMHO, there are two extremes when it comes to the so called debugging
optimized code.

One camp wants the full debuggability (let's call them debuggability
crowd) - which means
they want to know the value of any valid program state anywhere, and
wants to set breakpoint anywhere
and be able to even change the program state anywhere as if there was
an assignment at the point
the debugger stopped the program at. This camp still wants better
performance (like everyone else)
but they don't want to sacrifice the debuggability for performance,
because they rely on these.

The other camp is the performance crowd, where they want the absolute
best performance
but they still want as much debug information possible. Most people
fall in this camp
and this is what gcc has implemented. This camp doesn't want to change the code
so that they can get better debugging information.

Of course, the real world is somewhere in between, but in practice,
most people fall in the latter group
(aka performance crowd).
Alexandre's proposal would make it possible to make the debuggability
crowd happy
at some unknown cost of compile-time/runtime cost and maintenance cost.

Richiard's proposal (from what I can understand)
would make performance crowd happy, since it would be
less costly to implement than Alexandre's and would provide
incrementally better debugging information
than current,
but it doesn't seem to be that it would make the debuggability crowd happy
(or at least the extremists among debuggability crowd).

So I think the difference in the opinion isn't so much as Alexandre's
proposal is good or bad,
but rather whether we aim to make the debuggability crowd happy or the
performance crowd happy
or both.
Ideally we should serve both groups of users,
but there's non-trivial ongoing maintenance cost for having two
different approaches.

So I'd like to ask both Alexandre and Richard
whether they each can satisfy the other camp,
that is, Alexandre to come up with a way to tweak his proposal so that
it is possible to keep the compile time cost comparable to what is
right now with similar or  better debug information,
and with reasonable maintenance cost,
and Richard whether his proposal can satisfy the debuggability crowd.
Of course, another possible opinion would be to ignore the debuggability crowd
on the ground that they are not important or big.
I personally think it's a mistake to do so, but you may disagree on that point.

Seongbae

On 08 Nov 2007 12:50:17 -0800, Ian Lance Taylor <iant@google.com> wrote:
> Alexandre Oliva <aoliva@redhat.com> writes:
>
> > So...  The compiler is outputting code that tells other tools where to
> > look for certain variables at run time, but it's putting incorrect
> > information there.  How can you possibly argue that this is not a code
> > correctness issue?
>
> I don't see any point to going around this point again, so I'll just
> note that I disagree.
>
>
> > >> >> > We've fixed many many bugs and misoptimizations over the years due to
> > >> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> > >> >> > we've made in the past.
> > >> >>
> > >> >> That's a valid concern.  However, per this reasoning, we might as well
> > >> >> push every operand in our IL to separate representations, because
> > >> >> there have been so many bugs and misoptimizations over the years,
> > >> >> especially when the representation didn't make transformations
> > >> >> trivially correct.
> > >>
> > >> > Please don't use strawman arguments.
> > >>
> > >> It's not, really.  A reference to an object within a debug stmt or
> > >> insn is very much like any other operand, in that most optimizer
> > >> passes must keep them up to date.  If you argue for pushing them
> > >> outside the IL, why would any other operands be different?
> >
> > > I think you misread me.  I didn't argue for pushing debugging
> > > information outside the IL.  I argued against a specific
> > > implementation--DEBUG_INSN--based on our experience with similar
> > > implementations.
> >
> > Do you remember any other notes that contained actual rtx expressions
> > and expected optimization passes to keep them accurate?
>
> No.
>
> > Do you think
> > we'd gain anything by moving them to a separate, out-of-line
> > representation?
>
> I don't know.  I don't see such a proposal on the table, and I don't
> have one myself, so I don't know how to evaluate it.
>
> Ian
>

-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com"

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  1:26                 ` Ian Lance Taylor
@ 2007-11-09 12:22                   ` Robert Dewar
  2007-11-12 12:59                     ` Mark Mitchell
  2007-11-09 12:31                   ` Seongbae Park (박성배, 朴成培)
  1 sibling, 1 reply; 189+ messages in thread
From: Robert Dewar @ 2007-11-09 12:22 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Ian Lance Taylor wrote:
> Alexandre Oliva <aoliva@redhat.com> writes:
> 
>> So...  The compiler is outputting code that tells other tools where to
>> look for certain variables at run time, but it's putting incorrect
>> information there.  How can you possibly argue that this is not a code
>> correctness issue?
> 
> I don't see any point to going around this point again, so I'll just
> note that I disagree.

Well I very much agree. If you are writing certified code, then a number
of evidence producing tools rely on the debugging information, and it is
a problem if this information is incorrect.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 20:51                 ` Andrew Pinski
  2007-11-09  1:11                   ` Alexandre Oliva
@ 2007-11-09 11:28                   ` Robert Dewar
  1 sibling, 0 replies; 189+ messages in thread
From: Robert Dewar @ 2007-11-09 11:28 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Alexandre Oliva, David Edelsohn, Mark Mitchell, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

Andrew Pinski wrote:

> I have to ask, do you want an optimizing compiler or one which
> generates full debugging information???? 

Both!

I would like modes which do the following

a) reasonable amount of optimization that does not intefere too much
with debugging. The old GCC 3 -O1 was a close approximation to this
(certainly a closer approximation than the current -O1).

b) all possible optimziations even if debuggability is compromised

That's a perfectly reasonable request, and we used to be pretty
close to having it, but now -O1 has really degraded as a solution
to a). Yes, it's somewhat more efficient, but I suspect that the
small minority of those interested in the last bit of performance
are using -O2 anyway, so I doubt many people get much benefit from
the improved performance of -O1 code. On the other hand lots of
people are negatively affected by the degrading of debugging in
-O1 mode.
  Because there are trade off
> here really.  The reason behind the extra inlining is because it
> improves code generation.  I don't know about you but in some area of
> coding, they need the extra speed/size reductions that inlining of non
> user marked functions.  I have plenty of code which needs the speed
> help that the extra inling helps (remember some developers don't want
> to change the code that much to have the optimizing compiler do its
> work).

Obviously you don't want a lot of inlining unless the debugger can
handle inlining properly if your interest is in being able to debug!
> 
> Remember dwarf3 is not really a standards about meta-information, it
> just mentions how it represented if it exists.

But consumers want a debugger that works, without having to take the
hit of huge volumes of code at -O0
> 
> -- Pinski


^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 20:07               ` Mark Mitchell
  2007-11-08 20:14                 ` David Daney
@ 2007-11-09  9:10                 ` Alexandre Oliva
  2007-11-12 10:55                   ` Mark Mitchell
  1 sibling, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-09  9:10 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Alexandre Oliva wrote:
>> On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
>> 
>>> Until we all know what we're trying to do
>> 
>> Here's what I am trying to do:

> I think these are laudable goals, but you didn't really provide the
> information I wanted.

Oh, you didn't want goals.  Design and implementation plans more
detailed than
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html, I suppose.
Ok, let's see...

1. introduce, early in compilation (when entering SSA), annotations
that map user-level variables whose location may vary throughout their
lifetime to implementation-level variables or expressions at every
point of assignment and PHI joins.

2. keep those annotations accurate throughout compilation, without
letting them interfere with optimizations, but making sure they are
kept up-to-date or marked untrackable.

3. in var-tracking, starting from the expressions in the annotations
and their equivalent expressions computed with a dataflow-globalized
cse analysis, emit traditional var-tracking var_location notes for all
variables.  For variables that didn't start out as gimple regs, the
current debug info behavior should be preserved.

> I think that most of the goals boil down to making sure that, at any
> point in the program, the debug information for a variable meets the
> following criteria:

> (a) if the variable has not been optimized away, gives the location
> where that variable's current value can be found, or
> (b) if the variable has been optimized away, and the value is not a
> constant, says that the value is not available, or
> (c) if the variable has been optimized away, but is a constant, says
> what the constant value is

yes, except that instead of constant and constant value, I'd put it as
'computable expression from other live values'.

And I'd say "locations" rather than just "location".

> But, how are we going to track this information?  Algorithmically, what
> needs to change in the compiler to maintain this state?

Most optimizations passes must already update uses of gimple or pseudo
regs they modify, so these will be taken care of automatically (which
is why I chose this representation).  Optimization passes that move
assignments to an earlier point in the program don't need any
modification.  Those that move them to a later point will often move
them past their debug notes.  This means the debug notes need
updating, but it also means that, in the absence of fixes, the debug
notes most likely will stand in the way of the transformation, so
testing that the debug notes don't change optimization behavior ought
to catch these.

Transformations that copy or move blocks will retain the annotations,
so this should "just work".  Transformations that delete blocks might
be a bit of a problem, if they delete important debug annotations.  So
far, the only case I've noticed of such behavior is in ifcvt, in which
an if-then-assign-else-assign set of blocks is turned into a single
if-then-else assignments.  This particular case is covered by the PHI
statement that is placed in the entry point of the block that joins
the then and the else.

On architectures that support longer blocks with conditional-execution
of arbitrary instructions (arm, ia64), I'm not sure how to handle the
debug notes.  It seems to me that, with the current design, the
variable may be regarded as untrackable after the first conditional
assignment within the combined blocks, but at the join point there
will be a the debug annotation corresponding to the PHI join that will
take care of getting a correct location for the variable again.

I don't have plans in place for any other kind of situation, but it
appears to me that the notion of using assignments and joins as fixed
points is solid, and I'm pretty sure any surprises can be overcome.

Of course software pipelining and other kinds of loop transformations
will yield debug information that's not exactly easy to grasp, but
this would be true of any representation.  When the compiler messes
too much with the code, there's very little one can do to make
execution resemble that of sequential execution.

I'm also thinking debug info consumers would probably enjoy some means
to tell a point at which all side effects present in a certain source
line have been completed.  But these are mostly orthogonal issues, so
I won't delve into them right now.

> For example, we need some way for an optimization pass to tell the
> rest of the compiler that a variable was completely eliminated.

In the design I'm proposing, there's no need for anything explicit
like this.  This would require global information, which is
undesirable, especially for optimizers that operate locally.  What
they'd have to do when they throw away a value that a debug annotation
relies on is to replace that value with something equivalent, if they
can, or to mark that particular annotation as untrackable.  Then, if
all annotations associated with a variable are untrackable, we know it
was completely optimized away.  But if any assignments remained
trackable, we can (and should, even though we don't have to) still
issue debug information for that.

Besides, optimization passes don't deal with user variables.  They
deal with implementation user variables, that initially resemble user
variables, but that quickly diverge.  Optimization passes shouldn't
have to care about user variables.  In my proposal, all they have to
do is to adjust expressions (that happen to be known to evaluate to
what user variables are expected to hold) such that they retain the
same value in spite of transformations they perform, or are marked as
untrackable if that's impossible or too difficult.  For the
optimizers, all that matters is the expressions, and they already have
to deal with these all over anyway.  It's the debug info generator
that deals with user-level variables, taking into account whatever the
optimizers tell it about how to determine the location of user
variables throughout the program.

> What changes will need to be made throughout the compiler to keep
> track of the state?

Very few, so far.  Pretty much all of the changes that I had to make
were to prevent the notes from disabling optimizations; very few of
them required updating of debug notes beyond whatever the optimization
pass would have done by default.  That said, I have no means to test
automatically that updates to debug annotations are being performed
correctly, but since optimizers as a rule have to update all uses of
whatever they mess with, I have reasons to believe that they do it
correctly, precisely because the debug notes look so much like regular
uses to them.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 17:48             ` Alexandre Oliva
  2007-11-09  2:09               ` Robert Dewar
@ 2007-11-09  2:13               ` Joe Buck
  2007-11-09 18:40                 ` Daniel Jacobowitz
  1 sibling, 1 reply; 189+ messages in thread
From: Joe Buck @ 2007-11-09  2:13 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Robert Dewar, Michael Matz, Richard Guenther, gcc-patches, gcc

On Thu, Nov 08, 2007 at 02:36:57PM -0200, Alexandre Oliva wrote:
> > 3. The quality of code at -O0 is really terrible
> 
> That's a feature, no?

Actually it's a misfeature, in that it's worse than it needs to
be, and it's worse in ways that increase the time required to produce it
(since a larger volume of code then has to be handled by the back end,
assembler, and linker).

Debugging would be just as easy and natural if -O0 only made sure that
values of variables are written out to memory at positions where the
user can set a breakpoint; the code doesn't need to preserve every
operation exactly as written, or read variables in from memory that
are already in registers.  Kind of an -O0.5 would be more desirable
than what we have now.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 17:48             ` Alexandre Oliva
@ 2007-11-09  2:09               ` Robert Dewar
  2007-11-12 17:52                 ` Alexandre Oliva
  2007-11-09  2:13               ` Joe Buck
  1 sibling, 1 reply; 189+ messages in thread
From: Robert Dewar @ 2007-11-09  2:09 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Michael Matz, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:
> On Nov  8, 2007, Robert Dewar <dewar@adacore.com> wrote:
> 
>> My general feelings on this subject:
> 
>> 1. I don't think we should care much about the ability to
>> *SET* values of variables in optimized code.
> 
> Indeed.  We should care about correctness of debug information, and
> then this ability will come naturally ;-)

Not really, there are optimizations that will still allow
reading the value of a variable, but not setting it, and
I think it is just fine to do these optimizations. For
instance if we have

    b = a;

the optimizer may not do a copy, it may simply know that
b and a values are in the same place. This does not stand
in the way of reading the value, but it does make it
impossible to write a or b.

Similarly, if the optimizer does test replacement, and
knows that the value of a can be obtained by evaluating
some expression, the debugger can read the value, but
may not be able to set it.
> 
>> 3. The quality of code at -O0 is really terrible
> 
> That's a feature, no?
> 


^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  0:08               ` Alexandre Oliva
@ 2007-11-09  1:26                 ` Ian Lance Taylor
  2007-11-09 12:22                   ` Robert Dewar
  2007-11-09 12:31                   ` Seongbae Park (박성배, 朴成培)
  0 siblings, 2 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-11-09  1:26 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> So...  The compiler is outputting code that tells other tools where to
> look for certain variables at run time, but it's putting incorrect
> information there.  How can you possibly argue that this is not a code
> correctness issue?

I don't see any point to going around this point again, so I'll just
note that I disagree.


> >> >> > We've fixed many many bugs and misoptimizations over the years due to
> >> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> >> >> > we've made in the past.
> >> >> 
> >> >> That's a valid concern.  However, per this reasoning, we might as well
> >> >> push every operand in our IL to separate representations, because
> >> >> there have been so many bugs and misoptimizations over the years,
> >> >> especially when the representation didn't make transformations
> >> >> trivially correct.
> >> 
> >> > Please don't use strawman arguments.
> >> 
> >> It's not, really.  A reference to an object within a debug stmt or
> >> insn is very much like any other operand, in that most optimizer
> >> passes must keep them up to date.  If you argue for pushing them
> >> outside the IL, why would any other operands be different?
> 
> > I think you misread me.  I didn't argue for pushing debugging
> > information outside the IL.  I argued against a specific
> > implementation--DEBUG_INSN--based on our experience with similar
> > implementations.
> 
> Do you remember any other notes that contained actual rtx expressions
> and expected optimization passes to keep them accurate?

No.

> Do you think
> we'd gain anything by moving them to a separate, out-of-line
> representation?

I don't know.  I don't see such a proposal on the table, and I don't
have one myself, so I don't know how to evaluate it.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 20:51                 ` Andrew Pinski
@ 2007-11-09  1:11                   ` Alexandre Oliva
  2007-11-09 11:28                   ` Robert Dewar
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-09  1:11 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: David Edelsohn, Mark Mitchell, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, "Andrew Pinski" <pinskia@gmail.com> wrote:

> On 11/7/07, Alexandre Oliva <aoliva@redhat.com> wrote:

>> I'm personally getting numerous requests for debug information
>> correctness and better completeness from debug info consumers such as
>> gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
>> ones never declared as inline, and its eagerness to corrupt the
>> meta-information associated with them, causes these tools to
>> malfunction in very many situations.  And it's all GCC's fault, for
>> generating code that is not standards-compliant in the
>> meta-information sections of its output.

> I have to ask, do you want an optimizing compiler or one which
> generates full debugging information????

I want both.  That's the whole point of this project I'm in.

> Because there are trade off here really.

For a superficial look at the problem, they might look like
trade-offs.  But the assumption that it's impossible to get both is
incorrect.  It takes work, but it's not impossible.

> The reason behind the extra inlining is because it
> improves code generation.

I don't see how you got the impression that I might be arguing against
the inlining, as it looks like you did.  I'm not.  I'm arguing against
the corruption of meta-information associated with them.  That's just
laziness on our part.

> Remember dwarf3 is not really a standards about meta-information, it
> just mentions how it represented if it exists.

That's what meta-information is.  One of the problems is that we often
fail to represent information that does exist.  A more serious problem
is that we often represent such information incorrectly, making it
seem like things that don't exist do, and that things are at different
locations from those in which they actually are.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 19:23             ` Ian Lance Taylor
@ 2007-11-09  0:08               ` Alexandre Oliva
  2007-11-09  1:26                 ` Ian Lance Taylor
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-09  0:08 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Ian Lance Taylor <iant@google.com> wrote:

> However, I don't think your arguments that this is
> an issue comparable to code correctness are valid.

It *is* code correctness.  Say, if the linker emitted incorrect
addresses in an executable, but the kernel and dynamic loader didn't
rely on those addresses, would it not still be a bug in the linker?
And then, if those tools started relying on those addresses and
exposed the problem, would it be right to tell them they must not rely
on them because they were broken in the past and we don't feel like
correcting the linker?

So...  The compiler is outputting code that tells other tools where to
look for certain variables at run time, but it's putting incorrect
information there.  How can you possibly argue that this is not a code
correctness issue?

> Incorrect generated code is a fatal problem in a compiler.
> Incorrect debugging information is a quality of implementation
> issue.

Incomplete debugging information is a quality of implementation, just
like missed optimizations.

Incorrect compiler output is a bug.  Claiming it's not just because
tools you happen to rely on don't care about that piece of information
won't make it any less of a bug.  It may make it a less important bug
for some time, but it's still a bug.

>> >> > We've fixed many many bugs and misoptimizations over the years due to
>> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
>> >> > we've made in the past.
>> >> 
>> >> That's a valid concern.  However, per this reasoning, we might as well
>> >> push every operand in our IL to separate representations, because
>> >> there have been so many bugs and misoptimizations over the years,
>> >> especially when the representation didn't make transformations
>> >> trivially correct.
>> 
>> > Please don't use strawman arguments.
>> 
>> It's not, really.  A reference to an object within a debug stmt or
>> insn is very much like any other operand, in that most optimizer
>> passes must keep them up to date.  If you argue for pushing them
>> outside the IL, why would any other operands be different?

> I think you misread me.  I didn't argue for pushing debugging
> information outside the IL.  I argued against a specific
> implementation--DEBUG_INSN--based on our experience with similar
> implementations.

Do you remember any other notes that contained actual rtx expressions
and expected optimization passes to keep them accurate?

All notes (as in matching NOTE_P) I remember didn't really contain rtx
expressions.  The first exception I remember is VAR_LOCATION, and this
one explicitly does *not* want to be updated, for it's generated so
late in the process.

Conversely, REG_NOTES do contain rtx, and they often have to be
updated, so that's the right representation for them.  Do you think
we'd gain anything by moving them to a separate, out-of-line
representation?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:44               ` Alexandre Oliva
  2007-11-08 18:37                 ` Alexandre Oliva
@ 2007-11-08 20:51                 ` Andrew Pinski
  2007-11-09  1:11                   ` Alexandre Oliva
  2007-11-09 11:28                   ` Robert Dewar
  1 sibling, 2 replies; 189+ messages in thread
From: Andrew Pinski @ 2007-11-08 20:51 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: David Edelsohn, Mark Mitchell, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

First off I would like to say I did not want to reply but I guess I am
going to because of some false information spreading around about what
GCC as a compiler is.

On 11/7/07, Alexandre Oliva <aoliva@redhat.com> wrote:

> I'm personally getting numerous requests for debug information
> correctness and better completeness from debug info consumers such as
> gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
> ones never declared as inline, and its eagerness to corrupt the
> meta-information associated with them, causes these tools to
> malfunction in very many situations.  And it's all GCC's fault, for
> generating code that is not standards-compliant in the
> meta-information sections of its output.

I have to ask, do you want an optimizing compiler or one which
generates full debugging information????  Because there are trade off
here really.  The reason behind the extra inlining is because it
improves code generation.  I don't know about you but in some area of
coding, they need the extra speed/size reductions that inlining of non
user marked functions.  I have plenty of code which needs the speed
help that the extra inling helps (remember some developers don't want
to change the code that much to have the optimizing compiler do its
work).

Remember dwarf3 is not really a standards about meta-information, it
just mentions how it represented if it exists.

-- Pinski

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 20:14                 ` David Daney
@ 2007-11-08 20:41                   ` Mark Mitchell
  0 siblings, 0 replies; 189+ messages in thread
From: Mark Mitchell @ 2007-11-08 20:41 UTC (permalink / raw)
  To: David Daney
  Cc: Alexandre Oliva, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

David Daney wrote:

>> (a) if the variable has not been optimized away, gives the location
>> where that variable's current value can be found, or
>> (b) if the variable has been optimized away, and the value is not a
>> constant, says that the value is not available, or
> 
> Perhaps if the variable has been optimized away *but* it is possible to
> calculate its value by examining the state of the program, then we can
> emit the expression needed to calculate its value in the debugging
> information as well.

Yes, that's a good addition.  To be clear, I'm not trying to set the
goals here; I'm just trying to make sure we have a clear set of
objectives and a plan to get there.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 20:07               ` Mark Mitchell
@ 2007-11-08 20:14                 ` David Daney
  2007-11-08 20:41                   ` Mark Mitchell
  2007-11-09  9:10                 ` Alexandre Oliva
  1 sibling, 1 reply; 189+ messages in thread
From: David Daney @ 2007-11-08 20:14 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Alexandre Oliva, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Mark Mitchell wrote:
> Alexandre Oliva wrote:
>> On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
>>
>>> Until we all know what we're trying to do
>> Here's what I am trying to do:
> 
> I think these are laudable goals, but you didn't really provide the
> information I wanted.  In particular, what I'd like to drill down from
> goals (like "ensure that, for every user variable for which we emit
> debug information, the information is correct") to concrete problems.
> 
> I think that most of the goals boil down to making sure that, at any
> point in the program, the debug information for a variable meets the
> following criteria:
> 
> (a) if the variable has not been optimized away, gives the location
> where that variable's current value can be found, or
> (b) if the variable has been optimized away, and the value is not a
> constant, says that the value is not available, or

Perhaps if the variable has been optimized away *but* it is possible to 
calculate its value by examining the state of the program, then we can 
emit the expression needed to calculate its value in the debugging 
information as well.

I may be missing something, but it seems that may be part of Alexandre's 
plan as well.


> (c) if the variable has been optimized away, but is a constant, says
> what the constant value is

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  6:39             ` Alexandre Oliva
  2007-11-08 19:13               ` Alexandre Oliva
@ 2007-11-08 20:07               ` Mark Mitchell
  2007-11-08 20:14                 ` David Daney
  2007-11-09  9:10                 ` Alexandre Oliva
  1 sibling, 2 replies; 189+ messages in thread
From: Mark Mitchell @ 2007-11-08 20:07 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:
> On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
> 
>> Until we all know what we're trying to do
> 
> Here's what I am trying to do:

I think these are laudable goals, but you didn't really provide the
information I wanted.  In particular, what I'd like to drill down from
goals (like "ensure that, for every user variable for which we emit
debug information, the information is correct") to concrete problems.

I think that most of the goals boil down to making sure that, at any
point in the program, the debug information for a variable meets the
following criteria:

(a) if the variable has not been optimized away, gives the location
where that variable's current value can be found, or
(b) if the variable has been optimized away, and the value is not a
constant, says that the value is not available, or
(c) if the variable has been optimized away, but is a constant, says
what the constant value is

Is that right?  (Note "at any point" above; it might be that the
variable is present in r0 for a while, and then optimized away, and then
present at *0xdeadbeef for a while, and then has the constant value 7.)

If so, how are you proposing to accomplish that?  It's easy enough to
design a representation (whether in the instruction stream, or on the
side) that says "from instruction A to instruction B, the value is in
this location".  So, I don't think we need to worry about that just yet.

But, how are we going to track this information?  Algorithmically, what
needs to change in the compiler to maintain this state?

For example, we need some way for an optimization pass to tell the rest
of the compiler that a variable was completely eliminated.  (Perhaps,
for example, because all uses of the variable were eliminated.)  So,
maybe we need a debug_var_eliminated API.  Then, every pass that blows
away variables can call this function, which can make whatever notations
on the VAR_DECL are required.  I'm not claiming that's the right
approach, but I'd like to understand the plan at that kind of level.

What changes will need to be made throughout the compiler to keep track
of the state?

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:14           ` Alexandre Oliva
  2007-11-08 18:28             ` Alexandre Oliva
@ 2007-11-08 19:23             ` Ian Lance Taylor
  2007-11-09  0:08               ` Alexandre Oliva
  1 sibling, 1 reply; 189+ messages in thread
From: Ian Lance Taylor @ 2007-11-08 19:23 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:
> 
> >> Does it really matter?  Do we compromise standards compliance (and so
> >> violently, while at that) in any aspect of the compiler?
> 
> > What standards are you talking about?
> 
> Debug information standards such as DWARF-3.

...

> Incorrectness in the compiler output is always a bug.  No matter how
> hard it is to implement, or how resource-intensive the solution is,
> arguing that we've made a trade-off and decided to generate wrong
> output for this case is a clever decision.

I'm sorry, I've thought about it, but I don't buy this argument.  I'm
certainly willing to talk about improving debug information for
optimized code, and clearly it is more important to more people than I
initially thought.  However, I don't think your arguments that this is
an issue comparable to code correctness are valid.  Incorrect
generated code is a fatal problem in a compiler.  Incorrect debugging
information is a quality of implementation issue.


> >> > We've fixed many many bugs and misoptimizations over the years due to
> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> >> > we've made in the past.
> >> 
> >> That's a valid concern.  However, per this reasoning, we might as well
> >> push every operand in our IL to separate representations, because
> >> there have been so many bugs and misoptimizations over the years,
> >> especially when the representation didn't make transformations
> >> trivially correct.
> 
> > Please don't use strawman arguments.
> 
> It's not, really.  A reference to an object within a debug stmt or
> insn is very much like any other operand, in that most optimizer
> passes must keep them up to date.  If you argue for pushing them
> outside the IL, why would any other operands be different?

I think you misread me.  I didn't argue for pushing debugging
information outside the IL.  I argued against a specific
implementation--DEBUG_INSN--based on our experience with similar
implementations.

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  6:39             ` Alexandre Oliva
@ 2007-11-08 19:13               ` Alexandre Oliva
  2007-11-08 20:07               ` Mark Mitchell
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08 19:13 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Until we all know what we're trying to do

Here's what I am trying to do:

1. Ensure that, for every user variable for which we emit debug
information, the information is correct, i.e., if it says the value of
a variable at a certain instruction is at certain locations, or is a
known constant, then the variable must not be at any other location at
that point, and the locations or values must match reasonable
expectations based on source code inspection.

2. Defining "reasonable expectations" is tricky, for code reordering
typical of optimization can make room for numerous surprises.  I don't
have a precise definition for this yet, but very clearly to me saying
that a variable holds a value that it couldn't possibly hold (e.g.,
because it is only assigned that value in a code path that is
knowingly not taken) is a very clear indication that something is
amiss.  The general guiding rule is, if we aren't sure the information
is correct (or we're sure it isn't), we shouldn't pretend that it is.

3. Try to ensure that, if the value of a variable is a known constant
at a certain point in the program, this information is present in
debug information.

4. Try to ensure that, if the value of a variable is available at any
location at a certain point in the program, this information is
present in debug information.

5. Stop missing optimizations for the sake of improving debug
information.

6. Avoid using additional memory and CPU cycles that would be needed
only for debug information when compiling without generating debug
information

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:15                 ` Alexandre Oliva
@ 2007-11-08 18:44                   ` Alexandre Oliva
  0 siblings, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:44 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: David Edelsohn, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> First and foremost, I want to know what, concretely, Alexandre is
> trying to achieve, beyond "better debugging info for optimized
> code".

I'm not really going for "better".  I'm going for "correct" first,
while making room for "better", and hopefully already getting better,
in the process.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:44               ` Alexandre Oliva
@ 2007-11-08 18:37                 ` Alexandre Oliva
  2007-11-08 20:51                 ` Andrew Pinski
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:37 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, David Edelsohn <dje@watson.ibm.com> wrote:

> 	Who is "we"?  What better debugging are GCC users demanding?  What
> debugging difficulties are they experiencing?

I, for one, miss the arguments of inlined functions, a lot.

The reason for that is that arguments are currently optimized away to
boot.  Even if they weren't, since they're initialized with a trivial
copy, at least their initial value (quite often preserved throughout
compilation) would be gone to boot.

On top of that, we currently regard arguments and variables of
non-inlined functions as special, and we prevent a number of
optimizations with them, in order to be able to generate slightly
better debug information for them.  (As for arguments and variables of
inlined functions, we happily drop them on the floor right away.)
This is not only inconsistent, it's also harmful, because we're
trading performance and compile-time memory for slightly better but
still incorrect, incomplete and unreliable debug information.

> Who is that set of users?

I'm personally getting numerous requests for debug information
correctness and better completeness from debug info consumers such as
gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
ones never declared as inline, and its eagerness to corrupt the
meta-information associated with them, causes these tools to
malfunction in very many situations.  And it's all GCC's fault, for
generating code that is not standards-compliant in the
meta-information sections of its output.

> What functional changes would improve those cases?  What is the cost of
> those improvements in complexity, maintainability, compile time, object
> file size, GDB start-up time, etc.?

Before I spend hours describing the little I can foresee about this,
how much of this really matters, given that it's mostly a matter of
correctness, rather than mere trade offs?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:14           ` Alexandre Oliva
@ 2007-11-08 18:28             ` Alexandre Oliva
  2007-11-08 19:23             ` Ian Lance Taylor
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:28 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:

>> Does it really matter?  Do we compromise standards compliance (and so
>> violently, while at that) in any aspect of the compiler?

> What standards are you talking about?

Debug information standards such as DWARF-3.

> I'm not aware of any standard for debuggability of optimized code.

I'm talking about standards that specify how a compiler should encode
meta-information about how source code concepts map to the code it
generated.  See, for example, section 2.6 in the Dwarf-3
specification.  It talks very little about optimization, but it does
discuss what a DW_AT_location, if present, means.  It doesn't say
anything like: "if a variable is available at a certain location most
of the time, you can emit a DW_AT_location that refers to that
location".  It says:

  Debugging information must provide consumers a way to find the
  location of program variables, determine the bounds of dynamic
  arrays and strings, and possibly to find the base address of a
  subroutine’s stack frame or the return address of a subroutine

See, it's not about debuggers, it's about consumers.  It's an
obligation, not really an option (that said, DW_AT_location *is*
optional).

  1. Location expressions, which are a language independent
     representation of addressing rules of arbitrary complexity built
     from DWARF expressions. They are sufficient for describing the
     location of any object as long as its lifetime is either static
     or the same as the lexical block that owns it, and it does not
     move throughout its lifetime.

  2. Location lists, which are used to describe objects that have a
     limited lifetime or change their location throughout their
     lifetime.

Nowhere does it state that, "if the compiler can't quite keep track of
the location of a variable, it can be sloppy and emit just whatever is
simpler or appears to make sense".

  Address ranges may overlap. When they do, they describe a situation
  in which an object exists simultaneously in more than one place. If
  all of the address ranges in a given location list do not
  collectively cover the entire range over which the object in
  question is defined, it is assumed that the object is not available
  for the portion of the range that is not covered.

So, it does make room for *some* sloppiness, after all.  That's what I
refer to as "incompleteness of debug information".  If we fail to keep
track of where an object is, it's sort-of ok (although undesirable) to
emit debug information that omits the location of the object in
certain program regions where it might be live.

However, it is not standard-compliant to emit information stating that
the object is available at certain locations if it is NOT really
there, or if it is available elsewhere, in addition to or instead of
the locations we've emitted.  That's what I refer to as "incorrectness
of debug information".

Incorrectness in the compiler output is always a bug.  No matter how
hard it is to implement, or how resource-intensive the solution is,
arguing that we've made a trade-off and decided to generate wrong
output for this case is a clever decision.

Incompleteness is a completely different issue.  This is where we
*can* afford to make trade-offs.  Just like we can decide to omit
certain optimizations, or to not carry them out to the greatest
possible extent, or to experiment with various different heuristics,
we could afford to emit incomplete debug information, it's "just" a
quality of implementation issue.  But not incorrect debug information,
that's just a bug.

> gcc's users are definitely calling for a faster compiler.  Are they
> calling for better debuggability of optimized code?

This is not just about debuggability, as I've tried to explain all the
way from the beginning of the discussion, maybe a couple of months
ago.  Debug information is not just about debuggers any more.  There
are good reasons why the Dwarf-3 standard says "consumers" rather than
"debuggers".  It's no longer just a matter of convenience, recompile
with -g0 if you want to debug it.  It's a matter of correctness, for
various monitoring tools now rely on this meta-information, and
rightfully so.

>> > We've fixed many many bugs and misoptimizations over the years due to
>> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
>> > we've made in the past.
>> 
>> That's a valid concern.  However, per this reasoning, we might as well
>> push every operand in our IL to separate representations, because
>> there have been so many bugs and misoptimizations over the years,
>> especially when the representation didn't make transformations
>> trivially correct.

> Please don't use strawman arguments.

It's not, really.  A reference to an object within a debug stmt or
insn is very much like any other operand, in that most optimizer
passes must keep them up to date.  If you argue for pushing them
outside the IL, why would any other operands be different?

> As I understand your proposal, it materializes variables which were
> otherwise omitted from the generated program.  It doesn't address the
> other issues with debugging optimized code, like bouncing around
> between program lines.  Is that correct?  What else does your proposal
> do?

All it does is to try to carry information about what value the user
is entitled to expect a variable to hold at each point in the program
throughout compilation.  Such that, even if the compiler doesn't
retain something that represents only that variable through to the end
of the compilation, we still have information about where, or at least
what, its value is, if it is available anywhere, such that we can
include this piece of data in the debug information.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 16:32             ` Michael Matz
@ 2007-11-08 18:18               ` Alexandre Oliva
  2007-11-09 14:23                 ` Michael Matz
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:18 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Michael Matz <matz@suse.de> wrote:

> If you want to be really sure no arguments disappear (necessary for
> instance for meaningful use of systemtap) you also need to inhibit
> some transformations,

I'm not aware of any situations in which we must force an argument not
to disappear.  All of the problems I'm aware of are those in which the
argument is there, we're just missing debug information for it.  If
you have information about needs for preserving arguments that are
actually dead, please send it my way.

> This is a problem on it's own.  We're planning to work on this somewhen 
> during the next months, i.e. improve code quality at -O0 at least to a 
> point it was in the 3.x line of GCC.

Aah, I guess the problem here is all the gimple-introduced temps,
right?  That our current -O0 is more like -O-1? :-)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 15:13           ` Robert Dewar
  2007-11-08 16:11             ` H.J. Lu
  2007-11-08 16:32             ` Michael Matz
@ 2007-11-08 17:48             ` Alexandre Oliva
  2007-11-09  2:09               ` Robert Dewar
  2007-11-09  2:13               ` Joe Buck
  2 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08 17:48 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Michael Matz, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Robert Dewar <dewar@adacore.com> wrote:

> My general feelings on this subject:

> 1. I don't think we should care much about the ability to
> *SET* values of variables in optimized code.

Indeed.  We should care about correctness of debug information, and
then this ability will come naturally ;-)

> 3. The quality of code at -O0 is really terrible

That's a feature, no?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 11:22         ` Michael Matz
  2007-11-08 15:13           ` Robert Dewar
@ 2007-11-08 16:37           ` Alexandre Oliva
  1 sibling, 0 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08 16:37 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Wed, 7 Nov 2007, Alexandre Oliva wrote:

>> > x and y at the appropriate part.  Whatever holds 'x' at a point (SSA 
>> > name, pseudo or mem) will also mention that it holds 'c'.  At a later 
>> > point whichever holds 'y' will also mention in holds 'c' .
>> 
>> I.e., there will be two parallel locations throughout the entire 
>> function that hold the value of 'c'.

> No.  For some PC locations the location of 'c' will happen to be the same 
> as the one holding 'x', and for a different set of PC locations it will be 
> the one also holding 'y'.

So we're in agreement.  What you say is how it ought to be done, what
I did was to point out that the representation proposed by richi will
be unable to do the right thing.

>> f(int x /* but also c */, int y /* but also c */) { /* other vars */

> "int x /* but also c */, int y /* but also c */" implies that x == y 
> already

No, per the posted design (assuming I understood it correctly) it just
implies that, at some point in the program, an assignment 'c = x' was
optimized away, and that at some other point in the program, an
assignment 'c = y' was optimized away.

>> do_something_with(x, ...); // doesn't touch x or y
>> do_something_else_with(y, ...); // doesn't touch x or y
>> 
>> Now, what will you get if you 'print c' in the debugger (or if any
>> other debug info evaluator needs to tell what the value of user
>> variable c is) at a point within do_something_with(c,...) or
>> do_something_else_with(c)?

> ... so the answer would be "whatever is in that common place for x,y and 
> c".

And once we removed the incorrect assumption you made, that 'x == y',
what do you get?

> How come that f::c is actually set to p$x?

It was in the original source code, was it not?  p$x was passed to f()
as x, and then x was copied to c.

> I don't see any assignment and in fact no declaration for c in f.
> If you had one _that_ would be the place were the connection between
> p$x and 'c' would have been made and everything would fall in place.

Since there is a declaration of c in the original source-level f (the
only one that matters, as far as debug information is concerned), can
you please expand on how you'd get everything to fall in place?

> It's not possible that p$x _and_ p$y are f()::c.1 at the same time,

Exactly

> so the above examples are all somehow invalid.

It's the bitmap debug info representation that makes them nonsensical.

> int f(int y) {
>   int x = 2 * y;
>   return x + 2;
> }

> If the compiler forward-props 2*y into the single use and simplifies:

>   return (y+1)*2;

> then the value 2*y is never actually calculated anymore, not in any 
> register, not in any local variable, nowhere.  There's no way debug 
> information could generally rectify this loss of information.

Actually, while y is live, debug information could encode that x is
2*y, even if the value is not computed at run time.  So your statement
is quite an exaggeration.

> In case of more complicated expressions that's not possible anymore
> and you lose.

Yep.  If the value is unavailable, debug information should say so,
rather than pointing at something else.

> Forcing some values life is possible,

But undesirable.  I'm not trying to do that.  Actually, I'm working
hard to make sure it doesn't happen.

> So, our mapping is as accurate as your's.

Not at all, and you made that point yourself, twice, in a single
e-mail.

> It seems in your branch you also force some values life IIUC.

Nope.  Any values that are forced live by debug annotations are bugs
to be fixed.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 15:13           ` Robert Dewar
  2007-11-08 16:11             ` H.J. Lu
@ 2007-11-08 16:32             ` Michael Matz
  2007-11-08 18:18               ` Alexandre Oliva
  2007-11-08 17:48             ` Alexandre Oliva
  2 siblings, 1 reply; 189+ messages in thread
From: Michael Matz @ 2007-11-08 16:32 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Hi,

On Thu, 8 Nov 2007, Robert Dewar wrote:

> significantly degraded -O1 debugging. I have found for
> instance that debugging the GNAT compiler itself, -O1
> used to be perfectly fine, but now far too many arguments
> and variables disappear.

Yes.  That problem is addressed by Alexandre's approach and by ours.  If 
you want to be really sure no arguments disappear (necessary for instance 
for meaningful use of systemtap) you also need to inhibit some 
transformations, which can be done under a certain option (which might or 
might not be on by default for -O1).

> 3. The quality of code at -O0 is really terrible compared
> to the competition (at least in the case of Ada), and
> large scale programs are just too big at -O0 to be
> practical (there is a big difference between a 50
> megabyte image and a 100 megabyte image).

This is a problem on it's own.  We're planning to work on this somewhen 
during the next months, i.e. improve code quality at -O0 at least to a 
point it was in the 3.x line of GCC.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 15:13           ` Robert Dewar
@ 2007-11-08 16:11             ` H.J. Lu
  2007-11-08 16:32             ` Michael Matz
  2007-11-08 17:48             ` Alexandre Oliva
  2 siblings, 0 replies; 189+ messages in thread
From: H.J. Lu @ 2007-11-08 16:11 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Michael Matz, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

On Thu, Nov 08, 2007 at 08:59:18AM -0500, Robert Dewar wrote:
> 2. It is much more important to have reasonable debugging
> for most users than the last mile of optimization. For me
> we should ensure that -O1 is still reasonably debuggable.
> The switch to GCC 4, at least in the Ada context, has
> significantly degraded -O1 debugging. I have found for
> instance that debugging the GNAT compiler itself, -O1
> used to be perfectly fine, but now far too many arguments
> and variables disappear.
> 

With gcc 3.4, I can debug binutils at -O1 and -O2 in some cases.
But with gcc 4, I have to use -O0 if I want to do any serious
debug on binutils.



H.J.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 11:22         ` Michael Matz
@ 2007-11-08 15:13           ` Robert Dewar
  2007-11-08 16:11             ` H.J. Lu
                               ` (2 more replies)
  2007-11-08 16:37           ` Alexandre Oliva
  1 sibling, 3 replies; 189+ messages in thread
From: Robert Dewar @ 2007-11-08 15:13 UTC (permalink / raw)
  To: Michael Matz; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

My general feelings on this subject:

1. I don't think we should care much about the ability to
*SET* values of variables in optimized code. You can
definitely do without that. So if a variable exists in
two places, no problem, just register one of them.

2. It is much more important to have reasonable debugging
for most users than the last mile of optimization. For me
we should ensure that -O1 is still reasonably debuggable.
The switch to GCC 4, at least in the Ada context, has
significantly degraded -O1 debugging. I have found for
instance that debugging the GNAT compiler itself, -O1
used to be perfectly fine, but now far too many arguments
and variables disappear.

3. The quality of code at -O0 is really terrible compared
to the competition (at least in the case of Ada), and
large scale programs are just too big at -O0 to be
practical (there is a big difference between a 50
megabyte image and a 100 megabyte image). So we really
cannot rely on using -O0 for debugging. At -O1 we are
more than competitive for performance with competing
compilers.

4. In any case, most users really prefer to test and
debug at the same optimization level that they will
use for delivery. As noted above, -O0 is seldom practical
for delivery (furthermore the voluminous extra code makes
certification at the object level more work). -O1 is a
fine compromise from a performance point of view, but
needs to be debuggable.

5. Among our users we have relatively few who care about
even a factor of 2 in performance, and VERY few who care
about 10%. On the other hand we have lots of customers
who definitely have severe problems with the lack of
debuggability of -O1 code.

5. We have talked sometime about a -Od level or somesuch
that would be fully debuggable. That's an interesting
idea, but I think in practice it is more reasonable to
try to ensure good debugging at -O1. Optimizations that
significantly intefere with debugging should be moved
to -O2. I think it is fine for -O2 to mean "optimize
the heck out of the program, I really care about the
last ounce of optimization, and I know debuggability
will suffer."

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 19:03       ` Designs for better debug info in GCC Alexandre Oliva
@ 2007-11-08 11:22         ` Michael Matz
  2007-11-08 15:13           ` Robert Dewar
  2007-11-08 16:37           ` Alexandre Oliva
  0 siblings, 2 replies; 189+ messages in thread
From: Michael Matz @ 2007-11-08 11:22 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Hi,

On Wed, 7 Nov 2007, Alexandre Oliva wrote:

> > x and y at the appropriate part.  Whatever holds 'x' at a point (SSA 
> > name, pseudo or mem) will also mention that it holds 'c'.  At a later 
> > point whichever holds 'y' will also mention in holds 'c' .
> 
> I.e., there will be two parallel locations throughout the entire 
> function that hold the value of 'c'.

No.  For some PC locations the location of 'c' will happen to be the same 
as the one holding 'x', and for a different set of PC locations it will be 
the one also holding 'y'.  The request "what's in 'c'" from a debugger 
only makes sense when done from a certain program counter.  Depending on 
that the location of 'c' will be different.  In the case from above both 
locations might exist in parallel throughout the entire function, but they 
don't hold 'c' in parallel.

> Something like:
> 
> f(int x /* but also c */, int y /* but also c */) { /* other vars */

"int x /* but also c */, int y /* but also c */" implies that x == y 
already, at which point the compiler will most probably have allocated 
just one place for x and y (and c) anyway ...

>  do_something_with(x, ...); // doesn't touch x or y
>  do_something_else_with(y, ...); // doesn't touch x or y
> 
> Now, what will you get if you 'print c' in the debugger (or if any
> other debug info evaluator needs to tell what the value of user
> variable c is) at a point within do_something_with(c,...) or
> do_something_else_with(c)?

... so the answer would be "whatever is in that common place for x,y and 
c".  If the compiler did not allocate one place for x and y the answer 
still would be "whatever is in the place of 'y'", because that value is 
life, unlike 'x'.

> Now consider that f is inlined into the following code:
> 
> int g(point2d p) {
>   /* lots of code */
>   f(p.x, p.y);
>   /* more code */
>   f(p.y, p.x);
>   /* even more code */
> }
> 
> g gets fully scalarized, so, before inlining, we have:
> 
> int g(point2d p) {
>   int p$x = p.x, int p$y = p.y;
>   /* lots of code */
>   f(p$x, p$y);
>   /* more code */
>   f(p$y, p$x);
>   /* even more code */
> }
> 
> after inlining of f, we end up with:
> 
> int g(point2d p) {
>   int p$x = p.x, int p$y = p.y;
>   /* lots of code */
>   { int f()::x.1 /* but also f()::c.1 */ = p$x, f()::y.1 /* but also f()::c.1 */ = p$y;

Here you punt.  How come that f::c is actually set to p$x?  I don't see 
any assignment and in fact no declaration for c in f.  If you had one 
_that_ would be the place were the connection between p$x and 'c' would 
have been made and everything would fall in place.

>     { /* other vars */
>       do_something_with(f()::x.1, ...); // doesn't touch x or y
>       do_something_else_with(f()::y.1, ...); // doesn't touch x or y
>   } }
>   /* more code */
>   { int f()::x.2 /* but also f()::c.2 */ = p$x, f()::y.2 /* but also f()::c.2 */ = p$y;
>     { /* other vars */
>       do_something_with(f()::x.2, ...); // doesn't touch x or y
>       do_something_else_with(f()::y.2, ...); // doesn't touch x or y
>   } }
>   /* even more code */
> }
> 
> then, we further optimize g and get:
> 
> int g(point2d p) {
>   int p$x /* but also f()::x.1, f()::c.1, f()::y.2, f()::c.2 */ = p.x;
>   int p$y /* but also f()::y.1, f()::c.1, f()::x.2, f()::c.2 */ = p.y;
>   /* lots of code */
>   { { /* other vars */
>       do_something_with(p$x, ...); // doesn't touch x or y
>       do_something_else_with(p$y, ...); // doesn't touch x or y
>   } }
>   /* more code */
>   { { /* other vars */
>       do_something_with(p$y, ...); // doesn't touch x or y
>       do_something_else_with(p$x, ...); // doesn't touch x or y
>   } }
>   /* even more code */
> }
> 
> and now, if you try to resolve the variable name 'c' to a location or
> a value within any of the occurrences of do_something_*with(), what do
> you get?  What ranges do you generate for each of the variables
> involved?

It's not possible that p$x _and_ p$y are f()::c.1 at the same time, so the 
above examples are all somehow invalid.  Except if p$x and p$y are somehow 
the same value, and if that's the case it's enough and exactly correct if 
the range of f()::c.1 covers the whole body of your function 'g' referring 
to exactly the one location of f()::c.1, f()::c.2, p$x and p$y.

> Unfortunately, this mapping is not biunivocal.  The chosen 
> representation is fundamentally lossy.

What's fundamentally lossy are transformations done by the compiler.  E.g. 
in this simple case:

int f(int y) {
  int x = 2 * y;
  return x + 2;
}

If the compiler forward-props 2*y into the single use and simplifies:

  return (y+1)*2;

then the value 2*y is never actually calculated anymore, not in any 
register, not in any local variable, nowhere.  There's no way debug 
information could generally rectify this loss of information.  As DWARF is 
capable to encode complete expressions it would be possible in this case 
to express it, because the inverse of the above function is easily 
determined.  In case of more complicated expressions that's not possible 
anymore and you lose.

So, if the value is never ever computed anymore debug information won't 
help you.  You either have to force the value you're interested in to be 
life, or live with the impreciseness.

Forcing some values life is possible, but is independend of generating 
debug information as exact as possible.  It must be independend because 
forcing values life is going to change the code, something which mere 
generation of debug information is not allowed to do.

So, our mapping is as accurate as your's.  If a value is computed in some 
place which can be traced back to some user-declared variable then this 
will be expressed.  If the value is not available then of course it also 
can't be reflected in the debug information (only as "optimized out").  It 
seems in your branch you also force some values life IIUC.  That's okay 
but doesn't have to do with generating precise debug information as shown 
above.

Even for forcing values life there are easier mechanisms.  We for instance 
experimented with volatile asms, which simply refer to the values in 
question (and unsurprisingly we also were interested in formal arguments 
of inlined functions):

  int f (int x) {
    force_use (x);
    ... old body ...
  }

You have to switch off any propagation into force_use(x), so that the 
original value of 'x' and the connection to the DECL of 'x' lives until 
the end of the compilation pipeline.  That's a rather simple hack doing 
exactly what's necessary: it forces GCC to actually have a place for the 
value of 'x' at the function entry point, which also survives inlining.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:01           ` Mark Mitchell
  2007-11-08  0:28             ` David Edelsohn
  2007-11-08  6:39             ` Alexandre Oliva
@ 2007-11-08 10:11             ` Richard Guenther
  2 siblings, 0 replies; 189+ messages in thread
From: Richard Guenther @ 2007-11-08 10:11 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc-patches, gcc

On 11/8/07, Mark Mitchell <mark@codesourcery.com> wrote:
> Ian Lance Taylor wrote:
>
> > At one time, gcc actually provided better debugging of optimized code
> > than any other compiler, though I don't know if that is still true.
> > Optimized gcc code is still debuggable today.  I do it all the time.
> > (For me poor support for debugging C++ is a much bigger issue, though
> > I think that is an issue more with gdb than with gcc.)
>
> I think we all agree that providing better debugging of optimized code
> is a priori a good thing.  So, as I see it, this thread is focused on
> what internal representation we might use for that.
>
> I don't know that there's an abstract right answer to whether something
> NOTE-like or something on the side is better.  There are problems with
> both approaches.  We know the NOTE/DEBUG_INSN thing is going to break,
> from experience; we also know the on-the-side thing is going to be hard
> to maintain.

I think we're going to find out once both approaches are implemented up to a
way that they reasonably to what they want to do.  So I'm fine to defer this
decision up to that point (or the point where we start the fighting on which
approach will get merged).

> Alexandre has clearly thought about this a lot.  I'd like to start by
> capturing the functional changes that we want to make to GCC's debug
> output -- not the changes that we want in the debug experience, or
> changes that we need in GDB, but the changes in the generated DWARF.
>
> For example, I'm thinking of a series of function test cases.  Ignore
> the substance of this example -- I'm making it up! -- I'm just trying to
> capture the form.
>
> ===
> int main () { int i; i = 3; return i; }
>
> When optimizing, "i" is optimized away.  The debug info for "i" right
> before the return statement says "i has been optimized away", but not
> what its value is.  I think it should say that the value is "3".  To do
> that, we need to emit a DW_Now_My_Value_is_3 tag for "i".
> ===
>
> Now, how is whatever representation we pick going to get us that?  Is
> the Oliva representation sufficient?  What about the Guenther/Matz
> representation?  Independently of the representation, what algorithms
> are we going to use to track whatever we need to track as the optimizers
> remove, insert, duplicate, and reorder code?

For the example above, the representation we use on the tree level cannot
attach a name to '3' (since obviously '3' is not a SSA_NAME).  But this is
fixable if we think it is worthwhile.

> Until we all know what we're trying to do, I don't see how we can make a
> good decision about the representation.  Clearly, in the abstract, we
> can represent data either on-the-side or in the instruction stream, but
> until we know what output we want, I'm not sure how we can pick.

That's true.  I was also thinking on how to properly do testcases for both kind
of infrastructure.  At the moment I scan tree/rtl dumps for the names I want
to preserve, but ultimately it would be nice to be able to run gdb testcases in
the gcc tree to also verify 'correctness' of the information we produce (and
not just existence of some information).

Richard.

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:01           ` Mark Mitchell
  2007-11-08  0:28             ` David Edelsohn
@ 2007-11-08  6:39             ` Alexandre Oliva
  2007-11-08 19:13               ` Alexandre Oliva
  2007-11-08 20:07               ` Mark Mitchell
  2007-11-08 10:11             ` Richard Guenther
  2 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08  6:39 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Until we all know what we're trying to do

Here's what I am trying to do:

1. Ensure that, for every user variable for which we emit debug
information, the information is correct, i.e., if it says the value of
a variable at a certain instruction is at certain locations, or is a
known constant, then the variable must not be at any other location at
that point, and the locations or values must match reasonable
expectations based on source code inspection.

2. Defining "reasonable expectations" is tricky, for code reordering
typical of optimization can make room for numerous surprises.  I don't
have a precise definition for this yet, but very clearly to me saying
that a variable holds a value that it couldn't possibly hold (e.g.,
because it is only assigned that value in a code path that is
knowingly not taken) is a very clear indication that something is
amiss.  The general guiding rule is, if we aren't sure the information
is correct (or we're sure it isn't), we shouldn't pretend that it is.

3. Try to ensure that, if the value of a variable is a known constant
at a certain point in the program, this information is present in
debug information.

4. Try to ensure that, if the value of a variable is available at any
location at a certain point in the program, this information is
present in debug information.

5. Stop missing optimizations for the sake of improving debug
information.

6. Avoid using additional memory and CPU cycles that would be needed
only for debug information when compiling without generating debug
information

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:28             ` David Edelsohn
  2007-11-08  5:01               ` Mark Mitchell
@ 2007-11-08  5:44               ` Alexandre Oliva
  2007-11-08 18:37                 ` Alexandre Oliva
  2007-11-08 20:51                 ` Andrew Pinski
  1 sibling, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08  5:44 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, David Edelsohn <dje@watson.ibm.com> wrote:

> 	Who is "we"?  What better debugging are GCC users demanding?  What
> debugging difficulties are they experiencing?

I, for one, miss the arguments of inlined functions, a lot.

The reason for that is that arguments are currently optimized away to
boot.  Even if they weren't, since they're initialized with a trivial
copy, at least their initial value (quite often preserved throughout
compilation) would be gone to boot.

On top of that, we currently regard arguments and variables of
non-inlined functions as special, and we prevent a number of
optimizations with them, in order to be able to generate slightly
better debug information for them.  (As for arguments and variables of
inlined functions, we happily drop them on the floor right away.)
This is not only inconsistent, it's also harmful, because we're
trading performance and compile-time memory for slightly better but
still incorrect, incomplete and unreliable debug information.

> Who is that set of users?

I'm personally getting numerous requests for debug information
correctness and better completeness from debug info consumers such as
gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
ones never declared as inline, and its eagerness to corrupt the
meta-information associated with them, causes these tools to
malfunction in very many situations.  And it's all GCC's fault, for
generating code that is not standards-compliant in the
meta-information sections of its output.

> What functional changes would improve those cases?  What is the cost of
> those improvements in complexity, maintainability, compile time, object
> file size, GDB start-up time, etc.?

Before I spend hours describing the little I can foresee about this,
how much of this really matters, given that it's mostly a matter of
correctness, rather than mere trade offs?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:01               ` Mark Mitchell
@ 2007-11-08  5:15                 ` Alexandre Oliva
  2007-11-08 18:44                   ` Alexandre Oliva
  2007-11-23  2:20                 ` Frank Ch. Eigler
  1 sibling, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08  5:15 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: David Edelsohn, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> First and foremost, I want to know what, concretely, Alexandre is
> trying to achieve, beyond "better debugging info for optimized
> code".

I'm not really going for "better".  I'm going for "correct" first,
while making room for "better", and hopefully already getting better,
in the process.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 23:05         ` Ian Lance Taylor
  2007-11-07 23:28           ` Daniel Jacobowitz
  2007-11-08  0:01           ` Mark Mitchell
@ 2007-11-08  5:14           ` Alexandre Oliva
  2007-11-08 18:28             ` Alexandre Oliva
  2007-11-08 19:23             ` Ian Lance Taylor
  2 siblings, 2 replies; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-08  5:14 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:

>> Does it really matter?  Do we compromise standards compliance (and so
>> violently, while at that) in any aspect of the compiler?

> What standards are you talking about?

Debug information standards such as DWARF-3.

> I'm not aware of any standard for debuggability of optimized code.

I'm talking about standards that specify how a compiler should encode
meta-information about how source code concepts map to the code it
generated.  See, for example, section 2.6 in the Dwarf-3
specification.  It talks very little about optimization, but it does
discuss what a DW_AT_location, if present, means.  It doesn't say
anything like: "if a variable is available at a certain location most
of the time, you can emit a DW_AT_location that refers to that
location".  It says:

  Debugging information must provide consumers a way to find the
  location of program variables, determine the bounds of dynamic
  arrays and strings, and possibly to find the base address of a
  subroutine’s stack frame or the return address of a subroutine

See, it's not about debuggers, it's about consumers.  It's an
obligation, not really an option (that said, DW_AT_location *is*
optional).

  1. Location expressions, which are a language independent
     representation of addressing rules of arbitrary complexity built
     from DWARF expressions. They are sufficient for describing the
     location of any object as long as its lifetime is either static
     or the same as the lexical block that owns it, and it does not
     move throughout its lifetime.

  2. Location lists, which are used to describe objects that have a
     limited lifetime or change their location throughout their
     lifetime.

Nowhere does it state that, "if the compiler can't quite keep track of
the location of a variable, it can be sloppy and emit just whatever is
simpler or appears to make sense".

  Address ranges may overlap. When they do, they describe a situation
  in which an object exists simultaneously in more than one place. If
  all of the address ranges in a given location list do not
  collectively cover the entire range over which the object in
  question is defined, it is assumed that the object is not available
  for the portion of the range that is not covered.

So, it does make room for *some* sloppiness, after all.  That's what I
refer to as "incompleteness of debug information".  If we fail to keep
track of where an object is, it's sort-of ok (although undesirable) to
emit debug information that omits the location of the object in
certain program regions where it might be live.

However, it is not standard-compliant to emit information stating that
the object is available at certain locations if it is NOT really
there, or if it is available elsewhere, in addition to or instead of
the locations we've emitted.  That's what I refer to as "incorrectness
of debug information".

Incorrectness in the compiler output is always a bug.  No matter how
hard it is to implement, or how resource-intensive the solution is,
arguing that we've made a trade-off and decided to generate wrong
output for this case is a clever decision.

Incompleteness is a completely different issue.  This is where we
*can* afford to make trade-offs.  Just like we can decide to omit
certain optimizations, or to not carry them out to the greatest
possible extent, or to experiment with various different heuristics,
we could afford to emit incomplete debug information, it's "just" a
quality of implementation issue.  But not incorrect debug information,
that's just a bug.

> gcc's users are definitely calling for a faster compiler.  Are they
> calling for better debuggability of optimized code?

This is not just about debuggability, as I've tried to explain all the
way from the beginning of the discussion, maybe a couple of months
ago.  Debug information is not just about debuggers any more.  There
are good reasons why the Dwarf-3 standard says "consumers" rather than
"debuggers".  It's no longer just a matter of convenience, recompile
with -g0 if you want to debug it.  It's a matter of correctness, for
various monitoring tools now rely on this meta-information, and
rightfully so.

>> > We've fixed many many bugs and misoptimizations over the years due to
>> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
>> > we've made in the past.
>> 
>> That's a valid concern.  However, per this reasoning, we might as well
>> push every operand in our IL to separate representations, because
>> there have been so many bugs and misoptimizations over the years,
>> especially when the representation didn't make transformations
>> trivially correct.

> Please don't use strawman arguments.

It's not, really.  A reference to an object within a debug stmt or
insn is very much like any other operand, in that most optimizer
passes must keep them up to date.  If you argue for pushing them
outside the IL, why would any other operands be different?

> As I understand your proposal, it materializes variables which were
> otherwise omitted from the generated program.  It doesn't address the
> other issues with debugging optimized code, like bouncing around
> between program lines.  Is that correct?  What else does your proposal
> do?

All it does is to try to carry information about what value the user
is entitled to expect a variable to hold at each point in the program
throughout compilation.  Such that, even if the compiler doesn't
retain something that represents only that variable through to the end
of the compilation, we still have information about where, or at least
what, its value is, if it is available anywhere, such that we can
include this piece of data in the debug information.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:28             ` David Edelsohn
@ 2007-11-08  5:01               ` Mark Mitchell
  2007-11-08  5:15                 ` Alexandre Oliva
  2007-11-23  2:20                 ` Frank Ch. Eigler
  2007-11-08  5:44               ` Alexandre Oliva
  1 sibling, 2 replies; 189+ messages in thread
From: Mark Mitchell @ 2007-11-08  5:01 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

David Edelsohn wrote:
>>>>>> Mark Mitchell writes:
> 
> Mark> I think we all agree that providing better debugging of optimized code
> Mark> is a priori a good thing.  So, as I see it, this thread is focused on
> Mark> what internal representation we might use for that.
> 
> 	Yes, it is a good thing, but not at any price.  Regardless of the
> representation and implementation, there is a cost.  This discussion
> should not start with the premise that better debugging of optimized code
> is better at any cost.

I agree.  You're right to state this explicitly, but I'd implicitly
expected that we'd do cost/benefit analysis on this feature, as we would
any other.

> Mark> I'd like to start by
> Mark> capturing the functional changes that we want to make to GCC's debug
> Mark> output -- not the changes that we want in the debug experience, or
> Mark> changes that we need in GDB, but the changes in the generated DWARF.
> 
> 	Who is "we"?  What better debugging are GCC users demanding?  What
> debugging difficulties are they experiencing?  Who is that set of users?
> What functional changes would improve those cases?  What is the cost of
> those improvements in complexity, maintainability, compile time, object
> file size, GDB start-up time, etc.?

That's what I'm asking.  First and foremost, I want to know what,
concretely, Alexandre is trying to achieve, beyond "better debugging
info for optimized code".  Until we understand that, I don't see how we
can sensibly debate any methods of implementation, possible costs, etc.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:01           ` Mark Mitchell
@ 2007-11-08  0:28             ` David Edelsohn
  2007-11-08  5:01               ` Mark Mitchell
  2007-11-08  5:44               ` Alexandre Oliva
  2007-11-08  6:39             ` Alexandre Oliva
  2007-11-08 10:11             ` Richard Guenther
  2 siblings, 2 replies; 189+ messages in thread
From: David Edelsohn @ 2007-11-08  0:28 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

>>>>> Mark Mitchell writes:

Mark> I think we all agree that providing better debugging of optimized code
Mark> is a priori a good thing.  So, as I see it, this thread is focused on
Mark> what internal representation we might use for that.

	Yes, it is a good thing, but not at any price.  Regardless of the
representation and implementation, there is a cost.  This discussion
should not start with the premise that better debugging of optimized code
is better at any cost.

Mark> I'd like to start by
Mark> capturing the functional changes that we want to make to GCC's debug
Mark> output -- not the changes that we want in the debug experience, or
Mark> changes that we need in GDB, but the changes in the generated DWARF.

	Who is "we"?  What better debugging are GCC users demanding?  What
debugging difficulties are they experiencing?  Who is that set of users?
What functional changes would improve those cases?  What is the cost of
those improvements in complexity, maintainability, compile time, object
file size, GDB start-up time, etc.?

David

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 23:05         ` Ian Lance Taylor
  2007-11-07 23:28           ` Daniel Jacobowitz
@ 2007-11-08  0:01           ` Mark Mitchell
  2007-11-08  0:28             ` David Edelsohn
                               ` (2 more replies)
  2007-11-08  5:14           ` Alexandre Oliva
  2 siblings, 3 replies; 189+ messages in thread
From: Mark Mitchell @ 2007-11-08  0:01 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Ian Lance Taylor wrote:

> At one time, gcc actually provided better debugging of optimized code
> than any other compiler, though I don't know if that is still true.
> Optimized gcc code is still debuggable today.  I do it all the time.
> (For me poor support for debugging C++ is a much bigger issue, though
> I think that is an issue more with gdb than with gcc.)

I think we all agree that providing better debugging of optimized code
is a priori a good thing.  So, as I see it, this thread is focused on
what internal representation we might use for that.

I don't know that there's an abstract right answer to whether something
NOTE-like or something on the side is better.  There are problems with
both approaches.  We know the NOTE/DEBUG_INSN thing is going to break,
from experience; we also know the on-the-side thing is going to be hard
to maintain.

Alexandre has clearly thought about this a lot.  I'd like to start by
capturing the functional changes that we want to make to GCC's debug
output -- not the changes that we want in the debug experience, or
changes that we need in GDB, but the changes in the generated DWARF.

For example, I'm thinking of a series of function test cases.  Ignore
the substance of this example -- I'm making it up! -- I'm just trying to
capture the form.

===
int main () { int i; i = 3; return i; }

When optimizing, "i" is optimized away.  The debug info for "i" right
before the return statement says "i has been optimized away", but not
what its value is.  I think it should say that the value is "3".  To do
that, we need to emit a DW_Now_My_Value_is_3 tag for "i".
===

Now, how is whatever representation we pick going to get us that?  Is
the Oliva representation sufficient?  What about the Guenther/Matz
representation?  Independently of the representation, what algorithms
are we going to use to track whatever we need to track as the optimizers
remove, insert, duplicate, and reorder code?

Until we all know what we're trying to do, I don't see how we can make a
good decision about the representation.  Clearly, in the abstract, we
can represent data either on-the-side or in the instruction stream, but
until we know what output we want, I'm not sure how we can pick.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 23:05         ` Ian Lance Taylor
@ 2007-11-07 23:28           ` Daniel Jacobowitz
  2007-11-08  0:01           ` Mark Mitchell
  2007-11-08  5:14           ` Alexandre Oliva
  2 siblings, 0 replies; 189+ messages in thread
From: Daniel Jacobowitz @ 2007-11-07 23:28 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

On Wed, Nov 07, 2007 at 02:56:24PM -0800, Ian Lance Taylor wrote:
> At one time, gcc actually provided better debugging of optimized code
> than any other compiler, though I don't know if that is still true.
> Optimized gcc code is still debuggable today.  I do it all the time.
> (For me poor support for debugging C++ is a much bigger issue, though
> I think that is an issue more with gdb than with gcc.)

We're working on both of these on the GDB side.

> gcc's users are definitely calling for a faster compiler.  Are they
> calling for better debuggability of optimized code?

In my experience, yes.  CodeSourcery has work currently being
contributed to GDB that makes this quite a lot better; we also
occasionally have customers ask us about further improvements.  And I
file bugs about this from time to time, most of which are still open.

> As I understand your proposal, it materializes variables which were
> otherwise omitted from the generated program.  It doesn't address the
> other issues with debugging optimized code, like bouncing around
> between program lines.  Is that correct?  What else does your proposal
> do?

I've been thinking about the bouncing problem quite a bit lately.
I have some rough ideas, but I won't draw out this thread by sharing
:-)

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 21:43       ` Designs for better debug info in GCC Alexandre Oliva
@ 2007-11-07 23:05         ` Ian Lance Taylor
  2007-11-07 23:28           ` Daniel Jacobowitz
                             ` (2 more replies)
  0 siblings, 3 replies; 189+ messages in thread
From: Ian Lance Taylor @ 2007-11-07 23:05 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > Your approach gives you a point solution--did anything change
> > today--but it doesn't give us a maintenance solution--did anything
> > change over time?
> 
> Actually, no, your assessment is incorrect.

Ah, you're right.  I was wrong.

> > While I understand that you were given certain requirements, for the
> > purposes of mainline gcc we need to weigh costs and benefits.  How
> > many of our users are looking for precise debugging of optimized code,
> > and how much are they willing to pay for that?  Will our users overall
> > be better served by the 90% solution?
> 
> Does it really matter?  Do we compromise standards compliance (and so
> violently, while at that) in any aspect of the compiler?

What standards are you talking about?  I'm not aware of any standard
for debuggability of optimized code.

At one time, gcc actually provided better debugging of optimized code
than any other compiler, though I don't know if that is still true.
Optimized gcc code is still debuggable today.  I do it all the time.
(For me poor support for debugging C++ is a much bigger issue, though
I think that is an issue more with gdb than with gcc.)

gcc's users are definitely calling for a faster compiler.  Are they
calling for better debuggability of optimized code?

> >> 1. every single gimple assignment grows by one word,
> 
> I take this back, I'd been misled by richi's description.  It's really
> a side hashtable (which gets me worried about the re-emitted rather
> than modified gimple assignments in some locations), so it doesn't
> waste memory for gimple assignments that don't refer to user
> variables.
> 
> Unfortunately, this is not the case for rtx SETs, in this alternate
> approach.

Obviously the memory requirements of both approaches will need to be
measured.

> > We've fixed many many bugs and misoptimizations over the years due to
> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> > we've made in the past.
> 
> That's a valid concern.  However, per this reasoning, we might as well
> push every operand in our IL to separate representations, because
> there have been so many bugs and misoptimizations over the years,
> especially when the representation didn't make transformations
> trivially correct.

Please don't use strawman arguments.

As I understand your proposal, it materializes variables which were
otherwise omitted from the generated program.  It doesn't address the
other issues with debugging optimized code, like bouncing around
between program lines.  Is that correct?  What else does your proposal
do?

Ian

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 16:37     ` Ian Lance Taylor
@ 2007-11-07 21:43       ` Alexandre Oliva
  2007-11-07 23:05         ` Ian Lance Taylor
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-07 21:43 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:
>> I've pondered both alternatives, and decided that the latter was the
>> only testable path.  If we had a reliable debug information tester, we
>> could proceed incrementally with the first alternative; it might be
>> viable, but I don't really see that it would make things any simpler.

> It seems to me that this is a reason to write a reliable debug
> information tester.

Yep.  This is in the roadmap.  But it's not something that can be done
with GCC alone.  It's more of a "system" test, that will involve
debuggers or monitoring tools.  gdb, fryks, systemtap or some such
come to mind.

> Your approach gives you a point solution--did anything change
> today--but it doesn't give us a maintenance solution--did anything
> change over time?

Actually, no, your assessment is incorrect.  What I'm providing gives
us means to test, at any point in time, that enabling debug
information won't cause changes to the generated code.  So far, code
in the trunk only performs these comparisons within the GCC directory.
And, nevertheless, patches that correct obvious divergences have been
lingering for months.

I have recently-posted patches that introduce means to test other host
and target libraries.  I still haven't written testsuite code to
enable us to verify that debug information doesn't affect the
generated code for existing tests, or for additional tests introduced
for this very purpose, but this is in the roadmap.

Of course, none of this guarantees that debug information is accurate
or complete, it just helps ensure that -g won't change code
generation.

Testing more than this requires a tool that can not only interpret
debug information, but also the generated code, and verify that they
match.  The plan is to use the actual processors (or simulators) to
understand the generated code, and existing debug info consumers that
are debugging or monitoring tools to verify that debug info reflects
the behavior observed by the processor.

> While I understand that you were given certain requirements, for the
> purposes of mainline gcc we need to weigh costs and benefits.  How
> many of our users are looking for precise debugging of optimized code,
> and how much are they willing to pay for that?  Will our users overall
> be better served by the 90% solution?

Does it really matter?  Do we compromise standards compliance (and so
violently, while at that) in any aspect of the compiler?

What do we tell the growing number of users who don't regard debug
information as something useless except for occasional debugging?
That GCC cares about standards compliant except for debug information,
and they should write their own Free Software compiler if they want a
correct, standards-compliant compiler?

Do we accept taking shortcuts for optimizations or other code
generation issues when they cause incorrect code to be produced?  Why
should the mantra "must not sacrifice correctness" not applicable to
debug information standards in GCC?

At this point, debug information is so bad that it's a shame that most
builds are done with -O2 -g: we're just wasting CPU cycles and disk
space, contributing to accelerate the thermodynamic end of the
universe (nevermind the Kyoto protocol ;-), for information that is
severely incomplete at best, and terribly broken at worst.

Yes, generating correct code may take some more memory and some more
CPU cycles.  Have we ever made a decision to use less memory or CPU
cycles when the result is incorrect code?  Why should standardized
meta-information about the generated code be any different?

>> 1. every single gimple assignment grows by one word,

I take this back, I'd been misled by richi's description.  It's really
a side hashtable (which gets me worried about the re-emitted rather
than modified gimple assignments in some locations), so it doesn't
waste memory for gimple assignments that don't refer to user
variables.

Unfortunately, this is not the case for rtx SETs, in this alternate
approach.

> I don't know what the best approach is for improving debug
> information.

Your phrasing seems to indicate you're not concerned about fixing
debug information, but rather only about making it less broken.  With
different goals, we can come to very different solutions.

> But I think we've learned over time that explicit NOTEs
> in the RTL was not, in general, a good idea.  They complicate
> optimizations and they tend to get left behind when moving code.

Being left behind is actually a feature.  It's one of the reasons why
I chose this representation.  The debug annotation is not supposed to
move along with the SET, because it would then no longer model the
source code, it would rather be mangled, often beyond recognition,
because of implementation details.

As for complicating optimizations, I can have some sympathy for that.
Sure, generating code without preserving the information needed to map
source-level concepts to implementation-level concepts is easier.  But
generating broken code is not an option, it's a bug, so why should it
be an acceptable option just because the code we're talking about is
meta-information about the executable code?

> We've fixed many many bugs and misoptimizations over the years due to
> NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> we've made in the past.

That's a valid concern.  However, per this reasoning, we might as well
push every operand in our IL to separate representations, because
there have been so many bugs and misoptimizations over the years,
especially when the representation didn't make transformations
trivially correct.

However, the beauty of the representation I've chosen, that models the
annotations as a weak USE of an expression that evaluates to the value
of the variable at the point of assignment, most compiler passes
*will* keep them accurate, where any other representation would have
to be dealt with explicitly.  Sure, some passes need to compensate to
make sure these weak USEs don't affect codegen or optimizations, and a
few need special tweaks to keep notes accurate, to stop the safeguards
in place that would discard the information that went inaccurate.  But
these are few.  I believe strongly that this is the correct trade-off.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 17:25     ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Michael Matz
@ 2007-11-07 19:03       ` Alexandre Oliva
  2007-11-08 11:22         ` Michael Matz
  0 siblings, 1 reply; 189+ messages in thread
From: Alexandre Oliva @ 2007-11-07 19:03 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Michael Matz <matz@suse.de> wrote:

> On Wed, 7 Nov 2007, Alexandre Oliva wrote:

>> This will fail on a very fundamental level.  Consider code such as:
>> 
>> f(int x, int y) { int c; /* other vars */
>>  c = x; do_something_with(c, ...); // doesn't touch x or y
>>  c = y; do_something_else_with(c, ...); // doesn't touch x or y

>> This can (and should) be trivially optimized to:
>> 
>> f(int x, int y) { /* other vars */
>>  do_something_with(x, ...); // doesn't touch x or y
>>  do_something_else_with(y, ...); // doesn't touch x or y
>> 
>> But now, if I 'print c' in a debugger in the middle of one of the
>> do_something_*with expansions, what do I get?
>> 
>> With the approach I'm implementing, you should get x and y at the
>> appropriate points, even though variable c doesn't really exist any
>> more.
>> 
>> With your approach, what will you get?

> x and y at the appropriate part.  Whatever holds 'x' at a point (SSA name, 
> pseudo or mem) will also mention that it holds 'c'.  At a later point 
> whichever holds 'y' will also mention in holds 'c' .

I.e., there will be two parallel locations throughout the entire
function that hold the value of 'c'.  Something like:

f(int x /* but also c */, int y /* but also c */) { /* other vars */
 do_something_with(x, ...); // doesn't touch x or y
 do_something_else_with(y, ...); // doesn't touch x or y

Now, what will you get if you 'print c' in the debugger (or if any
other debug info evaluator needs to tell what the value of user
variable c is) at a point within do_something_with(c,...) or
do_something_else_with(c)?


Now consider that f is inlined into the following code:

int g(point2d p) {
  /* lots of code */
  f(p.x, p.y);
  /* more code */
  f(p.y, p.x);
  /* even more code */
}

g gets fully scalarized, so, before inlining, we have:

int g(point2d p) {
  int p$x = p.x, int p$y = p.y;
  /* lots of code */
  f(p$x, p$y);
  /* more code */
  f(p$y, p$x);
  /* even more code */
}

after inlining of f, we end up with:

int g(point2d p) {
  int p$x = p.x, int p$y = p.y;
  /* lots of code */
  { int f()::x.1 /* but also f()::c.1 */ = p$x, f()::y.1 /* but also f()::c.1 */ = p$y;
    { /* other vars */
      do_something_with(f()::x.1, ...); // doesn't touch x or y
      do_something_else_with(f()::y.1, ...); // doesn't touch x or y
  } }
  /* more code */
  { int f()::x.2 /* but also f()::c.2 */ = p$x, f()::y.2 /* but also f()::c.2 */ = p$y;
    { /* other vars */
      do_something_with(f()::x.2, ...); // doesn't touch x or y
      do_something_else_with(f()::y.2, ...); // doesn't touch x or y
  } }
  /* even more code */
}

then, we further optimize g and get:

int g(point2d p) {
  int p$x /* but also f()::x.1, f()::c.1, f()::y.2, f()::c.2 */ = p.x;
  int p$y /* but also f()::y.1, f()::c.1, f()::x.2, f()::c.2 */ = p.y;
  /* lots of code */
  { { /* other vars */
      do_something_with(p$x, ...); // doesn't touch x or y
      do_something_else_with(p$y, ...); // doesn't touch x or y
  } }
  /* more code */
  { { /* other vars */
      do_something_with(p$y, ...); // doesn't touch x or y
      do_something_else_with(p$x, ...); // doesn't touch x or y
  } }
  /* even more code */
}

and now, if you try to resolve the variable name 'c' to a location or
a value within any of the occurrences of do_something_*with(), what do
you get?  What ranges do you generate for each of the variables
involved?

>> There isn't any assignment to x or y you could hook your notes to.

> But there are _places_ for x and y.  Those places can and are also 
> associated with c.

This just goes to show that there's a fundamental mistake in the
mapping.  Instead of mapping user-level concepts to implementation
concepts, which is what debug information is meant to do, you're
mapping implementation details to user-level concepts.

Unfortunately, this mapping is not biunivocal.  The chosen
representation is fundamentally lossy.  It can't possibly get you
accurate debug information.  And the above is just an initial example
of the loss of information that will lead to *incorrect* debug
information, which is far worse than *incomplete* information.

>> Even if you were to set up side representations to model the additional 
>> variables that end up mapped to the incoming arguments, you'd have 'c' 
>> in both, and at the entry point.  How would you tell?

> I don't understand the question.

See the discussion about resolving 'c' above.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 189+ messages in thread

end of thread, other threads:[~2008-01-01 17:31 UTC | newest]

Thread overview: 189+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <or4pg114h5.fsf@oliva.athome.lsd.ic.unicamp.br.suse.lists.egcs>
     [not found] ` <orsl1xq4p3.fsf@oliva.athome.lsd.ic.unicamp.br.suse.lists.egcs>
     [not found]   ` <m3ir2t2u57.fsf@localhost.localdomain.suse.lists.egcs>
     [not found]     ` <m3tzmd1209.fsf@localhost.localdomain.suse.lists.egcs>
     [not found]       ` <m34pec1x4k.fsf@localhost.localdomain.suse.lists.egcs>
     [not found]         ` <orwsr8eyz5.fsf@oliva.athome.lsd.ic.unicamp.br.suse.lists.egcs>
     [not found]           ` <m3wsr76hov.fsf@localhost.localdomain.suse.lists.egcs>
     [not found]             ` <or8x3ng5ie.fsf@oliva.athome.lsd.ic.unicamp.br.suse.lists.egcs>
     [not found]               ` <m3r6hf4mw1.fsf@localhost.localdomain.suse.lists.egcs>
     [not found]                 ` <de8d50360712211607h77a0add5h794f6b5781b6491b@mail.gmail.com.suse.lists.egcs>
     [not found]                   ` <de8d50360712211609y643b8affpeb91048dedecbe60@mail.gmail.com.suse.lists.egcs>
     [not found]                     ` <7C283DB3-9716-4B2C-9721-D1F503B91CC4@apple.com.suse.lists.egcs>
     [not found]                       ` <m37ij64mwt.fsf@localhost.localdomain.suse.lists.egcs>
2007-12-23  0:52                         ` Designs for better debug info in GCC Andi Kleen
2007-12-23  1:32                           ` Daniel Jacobowitz
2007-12-23  1:36                             ` Andi Kleen
2007-12-23  5:55                               ` Daniel Jacobowitz
2007-11-26 18:36 J.C. Pizarro
2007-11-26 18:55 ` J.C. Pizarro
  -- strict thread matches above, loose matches on Subject: below --
2007-11-13  7:52 Steven Bosscher
2007-11-23 23:40 ` Alexandre Oliva
2007-11-24 10:27   ` Steven Bosscher
2007-11-24 15:08     ` Alexandre Oliva
2007-11-24 15:18       ` Richard Kenner
2007-11-24 20:21         ` Alexandre Oliva
2007-11-24 20:48           ` Bernd Schmidt
2007-11-24 22:01             ` Alexandre Oliva
2007-11-24 22:34               ` Richard Guenther
2007-11-25  1:21                 ` Alexandre Oliva
2007-11-25  2:36                   ` Richard Guenther
2007-11-26 11:37                     ` Alexandre Oliva
2007-11-26 12:38                       ` Richard Guenther
2007-11-26 18:10                         ` Alexandre Oliva
2007-11-25  0:20               ` Alexandre Oliva
2007-11-24 21:24           ` Richard Kenner
2007-11-24 21:55             ` Alexandre Oliva
2007-11-25  0:39           ` Robert Dewar
2007-12-15 20:32             ` Alexandre Oliva
2007-12-15 21:41               ` Robert Dewar
2007-11-24 16:07       ` Steven Bosscher
2007-11-24 20:11         ` Alexandre Oliva
2007-11-24 20:46           ` Richard Guenther
     [not found] <or4pg114h5.fsf@oliva.athome.lsd.ic.unicamp.br>
     [not found] ` <84fc9c000711050327x74845c78ya18a3329fcf9e4d2@mail.gmail.com>
2007-11-07  8:03   ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Alexandre Oliva
2007-11-07 16:37     ` Ian Lance Taylor
2007-11-07 21:43       ` Designs for better debug info in GCC Alexandre Oliva
2007-11-07 23:05         ` Ian Lance Taylor
2007-11-07 23:28           ` Daniel Jacobowitz
2007-11-08  0:01           ` Mark Mitchell
2007-11-08  0:28             ` David Edelsohn
2007-11-08  5:01               ` Mark Mitchell
2007-11-08  5:15                 ` Alexandre Oliva
2007-11-08 18:44                   ` Alexandre Oliva
2007-11-23  2:20                 ` Frank Ch. Eigler
2007-11-23  2:30                   ` Richard Guenther
2007-11-23 23:40                     ` Frank Ch. Eigler
2007-11-23 23:56                       ` Alexandre Oliva
2007-11-24 13:52                     ` Robert Dewar
2007-11-08  5:44               ` Alexandre Oliva
2007-11-08 18:37                 ` Alexandre Oliva
2007-11-08 20:51                 ` Andrew Pinski
2007-11-09  1:11                   ` Alexandre Oliva
2007-11-09 11:28                   ` Robert Dewar
2007-11-08  6:39             ` Alexandre Oliva
2007-11-08 19:13               ` Alexandre Oliva
2007-11-08 20:07               ` Mark Mitchell
2007-11-08 20:14                 ` David Daney
2007-11-08 20:41                   ` Mark Mitchell
2007-11-09  9:10                 ` Alexandre Oliva
2007-11-12 10:55                   ` Mark Mitchell
2007-11-12 18:22                     ` Alexandre Oliva
2007-11-12 20:08                       ` Joe Buck
2007-11-24 22:12                         ` Alexandre Oliva
2007-11-24 22:42                           ` Richard Kenner
2007-11-12 22:43                       ` Ian Lance Taylor
2007-11-24  1:44                         ` Alexandre Oliva
2007-11-13 10:50                       ` Mark Mitchell
2007-11-24  4:05                         ` Alexandre Oliva
2007-11-13 15:46                       ` Michael Matz
2007-11-23 23:56                         ` Alexandre Oliva
2007-11-26 18:19                           ` Michael Matz
2007-11-27  7:31                             ` Alexandre Oliva
2007-11-27 18:33                               ` Michael Matz
2007-11-27 20:37                                 ` Alexandre Oliva
2007-11-08 10:11             ` Richard Guenther
2007-11-08  5:14           ` Alexandre Oliva
2007-11-08 18:28             ` Alexandre Oliva
2007-11-08 19:23             ` Ian Lance Taylor
2007-11-09  0:08               ` Alexandre Oliva
2007-11-09  1:26                 ` Ian Lance Taylor
2007-11-09 12:22                   ` Robert Dewar
2007-11-12 12:59                     ` Mark Mitchell
2007-11-12 18:05                       ` Alexandre Oliva
2007-11-12 18:09                         ` Mark Mitchell
2007-11-24  4:31                           ` Alexandre Oliva
2007-11-26  6:10                             ` Mark Mitchell
2007-12-05 14:21                               ` Diego Novillo
2007-12-05 22:10                                 ` Joe Buck
2007-12-15 22:51                                 ` Alexandre Oliva
2007-12-16  6:27                                   ` Daniel Berlin
2007-12-16 12:47                                     ` Alexandre Oliva
2007-12-17  1:27                                       ` Daniel Berlin
2007-12-17  5:38                                         ` Joe Buck
2007-12-17  8:20                                           ` Geert Bosch
2007-12-18  1:24                                             ` Alexandre Oliva
2007-12-18  2:02                                               ` Joe Buck
2007-12-18  4:40                                                 ` Alexandre Oliva
2007-12-18  7:45                                                   ` Robert Dewar
2007-12-18  7:56                                                     ` Alexandre Oliva
2007-12-18 13:29                                                       ` Robert Dewar
2007-12-18 22:15                                                         ` Alexandre Oliva
2007-12-18  6:16                                               ` Robert Dewar
2007-12-18  8:09                                                 ` Alexandre Oliva
2007-12-17 18:33                                           ` Alexandre Oliva
2007-12-17 17:59                                         ` Alexandre Oliva
2007-12-17 18:02                                           ` Diego Novillo
2007-12-17 20:43                                             ` Alexandre Oliva
2007-12-17 21:20                                               ` Diego Novillo
2007-12-18  1:01                                                 ` Alexandre Oliva
2007-12-18  1:14                                                   ` Diego Novillo
2007-12-18  5:17                                                     ` Alexandre Oliva
2007-12-18  8:06                                                       ` Kai Henningsen
2007-12-18  8:39                                                       ` Alexandre Oliva
2007-12-18 13:15                                                         ` Diego Novillo
2007-12-18 15:06                                                           ` Alexandre Oliva
2007-12-18 16:22                                                         ` Ian Lance Taylor
2007-12-18 16:28                                                           ` Robert Dewar
2007-12-18 16:31                                                             ` Andrew Haley
2007-12-18 16:42                                                               ` Robert Dewar
2007-12-18 17:04                                                                 ` Andrew Haley
2007-12-18 17:12                                                               ` Richard Kenner
2007-12-18 16:32                                                             ` Daniel Jacobowitz
2007-12-18 16:44                                                               ` Robert Dewar
2007-12-19  4:30                                                           ` Alexandre Oliva
2007-12-19 18:41                                                             ` Ian Lance Taylor
2007-12-19 19:00                                                               ` Daniel Jacobowitz
2007-12-19 19:53                                                               ` Janis Johnson
2007-12-19 21:17                                                                 ` Ian Lance Taylor
2007-12-20  6:10                                                                   ` Alexandre Oliva
2007-12-20 16:52                                                                     ` Ian Lance Taylor
2007-12-20 21:38                                                                       ` Alexandre Oliva
2007-12-21  1:54                                                                         ` Ian Lance Taylor
     [not found]                                                                           ` <orprx0izhp.fsf@oliva.atho! me.lsd.ic.unicamp.br>
2007-12-21  2:11                                                                           ` Alexandre Oliva
2007-12-21  3:16                                                                             ` Robert Dewar
2007-12-21  5:10                                                                             ` Ian Lance Taylor
2007-12-21 18:12                                                                               ` Alexandre Oliva
2007-12-21 19:32                                                                                 ` Ian Lance Taylor
2007-12-21 22:46                                                                                   ` Alexandre Oliva
2007-12-22  0:07                                                                                     ` Ian Lance Taylor
2007-12-22  0:09                                                                                       ` Andrew Pinski
2007-12-22  3:16                                                                                         ` Andrew Pinski
2007-12-22 11:44                                                                                           ` Chris Lattner
2007-12-22 21:27                                                                                             ` Ian Lance Taylor
2007-12-23 17:40                                                                                       ` Frank Ch. Eigler
2007-12-22  7:38                                                                                     ` Robert Dewar
2007-12-22 13:33                                                                                     ` Andrew Haley
2007-12-22 17:11                                                                                       ` Robert Dewar
2007-12-31 19:39                                                                           ` Alexandre Oliva
2007-12-20  8:00                                                               ` Alexandre Oliva
2007-12-20  8:01                                                                 ` Alexandre Oliva
2007-12-20 17:02                                                                 ` Ian Lance Taylor
2007-12-31 16:55                                                           ` Richard Guenther
     [not found]                                                             ` <y0my7baigdf.fsf@ton.toronto.redhat.com>
2008-01-01 17:31                                                               ` Richard Guenther
2007-12-18 23:19                                                         ` Daniel Berlin
2007-12-19  6:07                                                           ` Alexandre Oliva
2007-12-19  6:18                                                             ` Daniel Berlin
2007-12-19 16:01                                                               ` Daniel Berlin
2007-12-19 16:29                                                                 ` Andrew MacLeod
2007-12-19 19:25                                                                   ` Daniel Berlin
2007-12-19 20:00                                                                 ` Andrew MacLeod
2007-12-19 20:40                                                                   ` Daniel Berlin
2007-12-19 20:00                                                                 ` Alexandre Oliva
2007-12-19 21:11                                                                   ` Daniel Berlin
2007-12-20  5:16                                                                     ` Alexandre Oliva
2007-12-20 16:44                                                                       ` Ian Lance Taylor
2007-12-20 20:42                                                                         ` Alexandre Oliva
2007-12-19 20:03                                                               ` Alexandre Oliva
2007-12-18 23:31                                                         ` Daniel Berlin
2007-12-19  4:35                                                           ` Alexandre Oliva
2007-12-19 16:12                                                             ` Daniel Berlin
2007-12-19 19:13                                                               ` Alexandre Oliva
2007-12-19 20:11                                                                 ` Daniel Jacobowitz
2007-12-31 14:45                                               ` Richard Guenther
2007-12-16 22:20                                   ` Mark Mitchell
2007-11-09 12:31                   ` Seongbae Park (박성배, 朴成培)
2007-11-09 12:42                     ` Robert Dewar
2007-11-07 17:25     ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Michael Matz
2007-11-07 19:03       ` Designs for better debug info in GCC Alexandre Oliva
2007-11-08 11:22         ` Michael Matz
2007-11-08 15:13           ` Robert Dewar
2007-11-08 16:11             ` H.J. Lu
2007-11-08 16:32             ` Michael Matz
2007-11-08 18:18               ` Alexandre Oliva
2007-11-09 14:23                 ` Michael Matz
2007-11-12 18:17                   ` Alexandre Oliva
2007-11-13 14:22                     ` Michael Matz
2007-11-24  4:58                       ` Alexandre Oliva
2007-11-26 18:10                         ` Michael Matz
2007-11-27  3:48                           ` Alexandre Oliva
2007-11-08 17:48             ` Alexandre Oliva
2007-11-09  2:09               ` Robert Dewar
2007-11-12 17:52                 ` Alexandre Oliva
2007-11-09  2:13               ` Joe Buck
2007-11-09 18:40                 ` Daniel Jacobowitz
2007-11-09 19:02                   ` Robert Dewar
2007-11-08 16:37           ` Alexandre Oliva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).