[vta] don't let debug insns get in the way of simple vect reduction

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [vta] don't let debug insns get in the way of simple vect reduction
@ 2007-11-05  8:28 Alexandre Oliva
  2007-11-05 11:27 ` Richard Guenther
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-05  8:28 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 184 bytes --]

libgfortran had some vectorization cases that wouldn't be applied in
the presence of debug stmts referencing the same variables.  Fixed
with the patch below, to be installed shortly.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: vta-vectorizer-reduction.patch --]
[-- Type: text/x-patch, Size: 1097 bytes --]

for  gcc/ChangeLog.vta
from  Alexandre Oliva  <aoliva@redhat.com>

	* tree-vectorizer.c (vect_is_simple_reduction): Disregard uses
	in debug insns.

Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c.orig	2007-09-17 15:31:48.000000000 -0300
+++ gcc/tree-vectorizer.c	2007-11-03 01:44:55.000000000 -0200
@@ -2199,6 +2199,8 @@ vect_is_simple_reduction (loop_vec_info 
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
     {
       tree use_stmt = USE_STMT (use_p);
+      if (IS_DEBUG_STMT (use_stmt))
+	continue;
       if (flow_bb_inside_loop_p (loop, bb_for_stmt (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
 	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
@@ -2241,6 +2243,8 @@ vect_is_simple_reduction (loop_vec_info 
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
     {
       tree use_stmt = USE_STMT (use_p);
+      if (IS_DEBUG_STMT (use_stmt))
+	continue;
       if (flow_bb_inside_loop_p (loop, bb_for_stmt (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
 	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))

[-- Attachment #3: Type: text/plain, Size: 249 bytes --]


-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [vta] don't let debug insns get in the way of simple vect reduction
  2007-11-05  8:28 [vta] don't let debug insns get in the way of simple vect reduction Alexandre Oliva
@ 2007-11-05 11:27 ` Richard Guenther
  2007-11-07  7:52   ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Richard Guenther @ 2007-11-05 11:27 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc-patches

On 11/5/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> libgfortran had some vectorization cases that wouldn't be applied in
> the presence of debug stmts referencing the same variables.  Fixed
> with the patch below, to be installed shortly.

(I'm just picking a random patch of this kind for this mail)

I see you have to touch lots of places to teach them about debug insns.  This
hints at a design error (as I believe), namely that you encode the
debug information
in the IL.  I believe in the long long thread earlier this year people
suggested to
use a on-the-side representation for the extra information.
Unfortunately nobody
has (apperantly) looked at the vta code yet and nobody made comments so far.

With the different approach I and Matz started (and to which we didn't yet spend
enough time to get debug information actually output - but I hope
we'll get there
soon), on the tree level the extra information is stored in a bitmap
per SSA_NAME
(where necessary).  On the RTL level we chose to extend the SET insn by a
bitmap argument (refering to those bitmaps).  With that approach we only need
to touch places where debug information is lost (we just at those
places propagate
this information to the bitmaps).

I realize that the GCC development model does not really support development
in the open or steering of technical approaches.  Probably due to lack of time
and interest.  Still I'd ask people to actually look at both approaches and give
advice to us implementors.  (And IMHO, debug insns in the IL are the wrong way
to go and I would be very unhappy seeing this code get in GCC - no personal
offense intended)

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction)
  2007-11-05 11:27 ` Richard Guenther
@ 2007-11-07  7:52   ` Alexandre Oliva
  2007-11-07 16:16     ` Ian Lance Taylor
  2007-11-07 17:20     ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Michael Matz
  0 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-07  7:52 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, gcc

On Nov  5, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:

> On 11/5/07, Alexandre Oliva <aoliva@redhat.com> wrote:
>> libgfortran had some vectorization cases that wouldn't be applied in
>> the presence of debug stmts referencing the same variables.  Fixed
>> with the patch below, to be installed shortly.

> (I'm just picking a random patch of this kind for this mail)

> I see you have to touch lots of places to teach them about debug
> insns.

Yes.  There's no escaping for that.  There are two options:

- keep them separate, and modify the code that manipulates the IL so
  as to update them as needed, or

- keep them in the IL, and modify the code to disregard them as
  needed.

I've pondered both alternatives, and decided that the latter was the
only testable path.  If we had a reliable debug information tester, we
could proceed incrementally with the first alternative; it might be
viable, but I don't really see that it would make things any simpler.
If anything, you'd need to introduce a lot of new code to manipulate
the separate representation, unless this separate representation was
very similar in structure to the existing representation, and in any
case you'd have to add code all over the place to keep it up to date.

With the approach I've taken, there's something that's testable: as
long as there are codegen changes, something needs to be fixed.
Besides, the information is encoded in a form that is automatically
handled by most compilation passes, so updates for pretty much all
transformations are already in place, without any additional code.

The only additional code is what's needed to detect missing updates
and to ensure the debug notes don't interfere with code generation.
I've managed to implement these such that they don't take any
additional memory unless you actually request the additional debug
information, and such that they almost never bring any compile-time
performance hit.  That's one of the reasons that guided the placement
of DEBUG_INSN just next to the other INSNs: such that INSN_P is
optimized to a range test (as it was before, but now with a different
boundary), and INSN_P && !DEBUG_INSN_P is optimized to the original
range test.  In most other places, it's just yet another entry in a
switch table, so again it's zero-cost in terms of performance.  And at
points where it would be more costly, there's a test guarding the
complex processing to tell whether the feature is enabled that
requires that additional processing.  Hard to beat that.

> I believe in the long long thread earlier this year people suggested
> to use a on-the-side representation for the extra information.

Yes.  And I thought I'd already made it clear why this on-the-side
representation won't get you as far as I needed to go.  Basically, it
leads to a situation in which you can't possibly represent correct
debug information, or you end up adding annotations to the instruction
flow anyway, which means you have to deal with them or give up correct
debug information.

Since one of the requirements I was given was that debug information
be correct (as in, if I don't know where a variable is, debug
information must say so, rather than say the variable is somewhere it
really isn't), going without additional annotations just wouldn't
work.  Therefore, I figured I'd have to bite the bullet and take the
longer path, even though I don't dispute that it is possible to
achieve many improvements with the simpler approach.

However, eventually the simpler approach runs into a wall, and I
couldn't afford to get to that point and then backtrack to the
complete approach, because the wall couldn't be surpassed.

> With the different approach I and Matz started (and to which we
> didn't yet spend enough time to get debug information actually
> output - but I hope we'll get there soon), on the tree level the
> extra information is stored in a bitmap per SSA_NAME (where
> necessary).

This will fail on a very fundamental level.  Consider code such as:

f(int x, int y) {
  int c;
  /* other vars */

  c = x;
  do_something_with(c, ...); // doesn't touch x or y

  c = y;
  do_something_else_with(c, ...); // doesn't touch x or y
}

where do_something_*with are actually complex computations, be that
explicit code, be it macros or inlined functions.

This can (and should) be trivially optimized to:

f(int x, int y) {
  /* other vars */

  do_something_with(x, ...); // doesn't touch x or y

  do_something_else_with(y, ...); // doesn't touch x or y
}

But now, if I 'print c' in a debugger in the middle of one of the
do_something_*with expansions, what do I get?

With the approach I'm implementing, you should get x and y at the
appropriate points, even though variable c doesn't really exist any
more.

With your approach, what will you get?

There isn't any assignment to x or y you could hook your notes to.

Even if you were to set up side representations to model the
additional variables that end up mapped to the incoming arguments,
you'd have 'c' in both, and at the entry point.  How would you tell?

Sure, you could hand-wave that both assignments were effectively moved
to the entry point of the function, and that only the last one
prevails.  I guess this wouldn't be wrong per se.  But would it be the
best we could do for the users?

Say, if do_something_with is a loop, and you're monitoring some
condition that depends on c and other variables at a point in the
middle of an iteration, would you be happy if that didn't work because
the compiler told you 'c' evaluated to 'y' rather than 'x'?

Do you realize that the only way you could possibly make the above
work as expected by the user is by adding notes at the point of the
assignment?  And that, once you add such notes, they'd have to map
back to the value the variable was supposed to gain at that point, and
that thus you must keep them accurate as further optimizations mess
with that value.  E.g., when f is inlined into another function, its x
and y are certainly going to disappear, like they do now, because of
copy propagation, and then you'd have to update the notes for the
original assignments to c that you've already discarded.

And, well, if you're going to have to add and update notes anyway, why
not just bite the bullet and use them all over?

Maybe to save on memory?  I guess this could be a good reason for
that.  We could indeed add a bitmap to gimple assignments indicating
which user variables, if any, they modify.  And then, if we move them
out of place, we can drop the bitmap in the new location, and replace
the original location with a note.

But this has a number of problems:

1. every single gimple assignment grows by one word, to hold the
pointer to the bitmap.  But most gimple assignments are to temporaries
variables, and these don't need annotations.  Creating different kinds
of gimple assignments would make things quite complex, so I'd rather
not go down that path.  So, you'd use a lot more memory, even when the
feature is not in use at all, and you might likely use more memory
than adding separate notes for user assignments like I do.  And this
doesn't even count the actual bitmaps.

2. this can't possibly handle assignments to parts of large variables.
My current implementation only tracks gimple regs, but there's no
reason why it can't be easily extended to handle component refs of
largish variables that end up in registers rather than memory, and for
other SRA-able variables, even when they aren't fully scalarized.
How'd you handle this with a look-aside bitmap?  I guess you could
generate uids or even decls for other expressions, but it seems to me
that this would waste even more space, and get things even more
complicated, no?

3. the marked assignment doesn't (can't possibly) denote the correct
point at which all variables in its bitmap were assigned to.  It only
marks the earliest such point for all the variables.  Overlapping
ranges and incorrect debug info are a consequence of this.

> On the RTL level we chose to extend the SET insn by a
> bitmap argument (refering to those bitmaps).

Same problems here.  The memory problem becomes even more critical:
even for compilation without debug info, you grow by 33% the most
common RTX element that, again, most often assigns to temporaries, and
this is without counting the space for the actual bitmaps.

> I realize that the GCC development model does not really support
> development in the open or steering of technical approaches.
> Probably due to lack of time and interest.  Still I'd ask people to
> actually look at both approaches and give advice to us implementors.

+1

> (And IMHO, debug insns in the IL are the wrong way to go and I would
> be very unhappy seeing this code get in GCC - no personal offense
> intended)

No offense taken.  I hope I've shown why it can't be helped to add
debug annotations to the IL if we're to have any hope of getting
correct debug information throughout compiler transformations, and
that the decision I've made to accomplish this is one that stands a
chance of being reasonably validated automatically.

I'm somewhat sick of seeing debug information being treated as a
second-class citizen, just because it doesn't cause the main program
to crash or to produce incorrect results.

The more people use monitoring tools that are based on
standards-defined debug information, the more important it is that
such information be reliable and accurate.  Otherwise, even if the
program is compiled into code that runs perfectly, the systems built
using this program, its debug information and the monitoring tools
that use this information will fail just as severely as if we had
generated incorrect code for the program in the first place.

I believe that adding debug annotations to the instruction stream, as
if they were references to the appropriate values, but subject to the
condition that debug information must not alter the generated code, is
the most viable approach to get to reliable and accurate debug
information.

That said, I do realize it's a long path, and certainly much longer
than other approaches that could get you say 80% there, but no further
than that and, worse, without any way for the compiler to tell when it
hasn't got there so as to inform the user about it.

I just hope having something that gets us 80% there won't become an
impediment for a solution that gets us 95% there, while setting
foundations that will enable us to get to 100% over time.

Debug information correctness ought to be treated no different from
code generation correctness.

Which is not to say that debug information needs to be absolutely
perfect and complete.  Having an optimization pass that improves code
while keeping its behavior correct is a good thing, even if the
optimization could be further improved, just like having debug
information that reports the location of a variable as unknown at some
points, and accurately at all others, even if the debug info generator
could be further improved so as to find out where the variable is at
some of the points where it lost track of them.

But having the debug info generator report an incorrect location for a
variable is as bad as having the optimization pass change the meaning
of the program.

I believe that keeping debug annotations as part of the IL are the
simplest and most effective way to ensure that optimization passes do
deal with them, rather than disregarding them as something
unimportant.  Besides, the way I've designed them, most of the passes
deal with them in the very same fashion as they deal with all other
expressions; it's just that some of them need minor tweaks to avoid
codegen changes when debug info is being generated.

We could avoid the risk of codegen changes with -g by always
generating the annotations, even when not generating debug info, and
discarding them at the end, when they could no longer affect the
generated code.  But then, if there are codegen changes (most often
missed optimizations), they will be present both with -g and -g0, so
this is undesirable.  Nevertheless, it's an option I've considered
adding, and that would be quite easy and helpful to introduce it for
field tests of this new debug info infrastructure, such that users
aren't left out in the cold if their code works with -O2 -g but not
with -O2, or vice-versa.  I haven't added it yet, for I've had no need
for it, but it's a matter of minutes.  It shouldn't be default,
though, for it would use more memory than needed for -g0.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction)
  2007-11-07  7:52   ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Alexandre Oliva
@ 2007-11-07 16:16     ` Ian Lance Taylor
  2007-11-07 19:11       ` Designs for better debug info in GCC Alexandre Oliva
  2007-11-07 17:20     ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Michael Matz
  1 sibling, 1 reply; 150+ messages in thread
From: Ian Lance Taylor @ 2007-11-07 16:16 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> I've pondered both alternatives, and decided that the latter was the
> only testable path.  If we had a reliable debug information tester, we
> could proceed incrementally with the first alternative; it might be
> viable, but I don't really see that it would make things any simpler.

It seems to me that this is a reason to write a reliable debug
information tester.  Your approach gives you a point solution--did
anything change today--but it doesn't give us a maintenance
solution--did anything change over time?

> Since one of the requirements I was given was that debug information
> be correct (as in, if I don't know where a variable is, debug
> information must say so, rather than say the variable is somewhere it
> really isn't), going without additional annotations just wouldn't
> work.  Therefore, I figured I'd have to bite the bullet and take the
> longer path, even though I don't dispute that it is possible to
> achieve many improvements with the simpler approach.

While I understand that you were given certain requirements, for the
purposes of mainline gcc we need to weigh costs and benefits.  How
many of our users are looking for precise debugging of optimized code,
and how much are they willing to pay for that?  Will our users overall
be better served by the 90% solution?

> 1. every single gimple assignment grows by one word, to hold the
> pointer to the bitmap.  But most gimple assignments are to temporaries
> variables, and these don't need annotations.  Creating different kinds
> of gimple assignments would make things quite complex, so I'd rather
> not go down that path.  So, you'd use a lot more memory, even when the
> feature is not in use at all, and you might likely use more memory
> than adding separate notes for user assignments like I do.  And this
> doesn't even count the actual bitmaps.

I expect that most compilations are with -g, so I think we need to
compare memory usage between the two approaches with -g.

I don't know what the best approach is for improving debug
information.  But I think we've learned over time that explicit NOTEs
in the RTL was not, in general, a good idea.  They complicate
optimizations and they tend to get left behind when moving code.
We've fixed many many bugs and misoptimizations over the years due to
NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
we've made in the past.

Ian

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 16:16     ` Ian Lance Taylor
@ 2007-11-07 19:11       ` Alexandre Oliva
  2007-11-07 22:57         ` Ian Lance Taylor
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-07 19:11 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:
>> I've pondered both alternatives, and decided that the latter was the
>> only testable path.  If we had a reliable debug information tester, we
>> could proceed incrementally with the first alternative; it might be
>> viable, but I don't really see that it would make things any simpler.

> It seems to me that this is a reason to write a reliable debug
> information tester.

Yep.  This is in the roadmap.  But it's not something that can be done
with GCC alone.  It's more of a "system" test, that will involve
debuggers or monitoring tools.  gdb, fryks, systemtap or some such
come to mind.

> Your approach gives you a point solution--did anything change
> today--but it doesn't give us a maintenance solution--did anything
> change over time?

Actually, no, your assessment is incorrect.  What I'm providing gives
us means to test, at any point in time, that enabling debug
information won't cause changes to the generated code.  So far, code
in the trunk only performs these comparisons within the GCC directory.
And, nevertheless, patches that correct obvious divergences have been
lingering for months.

I have recently-posted patches that introduce means to test other host
and target libraries.  I still haven't written testsuite code to
enable us to verify that debug information doesn't affect the
generated code for existing tests, or for additional tests introduced
for this very purpose, but this is in the roadmap.

Of course, none of this guarantees that debug information is accurate
or complete, it just helps ensure that -g won't change code
generation.

Testing more than this requires a tool that can not only interpret
debug information, but also the generated code, and verify that they
match.  The plan is to use the actual processors (or simulators) to
understand the generated code, and existing debug info consumers that
are debugging or monitoring tools to verify that debug info reflects
the behavior observed by the processor.

> While I understand that you were given certain requirements, for the
> purposes of mainline gcc we need to weigh costs and benefits.  How
> many of our users are looking for precise debugging of optimized code,
> and how much are they willing to pay for that?  Will our users overall
> be better served by the 90% solution?

Does it really matter?  Do we compromise standards compliance (and so
violently, while at that) in any aspect of the compiler?

What do we tell the growing number of users who don't regard debug
information as something useless except for occasional debugging?
That GCC cares about standards compliant except for debug information,
and they should write their own Free Software compiler if they want a
correct, standards-compliant compiler?

Do we accept taking shortcuts for optimizations or other code
generation issues when they cause incorrect code to be produced?  Why
should the mantra "must not sacrifice correctness" not applicable to
debug information standards in GCC?

At this point, debug information is so bad that it's a shame that most
builds are done with -O2 -g: we're just wasting CPU cycles and disk
space, contributing to accelerate the thermodynamic end of the
universe (nevermind the Kyoto protocol ;-), for information that is
severely incomplete at best, and terribly broken at worst.

Yes, generating correct code may take some more memory and some more
CPU cycles.  Have we ever made a decision to use less memory or CPU
cycles when the result is incorrect code?  Why should standardized
meta-information about the generated code be any different?

>> 1. every single gimple assignment grows by one word,

I take this back, I'd been misled by richi's description.  It's really
a side hashtable (which gets me worried about the re-emitted rather
than modified gimple assignments in some locations), so it doesn't
waste memory for gimple assignments that don't refer to user
variables.

Unfortunately, this is not the case for rtx SETs, in this alternate
approach.

> I don't know what the best approach is for improving debug
> information.

Your phrasing seems to indicate you're not concerned about fixing
debug information, but rather only about making it less broken.  With
different goals, we can come to very different solutions.

> But I think we've learned over time that explicit NOTEs
> in the RTL was not, in general, a good idea.  They complicate
> optimizations and they tend to get left behind when moving code.

Being left behind is actually a feature.  It's one of the reasons why
I chose this representation.  The debug annotation is not supposed to
move along with the SET, because it would then no longer model the
source code, it would rather be mangled, often beyond recognition,
because of implementation details.

As for complicating optimizations, I can have some sympathy for that.
Sure, generating code without preserving the information needed to map
source-level concepts to implementation-level concepts is easier.  But
generating broken code is not an option, it's a bug, so why should it
be an acceptable option just because the code we're talking about is
meta-information about the executable code?

> We've fixed many many bugs and misoptimizations over the years due to
> NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> we've made in the past.

That's a valid concern.  However, per this reasoning, we might as well
push every operand in our IL to separate representations, because
there have been so many bugs and misoptimizations over the years,
especially when the representation didn't make transformations
trivially correct.

However, the beauty of the representation I've chosen, that models the
annotations as a weak USE of an expression that evaluates to the value
of the variable at the point of assignment, most compiler passes
*will* keep them accurate, where any other representation would have
to be dealt with explicitly.  Sure, some passes need to compensate to
make sure these weak USEs don't affect codegen or optimizations, and a
few need special tweaks to keep notes accurate, to stop the safeguards
in place that would discard the information that went inaccurate.  But
these are few.  I believe strongly that this is the correct trade-off.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 19:11       ` Designs for better debug info in GCC Alexandre Oliva
@ 2007-11-07 22:57         ` Ian Lance Taylor
  2007-11-07 23:05           ` Daniel Jacobowitz
                             ` (3 more replies)
  0 siblings, 4 replies; 150+ messages in thread
From: Ian Lance Taylor @ 2007-11-07 22:57 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > Your approach gives you a point solution--did anything change
> > today--but it doesn't give us a maintenance solution--did anything
> > change over time?
> 
> Actually, no, your assessment is incorrect.

Ah, you're right.  I was wrong.

> > While I understand that you were given certain requirements, for the
> > purposes of mainline gcc we need to weigh costs and benefits.  How
> > many of our users are looking for precise debugging of optimized code,
> > and how much are they willing to pay for that?  Will our users overall
> > be better served by the 90% solution?
> 
> Does it really matter?  Do we compromise standards compliance (and so
> violently, while at that) in any aspect of the compiler?

What standards are you talking about?  I'm not aware of any standard
for debuggability of optimized code.

At one time, gcc actually provided better debugging of optimized code
than any other compiler, though I don't know if that is still true.
Optimized gcc code is still debuggable today.  I do it all the time.
(For me poor support for debugging C++ is a much bigger issue, though
I think that is an issue more with gdb than with gcc.)

gcc's users are definitely calling for a faster compiler.  Are they
calling for better debuggability of optimized code?

> >> 1. every single gimple assignment grows by one word,
> 
> I take this back, I'd been misled by richi's description.  It's really
> a side hashtable (which gets me worried about the re-emitted rather
> than modified gimple assignments in some locations), so it doesn't
> waste memory for gimple assignments that don't refer to user
> variables.
> 
> Unfortunately, this is not the case for rtx SETs, in this alternate
> approach.

Obviously the memory requirements of both approaches will need to be
measured.

> > We've fixed many many bugs and misoptimizations over the years due to
> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> > we've made in the past.
> 
> That's a valid concern.  However, per this reasoning, we might as well
> push every operand in our IL to separate representations, because
> there have been so many bugs and misoptimizations over the years,
> especially when the representation didn't make transformations
> trivially correct.

Please don't use strawman arguments.

As I understand your proposal, it materializes variables which were
otherwise omitted from the generated program.  It doesn't address the
other issues with debugging optimized code, like bouncing around
between program lines.  Is that correct?  What else does your proposal
do?

Ian

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 22:57         ` Ian Lance Taylor
@ 2007-11-07 23:05           ` Daniel Jacobowitz
  2007-11-08  0:00           ` Mark Mitchell
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 150+ messages in thread
From: Daniel Jacobowitz @ 2007-11-07 23:05 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

On Wed, Nov 07, 2007 at 02:56:24PM -0800, Ian Lance Taylor wrote:
> At one time, gcc actually provided better debugging of optimized code
> than any other compiler, though I don't know if that is still true.
> Optimized gcc code is still debuggable today.  I do it all the time.
> (For me poor support for debugging C++ is a much bigger issue, though
> I think that is an issue more with gdb than with gcc.)

We're working on both of these on the GDB side.

> gcc's users are definitely calling for a faster compiler.  Are they
> calling for better debuggability of optimized code?

In my experience, yes.  CodeSourcery has work currently being
contributed to GDB that makes this quite a lot better; we also
occasionally have customers ask us about further improvements.  And I
file bugs about this from time to time, most of which are still open.

> As I understand your proposal, it materializes variables which were
> otherwise omitted from the generated program.  It doesn't address the
> other issues with debugging optimized code, like bouncing around
> between program lines.  Is that correct?  What else does your proposal
> do?

I've been thinking about the bouncing problem quite a bit lately.
I have some rough ideas, but I won't draw out this thread by sharing
:-)

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 22:57         ` Ian Lance Taylor
  2007-11-07 23:05           ` Daniel Jacobowitz
@ 2007-11-08  0:00           ` Mark Mitchell
  2007-11-08  0:15             ` David Edelsohn
                               ` (2 more replies)
  2007-11-08  5:01           ` Alexandre Oliva
  2007-11-08  8:58           ` Paolo Bonzini
  3 siblings, 3 replies; 150+ messages in thread
From: Mark Mitchell @ 2007-11-08  0:00 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Ian Lance Taylor wrote:

> At one time, gcc actually provided better debugging of optimized code
> than any other compiler, though I don't know if that is still true.
> Optimized gcc code is still debuggable today.  I do it all the time.
> (For me poor support for debugging C++ is a much bigger issue, though
> I think that is an issue more with gdb than with gcc.)

I think we all agree that providing better debugging of optimized code
is a priori a good thing.  So, as I see it, this thread is focused on
what internal representation we might use for that.

I don't know that there's an abstract right answer to whether something
NOTE-like or something on the side is better.  There are problems with
both approaches.  We know the NOTE/DEBUG_INSN thing is going to break,
from experience; we also know the on-the-side thing is going to be hard
to maintain.

Alexandre has clearly thought about this a lot.  I'd like to start by
capturing the functional changes that we want to make to GCC's debug
output -- not the changes that we want in the debug experience, or
changes that we need in GDB, but the changes in the generated DWARF.

For example, I'm thinking of a series of function test cases.  Ignore
the substance of this example -- I'm making it up! -- I'm just trying to
capture the form.

===
int main () { int i; i = 3; return i; }

When optimizing, "i" is optimized away.  The debug info for "i" right
before the return statement says "i has been optimized away", but not
what its value is.  I think it should say that the value is "3".  To do
that, we need to emit a DW_Now_My_Value_is_3 tag for "i".
===

Now, how is whatever representation we pick going to get us that?  Is
the Oliva representation sufficient?  What about the Guenther/Matz
representation?  Independently of the representation, what algorithms
are we going to use to track whatever we need to track as the optimizers
remove, insert, duplicate, and reorder code?

Until we all know what we're trying to do, I don't see how we can make a
good decision about the representation.  Clearly, in the abstract, we
can represent data either on-the-side or in the instruction stream, but
until we know what output we want, I'm not sure how we can pick.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:00           ` Mark Mitchell
@ 2007-11-08  0:15             ` David Edelsohn
  2007-11-08  0:35               ` Mark Mitchell
  2007-11-08  5:15               ` Alexandre Oliva
  2007-11-08  5:44             ` Alexandre Oliva
  2007-11-08  9:54             ` Richard Guenther
  2 siblings, 2 replies; 150+ messages in thread
From: David Edelsohn @ 2007-11-08  0:15 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

>>>>> Mark Mitchell writes:

Mark> I think we all agree that providing better debugging of optimized code
Mark> is a priori a good thing.  So, as I see it, this thread is focused on
Mark> what internal representation we might use for that.

	Yes, it is a good thing, but not at any price.  Regardless of the
representation and implementation, there is a cost.  This discussion
should not start with the premise that better debugging of optimized code
is better at any cost.

Mark> I'd like to start by
Mark> capturing the functional changes that we want to make to GCC's debug
Mark> output -- not the changes that we want in the debug experience, or
Mark> changes that we need in GDB, but the changes in the generated DWARF.

	Who is "we"?  What better debugging are GCC users demanding?  What
debugging difficulties are they experiencing?  Who is that set of users?
What functional changes would improve those cases?  What is the cost of
those improvements in complexity, maintainability, compile time, object
file size, GDB start-up time, etc.?

David

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:15             ` David Edelsohn
@ 2007-11-08  0:35               ` Mark Mitchell
  2007-11-08  5:14                 ` Alexandre Oliva
  2007-11-22 23:07                 ` Frank Ch. Eigler
  2007-11-08  5:15               ` Alexandre Oliva
  1 sibling, 2 replies; 150+ messages in thread
From: Mark Mitchell @ 2007-11-08  0:35 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

David Edelsohn wrote:
>>>>>> Mark Mitchell writes:
> 
> Mark> I think we all agree that providing better debugging of optimized code
> Mark> is a priori a good thing.  So, as I see it, this thread is focused on
> Mark> what internal representation we might use for that.
> 
> 	Yes, it is a good thing, but not at any price.  Regardless of the
> representation and implementation, there is a cost.  This discussion
> should not start with the premise that better debugging of optimized code
> is better at any cost.

I agree.  You're right to state this explicitly, but I'd implicitly
expected that we'd do cost/benefit analysis on this feature, as we would
any other.

> Mark> I'd like to start by
> Mark> capturing the functional changes that we want to make to GCC's debug
> Mark> output -- not the changes that we want in the debug experience, or
> Mark> changes that we need in GDB, but the changes in the generated DWARF.
> 
> 	Who is "we"?  What better debugging are GCC users demanding?  What
> debugging difficulties are they experiencing?  Who is that set of users?
> What functional changes would improve those cases?  What is the cost of
> those improvements in complexity, maintainability, compile time, object
> file size, GDB start-up time, etc.?

That's what I'm asking.  First and foremost, I want to know what,
concretely, Alexandre is trying to achieve, beyond "better debugging
info for optimized code".  Until we understand that, I don't see how we
can sensibly debate any methods of implementation, possible costs, etc.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:35               ` Mark Mitchell
@ 2007-11-08  5:14                 ` Alexandre Oliva
  2007-11-08 18:28                   ` Alexandre Oliva
  2007-11-22 23:07                 ` Frank Ch. Eigler
  1 sibling, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08  5:14 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: David Edelsohn, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> First and foremost, I want to know what, concretely, Alexandre is
> trying to achieve, beyond "better debugging info for optimized
> code".

I'm not really going for "better".  I'm going for "correct" first,
while making room for "better", and hopefully already getting better,
in the process.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:14                 ` Alexandre Oliva
@ 2007-11-08 18:28                   ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:28 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: David Edelsohn, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> First and foremost, I want to know what, concretely, Alexandre is
> trying to achieve, beyond "better debugging info for optimized
> code".

I'm not really going for "better".  I'm going for "correct" first,
while making room for "better", and hopefully already getting better,
in the process.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:35               ` Mark Mitchell
  2007-11-08  5:14                 ` Alexandre Oliva
@ 2007-11-22 23:07                 ` Frank Ch. Eigler
  2007-11-22 23:13                   ` Richard Guenther
  1 sibling, 1 reply; 150+ messages in thread
From: Frank Ch. Eigler @ 2007-11-22 23:07 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: David Edelsohn, Ian Lance Taylor, Alexandre Oliva,
	Richard Guenther, gcc-patches, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> [...]
>> 	Who is "we"?  What better debugging are GCC users demanding?  What
>> debugging difficulties are they experiencing?  Who is that set of users?
>> What functional changes would improve those cases?  What is the cost of
>> those improvements in complexity, maintainability, compile time, object
>> file size, GDB start-up time, etc.?
>
> That's what I'm asking.  First and foremost, I want to know what,
> concretely, Alexandre is trying to achieve, beyond "better debugging
> info for optimized code".  Until we understand that, I don't see how we
> can sensibly debate any methods of implementation, possible costs, etc.

It may be asking to belabour the obvious.  GCC users do not want to
have to compile with "-O0 -g" just to debug during development (or
during crash analysis *after deployment*!).  Developers would like to
be able to place breakpoints anywhere by reference to the source code,
and would like to access any variables logically present there.
Developers will accept that optimized code will by its nature make
some of these fuzzy, but incorrect data must be and incomplete data
should be minimized.

That they put up with the status quo at all is a historical artifact
of being told so long not to expect any better.

- FChE

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-22 23:07                 ` Frank Ch. Eigler
@ 2007-11-22 23:13                   ` Richard Guenther
  2007-11-23 20:53                     ` Frank Ch. Eigler
  2007-11-24 15:02                     ` Robert Dewar
  0 siblings, 2 replies; 150+ messages in thread
From: Richard Guenther @ 2007-11-22 23:13 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Mark Mitchell, David Edelsohn, Ian Lance Taylor, Alexandre Oliva,
	gcc-patches, gcc

On Nov 22, 2007 8:22 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
> Mark Mitchell <mark@codesourcery.com> writes:
>
> > [...]
> >>      Who is "we"?  What better debugging are GCC users demanding?  What
> >> debugging difficulties are they experiencing?  Who is that set of users?
> >> What functional changes would improve those cases?  What is the cost of
> >> those improvements in complexity, maintainability, compile time, object
> >> file size, GDB start-up time, etc.?
> >
> > That's what I'm asking.  First and foremost, I want to know what,
> > concretely, Alexandre is trying to achieve, beyond "better debugging
> > info for optimized code".  Until we understand that, I don't see how we
> > can sensibly debate any methods of implementation, possible costs, etc.
>
> It may be asking to belabour the obvious.  GCC users do not want to
> have to compile with "-O0 -g" just to debug during development (or
> during crash analysis *after deployment*!).  Developers would like to
> be able to place breakpoints anywhere by reference to the source code,
> and would like to access any variables logically present there.
> Developers will accept that optimized code will by its nature make
> some of these fuzzy, but incorrect data must be and incomplete data
> should be minimized.
>
> That they put up with the status quo at all is a historical artifact
> of being told so long not to expect any better.

As it is (without serious overhead) impossible to do both, you either have
to live with possibly incorrect but elaborate or incomplete but correct
debug information for optimized code.  Choose one ;)

What we (Matz and myself) are trying to do is provide elaborate debug
information with the chance of wrong (I'd call it superflous, or extra)
debug information.  Alexandre seems to aim at the world-domination
solution (with the serious overhead in terms of implementation and
verboseness).

Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-22 23:13                   ` Richard Guenther
@ 2007-11-23 20:53                     ` Frank Ch. Eigler
  2007-11-24  1:53                       ` Alexandre Oliva
  2007-11-24 15:02                     ` Robert Dewar
  1 sibling, 1 reply; 150+ messages in thread
From: Frank Ch. Eigler @ 2007-11-23 20:53 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mark Mitchell, David Edelsohn, Ian Lance Taylor, Alexandre Oliva,
	gcc-patches, gcc

Hi -

(BTW, sorry for reopening this old thread if people are sick & tired of it.)

> > Mark Mitchell <mark@codesourcery.com> writes:
> > > [...]
> > > That's what I'm asking.  First and foremost, I want to know what,
> > > concretely, Alexandre is trying to achieve, beyond "better debugging
> > > info for optimized code".  [...]
> >
> > It may be asking to belabour the obvious.  GCC users do not want to
> > have to compile with "-O0 -g" just to debug during development [...]
> > Developers will accept that optimized code will by its nature make
> > some of these fuzzy, but incorrect data must be and incomplete data
> > should be minimized. [...]
> 
> As it is (without serious overhead) impossible to do both, you either have
> to live with possibly incorrect but elaborate or incomplete but correct
> debug information for optimized code.  Choose one ;)

I did say "minimized", not "eliminated".  It needs to be good enough
that a semi-knowledgable person or a dumb but heuristic-laden program
that processes debugging info can nevertheless extract reliable
information.

> What we (Matz and myself) are trying to do is provide elaborate
> debug information with the chance of wrong (I'd call it superflous,
> or extra) debug information.

(I will need to reread the thread to see what this extra information
can do in terms of misleading users or tools, such as giving incorrect
variable values/locations.  I'd appreciate a link if you have one
handy.)

> Alexandre seems to aim at the world-domination solution (with the
> serious overhead in terms of implementation and verboseness).

That ("world-domination") seems an overly unkind characterization - we
could simply say he's trying an exhaustive, straining-to-be-correct
solution.

It seems to me that we will shortly see the actual impacts of both of
these approaches in terms of compiler complexity as well as any
improvements in data quality.  It does not seem to me like there is
substantial disagreement over the ideal of correct and to a lesser
extent complete information, so let's see the implementations and then
compare.

- FChE

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-23 20:53                     ` Frank Ch. Eigler
@ 2007-11-24  1:53                       ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24  1:53 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Richard Guenther, Mark Mitchell, David Edelsohn,
	Ian Lance Taylor, gcc-patches, gcc

On Nov 23, 2007, "Frank Ch. Eigler" <fche@redhat.com> wrote:

>> > It may be asking to belabour the obvious.  GCC users do not want to
>> > have to compile with "-O0 -g" just to debug during development [...]
>> > Developers will accept that optimized code will by its nature make
>> > some of these fuzzy, but incorrect data must be and incomplete data
                                                    ^avoided?
>> > should be minimized. [...]

Richard Guenther replied:

>> As it is (without serious overhead) impossible to do both,

Is it?  

>> you either have to live with possibly incorrect but elaborate or
>> incomplete but correct debug information for optimized code.

You have proof of that?

>> Choose one ;)

As in, command line options?  Or are we going to make a choice and
impose that on all our users, as if it fit all?

Frank followed up:

>> What we (Matz and myself) are trying to do is provide elaborate
>> debug information with the chance of wrong (I'd call it superflous,
>> or extra) debug information.

It's not just superfluous or extra.  Your approach actively regresses
debug information for some cases, while it's arguable whether it
actually improves others.

> That ("world-domination") seems an overly unkind characterization

+1

It would be like myself pointing out that, for every problem, there's
a solution that's simple, elegant and wrong ;-)

Given the problems with sequential live ranges being made parallel and
conflicting, values subject to conditions being made inconditional,
and overwritten values remaining noted as live, I wouldn't think the
characterization above would be unfair, but I'd managed to resist it
so far.

I don't think pulling the blanket such that it covers your face while
it uncovers your feet is the way to go.  It's even worse, because
then, with your face covered, you won't even see that your feet are
uncovered ;-)

Regressions are bad, and this proposed approach guarantees
regressions, while it might fix a few trivial cases.  This is not
enough for me.  I'm not just hacking up a quick fix for a
poorly-worded problem.  I'm doing actual software engineering here,
trying to get GCC to comply with existing debug info standards.

> It does not seem to me like there is
> substantial disagreement over the ideal of correct

Unfortunately, that is indeed up for debate.  There are even those who
dispute that there's any correctness issue involved.  Most other
approaches are actually overreaching in completeness, trading
correctness for more information, as if more unreliable information
was any better than no information at all.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-22 23:13                   ` Richard Guenther
  2007-11-23 20:53                     ` Frank Ch. Eigler
@ 2007-11-24 15:02                     ` Robert Dewar
  1 sibling, 0 replies; 150+ messages in thread
From: Robert Dewar @ 2007-11-24 15:02 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Frank Ch. Eigler, Mark Mitchell, David Edelsohn,
	Ian Lance Taylor, Alexandre Oliva, gcc-patches, gcc

Richard Guenther wrote:
> On Nov 22, 2007 8:22 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>> Mark Mitchell <mark@codesourcery.com> writes:
>>
>>> [...]
>>>>      Who is "we"?  What better debugging are GCC users demanding?  What
>>>> debugging difficulties are they experiencing?  Who is that set of users?
>>>> What functional changes would improve those cases?  What is the cost of
>>>> those improvements in complexity, maintainability, compile time, object
>>>> file size, GDB start-up time, etc.?
>>> That's what I'm asking.  First and foremost, I want to know what,
>>> concretely, Alexandre is trying to achieve, beyond "better debugging
>>> info for optimized code".  Until we understand that, I don't see how we
>>> can sensibly debate any methods of implementation, possible costs, etc.
>> It may be asking to belabour the obvious.  GCC users do not want to
>> have to compile with "-O0 -g" just to debug during development (or
>> during crash analysis *after deployment*!).  Developers would like to
>> be able to place breakpoints anywhere by reference to the source code,
>> and would like to access any variables logically present there.
>> Developers will accept that optimized code will by its nature make
>> some of these fuzzy, but incorrect data must be and incomplete data
>> should be minimized.
>>
>> That they put up with the status quo at all is a historical artifact
>> of being told so long not to expect any better.
> 
> As it is (without serious overhead) impossible to do both, you either have
> to live with possibly incorrect but elaborate or incomplete but correct
> debug information for optimized code.  Choose one ;)

I don't think you can use the phrase "serious overhead" without rather
extensive statistics. To me, -O1 should be reasonably debuggable, as it
always was back in earlier gcc days. It is nice that -O1 is somewhat
more efficient than it was in those earlier days, but not nice enough
to warrant a severe regression in debug capabilities. To me anyone who
is so concerned about performance as to really appreciate this
difference will likely be using -O2 anyway.

The trouble is that we have set as the criterion for -O1 all the
optimizations that are reasonably cheap in compile time. I think
it is essential that there be an optimization level that means

All the optimizations that are reasonably cheap to implement
and that do not impact debugging information significantly
(except I would say it is OK to impact the ability to change
variables).

For me it would be fine for -O1 to mean that but if there is a
a consensus that an extra level (-Od or whatever) is worth while
that's fine by me.

I find working on the Ada front end that it used to be that I could
always use -O1, OK for debugging, and OK for performance. Now I have
to switch between -O0 for debugging, and then I use -O2 for performance
(for me, the debuggability of -O1 and -O2 are equivalent in this
context, both hopeless, so I might as well use -O2). So I no longer
use -O1 at all (the extra compile time for -O2 is negligible on my
fast note book).

> 
> What we (Matz and myself) are trying to do is provide elaborate debug
> information with the chance of wrong (I'd call it superflous, or extra)
> debug information.  Alexandre seems to aim at the world-domination
> solution (with the serious overhead in terms of implementation and
> verboseness).
> 
> Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:15             ` David Edelsohn
  2007-11-08  0:35               ` Mark Mitchell
@ 2007-11-08  5:15               ` Alexandre Oliva
  2007-11-08 18:18                 ` Alexandre Oliva
  2007-11-08 19:46                 ` Andrew Pinski
  1 sibling, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08  5:15 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, David Edelsohn <dje@watson.ibm.com> wrote:

> 	Who is "we"?  What better debugging are GCC users demanding?  What
> debugging difficulties are they experiencing?

I, for one, miss the arguments of inlined functions, a lot.

The reason for that is that arguments are currently optimized away to
boot.  Even if they weren't, since they're initialized with a trivial
copy, at least their initial value (quite often preserved throughout
compilation) would be gone to boot.

On top of that, we currently regard arguments and variables of
non-inlined functions as special, and we prevent a number of
optimizations with them, in order to be able to generate slightly
better debug information for them.  (As for arguments and variables of
inlined functions, we happily drop them on the floor right away.)
This is not only inconsistent, it's also harmful, because we're
trading performance and compile-time memory for slightly better but
still incorrect, incomplete and unreliable debug information.

> Who is that set of users?

I'm personally getting numerous requests for debug information
correctness and better completeness from debug info consumers such as
gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
ones never declared as inline, and its eagerness to corrupt the
meta-information associated with them, causes these tools to
malfunction in very many situations.  And it's all GCC's fault, for
generating code that is not standards-compliant in the
meta-information sections of its output.

> What functional changes would improve those cases?  What is the cost of
> those improvements in complexity, maintainability, compile time, object
> file size, GDB start-up time, etc.?

Before I spend hours describing the little I can foresee about this,
how much of this really matters, given that it's mostly a matter of
correctness, rather than mere trade offs?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:15               ` Alexandre Oliva
@ 2007-11-08 18:18                 ` Alexandre Oliva
  2007-11-08 19:46                 ` Andrew Pinski
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:18 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, David Edelsohn <dje@watson.ibm.com> wrote:

> 	Who is "we"?  What better debugging are GCC users demanding?  What
> debugging difficulties are they experiencing?

I, for one, miss the arguments of inlined functions, a lot.

The reason for that is that arguments are currently optimized away to
boot.  Even if they weren't, since they're initialized with a trivial
copy, at least their initial value (quite often preserved throughout
compilation) would be gone to boot.

On top of that, we currently regard arguments and variables of
non-inlined functions as special, and we prevent a number of
optimizations with them, in order to be able to generate slightly
better debug information for them.  (As for arguments and variables of
inlined functions, we happily drop them on the floor right away.)
This is not only inconsistent, it's also harmful, because we're
trading performance and compile-time memory for slightly better but
still incorrect, incomplete and unreliable debug information.

> Who is that set of users?

I'm personally getting numerous requests for debug information
correctness and better completeness from debug info consumers such as
gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
ones never declared as inline, and its eagerness to corrupt the
meta-information associated with them, causes these tools to
malfunction in very many situations.  And it's all GCC's fault, for
generating code that is not standards-compliant in the
meta-information sections of its output.

> What functional changes would improve those cases?  What is the cost of
> those improvements in complexity, maintainability, compile time, object
> file size, GDB start-up time, etc.?

Before I spend hours describing the little I can foresee about this,
how much of this really matters, given that it's mostly a matter of
correctness, rather than mere trade offs?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:15               ` Alexandre Oliva
  2007-11-08 18:18                 ` Alexandre Oliva
@ 2007-11-08 19:46                 ` Andrew Pinski
  2007-11-08 20:39                   ` Alexandre Oliva
  2007-11-09  8:39                   ` Robert Dewar
  1 sibling, 2 replies; 150+ messages in thread
From: Andrew Pinski @ 2007-11-08 19:46 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: David Edelsohn, Mark Mitchell, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

First off I would like to say I did not want to reply but I guess I am
going to because of some false information spreading around about what
GCC as a compiler is.

On 11/7/07, Alexandre Oliva <aoliva@redhat.com> wrote:

> I'm personally getting numerous requests for debug information
> correctness and better completeness from debug info consumers such as
> gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
> ones never declared as inline, and its eagerness to corrupt the
> meta-information associated with them, causes these tools to
> malfunction in very many situations.  And it's all GCC's fault, for
> generating code that is not standards-compliant in the
> meta-information sections of its output.

I have to ask, do you want an optimizing compiler or one which
generates full debugging information????  Because there are trade off
here really.  The reason behind the extra inlining is because it
improves code generation.  I don't know about you but in some area of
coding, they need the extra speed/size reductions that inlining of non
user marked functions.  I have plenty of code which needs the speed
help that the extra inling helps (remember some developers don't want
to change the code that much to have the optimizing compiler do its
work).

Remember dwarf3 is not really a standards about meta-information, it
just mentions how it represented if it exists.

-- Pinski

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 19:46                 ` Andrew Pinski
@ 2007-11-08 20:39                   ` Alexandre Oliva
  2007-11-09  8:39                   ` Robert Dewar
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 20:39 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: David Edelsohn, Mark Mitchell, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, "Andrew Pinski" <pinskia@gmail.com> wrote:

> On 11/7/07, Alexandre Oliva <aoliva@redhat.com> wrote:

>> I'm personally getting numerous requests for debug information
>> correctness and better completeness from debug info consumers such as
>> gdb, frysk and systemtap.  GCC's eagerness to inline functions, even
>> ones never declared as inline, and its eagerness to corrupt the
>> meta-information associated with them, causes these tools to
>> malfunction in very many situations.  And it's all GCC's fault, for
>> generating code that is not standards-compliant in the
>> meta-information sections of its output.

> I have to ask, do you want an optimizing compiler or one which
> generates full debugging information????

I want both.  That's the whole point of this project I'm in.

> Because there are trade off here really.

For a superficial look at the problem, they might look like
trade-offs.  But the assumption that it's impossible to get both is
incorrect.  It takes work, but it's not impossible.

> The reason behind the extra inlining is because it
> improves code generation.

I don't see how you got the impression that I might be arguing against
the inlining, as it looks like you did.  I'm not.  I'm arguing against
the corruption of meta-information associated with them.  That's just
laziness on our part.

> Remember dwarf3 is not really a standards about meta-information, it
> just mentions how it represented if it exists.

That's what meta-information is.  One of the problems is that we often
fail to represent information that does exist.  A more serious problem
is that we often represent such information incorrectly, making it
seem like things that don't exist do, and that things are at different
locations from those in which they actually are.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 19:46                 ` Andrew Pinski
  2007-11-08 20:39                   ` Alexandre Oliva
@ 2007-11-09  8:39                   ` Robert Dewar
  1 sibling, 0 replies; 150+ messages in thread
From: Robert Dewar @ 2007-11-09  8:39 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Alexandre Oliva, David Edelsohn, Mark Mitchell, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

Andrew Pinski wrote:

> I have to ask, do you want an optimizing compiler or one which
> generates full debugging information???? 

Both!

I would like modes which do the following

a) reasonable amount of optimization that does not intefere too much
with debugging. The old GCC 3 -O1 was a close approximation to this
(certainly a closer approximation than the current -O1).

b) all possible optimziations even if debuggability is compromised

That's a perfectly reasonable request, and we used to be pretty
close to having it, but now -O1 has really degraded as a solution
to a). Yes, it's somewhat more efficient, but I suspect that the
small minority of those interested in the last bit of performance
are using -O2 anyway, so I doubt many people get much benefit from
the improved performance of -O1 code. On the other hand lots of
people are negatively affected by the degrading of debugging in
-O1 mode.
  Because there are trade off
> here really.  The reason behind the extra inlining is because it
> improves code generation.  I don't know about you but in some area of
> coding, they need the extra speed/size reductions that inlining of non
> user marked functions.  I have plenty of code which needs the speed
> help that the extra inling helps (remember some developers don't want
> to change the code that much to have the optimizing compiler do its
> work).

Obviously you don't want a lot of inlining unless the debugger can
handle inlining properly if your interest is in being able to debug!
> 
> Remember dwarf3 is not really a standards about meta-information, it
> just mentions how it represented if it exists.

But consumers want a debugger that works, without having to take the
hit of huge volumes of code at -O0
> 
> -- Pinski


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:00           ` Mark Mitchell
  2007-11-08  0:15             ` David Edelsohn
@ 2007-11-08  5:44             ` Alexandre Oliva
  2007-11-08 18:37               ` Alexandre Oliva
  2007-11-08 19:13               ` Mark Mitchell
  2007-11-08  9:54             ` Richard Guenther
  2 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08  5:44 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Until we all know what we're trying to do

Here's what I am trying to do:

1. Ensure that, for every user variable for which we emit debug
information, the information is correct, i.e., if it says the value of
a variable at a certain instruction is at certain locations, or is a
known constant, then the variable must not be at any other location at
that point, and the locations or values must match reasonable
expectations based on source code inspection.

2. Defining "reasonable expectations" is tricky, for code reordering
typical of optimization can make room for numerous surprises.  I don't
have a precise definition for this yet, but very clearly to me saying
that a variable holds a value that it couldn't possibly hold (e.g.,
because it is only assigned that value in a code path that is
knowingly not taken) is a very clear indication that something is
amiss.  The general guiding rule is, if we aren't sure the information
is correct (or we're sure it isn't), we shouldn't pretend that it is.

3. Try to ensure that, if the value of a variable is a known constant
at a certain point in the program, this information is present in
debug information.

4. Try to ensure that, if the value of a variable is available at any
location at a certain point in the program, this information is
present in debug information.

5. Stop missing optimizations for the sake of improving debug
information.

6. Avoid using additional memory and CPU cycles that would be needed
only for debug information when compiling without generating debug
information

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:44             ` Alexandre Oliva
@ 2007-11-08 18:37               ` Alexandre Oliva
  2007-11-08 19:13               ` Mark Mitchell
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:37 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Until we all know what we're trying to do

Here's what I am trying to do:

1. Ensure that, for every user variable for which we emit debug
information, the information is correct, i.e., if it says the value of
a variable at a certain instruction is at certain locations, or is a
known constant, then the variable must not be at any other location at
that point, and the locations or values must match reasonable
expectations based on source code inspection.

2. Defining "reasonable expectations" is tricky, for code reordering
typical of optimization can make room for numerous surprises.  I don't
have a precise definition for this yet, but very clearly to me saying
that a variable holds a value that it couldn't possibly hold (e.g.,
because it is only assigned that value in a code path that is
knowingly not taken) is a very clear indication that something is
amiss.  The general guiding rule is, if we aren't sure the information
is correct (or we're sure it isn't), we shouldn't pretend that it is.

3. Try to ensure that, if the value of a variable is a known constant
at a certain point in the program, this information is present in
debug information.

4. Try to ensure that, if the value of a variable is available at any
location at a certain point in the program, this information is
present in debug information.

5. Stop missing optimizations for the sake of improving debug
information.

6. Avoid using additional memory and CPU cycles that would be needed
only for debug information when compiling without generating debug
information

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:44             ` Alexandre Oliva
  2007-11-08 18:37               ` Alexandre Oliva
@ 2007-11-08 19:13               ` Mark Mitchell
  2007-11-08 19:13                 ` David Daney
  2007-11-09  2:09                 ` Alexandre Oliva
  1 sibling, 2 replies; 150+ messages in thread
From: Mark Mitchell @ 2007-11-08 19:13 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:
> On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
> 
>> Until we all know what we're trying to do
> 
> Here's what I am trying to do:

I think these are laudable goals, but you didn't really provide the
information I wanted.  In particular, what I'd like to drill down from
goals (like "ensure that, for every user variable for which we emit
debug information, the information is correct") to concrete problems.

I think that most of the goals boil down to making sure that, at any
point in the program, the debug information for a variable meets the
following criteria:

(a) if the variable has not been optimized away, gives the location
where that variable's current value can be found, or
(b) if the variable has been optimized away, and the value is not a
constant, says that the value is not available, or
(c) if the variable has been optimized away, but is a constant, says
what the constant value is

Is that right?  (Note "at any point" above; it might be that the
variable is present in r0 for a while, and then optimized away, and then
present at *0xdeadbeef for a while, and then has the constant value 7.)

If so, how are you proposing to accomplish that?  It's easy enough to
design a representation (whether in the instruction stream, or on the
side) that says "from instruction A to instruction B, the value is in
this location".  So, I don't think we need to worry about that just yet.

But, how are we going to track this information?  Algorithmically, what
needs to change in the compiler to maintain this state?

For example, we need some way for an optimization pass to tell the rest
of the compiler that a variable was completely eliminated.  (Perhaps,
for example, because all uses of the variable were eliminated.)  So,
maybe we need a debug_var_eliminated API.  Then, every pass that blows
away variables can call this function, which can make whatever notations
on the VAR_DECL are required.  I'm not claiming that's the right
approach, but I'd like to understand the plan at that kind of level.

What changes will need to be made throughout the compiler to keep track
of the state?

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 19:13               ` Mark Mitchell
@ 2007-11-08 19:13                 ` David Daney
  2007-11-08 19:17                   ` Mark Mitchell
  2007-11-09  2:09                 ` Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: David Daney @ 2007-11-08 19:13 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Alexandre Oliva, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Mark Mitchell wrote:
> Alexandre Oliva wrote:
>> On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
>>
>>> Until we all know what we're trying to do
>> Here's what I am trying to do:
> 
> I think these are laudable goals, but you didn't really provide the
> information I wanted.  In particular, what I'd like to drill down from
> goals (like "ensure that, for every user variable for which we emit
> debug information, the information is correct") to concrete problems.
> 
> I think that most of the goals boil down to making sure that, at any
> point in the program, the debug information for a variable meets the
> following criteria:
> 
> (a) if the variable has not been optimized away, gives the location
> where that variable's current value can be found, or
> (b) if the variable has been optimized away, and the value is not a
> constant, says that the value is not available, or

Perhaps if the variable has been optimized away *but* it is possible to 
calculate its value by examining the state of the program, then we can 
emit the expression needed to calculate its value in the debugging 
information as well.

I may be missing something, but it seems that may be part of Alexandre's 
plan as well.


> (c) if the variable has been optimized away, but is a constant, says
> what the constant value is

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 19:13                 ` David Daney
@ 2007-11-08 19:17                   ` Mark Mitchell
  0 siblings, 0 replies; 150+ messages in thread
From: Mark Mitchell @ 2007-11-08 19:17 UTC (permalink / raw)
  To: David Daney
  Cc: Alexandre Oliva, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

David Daney wrote:

>> (a) if the variable has not been optimized away, gives the location
>> where that variable's current value can be found, or
>> (b) if the variable has been optimized away, and the value is not a
>> constant, says that the value is not available, or
> 
> Perhaps if the variable has been optimized away *but* it is possible to
> calculate its value by examining the state of the program, then we can
> emit the expression needed to calculate its value in the debugging
> information as well.

Yes, that's a good addition.  To be clear, I'm not trying to set the
goals here; I'm just trying to make sure we have a clear set of
objectives and a plan to get there.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 19:13               ` Mark Mitchell
  2007-11-08 19:13                 ` David Daney
@ 2007-11-09  2:09                 ` Alexandre Oliva
  2007-11-12  4:49                   ` Mark Mitchell
  1 sibling, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-09  2:09 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Alexandre Oliva wrote:
>> On Nov  7, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
>> 
>>> Until we all know what we're trying to do
>> 
>> Here's what I am trying to do:

> I think these are laudable goals, but you didn't really provide the
> information I wanted.

Oh, you didn't want goals.  Design and implementation plans more
detailed than
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html, I suppose.
Ok, let's see...

1. introduce, early in compilation (when entering SSA), annotations
that map user-level variables whose location may vary throughout their
lifetime to implementation-level variables or expressions at every
point of assignment and PHI joins.

2. keep those annotations accurate throughout compilation, without
letting them interfere with optimizations, but making sure they are
kept up-to-date or marked untrackable.

3. in var-tracking, starting from the expressions in the annotations
and their equivalent expressions computed with a dataflow-globalized
cse analysis, emit traditional var-tracking var_location notes for all
variables.  For variables that didn't start out as gimple regs, the
current debug info behavior should be preserved.

> I think that most of the goals boil down to making sure that, at any
> point in the program, the debug information for a variable meets the
> following criteria:

> (a) if the variable has not been optimized away, gives the location
> where that variable's current value can be found, or
> (b) if the variable has been optimized away, and the value is not a
> constant, says that the value is not available, or
> (c) if the variable has been optimized away, but is a constant, says
> what the constant value is

yes, except that instead of constant and constant value, I'd put it as
'computable expression from other live values'.

And I'd say "locations" rather than just "location".

> But, how are we going to track this information?  Algorithmically, what
> needs to change in the compiler to maintain this state?

Most optimizations passes must already update uses of gimple or pseudo
regs they modify, so these will be taken care of automatically (which
is why I chose this representation).  Optimization passes that move
assignments to an earlier point in the program don't need any
modification.  Those that move them to a later point will often move
them past their debug notes.  This means the debug notes need
updating, but it also means that, in the absence of fixes, the debug
notes most likely will stand in the way of the transformation, so
testing that the debug notes don't change optimization behavior ought
to catch these.

Transformations that copy or move blocks will retain the annotations,
so this should "just work".  Transformations that delete blocks might
be a bit of a problem, if they delete important debug annotations.  So
far, the only case I've noticed of such behavior is in ifcvt, in which
an if-then-assign-else-assign set of blocks is turned into a single
if-then-else assignments.  This particular case is covered by the PHI
statement that is placed in the entry point of the block that joins
the then and the else.

On architectures that support longer blocks with conditional-execution
of arbitrary instructions (arm, ia64), I'm not sure how to handle the
debug notes.  It seems to me that, with the current design, the
variable may be regarded as untrackable after the first conditional
assignment within the combined blocks, but at the join point there
will be a the debug annotation corresponding to the PHI join that will
take care of getting a correct location for the variable again.

I don't have plans in place for any other kind of situation, but it
appears to me that the notion of using assignments and joins as fixed
points is solid, and I'm pretty sure any surprises can be overcome.

Of course software pipelining and other kinds of loop transformations
will yield debug information that's not exactly easy to grasp, but
this would be true of any representation.  When the compiler messes
too much with the code, there's very little one can do to make
execution resemble that of sequential execution.

I'm also thinking debug info consumers would probably enjoy some means
to tell a point at which all side effects present in a certain source
line have been completed.  But these are mostly orthogonal issues, so
I won't delve into them right now.

> For example, we need some way for an optimization pass to tell the
> rest of the compiler that a variable was completely eliminated.

In the design I'm proposing, there's no need for anything explicit
like this.  This would require global information, which is
undesirable, especially for optimizers that operate locally.  What
they'd have to do when they throw away a value that a debug annotation
relies on is to replace that value with something equivalent, if they
can, or to mark that particular annotation as untrackable.  Then, if
all annotations associated with a variable are untrackable, we know it
was completely optimized away.  But if any assignments remained
trackable, we can (and should, even though we don't have to) still
issue debug information for that.

Besides, optimization passes don't deal with user variables.  They
deal with implementation user variables, that initially resemble user
variables, but that quickly diverge.  Optimization passes shouldn't
have to care about user variables.  In my proposal, all they have to
do is to adjust expressions (that happen to be known to evaluate to
what user variables are expected to hold) such that they retain the
same value in spite of transformations they perform, or are marked as
untrackable if that's impossible or too difficult.  For the
optimizers, all that matters is the expressions, and they already have
to deal with these all over anyway.  It's the debug info generator
that deals with user-level variables, taking into account whatever the
optimizers tell it about how to determine the location of user
variables throughout the program.

> What changes will need to be made throughout the compiler to keep
> track of the state?

Very few, so far.  Pretty much all of the changes that I had to make
were to prevent the notes from disabling optimizations; very few of
them required updating of debug notes beyond whatever the optimization
pass would have done by default.  That said, I have no means to test
automatically that updates to debug annotations are being performed
correctly, but since optimizers as a rule have to update all uses of
whatever they mess with, I have reasons to believe that they do it
correctly, precisely because the debug notes look so much like regular
uses to them.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  2:09                 ` Alexandre Oliva
@ 2007-11-12  4:49                   ` Mark Mitchell
  2007-11-12 18:45                     ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Mark Mitchell @ 2007-11-12  4:49 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:

> 1. introduce, early in compilation (when entering SSA), annotations
> that map user-level variables whose location may vary throughout their
> lifetime to implementation-level variables or expressions at every
> point of assignment and PHI joins.
> 
> 2. keep those annotations accurate throughout compilation, without
> letting them interfere with optimizations, but making sure they are
> kept up-to-date or marked untrackable.
> 
> 3. in var-tracking, starting from the expressions in the annotations
> and their equivalent expressions computed with a dataflow-globalized
> cse analysis, emit traditional var-tracking var_location notes for all
> variables.  For variables that didn't start out as gimple regs, the
> current debug info behavior should be preserved.
> 
>> I think that most of the goals boil down to making sure that, at any
>> point in the program, the debug information for a variable meets the
>> following criteria:
> 
>> (a) if the variable has not been optimized away, gives the location
>> where that variable's current value can be found, or
>> (b) if the variable has been optimized away, and the value is not a
>> constant, says that the value is not available, or
>> (c) if the variable has been optimized away, but is a constant, says
>> what the constant value is
> 
> yes, except that instead of constant and constant value, I'd put it as
> 'computable expression from other live values'.
> 
> And I'd say "locations" rather than just "location".

I agree; those are generalizations, of which my bullets are a needlessly
constrained special case.  (Of course, we can gradually approach
"computable" by starting with "constant", and then adding more and more
refinement, if we like.)

>> But, how are we going to track this information?  Algorithmically, what
>> needs to change in the compiler to maintain this state?
> 
> Most optimizations passes must already update uses of gimple or pseudo
> regs they modify, so these will be taken care of automatically (which
> is why I chose this representation).

For the purposes of this discussion, let's assume that upon exit from
SSA we still have the information we need.  In particular, we know which
SSA names correspond to which user variables.  That tells us how to get
the values of user variables at the points where their values are
available, and also tells us when those variables do not have their
values available.

(We may already have lost some information, though.  For example, given:

  i = 3;
  f(i);
  i = 7;
  i = 2;
  g(i);

we may well have lost the "i = 7" assignment, so "i" might appear to
have the value "3" right before we assign "2" to it, if we were to
generate debug information right then.)

The reason I want to make that assumption is that the part of this where
the representation is in question is once we reach RTL, right?

I guess I still don't really understand what you're doing at the RTL
level.  I understand the objectives.  I understand some of the things
you're claiming as virtues of DEBUG_INSN.  What I don't understand is
how it's actually going to work.  What are the notes you're inserting?
Do they just say "here is an RTL expression for computing the value of
user-variable V at this point in the program"?  Why does it make sense
to have that, rather than notes on instructions that say what affect the
instruction has on user variables?  (For example, "this SET makes the
value of V unavailable".  Or "this SET makes the value of the V
available in the destination register"?)

As a meta-question, have you or anyone else on the list looked at the
literature (IEEE/ACM, etc.) or how other compilers handle these problems?

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12  4:49                   ` Mark Mitchell
@ 2007-11-12 18:45                     ` Alexandre Oliva
  2007-11-12 18:49                       ` Joe Buck
                                         ` (3 more replies)
  0 siblings, 4 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-12 18:45 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> (We may already have lost some information, though.  For example, given:

>   i = 3;
>   f(i);
>   i = 7;
>   i = 2;
>   g(i);

> we may well have lost the "i = 7" assignment, so "i" might appear to
> have the value "3" right before we assign "2" to it, if we were to
> generate debug information right then.)

Yup.  And even if we could somehow preserve that information, there
wouldn't be any code to attach that information to.  There might be
uses for empty-range locations in debug information, but I can't think
of any.  Can anyone?  It's something we could try to preserve, and
with my design it would be quite easy to do so, but unless it's useful
for some purpose, I think we could just do away with it.

> The reason I want to make that assumption is that the part of this where
> the representation is in question is once we reach RTL, right?

I'm not sure what is in question at all.  I've proposed a design to
preserve debug information throughout compilation.  Other designs on
the table differ both in tree and rtl levels, and in the potential
quality and correctness of the debug information they can produce.

> I guess I still don't really understand what you're doing at the RTL
> level.

It's no different, except that instead of a DEBUG_STMT it's a
DEBUG_INSN, with the TREE exprssion converted to an RTL expression.

/me mumbles something about the silliness of keeping two completely
different yet nearly-isomorphic internal representations for
statements/instructions.

> What I don't understand is how it's actually going to work.  What
> are the notes you're inserting?

They're always of the form

  DEBUG user-variable = expression

where DEBUG stands for a DEBUG_STMT or a DEBUG_INSN, user-variable is
a tree that represents the user variable, and expression is a TREE or
RTL (depending on which representation we're in) that evaluates to the
value the user-variable is expected to hold at that point in the
program.

> Do they just say "here is an RTL expression for computing the value of
> user-variable V at this point in the program"?

In RTL, yes.

> Why does it make sense to have that, rather than notes on
> instructions that say what affect the instruction has on user
> variables?

Few instructions need such notes, so the proposal of growing SET by
33% doesn't quite appeal to me.  And then, optimizations move
instructions around, but I don't think they should move the assignment
notes around, for they should reflect the structure of the source
program, rather than the mangled representation that the optimizers
turn it into.

That said, growing SET to add to it a list of variables (or components
thereof) that the variable is assigned to could be made to work, to
some extent.  But when you optimize away such a set, you'd still have
to keep the note around, so it's not clear to me that adding code all
over to maintain the notes in place when the SETs go away or are
juggled around would bring us any advantage.  It would be just a
redundant notation for what the note would already convey, so it just
brings complexity for no actual advantage.

To make it concrete, consider that your example above could have become:

(set (reg i) (const_int 3)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem f))
(set (reg i) (const_int 7)) ;; assigns to i
(set (reg i) (const_int 2)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem g))

could have been optimized to:

(set (reg P1) (const_int 3))
(call (mem f))
(set (reg P1) (const_int 2))
(call (mem g))

and then you wouldn't have any debug information left for variable i.

whereas with the notes I propose, you'd be left with:

(debug i (const_int 3))
(set (reg P1) (const_int 3))
(call (mem f))
(debug i (const_int 7)) ;; may be dropped, as discussed above
(debug i (const_int 2))
(set (reg P1) (const_int 2))
(call (mem g))

even if no register at all ends up allocated for i.  And if there were
uses of i that followed the assignment to 7, to which the constant
could be propagated, you'd still be left with the annotation to
indicate that i has a new value at the correct point.

> As a meta-question, have you or anyone else on the list looked at the
> literature (IEEE/ACM, etc.) or how other compilers handle these problems?

I couldn't find much information about other compilers, but I've see a
number of (mostly dated) articles and US patents.  In fact, I'm
particularly concerned that US Patent 6091896 covers the design
proposed by Richi, that involves annotating the instructions
themselves.  I believe the independent, stand-alone annotations I
propose escape the patent claims.

That said, if anyone knows of articles that could be of use, I'd love
to hear about them.  It's not like my research was exhaustive.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:45                     ` Alexandre Oliva
@ 2007-11-12 18:49                       ` Joe Buck
  2007-11-25  6:57                         ` Alexandre Oliva
  2007-11-12 18:53                       ` Ian Lance Taylor
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 150+ messages in thread
From: Joe Buck @ 2007-11-12 18:49 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Mon, Nov 12, 2007 at 03:52:01PM -0200, Alexandre Oliva wrote:
> On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
> 
> > (We may already have lost some information, though.  For example, given:
> 
> >   i = 3;
> >   f(i);
> >   i = 7;
> >   i = 2;
> >   g(i);
> 
> > we may well have lost the "i = 7" assignment, so "i" might appear to
> > have the value "3" right before we assign "2" to it, if we were to
> > generate debug information right then.)
> 
> Yup.  And even if we could somehow preserve that information, there
> wouldn't be any code to attach that information to.  There might be
> uses for empty-range locations in debug information, but I can't think
> of any.  Can anyone?  It's something we could try to preserve, and
> with my design it would be quite easy to do so, but unless it's useful
> for some purpose, I think we could just do away with it.

If we drop the "i = 7" assignment, then a debugger could have a consistent
view of what is going on if, given

   i = 3;  // line 10
   f(i);   // line 11
   i = 7;  // line 12
   i = 2;  // line 13
   g(i);   // line 14

"next" would step from line 10, to 11, to 12, to 14.  We would not be able
to stop after the execution of a no-longer-existing statement; if we could
stop at the beginning of line 13, it would imply that line 12 has run and
line 13 has not, which does not reflect what the optimized code is doing.

We don't do it this way at the moment; we would be able to set a
breakpoint at line 13.  But perhaps the right way to think about your
project, Alexandre, is to make things match up at the point where the gdb
user can observe the state, and consider dropping observable points where
the states will not match.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:49                       ` Joe Buck
@ 2007-11-25  6:57                         ` Alexandre Oliva
  2007-11-25 12:09                           ` Richard Kenner
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-25  6:57 UTC (permalink / raw)
  To: Joe Buck
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Joe Buck <Joe.Buck@synopsys.COM> wrote:

> consider dropping observable points where the states will not match.

We can't really do that.  The line number mapping is from PC to line
number, regardless of how far into the execution or earlier lines the
code is.  Omitting certain mappings from PC to line numbers would be
wrong.

The piece of the puzzle we're still missing is how to get debuggers
clever enough to decide where to set a breakpoint.  Nowadays,
debuggers (at least those I'm familiar with) tend to set breakpoints
at the lowest-numbered PC corresponding to a given source line number.
While this is useful at times, at other times what you want is the
lowest PC after all instructions corresponding to the previous line,
because at that point you know all the state of the previous line
should be stable and hopefully still observable.  Or something along
these lines.  I don't have a complete solution for this problem.  It's
very far from trivial, and I don't see that debug information can
carry enough information for the compiler to aid the debugger in
selecting where to place breakpoints in this regard.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25  6:57                         ` Alexandre Oliva
@ 2007-11-25 12:09                           ` Richard Kenner
  0 siblings, 0 replies; 150+ messages in thread
From: Richard Kenner @ 2007-11-25 12:09 UTC (permalink / raw)
  To: aoliva; +Cc: Joe.Buck, gcc-patches, gcc, iant, mark, richard.guenther

> The piece of the puzzle we're still missing is how to get debuggers
> clever enough to decide where to set a breakpoint.  Nowadays, debuggers
> (at least those I'm familiar with) tend to set breakpoints at the
> lowest-numbered PC corresponding to a given source line number.  While
> this is useful at times, at other times what you want is the lowest PC
> after all instructions corresponding to the previous line, because at
> that point you know all the state of the previous line should be stable
> and hopefully still observable.  Or something along these lines.  I don't
> have a complete solution for this problem.  It's very far from trivial,
> and I don't see that debug information can carry enough information for
> the compiler to aid the debugger in selecting where to place breakpoints
> in this regard.

Or you want the first instruction of that line that shows the actual flow
of control.  Or sometimes other things, as you say.

A few of us were discussing this issue in person last week and we strongly
agree with your characterization that it's very far from trivial.  The
consensus we came to is that the compiler should continue associating the
original line number with each instruction that came from it, but perhaps
should also provide additional, not-yet-defined annotations to allow the
debugger to be able to provide various different types of breakpoints,
corresponding to various purposes the programmer us using the breakpoints
for.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:45                     ` Alexandre Oliva
  2007-11-12 18:49                       ` Joe Buck
@ 2007-11-12 18:53                       ` Ian Lance Taylor
  2007-11-24  2:12                         ` Alexandre Oliva
  2007-11-13 10:30                       ` Mark Mitchell
  2007-11-13 15:30                       ` Michael Matz
  3 siblings, 1 reply; 150+ messages in thread
From: Ian Lance Taylor @ 2007-11-12 18:53 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Mark Mitchell, Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > Why does it make sense to have that, rather than notes on
> > instructions that say what affect the instruction has on user
> > variables?
> 
> Few instructions need such notes, so the proposal of growing SET by
> 33% doesn't quite appeal to me.

We could add a note to the relevant instructions.  We don't need to
change the SET representation.  That approach would only increase
memory usage for relevant instructions.

> And then, optimizations move
> instructions around, but I don't think they should move the assignment
> notes around, for they should reflect the structure of the source
> program, rather than the mangled representation that the optimizers
> turn it into.

I'm not sure I follow this.  If the equivalent of some source code
line is hoisted out of a loop, shouldn't the user variable assignments
follow it?  After the scheduler has run over a large basic block, the
structure of the source program is gone.  Are we going to somehow try
to retain it in the debugging information?  Does that make sense?

Side note: I think it would be unwise to discuss specific patents on
this public mailing list.  I think that where we have specific patent
concerns, the steering committee should raise them on a telephone call
with the FSF and/or the SFLC.  If you have concerns about a specific
patent, I recommend that you telephone some member of the SC, or send
e-mail directly to that person.

Ian

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:53                       ` Ian Lance Taylor
@ 2007-11-24  2:12                         ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24  2:12 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Mark Mitchell, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:

>> And then, optimizations move instructions around, but I don't think
>> they should move the assignment notes around, for they should
>> reflect the structure of the source program, rather than the
>> mangled representation that the optimizers turn it into.

> I'm not sure I follow this.  If the equivalent of some source code
> line is hoisted out of a loop, shouldn't the user variable assignments
> follow it?

Why should it?  The user is entitled to expect the variable to be set
to that value at the right point in the program, no earlier than that.
Before the assignment point in the program, we ought to note that the
variable holds its previous value, or that its previous value is no
longer available.  But noting it holds a value it should only hold at
a later point doesn't seem right to me.

Consider, again, the example:

f(int x, int y) {
  int c;

  c = x;
  do_something_with_c();

  c = y;
  do_something_with_c();
}

If we optimize away the assignments c=x and c=y, and just use x and y
instead (assume c is not otherwise modified), what should we note in
debug info?  Should we pretend that c is dead all over, just because
it was optimized away?  Should we note that it's live in both x and y
registers/stack slots?  Or should we vary its location between x and
y, at the assignment points, as expected by the user?

Now, what if f() is inlined into a loop, such that c could be
versioned and the assignments to it could be hoisted, because x and y
don't vary?  Should this then change the debug information generated
for variable c from the IMHO correct points to the loop entry points?

> After the scheduler has run over a large basic block, the
> structure of the source program is gone.

The mapping becomes more difficult, yes.  But the structure of the
source program remains untouched, in the source program.  And debug
information is about mapping source concepts to implementation
concepts.  So we should try to map source concepts that remain in the
implementation to the remaining implementation concepts.

> Side note: I think it would be unwise to discuss specific patents on
> this public mailing list.  I think that where we have specific patent
> concerns, the steering committee should raise them on a telephone call
> with the FSF and/or the SFLC.  If you have concerns about a specific
> patent, I recommend that you telephone some member of the SC, or send
> e-mail directly to that person.

That makes sense.  I hadn't actually seen that patent before the day I
mentioned it, and I still haven't got 'round to reading it.  I just
thought it would be wise to inform people about the danger of going
down that path, but now I realize it may not have been wise at all.
Sorry for not thinking about it.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:45                     ` Alexandre Oliva
  2007-11-12 18:49                       ` Joe Buck
  2007-11-12 18:53                       ` Ian Lance Taylor
@ 2007-11-13 10:30                       ` Mark Mitchell
  2007-11-24  1:54                         ` Alexandre Oliva
  2007-11-13 15:30                       ` Michael Matz
  3 siblings, 1 reply; 150+ messages in thread
From: Mark Mitchell @ 2007-11-13 10:30 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:

>> What I don't understand is how it's actually going to work.  What
>> are the notes you're inserting?
> 
> They're always of the form
> 
>   DEBUG user-variable = expression

Good, I understand that now.

Why is this better than associating user variables with assignments?  In
other words, if we have:

  X = E;

where X is the location in which a user variable V is presently being
stored, we could just put a note on the assignment that says "assigns to
user variable V".  If X is, for example, a hard register, and we're now
clobbering the value of a user variable V (so that the value of the
variable is no longer available there), we can add a note that says
"clobbers user variable V".  (The value might still be available
somewhere else; we can figure that out by seeing if any instruction that
is annotated as setting V dominates this instruction, without an
intervening clobbering of that location.)

> That said, growing SET to add to it a list of variables (or components
> thereof) that the variable is assigned to could be made to work, to
> some extent.  But when you optimize away such a set, you'd still have
> to keep the note around

Why?  It seems to me that if we're no longer doing the assignment, then
the location where the value of the user variable can be found (if any)
is not changing at this point.

> (set (reg i) (const_int 3)) ;; assigns to i
> (set (reg P1) (reg i))
> (call (mem f))
> (set (reg i) (const_int 7)) ;; assigns to i
> (set (reg i) (const_int 2)) ;; assigns to i
> (set (reg P1) (reg i))
> (call (mem g))
> 
> could have been optimized to:
> 
> (set (reg P1) (const_int 3))
> (call (mem f))
> (set (reg P1) (const_int 2))
> (call (mem g))
> 
> and then you wouldn't have any debug information left for variable i.

Actually, you would, in the method I'm making up.  In particular, both
of the first two lines in the top example (setting "i" and setting "P1")
would be marked as providing the value of the user variable "i".  The
first line obviously has the value of "i", so we would have a "value of
i" note.  The second would also have a "value of i" note because its
copying a value with such a note.

What I'm suggesting is that this is something akin to a dataflow
problem.  We start by marking user variables, in the original TREE
representation.  Then, any time we copy the value of a user variable, we
know that what we're doing is providing another place where we can find
the value of that user variable.  Then, when generating debug
information, for every program region, we can find the location(s) where
the value of the user variable is available, and we can output any one
of those locations for the debugger.  Now, of course, we can generate
more compact information by trying to use the same location as often as
possible, but that's just an optimization problem.

This method gives us accurate debug information, in the sense that if we
say that the value of V is at location X, then it is in fact there, and
the value there is a value assigned to V.  It does not necessarily give
us complete information, though, in that there may be times when the
value is somewhere and we don't realize it.  Like, if:

  x = y + 3;
  f(x);

is optimized to:

  f(y + 3)

Then, right before the call to "f", we might not know that the value of
"x" is available, or we might say that "x" has a previous value.

As a special case of incompleteness, this fails utterly with respect to
variables whose values are constants if those variables are then
optimized away.  If there's no location holding the constant, then the
method I've proposed will say that the value is unavailable -- rather
than cleverly telling the debugger that the value is a constant.  I
don't see that as an unreasonable limitation when debugging optimized
code, but that's open for debate.

I'm not claiming this is better than what you're suggesting.  I'm just
throwing it out there.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-13 10:30                       ` Mark Mitchell
@ 2007-11-24  1:54                         ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24  1:54 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 13, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Alexandre Oliva wrote:
>>> What I don't understand is how it's actually going to work.  What
>>> are the notes you're inserting?
>> 
>> They're always of the form
>> 
>> DEBUG user-variable = expression

> Good, I understand that now.

> Why is this better than associating user variables with assignments?

I've already explained that, but let me try to sum it up again.

If we annotate assignments, then not only do the annotations move
around along with assignments (I don't think that's desirable), but
when we optimize such assignments away, the annotations are either
dropped or have to stand on their own.

Since dropping annotations and moving them around are precisely
opposed the goal of making debug information accurate, then keeping
the annotations in place and enabling them to stand on their own is
the right thing to do.

Now, since we have to enable them to stand on their own, then we're
faced with the following decision: either we make that the canonical
annotation representation all the way from the beginning, or we
piggyback the annotations on assignments until they're moved or
removed, at which point they become stand-alone annotations.  The
former seems much more maintainable and simpler to deal with, and I
don't see that there's a significant memory or performance penalty to
this.

>> That said, growing SET to add to it a list of variables (or components
>> thereof) that the variable is assigned to could be made to work, to
>> some extent.  But when you optimize away such a set, you'd still have
>> to keep the note around

> Why?  It seems to me that if we're no longer doing the assignment, then
> the location where the value of the user variable can be found (if any)
> is not changing at this point.

The thing is that the *location* of the user variable is changing at
that point.  Either because its previous value was unavalable, or
because it had remained only at a different location.  Only at the
point of the assignment should we associate the variable with the
location that holds its current value.

>> (set (reg i) (const_int 3)) ;; assigns to i
>> (set (reg P1) (reg i))
>> (call (mem f))
>> (set (reg i) (const_int 7)) ;; assigns to i
>> (set (reg i) (const_int 2)) ;; assigns to i
>> (set (reg P1) (reg i))
>> (call (mem g))
>> 
>> could have been optimized to:
>> 
>> (set (reg P1) (const_int 3))
>> (call (mem f))
>> (set (reg P1) (const_int 2))
>> (call (mem g))
>> 
>> and then you wouldn't have any debug information left for variable i.

> Actually, you would, in the method I'm making up.  In particular, both
> of the first two lines in the top example (setting "i" and setting "P1")
> would be marked as providing the value of the user variable "i".

Yes, this works in this very simple case.  But it doesn't when i is
assigned, at different points, to the values of two separate
variables, that are live and initialized much earlier in the program.
Using hte method you seem to be envisioning would extend the life of
the binding of variable 'i' to the life of the two other variables,
ending up with two overlapping and conflicting live ranges for i, or
it would have to drop one in favor of the other.  You can't possibly
retain correct (non-overlapping) live ranges for both unless you keep
notes at the points of assignment.

To make the example clear, consider:

(set (reg x [x]) ???1)
(set (reg y [y]) ???2)
(set (reg i [i]) (reg x [x]))
(set (reg P1) (reg i))
(call (mem f))
(set (reg i [i]) (reg y [y]))
(call (mem g))
(set (reg P1) (reg i))
(call (mem f))

if it gets optimized to:

(set (reg P1 [x, i]) ???1)
(set (reg y [y, i]) ???2)
(call (mem f))
(call (mem g))
(set (reg P1) (reg y))
(call (mem f))

then we lose.  There's no way you can emit debug information for i
based on these annotations such that, at the call to g, the value of i
is correct.  Even if you annotate the copy from y to P1, you still
won't have it right, and, worse, you won't even be able to tell that,
before the call to g, i should have held a different value.  So you'll
necessarily emit incorrect debug information for this case: you'll
state i still holds a value at a point in which it shouldn't hold that
value any more.  This is worse that stating you don't know what the
value of i is.

> What I'm suggesting is that this is something akin to a dataflow
> problem.  We start by marking user variables, in the original TREE
> representation.  Then, any time we copy the value of a user variable, we
> know that what we're doing is providing another place where we can find
> the value of that user variable.  Then, when generating debug
> information, for every program region, we can find the location(s) where
> the value of the user variable is available, and we can output any one
> of those locations for the debugger.

That's exactly what I have in mind.

> This method gives us accurate debug information, in the sense that if we
> say that the value of V is at location X, then it is in fact there, and
> the value there is a value assigned to V.  It does not necessarily give
> us complete information, though, in that there may be times when the
> value is somewhere and we don't realize it.  Like, if:

>   x = y + 3;
>   f(x);

> is optimized to:

>   f(y + 3)

> Then, right before the call to "f", we might not know that the value of
> "x" is available, or we might say that "x" has a previous value.

It's not just previous value.  It can be arbitrarily wrong value too.
Consider again the conditional case:

foo (int x, int y, int z)
{
  int c = z;
  whatever0(c);
  c = x;
  whatever1();
  if (some_condition)
    {
      whatever2();
      c = y;
      whatever3();
    }
  whatever4(c);
}

In the tree representation, the assignments to c just go away, in
favor of a PHI node that takes x from the !some_condition block and y
from the some_condition block.

So, you could recover the correct value for c at the PHI node, but
since the other assignments are all dropped, you can at best figure
out that you don't know the value held by c between whatever1() and
the PHI node, and at worst claim that it's z or x or y, or even both x
and y, depending on how you update the notes.

> method I've proposed will say that the value is unavailable [when
> it's a constant and the assignment is optimized away]

I don't see how, unless you keep a note saying at least that the
variable was modified to an unknown value at that point.

> I don't see that as an unreasonable limitation when debugging
> optimized code, but that's open for debate.

If it did that reliably, then it would be a reasonable limitation,
indeed, for it would be accurate, even if incomplete.  It would no
longer be a correctness issue, just a quality of implementation issue.
But then, I'm yet to understand how you'd generate debug info to note
that the value is unavailable if you don't keep notes around to
indicate the point of the assignment that was optimized away.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:45                     ` Alexandre Oliva
                                         ` (2 preceding siblings ...)
  2007-11-13 10:30                       ` Mark Mitchell
@ 2007-11-13 15:30                       ` Michael Matz
  2007-11-24  2:00                         ` Alexandre Oliva
  3 siblings, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-13 15:30 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Hi,

On Mon, 12 Nov 2007, Alexandre Oliva wrote:

> > Why does it make sense to have that, rather than notes on instructions 
> > that say what affect the instruction has on user variables?
> 
> Few instructions need such notes, so the proposal of growing SET by 33% 
> doesn't quite appeal to me.

Though I don't have produced hard numbers yet, that every SET now contains 
an additional pointer is less of an issue than one might think.  There 
only ever exists one RTL body at each point in time, hence the memory use 
for RTL is vastly dominated by the memory use of GIMPLE, which exists for 
all functions at the same time.

Having this annotation in the SET is just the esthetically most pleasing 
place.  If you do it with notes on insns you have issues with multi-set 
insns, and you have to move them around in case you change the insns.  
Putting them in the SET itself keeps them up-to-date nearly automatically 
(of course you still have to touch them once in a while).

> That said, growing SET to add to it a list of variables (or components
> thereof) that the variable is assigned to could be made to work, to
> some extent.  But when you optimize away such a set, you'd still have
> to keep the note around, so it's not clear to me that adding code all
> over to maintain the notes in place when the SETs go away or are
> juggled around would bring us any advantage.

The nice thing is, that there are only few places which really get rid of 
SETs: remove_insn.  You have to tweak that to keep the information around, 
not much else (though that claim remains to be proven :) ).

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-13 15:30                       ` Michael Matz
@ 2007-11-24  2:00                         ` Alexandre Oliva
  2007-11-26 21:01                           ` Michael Matz
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24  2:00 UTC (permalink / raw)
  To: Michael Matz
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:

> The nice thing is, that there are only few places which really get rid of 
> SETs: remove_insn.  You have to tweak that to keep the information around, 
> not much else (though that claim remains to be proven :) ).

And then, you have to tweak everything else to keep the note that
replaced the set up to date as you further optimize the code.  So what
was the point of adding the note to the SET, again?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24  2:00                         ` Alexandre Oliva
@ 2007-11-26 21:01                           ` Michael Matz
  2007-11-27  5:31                             ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-26 21:01 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Hi,

On Fri, 23 Nov 2007, Alexandre Oliva wrote:

> On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:
> 
> > The nice thing is, that there are only few places which really get rid of 
> > SETs: remove_insn.  You have to tweak that to keep the information around, 
> > not much else (though that claim remains to be proven :) ).
> 
> And then, you have to tweak everything else to keep the note that
> replaced the set up to date as you further optimize the code.

No.  remove_insn() would replace the SET with a note.  It would look at 
other SETs where the information could be put in which is lost.  After 
all, there must have been a reason for the SET to be deleted: the 
destination is dead, hence whatever user-variables were associated with it 
also are dead.  (if they also lie in other places, those are not 
affected).  So it's okay to completely get rid of the SET and decl 
associations.

One special case of the above is, when a SET is deleted which is a copy, 
where the LHS was associated with some variables, but the RHS was not.  
From that point on we can (under certain circumstances) associate the RHS 
with the decls (by changing it's initial SET).

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 21:01                           ` Michael Matz
@ 2007-11-27  5:31                             ` Alexandre Oliva
  2007-11-27 20:31                               ` Michael Matz
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-27  5:31 UTC (permalink / raw)
  To: Michael Matz
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 26, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Fri, 23 Nov 2007, Alexandre Oliva wrote:

>> On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:
>> 
>> > The nice thing is, that there are only few places which really get rid of 
>> > SETs: remove_insn.  You have to tweak that to keep the information around, 
>> > not much else (though that claim remains to be proven :) ).
>> 
>> And then, you have to tweak everything else to keep the note that
>> replaced the set up to date as you further optimize the code.

> No.  remove_insn() would replace the SET with a note.

What information would this note convey?

> After all, there must have been a reason for the SET to be deleted:
> the destination is dead, hence whatever user-variables were
> associated with it also are dead.

Note quite.  The destination could be merely redundant.  And the
difference is crucial.

If you delete a copy (or some other redundant computation, you don't
seem to handle this case) that would install a value in a variable
that is available elsewhere, and then adjust the uses of the variable
such that they use the value elsewhere, you ought to note that the
variable holds that value, and at that point.

If you delete a computation because the result is completely unused,
then you ought to note that you no longer know the value of the
variable (or, ideally, that the variable would hold the result of that
computation if there was code to compute it).

In both cases, you ought to note that earlier values of the variable
are no longer current at that point.

In both cases, the notion of "at that point" is crucial, especially
when you deal with conditional assignments.  You don't want to make it
seem like a conditional assignment applies when the condition doesn't
hold.  Consider:

int foo(bool p, int x, int y) {
  int i = x;

  p1();

  if (p)
    i = y;

  p2();

  i++;

  p3(i);
}

int main() {
  foo (false, 3, 5);
}

At p1()'s caller's frame, you want i to hold the value 3.  At p2()'s,
you want i to still hold the value 3.  At p3(int)'s, it should be 4.

Now, if you change the program such that p is true, then at p1 i is
still 3, but at p2 it ought to be 5, and at p3(int)'s it should be 6.

How do you get that if you drop the assignments on the floor, or even
if you replace them assignments with notes that don't keep the correct
values associated not only with the names, but also with the points in
the program?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-27  5:31                             ` Alexandre Oliva
@ 2007-11-27 20:31                               ` Michael Matz
  2007-11-27 21:44                                 ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-27 20:31 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Hi,

On Mon, 26 Nov 2007, Alexandre Oliva wrote:

> >> And then, you have to tweak everything else to keep the note that
> >> replaced the set up to date as you further optimize the code.
> 
> > No.  remove_insn() would replace the SET with a note.
> 
> What information would this note convey?

Oh my, sorry for adding confusion to the topic: I meant to write "would 
_not_ replace the SET with a note".


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-27 20:31                               ` Michael Matz
@ 2007-11-27 21:44                                 ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-27 21:44 UTC (permalink / raw)
  To: Michael Matz
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 27, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Mon, 26 Nov 2007, Alexandre Oliva wrote:

>> >> And then, you have to tweak everything else to keep the note that
>> >> replaced the set up to date as you further optimize the code.
>> 
>> > No.  remove_insn() would replace the SET with a note.
>> 
>> What information would this note convey?

> Oh my, sorry for adding confusion to the topic: I meant to write "would 
> _not_ replace the SET with a note".

Aah, ok.  So, you do indeed completely lose track of the crucial
differences between the two cases for the removal of a SET.  And not
only about their implications, but also about where they ought to take
effect.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  0:00           ` Mark Mitchell
  2007-11-08  0:15             ` David Edelsohn
  2007-11-08  5:44             ` Alexandre Oliva
@ 2007-11-08  9:54             ` Richard Guenther
  2 siblings, 0 replies; 150+ messages in thread
From: Richard Guenther @ 2007-11-08  9:54 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ian Lance Taylor, Alexandre Oliva, gcc-patches, gcc

On 11/8/07, Mark Mitchell <mark@codesourcery.com> wrote:
> Ian Lance Taylor wrote:
>
> > At one time, gcc actually provided better debugging of optimized code
> > than any other compiler, though I don't know if that is still true.
> > Optimized gcc code is still debuggable today.  I do it all the time.
> > (For me poor support for debugging C++ is a much bigger issue, though
> > I think that is an issue more with gdb than with gcc.)
>
> I think we all agree that providing better debugging of optimized code
> is a priori a good thing.  So, as I see it, this thread is focused on
> what internal representation we might use for that.
>
> I don't know that there's an abstract right answer to whether something
> NOTE-like or something on the side is better.  There are problems with
> both approaches.  We know the NOTE/DEBUG_INSN thing is going to break,
> from experience; we also know the on-the-side thing is going to be hard
> to maintain.

I think we're going to find out once both approaches are implemented up to a
way that they reasonably to what they want to do.  So I'm fine to defer this
decision up to that point (or the point where we start the fighting on which
approach will get merged).

> Alexandre has clearly thought about this a lot.  I'd like to start by
> capturing the functional changes that we want to make to GCC's debug
> output -- not the changes that we want in the debug experience, or
> changes that we need in GDB, but the changes in the generated DWARF.
>
> For example, I'm thinking of a series of function test cases.  Ignore
> the substance of this example -- I'm making it up! -- I'm just trying to
> capture the form.
>
> ===
> int main () { int i; i = 3; return i; }
>
> When optimizing, "i" is optimized away.  The debug info for "i" right
> before the return statement says "i has been optimized away", but not
> what its value is.  I think it should say that the value is "3".  To do
> that, we need to emit a DW_Now_My_Value_is_3 tag for "i".
> ===
>
> Now, how is whatever representation we pick going to get us that?  Is
> the Oliva representation sufficient?  What about the Guenther/Matz
> representation?  Independently of the representation, what algorithms
> are we going to use to track whatever we need to track as the optimizers
> remove, insert, duplicate, and reorder code?

For the example above, the representation we use on the tree level cannot
attach a name to '3' (since obviously '3' is not a SSA_NAME).  But this is
fixable if we think it is worthwhile.

> Until we all know what we're trying to do, I don't see how we can make a
> good decision about the representation.  Clearly, in the abstract, we
> can represent data either on-the-side or in the instruction stream, but
> until we know what output we want, I'm not sure how we can pick.

That's true.  I was also thinking on how to properly do testcases for both kind
of infrastructure.  At the moment I scan tree/rtl dumps for the names I want
to preserve, but ultimately it would be nice to be able to run gdb testcases in
the gcc tree to also verify 'correctness' of the information we produce (and
not just existence of some information).

Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 22:57         ` Ian Lance Taylor
  2007-11-07 23:05           ` Daniel Jacobowitz
  2007-11-08  0:00           ` Mark Mitchell
@ 2007-11-08  5:01           ` Alexandre Oliva
  2007-11-08 18:15             ` Alexandre Oliva
  2007-11-08 19:13             ` Ian Lance Taylor
  2007-11-08  8:58           ` Paolo Bonzini
  3 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08  5:01 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:

>> Does it really matter?  Do we compromise standards compliance (and so
>> violently, while at that) in any aspect of the compiler?

> What standards are you talking about?

Debug information standards such as DWARF-3.

> I'm not aware of any standard for debuggability of optimized code.

I'm talking about standards that specify how a compiler should encode
meta-information about how source code concepts map to the code it
generated.  See, for example, section 2.6 in the Dwarf-3
specification.  It talks very little about optimization, but it does
discuss what a DW_AT_location, if present, means.  It doesn't say
anything like: "if a variable is available at a certain location most
of the time, you can emit a DW_AT_location that refers to that
location".  It says:

  Debugging information must provide consumers a way to find the
  location of program variables, determine the bounds of dynamic
  arrays and strings, and possibly to find the base address of a
  subroutine’s stack frame or the return address of a subroutine

See, it's not about debuggers, it's about consumers.  It's an
obligation, not really an option (that said, DW_AT_location *is*
optional).

  1. Location expressions, which are a language independent
     representation of addressing rules of arbitrary complexity built
     from DWARF expressions. They are sufficient for describing the
     location of any object as long as its lifetime is either static
     or the same as the lexical block that owns it, and it does not
     move throughout its lifetime.

  2. Location lists, which are used to describe objects that have a
     limited lifetime or change their location throughout their
     lifetime.

Nowhere does it state that, "if the compiler can't quite keep track of
the location of a variable, it can be sloppy and emit just whatever is
simpler or appears to make sense".

  Address ranges may overlap. When they do, they describe a situation
  in which an object exists simultaneously in more than one place. If
  all of the address ranges in a given location list do not
  collectively cover the entire range over which the object in
  question is defined, it is assumed that the object is not available
  for the portion of the range that is not covered.

So, it does make room for *some* sloppiness, after all.  That's what I
refer to as "incompleteness of debug information".  If we fail to keep
track of where an object is, it's sort-of ok (although undesirable) to
emit debug information that omits the location of the object in
certain program regions where it might be live.

However, it is not standard-compliant to emit information stating that
the object is available at certain locations if it is NOT really
there, or if it is available elsewhere, in addition to or instead of
the locations we've emitted.  That's what I refer to as "incorrectness
of debug information".

Incorrectness in the compiler output is always a bug.  No matter how
hard it is to implement, or how resource-intensive the solution is,
arguing that we've made a trade-off and decided to generate wrong
output for this case is a clever decision.

Incompleteness is a completely different issue.  This is where we
*can* afford to make trade-offs.  Just like we can decide to omit
certain optimizations, or to not carry them out to the greatest
possible extent, or to experiment with various different heuristics,
we could afford to emit incomplete debug information, it's "just" a
quality of implementation issue.  But not incorrect debug information,
that's just a bug.

> gcc's users are definitely calling for a faster compiler.  Are they
> calling for better debuggability of optimized code?

This is not just about debuggability, as I've tried to explain all the
way from the beginning of the discussion, maybe a couple of months
ago.  Debug information is not just about debuggers any more.  There
are good reasons why the Dwarf-3 standard says "consumers" rather than
"debuggers".  It's no longer just a matter of convenience, recompile
with -g0 if you want to debug it.  It's a matter of correctness, for
various monitoring tools now rely on this meta-information, and
rightfully so.

>> > We've fixed many many bugs and misoptimizations over the years due to
>> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
>> > we've made in the past.
>> 
>> That's a valid concern.  However, per this reasoning, we might as well
>> push every operand in our IL to separate representations, because
>> there have been so many bugs and misoptimizations over the years,
>> especially when the representation didn't make transformations
>> trivially correct.

> Please don't use strawman arguments.

It's not, really.  A reference to an object within a debug stmt or
insn is very much like any other operand, in that most optimizer
passes must keep them up to date.  If you argue for pushing them
outside the IL, why would any other operands be different?

> As I understand your proposal, it materializes variables which were
> otherwise omitted from the generated program.  It doesn't address the
> other issues with debugging optimized code, like bouncing around
> between program lines.  Is that correct?  What else does your proposal
> do?

All it does is to try to carry information about what value the user
is entitled to expect a variable to hold at each point in the program
throughout compilation.  Such that, even if the compiler doesn't
retain something that represents only that variable through to the end
of the compilation, we still have information about where, or at least
what, its value is, if it is available anywhere, such that we can
include this piece of data in the debug information.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:01           ` Alexandre Oliva
@ 2007-11-08 18:15             ` Alexandre Oliva
  2007-11-08 19:13             ` Ian Lance Taylor
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 18:15 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:

>> Does it really matter?  Do we compromise standards compliance (and so
>> violently, while at that) in any aspect of the compiler?

> What standards are you talking about?

Debug information standards such as DWARF-3.

> I'm not aware of any standard for debuggability of optimized code.

I'm talking about standards that specify how a compiler should encode
meta-information about how source code concepts map to the code it
generated.  See, for example, section 2.6 in the Dwarf-3
specification.  It talks very little about optimization, but it does
discuss what a DW_AT_location, if present, means.  It doesn't say
anything like: "if a variable is available at a certain location most
of the time, you can emit a DW_AT_location that refers to that
location".  It says:

  Debugging information must provide consumers a way to find the
  location of program variables, determine the bounds of dynamic
  arrays and strings, and possibly to find the base address of a
  subroutine’s stack frame or the return address of a subroutine

See, it's not about debuggers, it's about consumers.  It's an
obligation, not really an option (that said, DW_AT_location *is*
optional).

  1. Location expressions, which are a language independent
     representation of addressing rules of arbitrary complexity built
     from DWARF expressions. They are sufficient for describing the
     location of any object as long as its lifetime is either static
     or the same as the lexical block that owns it, and it does not
     move throughout its lifetime.

  2. Location lists, which are used to describe objects that have a
     limited lifetime or change their location throughout their
     lifetime.

Nowhere does it state that, "if the compiler can't quite keep track of
the location of a variable, it can be sloppy and emit just whatever is
simpler or appears to make sense".

  Address ranges may overlap. When they do, they describe a situation
  in which an object exists simultaneously in more than one place. If
  all of the address ranges in a given location list do not
  collectively cover the entire range over which the object in
  question is defined, it is assumed that the object is not available
  for the portion of the range that is not covered.

So, it does make room for *some* sloppiness, after all.  That's what I
refer to as "incompleteness of debug information".  If we fail to keep
track of where an object is, it's sort-of ok (although undesirable) to
emit debug information that omits the location of the object in
certain program regions where it might be live.

However, it is not standard-compliant to emit information stating that
the object is available at certain locations if it is NOT really
there, or if it is available elsewhere, in addition to or instead of
the locations we've emitted.  That's what I refer to as "incorrectness
of debug information".

Incorrectness in the compiler output is always a bug.  No matter how
hard it is to implement, or how resource-intensive the solution is,
arguing that we've made a trade-off and decided to generate wrong
output for this case is a clever decision.

Incompleteness is a completely different issue.  This is where we
*can* afford to make trade-offs.  Just like we can decide to omit
certain optimizations, or to not carry them out to the greatest
possible extent, or to experiment with various different heuristics,
we could afford to emit incomplete debug information, it's "just" a
quality of implementation issue.  But not incorrect debug information,
that's just a bug.

> gcc's users are definitely calling for a faster compiler.  Are they
> calling for better debuggability of optimized code?

This is not just about debuggability, as I've tried to explain all the
way from the beginning of the discussion, maybe a couple of months
ago.  Debug information is not just about debuggers any more.  There
are good reasons why the Dwarf-3 standard says "consumers" rather than
"debuggers".  It's no longer just a matter of convenience, recompile
with -g0 if you want to debug it.  It's a matter of correctness, for
various monitoring tools now rely on this meta-information, and
rightfully so.

>> > We've fixed many many bugs and misoptimizations over the years due to
>> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
>> > we've made in the past.
>> 
>> That's a valid concern.  However, per this reasoning, we might as well
>> push every operand in our IL to separate representations, because
>> there have been so many bugs and misoptimizations over the years,
>> especially when the representation didn't make transformations
>> trivially correct.

> Please don't use strawman arguments.

It's not, really.  A reference to an object within a debug stmt or
insn is very much like any other operand, in that most optimizer
passes must keep them up to date.  If you argue for pushing them
outside the IL, why would any other operands be different?

> As I understand your proposal, it materializes variables which were
> otherwise omitted from the generated program.  It doesn't address the
> other issues with debugging optimized code, like bouncing around
> between program lines.  Is that correct?  What else does your proposal
> do?

All it does is to try to carry information about what value the user
is entitled to expect a variable to hold at each point in the program
throughout compilation.  Such that, even if the compiler doesn't
retain something that represents only that variable through to the end
of the compilation, we still have information about where, or at least
what, its value is, if it is available anywhere, such that we can
include this piece of data in the debug information.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08  5:01           ` Alexandre Oliva
  2007-11-08 18:15             ` Alexandre Oliva
@ 2007-11-08 19:13             ` Ian Lance Taylor
  2007-11-08 20:27               ` Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: Ian Lance Taylor @ 2007-11-08 19:13 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> On Nov  7, 2007, Ian Lance Taylor <iant@google.com> wrote:
> 
> >> Does it really matter?  Do we compromise standards compliance (and so
> >> violently, while at that) in any aspect of the compiler?
> 
> > What standards are you talking about?
> 
> Debug information standards such as DWARF-3.

...

> Incorrectness in the compiler output is always a bug.  No matter how
> hard it is to implement, or how resource-intensive the solution is,
> arguing that we've made a trade-off and decided to generate wrong
> output for this case is a clever decision.

I'm sorry, I've thought about it, but I don't buy this argument.  I'm
certainly willing to talk about improving debug information for
optimized code, and clearly it is more important to more people than I
initially thought.  However, I don't think your arguments that this is
an issue comparable to code correctness are valid.  Incorrect
generated code is a fatal problem in a compiler.  Incorrect debugging
information is a quality of implementation issue.


> >> > We've fixed many many bugs and misoptimizations over the years due to
> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> >> > we've made in the past.
> >> 
> >> That's a valid concern.  However, per this reasoning, we might as well
> >> push every operand in our IL to separate representations, because
> >> there have been so many bugs and misoptimizations over the years,
> >> especially when the representation didn't make transformations
> >> trivially correct.
> 
> > Please don't use strawman arguments.
> 
> It's not, really.  A reference to an object within a debug stmt or
> insn is very much like any other operand, in that most optimizer
> passes must keep them up to date.  If you argue for pushing them
> outside the IL, why would any other operands be different?

I think you misread me.  I didn't argue for pushing debugging
information outside the IL.  I argued against a specific
implementation--DEBUG_INSN--based on our experience with similar
implementations.

Ian

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 19:13             ` Ian Lance Taylor
@ 2007-11-08 20:27               ` Alexandre Oliva
  2007-11-08 21:26                 ` Ian Lance Taylor
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 20:27 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Ian Lance Taylor <iant@google.com> wrote:

> However, I don't think your arguments that this is
> an issue comparable to code correctness are valid.

It *is* code correctness.  Say, if the linker emitted incorrect
addresses in an executable, but the kernel and dynamic loader didn't
rely on those addresses, would it not still be a bug in the linker?
And then, if those tools started relying on those addresses and
exposed the problem, would it be right to tell them they must not rely
on them because they were broken in the past and we don't feel like
correcting the linker?

So...  The compiler is outputting code that tells other tools where to
look for certain variables at run time, but it's putting incorrect
information there.  How can you possibly argue that this is not a code
correctness issue?

> Incorrect generated code is a fatal problem in a compiler.
> Incorrect debugging information is a quality of implementation
> issue.

Incomplete debugging information is a quality of implementation, just
like missed optimizations.

Incorrect compiler output is a bug.  Claiming it's not just because
tools you happen to rely on don't care about that piece of information
won't make it any less of a bug.  It may make it a less important bug
for some time, but it's still a bug.

>> >> > We've fixed many many bugs and misoptimizations over the years due to
>> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
>> >> > we've made in the past.
>> >> 
>> >> That's a valid concern.  However, per this reasoning, we might as well
>> >> push every operand in our IL to separate representations, because
>> >> there have been so many bugs and misoptimizations over the years,
>> >> especially when the representation didn't make transformations
>> >> trivially correct.
>> 
>> > Please don't use strawman arguments.
>> 
>> It's not, really.  A reference to an object within a debug stmt or
>> insn is very much like any other operand, in that most optimizer
>> passes must keep them up to date.  If you argue for pushing them
>> outside the IL, why would any other operands be different?

> I think you misread me.  I didn't argue for pushing debugging
> information outside the IL.  I argued against a specific
> implementation--DEBUG_INSN--based on our experience with similar
> implementations.

Do you remember any other notes that contained actual rtx expressions
and expected optimization passes to keep them accurate?

All notes (as in matching NOTE_P) I remember didn't really contain rtx
expressions.  The first exception I remember is VAR_LOCATION, and this
one explicitly does *not* want to be updated, for it's generated so
late in the process.

Conversely, REG_NOTES do contain rtx, and they often have to be
updated, so that's the right representation for them.  Do you think
we'd gain anything by moving them to a separate, out-of-line
representation?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 20:27               ` Alexandre Oliva
@ 2007-11-08 21:26                 ` Ian Lance Taylor
  2007-11-09  9:53                   ` Robert Dewar
  2007-11-09  9:55                   ` Seongbae Park (박성배, 朴成培)
  0 siblings, 2 replies; 150+ messages in thread
From: Ian Lance Taylor @ 2007-11-08 21:26 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> So...  The compiler is outputting code that tells other tools where to
> look for certain variables at run time, but it's putting incorrect
> information there.  How can you possibly argue that this is not a code
> correctness issue?

I don't see any point to going around this point again, so I'll just
note that I disagree.


> >> >> > We've fixed many many bugs and misoptimizations over the years due to
> >> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> >> >> > we've made in the past.
> >> >> 
> >> >> That's a valid concern.  However, per this reasoning, we might as well
> >> >> push every operand in our IL to separate representations, because
> >> >> there have been so many bugs and misoptimizations over the years,
> >> >> especially when the representation didn't make transformations
> >> >> trivially correct.
> >> 
> >> > Please don't use strawman arguments.
> >> 
> >> It's not, really.  A reference to an object within a debug stmt or
> >> insn is very much like any other operand, in that most optimizer
> >> passes must keep them up to date.  If you argue for pushing them
> >> outside the IL, why would any other operands be different?
> 
> > I think you misread me.  I didn't argue for pushing debugging
> > information outside the IL.  I argued against a specific
> > implementation--DEBUG_INSN--based on our experience with similar
> > implementations.
> 
> Do you remember any other notes that contained actual rtx expressions
> and expected optimization passes to keep them accurate?

No.

> Do you think
> we'd gain anything by moving them to a separate, out-of-line
> representation?

I don't know.  I don't see such a proposal on the table, and I don't
have one myself, so I don't know how to evaluate it.

Ian

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 21:26                 ` Ian Lance Taylor
@ 2007-11-09  9:53                   ` Robert Dewar
  2007-11-12  5:36                     ` Mark Mitchell
  2007-11-09  9:55                   ` Seongbae Park (박성배, 朴成培)
  1 sibling, 1 reply; 150+ messages in thread
From: Robert Dewar @ 2007-11-09  9:53 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Ian Lance Taylor wrote:
> Alexandre Oliva <aoliva@redhat.com> writes:
> 
>> So...  The compiler is outputting code that tells other tools where to
>> look for certain variables at run time, but it's putting incorrect
>> information there.  How can you possibly argue that this is not a code
>> correctness issue?
> 
> I don't see any point to going around this point again, so I'll just
> note that I disagree.

Well I very much agree. If you are writing certified code, then a number
of evidence producing tools rely on the debugging information, and it is
a problem if this information is incorrect.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  9:53                   ` Robert Dewar
@ 2007-11-12  5:36                     ` Mark Mitchell
  2007-11-12 17:34                       ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Mark Mitchell @ 2007-11-12  5:36 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Robert Dewar wrote:
> Ian Lance Taylor wrote:
>> Alexandre Oliva <aoliva@redhat.com> writes:
>>
>>> So...  The compiler is outputting code that tells other tools where to
>>> look for certain variables at run time, but it's putting incorrect
>>> information there.  How can you possibly argue that this is not a code
>>> correctness issue?
>>
>> I don't see any point to going around this point again, so I'll just
>> note that I disagree.
> 
> Well I very much agree.

The trick is that we're being asked to give a binary answer ("is it a
correctness issue?") when it's not really a binary issue.

Clearly, for some users, incorrect debugging information on optimized
code is not a terribly big deal.  It's certainly less important to many
users than that the program get the right answer.  On the other hand,
there are no doubt users where, whether for debugging, certification, or
whatever, it's vitally important that the debugging information meet
some standard of accuracy.

Part of my concern with this whole discussion is that we seem to be
saying we want the debugging information to be better, but not saying
very clearly what the requirements on better are.  Are we going to
consider it a bug if the value of a variable is unavailable, but the
debugging information says it is available?  (Yes, this seems like a bug
to me.)  What if an old value is available, but a simple-minded reading
of the program would have now assigned a new value?  (No, I wouldn't
consider this a bug.)  What if the value is available in two places, and
we only describe one of them?  (No, I wouldn't consider this a bug.)
What if the value is available, but we say that it isn't because we lost
track of it at some point?  (I would say "it depends".)

We could certainly track user variables through SSA and RTL, at least
insofar as knowing that some REGs refer to SSA names that refer to user
VAR_DECLs.  We can use dataflow analysis to compute where those values
(might) die.  Thus, we can probably do a reasonable job of guaranteeing
that when we say a variable is somewhere, it is in fact in that place.

I don't yet understand what else we're trying to do.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12  5:36                     ` Mark Mitchell
@ 2007-11-12 17:34                       ` Alexandre Oliva
  2007-11-12 17:54                         ` Mark Mitchell
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-12 17:34 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Clearly, for some users, incorrect debugging information on optimized
> code is not a terribly big deal.  It's certainly less important to many
> users than that the program get the right answer.  On the other hand,
> there are no doubt users where, whether for debugging, certification, or
> whatever, it's vitally important that the debugging information meet
> some standard of accuracy.

How is this different from a port of the compiler for a CPU that few
people care about?  That many users couldn't care less whether the
compiler output on that port works at all doesn't make it any less of
a correctness issue.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 17:34                       ` Alexandre Oliva
@ 2007-11-12 17:54                         ` Mark Mitchell
  2007-11-24  1:55                           ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Mark Mitchell @ 2007-11-12 17:54 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:
> On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
> 
>> Clearly, for some users, incorrect debugging information on optimized
>> code is not a terribly big deal.  It's certainly less important to many
>> users than that the program get the right answer.  On the other hand,
>> there are no doubt users where, whether for debugging, certification, or
>> whatever, it's vitally important that the debugging information meet
>> some standard of accuracy.
> 
> How is this different from a port of the compiler for a CPU that few
> people care about?  That many users couldn't care less whether the
> compiler output on that port works at all doesn't make it any less of
> a correctness issue.

You're again trying to make this a binary-value question.  Why?

Lots of things are "a correctness issue".  But, some categories tend to
be worse than others.  There is certainly a qualitative difference in
the severity of a defect that results in the compiler generating code
that computes the wrong answer and a defect that results in the compiler
generating wrong debugging information for optimized code.

The impact on a user affected by the first problem is likely very
severe: the application does not run correctly.  The impact on a user
affected by the second problem is likely less severe: the debugger
doesn't work as well, or some other external tool doesn't work as well.

Let's put it this way: if a user has to choose whether the compiler will
(a) generate code that runs correctly for their application, or (b)
generate debugging information that's accurate, which one will they choose?

But what's the point of this argument?  It sounds like you're trying to
argue that debug info for optimized code is a correctness issue, and
therefore we should work as hard on it as we would on code-generation
bugs.  I don't find that argument persuasive.  I'd like better debugging
for optimized code, but I'm certainly more concerned that (a) we
generate correct, fast code when optimizing, and (b) we generate good
debugging information when not optimizing.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 17:54                         ` Mark Mitchell
@ 2007-11-24  1:55                           ` Alexandre Oliva
  2007-11-26  1:08                             ` Mark Mitchell
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24  1:55 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> Alexandre Oliva wrote:
>> On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:
>> 
>>> Clearly, for some users, incorrect debugging information on optimized
>>> code is not a terribly big deal.  It's certainly less important to many
>>> users than that the program get the right answer.  On the other hand,
>>> there are no doubt users where, whether for debugging, certification, or
>>> whatever, it's vitally important that the debugging information meet
>>> some standard of accuracy.
>> 
>> How is this different from a port of the compiler for a CPU that few
>> people care about?  That many users couldn't care less whether the
>> compiler output on that port works at all doesn't make it any less of
>> a correctness issue.

> You're again trying to make this a binary-value question.  Why?

Because in my mind, when we agree there is a bug, then a fix for it
can is easier to swallow even if it makes the compiler spend more
resources, whereas a mere quality-of-implementation issue is subject
to quite different standards.

> Lots of things are "a correctness issue".  But, some categories tend to
> be worse than others.  There is certainly a qualitative difference in
> the severity of a defect that results in the compiler generating code
> that computes the wrong answer and a defect that results in the compiler
> generating wrong debugging information for optimized code.

That depends a lot on whether your application depends uses the
incorrect compiler output or not.

If the compiler produces incorrect code, but your application doesn't
ever exercise that error, would you argue for leaving the bug unfixed?

These days, applications are built that depend on the correctness of
the compiler output in certain sections that historically weren't all
that functionally essential, namely, the meta-information sections
that we got used to calling debug information.

I.e., these days, applications exercise the "code paths" that formerly
weren't exercised.  This exposes bugs in the compiler.  Worse: bugs
that we have no infrastructure to test, and that we don't even agree
are actual bugs, because the standards that specify the "ISA and ABI"
in which such code ought to be output are apparently regarded as
irrelevant by some.

Just because their perception is distorted by a single use of such
information, which involves a high amount of human interaction, and
humans are able to tolerate and adapt to error conditions.

But as more and more uses of such information are actual production
systems rather than humans behind debuggers, such errors can no longer
be tolerated, because when the debug output is wrong, the system
breaks.  It's that simple.  It's really no different from any other
compiler bug.

> Let's put it this way: if a user has to choose whether the compiler will
> (a) generate code that runs correctly for their application, or (b)
> generate debugging information that's accurate, which one will they choose?

(a), for sure.  But bear in mind that, when the application's correct
execution depends on the correctness of debugging information, then a
implies b.

> But what's the point of this argument?  It sounds like you're trying to
> argue that debug info for optimized code is a correctness issue, and
> therefore we should work as hard on it as we would on code-generation
> bugs.

I'm working hard on it.  I'm not asking others to join me.  I'm just
asking people to understand how serious a problem it is, and that,
even those fixing these bugs may have a cost, it's bugs we're talking
about, it's incorrect compiler output that causes applications to
break, not mere inconvenience for debuggers.

> I'd like better debugging for optimized code, but I'm certainly more
> concerned that (a) we generate correct, fast code when optimizing,
> and (b) we generate good debugging information when not optimizing.

This just goes to show that you're not concerned with the kind of
application that *depends* on correct debug information for
functioning.  And it's not debuggers I'm talking about here.

That's a reasonable point of view.  Maybe the GCC community can decide
that the debug information it produces is just for (poor) consumption
by debug programs, and that we have no interest in *complying* with
the debug information standards that document the debug information
that other applications depend on.  And I mean *complying* with the
standards, rather than merely outputting whatever seems to be easy and
approximately close to what the standard mandates.

I just wish the GCC community doesn't make this decision, and it
accepts fixes to these bugs even when they impose some overhead,
especially when such overhead can be easily avoided with command-line
options, or even is disabled by default (because debug info is not
emitted by default, after all).

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24  1:55                           ` Alexandre Oliva
@ 2007-11-26  1:08                             ` Mark Mitchell
  2007-12-05 14:22                               ` Diego Novillo
  0 siblings, 1 reply; 150+ messages in thread
From: Mark Mitchell @ 2007-11-26  1:08 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:

>> You're again trying to make this a binary-value question.  Why?
> 
> Because in my mind, when we agree there is a bug, then a fix for it
> can is easier to swallow even if it makes the compiler spend more
> resources, whereas a mere quality-of-implementation issue is subject
> to quite different standards.

Unfortunately, not all questions are black-and-white.  I don't think
you're going to get consensus that this issue is as important to fix as
wrong-code (in the traditional sense) problems.  So, arguing about
whether this is a "correctness issue" isn't very productive.

Neither is arguing that there is now some urgent need for machine-usable
debugging information in a way that there wasn't before.  Machines have
been using debugging information for various purposes other than
interactive debugging for ages.  But, they've always had to deal with
the kinds of problems that you're encountering, especially with
optimized code.

I think that at this point you're doing research.  I don't think we have
a well-defined notion of what exactly debugging information should be
for optimized code.  Robert Dewar's definition of -O1 as doing
optimizations that don't interfere with debugging is coherent (though
informal, of course), but you're asking for something more: full
optimization, and, somehow, accurate debugging information in the
presence of that.  I'm all for research, and the thinking that you're
doing is unquestionably valuable.  But, you're pushing hard for a
particular solution and that may be premature at this point.

Debugging information just isn't rich enough to describe the full
complexity of the optimization transformations.  There's no great way to
assign a line number to an instruction that was created by the compiler
when it inserted code on some flow-graph edge.  You can't get exact
information about variable lifetimes because the scope doesn't start at
a particular point in the generated code in the same way that it does in
the source code.

My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
GCC developer with an interest in improving the compiler in the same way
that you're trying to do) is that you stop writing code and start
writing a paper about what you're trying to do.

Ignore the implementation.  Describe the problem in detail.  Narrow its
scope if necessary.  Describe the success criteria in detail.  Ideally,
the success criteria are mechanically checkable properties: i.e., given
a C program as input, and optimized code + debug information as output,
it should be possible to algorithmically prove whether the output is
correct.

For example, how do you define the correctness of debug information for
a variable's location at a given PC?  Perhaps we want to say that giving
the answer "no information available" is always correct, but that saying
"the value is here" when it's not is incorrect; that gives us a
conservative fallback.  How do you define the point in the source
program given a PC?  If the value of "x" changes on line 100, and we're
at an instruction which corresponds line 101, are we guaranteed to see
the changed value?  Or is seeing the previous value OK?  What about some
intermediate value if "x" is being changed byte-by-byte?  What about a
garbage value if the compiler happens to optimize by throwing away the
old value of "x" before assigning a new one?

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26  1:08                             ` Mark Mitchell
@ 2007-12-05 14:22                               ` Diego Novillo
  2007-12-05 22:10                                 ` Joe Buck
  2007-12-15 21:41                                 ` Alexandre Oliva
  0 siblings, 2 replies; 150+ messages in thread
From: Diego Novillo @ 2007-12-05 14:22 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Alexandre Oliva, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 11/25/07 3:43 PM, Mark Mitchell wrote:

> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
> GCC developer with an interest in improving the compiler in the same way
> that you're trying to do) is that you stop writing code and start
> writing a paper about what you're trying to do.
> 
> Ignore the implementation.  Describe the problem in detail.  Narrow its
> scope if necessary.  Describe the success criteria in detail.  Ideally,
> the success criteria are mechanically checkable properties: i.e., given
> a C program as input, and optimized code + debug information as output,
> it should be possible to algorithmically prove whether the output is
> correct.

Yes, please.  I would very much like to see an abstract design document 
on what you are trying to accomplish.  I have been trying to follow this 
thread but I've gotten lost.  It's full of implementation details, 
rhetoric and high-level discussion.

I would like to see exactly what Mark is asking for.  Perhaps a 
presentation in next year's Summit?  I don't think I understand the goal 
of the project.  "Correct debugging info" means little, particularly if 
you say that it's not debuggers that you are thinking about.

It's certainly worrisome that your implementation seems to be intrusive 
to the point of brittleness.  Will every new optimization need to think 
about debug information from scratch and refrain from doing certain 
transformations?

In my simplistic view of this problem, I've always had the idea that -O0 
-g means "full debugging bliss", -O1 -g means "tolerable debugging" 
(symbols shouldn't disappear, for instance, though they do now) and -O2 
-g means "you can probably know what line+function you're executing".

But you seem to be addressing other problems.  And it even seems to me 
that you want debugging information that is capable of deconstructing 
arbitrary transformations done by the optimizers.  But I think I'm just 
lost in this thread, so a high-level design document would be perfect to 
  expose your ideas.

Diego.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-05 14:22                               ` Diego Novillo
@ 2007-12-05 22:10                                 ` Joe Buck
  2007-12-15 21:41                                 ` Alexandre Oliva
  1 sibling, 0 replies; 150+ messages in thread
From: Joe Buck @ 2007-12-05 22:10 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Mark Mitchell, Alexandre Oliva, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Wed, Dec 05, 2007 at 09:05:33AM -0500, Diego Novillo wrote:
> In my simplistic view of this problem, I've always had the idea that -O0 
> -g means "full debugging bliss", -O1 -g means "tolerable debugging" 
> (symbols shouldn't disappear, for instance, though they do now) and -O2 
> -g means "you can probably know what line+function you're executing".

I'd be happy enough if the state of -O1 -g debugging were improved,
perhaps using some of Alexandre's ideas so that it could be "full
debugging bliss" with some optimization as well.  Speeding up the
compile/test/debug/modify cycle would result.  We could then have fast
but fully debuggable code at -O1, and even faster code at -O2 not
constrained by the requirement of, as Diego says, "deconstructing
arbitrary transformations done by the optimizers". 

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-05 14:22                               ` Diego Novillo
  2007-12-05 22:10                                 ` Joe Buck
@ 2007-12-15 21:41                                 ` Alexandre Oliva
  2007-12-16  3:15                                   ` Daniel Berlin
  2007-12-16 21:42                                   ` Mark Mitchell
  1 sibling, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-15 21:41 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Mark Mitchell, Robert Dewar, Ian Lance Taylor, Richard Guenther,
	gcc-patches, gcc

On Dec  5, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 11/25/07 3:43 PM, Mark Mitchell wrote:

>> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
>> GCC developer with an interest in improving the compiler in the same way
>> that you're trying to do) is that you stop writing code and start
>> writing a paper about what you're trying to do.
>> 
>> Ignore the implementation.  Describe the problem in detail.  Narrow its
>> scope if necessary.  Describe the success criteria in detail.  Ideally,
>> the success criteria are mechanically checkable properties: i.e., given
>> a C program as input, and optimized code + debug information as output,
>> it should be possible to algorithmically prove whether the output is
>> correct.

> Yes, please.  I would very much like to see an abstract design
> document on what you are trying to accomplish.

Other than the ones I've already posted, here's one:

http://dwarfstd.org/Dwarf3Std.php

Seriously.  There is a standard for this stuff.  My ultimate goal in
this project is that we comply with it, at least as far as emitting
debug information for location of variables is concerned.

Here are some relevant postings on design strategies, rationales and
goals:

http://gcc.gnu.org/ml/gcc/2007-11/msg00229.html (goals)
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html (initial plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00261.html (detailed plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00317.html (example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00590.html (more example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00176.html (design rationale)
http://gcc.gnu.org/ml/gcc/2007-11/msg00177.html (clarification)

> I would like to see exactly what Mark is asking for.  Perhaps a
> presentation in next year's Summit?

Sure, if there's interest, I could sure plan on doing that.  I could
use sponsors, BTW; I haven't discussed this with my employer, and
writing articles and presenting speeches are not part of this
assignment I was given.  Anyhow, by the time of the next year's
Summit, I hope this is mostly old news.

> I don't think I understand the goal of the project.

Follow the standard, as in (1) emit debug information that is correct
(standard-compliant), as in, if we emit some piece of debug
information, it reflects reality, rather than being a sometimes
distant approximation of some past reality long destroyed by some
optimization pass, and (2) emit debug information that is more
complete, as in, we currently fail to emit a lot of debug information
that we could, because we lose track of the location of variables as
optimization passes fail to maintain the needed information to do so.

> "Correct debugging info" means little, particularly if you say that
> it's not debuggers that you are thinking about.

Thinking of the debuggers is a mistake.  We don't think of specific
compilers when reading a programming language standard.  We don't
think of specific processors when reading an ISA or ABI specification.
Even when we read documentation specific to a processor, we still
don't think of its internal implementation details in order to write a
compiler for it; even the scheduling properties are abstracted out in
the design specification and optimization guidelines.

When someone finds that the compiler deviates from one of these
standards, we just cite chapter and verse of the relevant standard,
and people see there's a bug.

Why should debug information standards be treated any differently?

> It's certainly worrisome that your implementation seems to be
> intrusive to the point of brittleness.

What part of instrusiveness are you concerned about?  The change of
INSN_P such that it covers DEBUG_INSN_P too in the supported range?
Or the few changes that revert to the original INSN_P, in the few
exceptions in which DEBUG_INSN_P is not to be handled as an INSN?

I've heard this "intrusiveness" argument be pointed out so many times,
by so many people that claim to not have been able to keep up with the
thread, and who claim to have not looked at the patches at all, that
I'm more and more convinced it's just fear of the unknown than any
actual rational evaluation of the impact of the changes.

Seriously.  Have a look at the patches and tell me what in them you
regard as intrusive.

We're talking about infrastructure here, needed to fix GCC's
carelessness about maintaining a mapping between source and
implementation concepts that went on for years and years, while
optimizations were added and debug information was degraded.

At some point you have to face reality and see that such information
isn't kept around by magic, it takes some effort, and this effort is
needed at every location where there are changes that might affect
debug information.  And that's pretty much everywhere.  Even if we had
consistent interfaces to make some changes, such as variable renaming,
substitution, etc, this would only cover a small amount of the data a
debug info generator would need: it needs higher-level information
than that, especially in rtl, where transformations, for historical
reasons, are messier than in the tree IL.

So, the approach I've taken is to use the strength of the problem
against itself: take advantage of the fact that optimizers already
know how to perform transformations they need to do in order to keep
things consistent, and represent debug information in a way that, to
them, will look just like any other use, so they will adjust it
likewise.  And then, on top of that, handle the few exceptions, in
which the optimizer needs to do something cleverer, because the
transformation it performs wouldn't work when say there's more than
one use or so.

> Will every new optimization need to think about debug information
> from scratch and refrain from doing certain transformations?

Refraining from doing certain transformations would be wrong.  We
don't want debug information to affect code generation, and we don't
want it to reduce the amount of optimization you can make.  So, you
optimize away, and if you find that you can't keep track of debug
information, you mark stuff as unavailable, or, most likely, the
safety nets in place will do that for you, rather than taking the
current approach, in which we silently corrupt debug information.

Sure, this might require a little bit more thinking in some
optimizations.  But in my experience fixing up the tree and rtl passes
that needed tweaking, the additional thinking needed is a no-brainer
in most cases; in a few, you have to work a bit harder to keep
information around rather than simply noting it as unavailable.  But
it has never required optimizations to be disabled, and it must not do
so.  In fact, in a few cases, I noticed we were missing trivial
optimizations and fixed them.

> In my simplistic view of this problem, I've always had the idea that
> -O0 -g means "full debugging bliss", -O1 -g means "tolerable
> debugging" (symbols shouldn't disappear, for instance, though they do
> now) and -O2 -g means "you can probably know what line+function you're
> executing".

I've never seen this documented as such, and we've never worked toward
these stated goals.  However, I see that, underlying all of this, we
should be concerned about emitting debug information that is correct,
i.e., never emit information that says the location of FOO is BAR
while it's actually at BAZ.

I've seen many people (including myself, in a distant past) claiming
that imprecise information is better than no information.  I've
learned better.  Debugger information consumers are often equipped
with heuristics to fill in common gaps in debug information.

But if the information is there, and wrong, the heuristics that might
very well have worked are disabled in favor of the incorrect
information, and then the whole system (debuggers, monitors, etc,
along with the program) misbehaves.

And then, even when heuristics don't exist and the information is
gone, it's better to tell the user "I don't know how to get you that"
than to hand it something other than it needs (e.g., an incorrect
variable location).

> But you seem to be addressing other problems.  And it even seems to me
> that you want debugging information that is capable of deconstructing
> arbitrary transformations done by the optimizers.

No.  I don't see where this notion came from, but it appears to be
quite widespread.  Omitting certain pieces of debug information is
almost always correct, since most debug info attributes are optional.
But emitting information that doesn't reflect the program is always
incorrect.

So, if you perform an arbitrary transformation that is too hard to
represent in debug information, that's fine, just throw the
information away.  The debug information might become less complete,
and therefore less useful, but it will at least won't induce errors
elsewhere.

The parallel I draw is that emitting an optional piece of debug
information is like applying an optional optimization.  If it's
correct, and it's not too expensive, go for it.  But if it's going to
get you the wrong output, it's broken, so don't do it.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-15 21:41                                 ` Alexandre Oliva
@ 2007-12-16  3:15                                   ` Daniel Berlin
  2007-12-16 13:09                                     ` Alexandre Oliva
  2007-12-16 21:42                                   ` Mark Mitchell
  1 sibling, 1 reply; 150+ messages in thread
From: Daniel Berlin @ 2007-12-16  3:15 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/15/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec  5, 2007, Diego Novillo <dnovillo@google.com> wrote:
>
> > On 11/25/07 3:43 PM, Mark Mitchell wrote:
>
> >> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
> >> GCC developer with an interest in improving the compiler in the same way
> >> that you're trying to do) is that you stop writing code and start
> >> writing a paper about what you're trying to do.
> >>
> >> Ignore the implementation.  Describe the problem in detail.  Narrow its
> >> scope if necessary.  Describe the success criteria in detail.  Ideally,
> >> the success criteria are mechanically checkable properties: i.e., given
> >> a C program as input, and optimized code + debug information as output,
> >> it should be possible to algorithmically prove whether the output is
> >> correct.
>
> > Yes, please.  I would very much like to see an abstract design
> > document on what you are trying to accomplish.
>
> Other than the ones I've already posted, here's one:
>
> http://dwarfstd.org/Dwarf3Std.php
>
> Seriously.  There is a standard for this stuff.  My ultimate goal in
> this project is that we comply with it
Comply with it how?

There is no portion of the DWARF3 spec which requires you output
information that is correct or useful. The same way the C standard
does not require you to write correct programs, only valid ones, the
DWARF3 spec does not require you to output correct information, only
information that is encoded properly.

It is certainly a goal of DWARF3 to allow producers to provide correct
info (as witness by the one of the listed goals: "Debugging
information must provide consumers a way to find the location of
program variables,  determine the bounds of dynamic arrays and
strings, and possibly to find the base address of a  subroutine's
stack frame or the return address of a subroutine. Furthermore, to
meet the needs of recent computer architectures and optimization
techniques, debugging information must be  able to describe the
location of an object whose location changes over the object's
lifetime.")

If you search the entire spec for the word "correct", you will find it
3 times.  If you search for "must", you will discover they all related
to encoding or the goals of the standard.

It may be entirely useless to output incorrect information, and in
fact, worse than useless.
It is however, compliant, as long as they are encoded properly.

I have to say, this is typical of the argumentation you have used thus
far in this thread, and honestly, it's not winning you any points.

That said, nobody here believes we should output useless or incorrect
info, even though we could.  A lot of people appear to disagree with
you about the best way to do it, and in fact, about what we should be
trying to provide users in what cases.

>
>What part of instrusiveness are you concerned about?  The change of
>INSN_P such that it covers DEBUG_INSN_P too in the supported range?
>Or the few changes that revert to the original INSN_P, in the few
>exceptions in which DEBUG_INSN_P is not to be handled as an INSN?

>I've heard this "intrusiveness" argument be pointed out so many times,
>by so many people that claim to not have been able to keep up with the
>thread, and who claim to have not looked at the patches at all, that
>I'm more and more convinced it's just fear of the unknown than any
>actual rational evaluation of the impact of the changes.

Well, no.
You yourself have shown it to be intrusiveness in the extreme, in the
very next paragraphs!

"
At some point you have to face reality and see that such information
isn't kept around by magic, it takes some effort, and this effort is
needed at every location where there are changes that might affect
debug information.  And that's pretty much everywhere. "

So, everywhere needs to change. That's pretty intrusiveness, no?

"Sure, this might require a little bit more thinking in some
optimizations.  But in my experience fixing up the tree and rtl passes
that needed tweaking, the additional thinking needed is a no-brainer
in most cases; in a few, you have to work a bit harder to keep
information around rather than simply noting it as unavailable. "

Having to stop and think at every point in an optimization about the
debug info, having to deal with debug info at every single point of
change, and then your other patches
This is intrusiveness as well (having to stop and think about debug
info at every single point of every single optimization).

You don't need to be this intrusiveness to stop outputting the
incorrect info we do.

>I've never seen this documented as such, and we've never worked toward
> these stated goals.

Who is we?
I certainly have worked exactly towards these goals.
As have almost all the authors of the current debugging info
framework.  The reason it is the way it is  because these in fact,
*were exactly the goals we were working towards*.
As for not documented, a lot of gcc is not documented.
If you look in the mailing list archives, you will even discover Diego
is not the first one have exactly the viewpoint about what should and
should not be debuggable, and that the community has consistenly
worked towards exactly the viewpoint diego describes.

Anyway, I give up on reading this thread.  It has turned into a mess.
You really need to step back and see that you have not achieved any
sort of consensus of what levels of optimization should be how
debuggable, before you start telling everyone their approach isn't as
good as yours.

I certainly wouldn't agree that we should take such intrusive steps to
make -O2 -g as debuggable as you want,  I'd much rather see us do what
we can easily, and drop any info that ends up being incorrect.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-16  3:15                                   ` Daniel Berlin
@ 2007-12-16 13:09                                     ` Alexandre Oliva
  2007-12-17  1:27                                       ` Daniel Berlin
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-16 13:09 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 16, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> There is no portion of the DWARF3 spec which requires you output
> information that is correct or useful. The same way the C standard
> does not require you to write correct programs, only valid ones, the
> DWARF3 spec does not require you to output correct information, only
> information that is encoded properly.

But if a C compiler translated programs to garbage, that would be
wrong.  By the same reasoning, if a Dwarf producer created garbage,
that would be wrong.

It's true that most of Dwarf 3 attributes are optional.  But when it
says "if you output this attribute, its operand must be such and
such", if you output the attribute with operands that don't match the
specification, that's a bug.

> It is certainly a goal of DWARF3 to allow producers to provide correct
> info

Exactly.  And where's the permission to provide incorrect info, rather
than merely leaving it out?

>> I've heard this "intrusiveness" argument be pointed out so many times,
>> by so many people that claim to not have been able to keep up with the
>> thread, and who claim to have not looked at the patches at all, that
>> I'm more and more convinced it's just fear of the unknown than any
>> actual rational evaluation of the impact of the changes.

> Well, no.
> You yourself have shown it to be intrusiveness in the extreme, in the
> very next paragraphs!

> "
> At some point you have to face reality and see that such information
> isn't kept around by magic, it takes some effort, and this effort is
> needed at every location where there are changes that might affect
> debug information.  And that's pretty much everywhere. "

> So, everywhere needs to change. That's pretty intrusiveness, no?

No.  Looks like selective attention, because you're reasoning out the
part in which I discussed using the strength of the optimizers against
the problem, by letting them do what they are already used to on the
debug information too.

If we add a new RTL code or a new TREE code, is that intrusive because
now every optimization pass will deal with the new node types in very
much the same way they've dealt with other similar node types forever?
Of course not.

And if we have to add a few exceptions here and there to deal with the
specifics of this new node type, does that become too intrusive then?
I don't think so.

Then what's the fuss about the new node types?  Do you want to count
the number of places in which INSN_P remains there, lexically
unchanged, and compare with the number of places in which I've added a
!DEBUG_INSN_P after it?

> Having to stop and think at every point in an optimization about the
> debug info,

Well, sorry, writing compilers is hard.  You have to think about
several things at the same time.  Shall we just go shopping instead?

I'm trying to make it as simple as possible.  The fact that nearly
100% of the code is unchanged seems to indicate to me that it's not
such a bad an approach, but if you want something that just magically
works, you're up for much disappointment.

> (having to stop and think about debug info at every single point of
> every single optimization).

Information doesn't come out of thin air, and thin air doesn't
maintain information accurate just because we wish it does.  We have
to work to create and update the information throughout compilation,
at every transformation, and my reasoning is precisely that optimizers
already do this all the time, so why not use them for what we need?

> You don't need to be this intrusiveness to stop outputting the
> incorrect info we do.

What do you have to back your statement up?

Let me help you: sure we don't.  We can just refrain from outputting
any debug information whatsoever.  Then, it will be compliant with the
standard.  But it won't be useful.

>> I've never seen this documented as such, and we've never worked toward
>> these stated goals.

> Who is we?
> I certainly have worked exactly towards these goals.
> As have almost all the authors of the current debugging info
> framework.

Oh, wow, I guess I just wasn't welcome into the club, because I didn't
get the guidelines book.  How unfortunate, now I have to give up my
plan of doing better and abide by the unpublished and undocumented
goals of some small cabal.  Or do I?

> If you look in the mailing list archives, you will even discover Diego
> is not the first one have exactly the viewpoint about what should and
> should not be debuggable, and that the community has consistenly
> worked towards exactly the viewpoint diego describes.

I've seen several different viewpoints from "the community".

> Anyway, I give up on reading this thread.  It has turned into a mess.
> You really need to step back

Oh, do I?  Why is that?

> and see that you have not achieved any sort of consensus of what
> levels of optimization should be how debuggable,

Why would I expect to get any consensus on that?  I haven't even
tried, and I won't.  This is not what the issue is about.  The issue
is about not emitting incorrect information.  Better debuggability for
all levels of optimization will be a side effect of achieving that,
and it will be achievable incrementally once we have an actual
framework that enables us to take steps in this direction without
introducing further regressions.

> I certainly wouldn't agree that we should take such intrusive steps to
> make -O2 -g as debuggable as you want,

It is obvious that you misunderstood what I want, and how intrusive
the approach is.

> I'd much rather see us do what we can easily, and drop any info that
> ends up being incorrect.

So what's your plan to find out what's incorrect?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-16 13:09                                     ` Alexandre Oliva
@ 2007-12-17  1:27                                       ` Daniel Berlin
  2007-12-17  4:20                                         ` Joe Buck
  2007-12-17 17:59                                         ` Alexandre Oliva
  0 siblings, 2 replies; 150+ messages in thread
From: Daniel Berlin @ 2007-12-17  1:27 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

> It is obvious that you misunderstood what I want, and how intrusive
> the approach is.
>

Yes Alexandre, everyone who disagrees with you must not understand!
That's really the problem here.
None of us understand but you.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  1:27                                       ` Daniel Berlin
@ 2007-12-17  4:20                                         ` Joe Buck
  2007-12-17  8:13                                           ` Geert Bosch
  2007-12-17 18:36                                           ` Alexandre Oliva
  2007-12-17 17:59                                         ` Alexandre Oliva
  1 sibling, 2 replies; 150+ messages in thread
From: Joe Buck @ 2007-12-17  4:20 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Sun, Dec 16, 2007 at 08:12:07PM -0500, Daniel Berlin wrote:
> > It is obvious that you misunderstood what I want, and how intrusive
> > the approach is.
> >
> 
> Yes Alexandre, everyone who disagrees with you must not understand!
> That's really the problem here.
> None of us understand but you.

I have some sympathy for going in Alexandre's direction, in that it
would be nice to have a mode that provided optimization as well as
accurate debugging.  However, since preserving accurate debug information
has a cost, I think it would be better to turn -O1, not -O2, into the
mode that Alexandre wants, where debug information is preserved.  Trying
to rework all optimizations to keep perfect debug information is going
to take forever and make the compiler worse.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  4:20                                         ` Joe Buck
@ 2007-12-17  8:13                                           ` Geert Bosch
  2007-12-18  1:24                                             ` Alexandre Oliva
  2007-12-17 18:36                                           ` Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: Geert Bosch @ 2007-12-17  8:13 UTC (permalink / raw)
  To: Joe Buck
  Cc: Daniel Berlin, Alexandre Oliva, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 16, 2007, at 20:27, Joe Buck wrote:
> I have some sympathy for going in Alexandre's direction, in that it
> would be nice to have a mode that provided optimization as well as
> accurate debugging.  However, since preserving accurate debug  
> information
> has a cost, I think it would be better to turn -O1, not -O2, into the
> mode that Alexandre wants, where debug information is preserved.   
> Trying
> to rework all optimizations to keep perfect debug information is going
> to take forever and make the compiler worse.

Right, at the moment -O1 is far too much like -O2.
There is room for an optimization mode that is mostly local,
scales well far large programs and allows for high-quality debug
information. Fortunately, these goals seem all to match.

We could conceptually have inspection points between each source
statement and declaration, which would roughly correspond to a
use of all memory and all source variables, wether in memory or
in registers.
These inspections points would be considered potentially trapping.

This approach would still allow some scheduling. For example, loads
and arithmetic operations that are known not to trap could still
be done early. On the other hand, when breaking at any statement,
all variables can be printed.

Also, since no user-visible state can be modified by speculatively
executed instructions such as loads, such instructions should not
be tagged with their original source location information.
This would prevent the very annoying and unhelpful jumping around
the program during debugging.

The method I describe here, which roughly corresponds to the semantics
of Ada's "pragma Inspection_Point", seems relatively easy to implement
using an empty "asm" or similar.

   -Geert

PS. For convenience, I'm including a snippet of the Ada 2005 standard,
     the full version of which is freely available on the web.

H.3.2 Pragma Inspection_Point

1     An occurrence of a pragma Inspection_Point identifies a set of  
objects
each of whose values is to be available at the point(s) during program
execution corresponding to the position of the pragma in the  
compilation unit.
The purpose of such a pragma is to facilitate code validation.

                                    Syntax

2     The form of a pragma Inspection_Point is as follows:

3       pragma Inspection_Point[(object_name {, object_name})];

                                Legality Rules

4     A pragma Inspection_Point is allowed wherever a declarative_item  
or
statement is allowed. Each object_name shall statically denote the  
declaration
of an object.

                               Static Semantics

5/2   An inspection point is a point in the object code corresponding  
to the
occurrence of a pragma Inspection_Point in the compilation unit. An  
object is
inspectable at an inspection point if the corresponding pragma
Inspection_Point either has an argument denoting that object, or has no
arguments and the declaration of the object is visible at the inspection
point.

                               Dynamic Semantics

6     Execution of a pragma Inspection_Point has no effect.

                          Implementation Requirements

7     Reaching an inspection point is an external interaction with  
respect to
the values of the inspectable objects at that point (see 1.1.3).

                          Documentation Requirements

8     For each inspection point, the implementation shall identify a  
mapping
between each inspectable object and the machine resources (such as  
memory
locations or registers) from which the object's value can be obtained.

       NOTES

9/2   7  The implementation is not allowed to perform "dead store
       elimination" on the last assignment to a variable prior to a  
point where the
       variable is inspectable. Thus an inspection point has the  
effect of an
       implicit read of each of its inspectable objects.

10    8  Inspection points are useful in maintaining a correspondence  
between
       the state of the program in source code terms, and the machine  
state
       during the program's execution. Assertions about the values of  
program
       objects can be tested in machine terms at inspection points.  
Object code
       between inspection points can be processed by automated tools  
to verify
       programs mechanically.

11    9  The identification of the mapping from source program objects  
to
       machine resources is allowed to be in the form of an annotated  
object
       listing, in human-readable or tool-processable form.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  8:13                                           ` Geert Bosch
@ 2007-12-18  1:24                                             ` Alexandre Oliva
  2007-12-18  1:29                                               ` Joe Buck
  2007-12-18  7:35                                               ` Robert Dewar
  0 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18  1:24 UTC (permalink / raw)
  To: Geert Bosch
  Cc: Joe Buck, Daniel Berlin, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 17, 2007, Geert Bosch <bosch@adacore.com> wrote:

> We could conceptually have inspection points between each source
> statement and declaration, which would roughly correspond to a
> use of all memory and all source variables, wether in memory or
> in registers.
> These inspections points would be considered potentially trapping.

Yes, I've considered something along these lines, but decided against
it, for we can't afford for debug information to affect executable
code generation in any way whatsoever, and we don't want to pessimize
optimized code when compiling without -g just so that compiling with
-g would get us the same code.

> Also, since no user-visible state can be modified by speculatively
> executed instructions such as loads, such instructions should not
> be tagged with their original source location information.

Line number information has a well-defined meaning: it ought to
represent the source code line that best represents the source-code
construct that ended up implemented using that instruction.

To address what we have in mind, there's an additional annotation on
top of line number information: the is_stmt flag.  This is what we
should use to tell debuggers what the best instruction is to set a
breakpoint at a certain line number or so, and for debuggers to be
able to step line by line more seamlessly.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:24                                             ` Alexandre Oliva
@ 2007-12-18  1:29                                               ` Joe Buck
  2007-12-18  4:40                                                 ` Alexandre Oliva
  2007-12-18  7:35                                               ` Robert Dewar
  1 sibling, 1 reply; 150+ messages in thread
From: Joe Buck @ 2007-12-18  1:29 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Geert Bosch, Daniel Berlin, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Mon, Dec 17, 2007 at 11:11:46PM -0200, Alexandre Oliva wrote:
> Line number information has a well-defined meaning: it ought to
> represent the source code line that best represents the source-code
> construct that ended up implemented using that instruction.

You implicitly assume that souch a source code line exists.
Consider something like

int func(bool cond, int a, int b, int c)
{
  int out;
  if (cond)
    out = a + b;
  else
    out = a + b + c;
  return out;
}

The optimizer might produce something that structurally resembles

  out = a + b;
  if (!cond)
    out += c;
  return out;

If you set a breakpoint on the addition of a and b, it will trigger
regardless of the value of cond.  Furthermore, there isn't a place
to put a breakpoint that will trigger only for the case where cond
is true, as you can on unoptimized code.  So you need to choose
between natural debugging and optimization.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:29                                               ` Joe Buck
@ 2007-12-18  4:40                                                 ` Alexandre Oliva
  2007-12-18  7:42                                                   ` Robert Dewar
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18  4:40 UTC (permalink / raw)
  To: Joe Buck
  Cc: Geert Bosch, Daniel Berlin, Diego Novillo, Mark Mitchell,
	Robert Dewar, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 17, 2007, Joe Buck <Joe.Buck@synopsys.COM> wrote:

> On Mon, Dec 17, 2007 at 11:11:46PM -0200, Alexandre Oliva wrote:
>> Line number information has a well-defined meaning: it ought to
>> represent the source code line that best represents the source-code
>> construct that ended up implemented using that instruction.

> You implicitly assume that souch a source code line exists.

Actually, no.  I'm not sure where you got that impression, and how you
came to the conclusion that I'd assign line numbers the way you have.
To me, when you hoist something that is present in both blocks of a
conditional, it probably makes more sense to give it the line number
of the conditional, rather than that of either block.  But I won't
pretend to have thought very hard about this particular issue.  For
the time being, I'm focusing my efforts on local variable locations.

Anyhow, very clearly you don't want to mark such hoisted-out
computation as is_stmt.  This should eliminate at least the solvable
problem you're worried about.

>   out = a + b;
>   if (!cond)
>     out += c;
>   return out;

> Furthermore, there isn't a place to put a breakpoint that will
> trigger only for the case where cond is true, as you can on
> unoptimized code.

Yep.  Sometimes code just is optimized away.  Can't stop that without
harming optimizations.

If dwarf line number programs were smarter, we could perhaps encode
multiple lines for the same instruction, along with conditions to tell
when the instruction applies to such or such lines, and even more
fancy stuff like that.  But line number programs don't let us express
this in Dwarf3.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  4:40                                                 ` Alexandre Oliva
@ 2007-12-18  7:42                                                   ` Robert Dewar
  2007-12-18  8:09                                                     ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Robert Dewar @ 2007-12-18  7:42 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

Alexandre Oliva wrote:

> Yep.  Sometimes code just is optimized away.  Can't stop that without
> harming optimizations.

OK, so you are agreeing that good debuggability is impossible
with all the optimizations in place, so once again, let's have
an optimziation level that optimizes as far as possible without
harming debuggability.
> 
> If dwarf line number programs were smarter, we could perhaps encode
> multiple lines for the same instruction, along with conditions to tell
> when the instruction applies to such or such lines, and even more
> fancy stuff like that.  But line number programs don't let us express
> this in Dwarf3.

So, that's not an option.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  7:42                                                   ` Robert Dewar
@ 2007-12-18  8:09                                                     ` Alexandre Oliva
  2007-12-18 14:01                                                       ` Robert Dewar
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18  8:09 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:
>> Yep.  Sometimes code just is optimized away.  Can't stop that without
>> harming optimizations.

> OK, so you are agreeing that good debuggability is impossible
> with all the optimizations in place, so once again, let's have
> an optimziation level that optimizes as far as possible without
> harming debuggability.

I don't oppose such an optimization level, even though I don't know
that we agree on what "good debuggability" stands for.

It's just that changing optimizations is precisely *against* the goals
of my current project.  So, don't expect significant efforts to this
end from me at this time.

>> If dwarf line number programs were smarter, we could perhaps encode
>> multiple lines for the same instruction, along with conditions to tell
>> when the instruction applies to such or such lines, and even more
>> fancy stuff like that.  But line number programs don't let us express
>> this in Dwarf3.

> So, that's not an option.

Yup.  Best we can do right now is to emit the condition line number.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  8:09                                                     ` Alexandre Oliva
@ 2007-12-18 14:01                                                       ` Robert Dewar
  2007-12-18 21:20                                                         ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Robert Dewar @ 2007-12-18 14:01 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

Alexandre Oliva wrote:
> On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

>> OK, so you are agreeing that good debuggability is impossible
>> with all the optimizations in place, so once again, let's have
>> an optimziation level that optimizes as far as possible without
>> harming debuggability.
> 
> I don't oppose such an optimization level, even though I don't know
> that we agree on what "good debuggability" stands for.

My definition is that it should be indistinguishable from -O0
except that I could live without being able to modify variables.
> 
> It's just that changing optimizations is precisely *against* the goals
> of my current project.  So, don't expect significant efforts to this
> end from me at this time.

But you can't achieve the above criterion with your approach.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 14:01                                                       ` Robert Dewar
@ 2007-12-18 21:20                                                         ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18 21:20 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Joe Buck, Geert Bosch, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:
>> On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

>>> OK, so you are agreeing that good debuggability is impossible
>>> with all the optimizations in place, so once again, let's have
>>> an optimziation level that optimizes as far as possible without
>>> harming debuggability.

>> It's just that changing optimizations is precisely *against* the goals
>> of my current project.  So, don't expect significant efforts to this
>> end from me at this time.

> But you can't achieve the above criterion with your approach.

Actually, you can.  My approach is about ensuring the mapping between
the location of source and implementation variables is correct.  This
is orthogonal to how much optimization you make.

If you optimize more, more values or locations may become unavailable,
but this is not about correctness (what fraction of the annotations
point at locations that hold the correct value), and it's not even
about completeness (what fraction of the source variables are
represented at all locations they are available), it's just about
theoretical completeness (what fraction of the source variables are
represented at all locations they would be available without
optimization).

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:24                                             ` Alexandre Oliva
  2007-12-18  1:29                                               ` Joe Buck
@ 2007-12-18  7:35                                               ` Robert Dewar
  2007-12-18  8:34                                                 ` Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: Robert Dewar @ 2007-12-18  7:35 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Geert Bosch, Joe Buck, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

Alexandre Oliva wrote:

> Yes, I've considered something along these lines, but decided against
> it, for we can't afford for debug information to affect executable
> code generation in any way whatsoever, and we don't want to pessimize
> optimized code when compiling without -g just so that compiling with
> -g would get us the same code.

I disagree, I think it would be fine to degrade -O1 slightly to achieve
full debuggability, and of course -g cannot affect the generated code.
If indeed

a) it is possible to get perfect debuggability without any pessimization
b) that includes unexpected jumping around
c) everyone agrees on how to achieve a) and b)
d) this is implemented

then fine, but in the absence of these conditions, if we need to
pessimize -O1 code slightly to achieve this, that's OK by me. If
it really worries people, introduce a -Og that achieves this. In
my experience people use -O1 not because they are very performance
sensitive (those folk use -O2), but because -O0 is so horrible,
that they need something better than that for production delivery.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  7:35                                               ` Robert Dewar
@ 2007-12-18  8:34                                                 ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18  8:34 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Geert Bosch, Joe Buck, Daniel Berlin, Diego Novillo,
	Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches,
	gcc

On Dec 18, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:
>> Yes, I've considered something along these lines, but decided against
>> it, for we can't afford for debug information to affect executable
>> code generation in any way whatsoever, and we don't want to pessimize
>> optimized code when compiling without -g just so that compiling with
>> -g would get us the same code.

> I disagree, I think it would be fine to degrade -O1 slightly to achieve
> full debuggability,

Sure.  But this is just not relevant to my project of getting GCC to
emit correct (and, ideally, as complete as possible) variable location
information, no matter what the optimization level.

My goal is not so much about aiming at a perfect debugging experience,
but rather at making sure that what the compiler encodes in debug
information actually reflects the code it produced.

This will surely benefit a future full debuggability project, of
course.  But, as much as I see value in perfect debuggability at some
new optimization level, my current task is to get correct and more
complete variable location information at vanilla-build optimization
levels, i.e., at -O2 -g.

It is possible to do much better than what we do now, and it appears
to me that it's even possible to do much better than my current plan.
But I need to get this task wrapped up before I can spend further time
figuring out how to make it even better.

In either case, it probably won't be like -O0, for optimizations are
performed that make it impossible, and I'm not supposed to sacrifice
them for the sake of better debug information.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  4:20                                         ` Joe Buck
  2007-12-17  8:13                                           ` Geert Bosch
@ 2007-12-17 18:36                                           ` Alexandre Oliva
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-17 18:36 UTC (permalink / raw)
  To: Joe Buck
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Dec 16, 2007, Joe Buck <Joe.Buck@synopsys.COM> wrote:

> However, since preserving accurate debug information
> has a cost, I think it would be better to turn -O1, not -O2, into the
> mode that Alexandre wants, where debug information is preserved.

In terms of memory, that's true, it does have a cost, for we have to
keep more information around.  That's one of the reasons why I'm
implementing this all under the control of a command-line option: you
can selectively enable or disable it, regardless of the level of
optimization.  If we want to make it default for -O1, but not for -O2,
sure, that works.

But this won't make much of a difference in terms of code change.
Except for the fact that we could simply leave alone the passes that
are only executed at -O2 or higher (which is not worth it, given that
I've already done the small work needed for them to keep debug info
accurate), most of the passes will still keep the information
accurate, nearly all of them without any code changes whatsoever.

So, doing this only for -O1 seems like a waste, given that -O2 is the
most common optimization level, and it's most often accompanied by -g.

> Trying to rework all optimizations to keep perfect debug information
> is going to take forever and make the compiler worse.

This statement is easy to make and to believe, but my approach is
proving it false, given a design that took this concern into account.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17  1:27                                       ` Daniel Berlin
  2007-12-17  4:20                                         ` Joe Buck
@ 2007-12-17 17:59                                         ` Alexandre Oliva
  2007-12-17 18:02                                           ` Diego Novillo
  1 sibling, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-17 17:59 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 16, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

>> It is obvious that you misunderstood what I want, and how intrusive
>> the approach is.

> Yes Alexandre, everyone who disagrees with you must not understand!

My conclusion is not based on disagreement, but rather on the faulty
arguments presented during the discussion.

For example, when you took the argument that every transformation had
effects on debug information, and used that to conclude that every
transformation would need difficult changes to generate correct debug
information, you left out from your reasoning a major strength of the
design, that I had mentioned in the e-mail you responded to: that the
optimizers already perform the transformations we need to keep debug
information accurate.

So, by missing or misunderstanding an essential part of the thought
process that went into the design, you came to a false conclusion
about it.

> That's really the problem here.
> None of us understand but you.

I guess I'm to blame, for having naïvely put the code out without as
much as a design and goals document, such that people started looking
at it without actually understanding what it was about, and at the
same time taking conclusions about it based on hunches rather than on
solid logical grounds.

At this point, we have a scenario in which people have already jumped
to their conclusions, and whatever I say requires a much higher
threshold to be listened to and accepted.  It's quite unfortunate that
psychological factors take such a large role in the making of
technical decisions, and I naïvely assumed this wouldn't raise so much
rejection, for being such a simple and well thought-out design.  Oh,
well...  Something to avoid next time...

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 17:59                                         ` Alexandre Oliva
@ 2007-12-17 18:02                                           ` Diego Novillo
  2007-12-17 20:34                                             ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Diego Novillo @ 2007-12-17 18:02 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/17/07 12:51, Alexandre Oliva wrote:

> I guess I'm to blame, for having naÃƒÂ¯vely put the code out without as
> much as a design and goals document

Yes, you are.

You need to provide such a document now.  I can't see how you'll be able 
to incorporate your implementation without a convincing design.

The barrier is probably going to be higher.  You raised too much 
controversy, so I have my doubts about your simplicity claims.


Diego.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 18:02                                           ` Diego Novillo
@ 2007-12-17 20:34                                             ` Alexandre Oliva
  2007-12-17 20:45                                               ` Diego Novillo
  2007-12-31 15:40                                               ` Richard Guenther
  0 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-17 20:34 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/17/07 12:51, Alexandre Oliva wrote:
>> I guess I'm to blame, for having naÃ¯vely put the code out without as
>> much as a design and goals document

> Yes, you are.

Wow, thanks.  At least we agree on something! ;-)

> You need to provide such a document now.

Can't I instead provide it when it's ready?

You know, it wasn't me who asked to have the thing developed in the
open.  I didn't push it out just so that people who didn't want to
understand it could beat on it before it was ready to defend itself.
I put it out because there was an offer for contribution.

> I can't see how you'll be able to incorporate your implementation
> without a convincing design.

Agreed, I don't see how this would be doable for any but the most
trivial patches.

> The barrier is probably going to be higher.
> You raised too much controversy, so I have my doubts about your
> simplicity claims.

Oh, nice!  *I* raised too much controversy.  So people first ask me to
put the code out such that they can peek at it and help, then most
refrain from peeking at it because it's not ready and some who do
raise some concerns that are not reflected by the code, and then
everyone doubts I've taken those concerns into account and demand a
design document that will no more than just repeat the information
that's already out there but that people fail to take into account.

And then, this is a technical discussion, so historical controversy
shouldn't play any role in it, if people were rational about it.

Now, can you please explain to me how the efforts of repeating myself
one more time, rather than completing the implementation, are going to
make it any more likely that people who have already made up their
minds based on groundless fears will be convinced?

If you really think it would be worth it, can you point out at what
you feel to be missing in the consolidated documentation I posted
upthread, in response to your request?  I'd be happy to fill in the
blanks, if you're willing to listen.  But I wouldn't be happy to waste
more time.

(This is not to say that the document won't ever be produced; it's to
say that I'm to work on it right now.  I have other deliverables ahead
of it.)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 20:34                                             ` Alexandre Oliva
@ 2007-12-17 20:45                                               ` Diego Novillo
  2007-12-18  1:02                                                 ` Alexandre Oliva
  2007-12-31 15:40                                               ` Richard Guenther
  1 sibling, 1 reply; 150+ messages in thread
From: Diego Novillo @ 2007-12-17 20:45 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/17/07 15:28, Alexandre Oliva wrote:

>> You need to provide such a document now.
> 
> Can't I instead provide it when it's ready?

Of course.


Diego.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 20:45                                               ` Diego Novillo
@ 2007-12-18  1:02                                                 ` Alexandre Oliva
  2007-12-18  1:14                                                   ` Diego Novillo
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18  1:02 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/17/07 15:28, Alexandre Oliva wrote:
>>> You need to provide such a document now.
>> 
>> Can't I instead provide it when it's ready?

> Of course.

Thanks,

Now, since you're so interested in it and you've already read the
various perspectives on the issue that I listed in my yesterday's
e-mail to you, would you help me improve this document, by letting me
know what you believe to be missing from the selected postings on
design strategies, rationales and goals:

http://gcc.gnu.org/ml/gcc/2007-11/msg00229.html (goals)
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html (initial plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00261.html (detailed plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00317.html (example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00590.html (more example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00176.html (design rationale)
http://gcc.gnu.org/ml/gcc/2007-11/msg00177.html (clarification)

I could then focus on these missing aspects too, in addition to the
ones I already have, while designing the best form to present the
ideas.

Thanks in advance,

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:02                                                 ` Alexandre Oliva
@ 2007-12-18  1:14                                                   ` Diego Novillo
  2007-12-18  5:21                                                     ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Diego Novillo @ 2007-12-18  1:14 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/17/07 19:50, Alexandre Oliva wrote:

> Now, since you're so interested in it and you've already read the
> various perspectives on the issue that I listed in my yesterday's
> e-mail to you, would you help me improve this document, by letting me
> know what you believe to be missing from the selected postings on
> design strategies, rationales and goals:

No.  I am not interested in organizing your thoughts for you.

I am interested in reading a single, concise and well organized design 
document that you produce for all of us to understand what you want to do.

Take your time.  It doesn't need to be now.


Diego.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  1:14                                                   ` Diego Novillo
@ 2007-12-18  5:21                                                     ` Alexandre Oliva
  2007-12-18  9:10                                                       ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18  5:21 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/17/07 19:50, Alexandre Oliva wrote:
>> Now, since you're so interested in it and you've already read the
>> various perspectives on the issue that I listed in my yesterday's
>> e-mail to you, would you help me improve this document, by letting me
>> know what you believe to be missing from the selected postings on
>> design strategies, rationales and goals:

> No.  I am not interested in organizing your thoughts for you.

Wow, nice shot!

So tell me, what part of what you've read in the selected bibliography
seemed not organized for you?  Maybe that's what I have to work on
first.

> I am interested in reading a single, concise and well organized design
> document that you produce for all of us to understand what you want to
> do.

You got that already, except now I'm no longer sure you've actually
read it.  Have you?

You got the goals.  You got the way I intend to get there, in two
levels of detail.  You got examples that show why the goals can't be
achieved in other simpler ways.  You got various justifications for
the representation I've chosen.

Would reformatting these and stamping a title on top make it worthy of
your interest?

I really don't see what else you might want, and if the above isn't
enough, then my rephrasing it all into a single document still
wouldn't be enough.  I'd be just wasting my time, and yours.

So, please do tell me, what is it that you're still missing?  Note
that I can't promise to deliver, but I can't possibly give you what
you want unless you help me figure out what it is.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  5:21                                                     ` Alexandre Oliva
@ 2007-12-18  9:10                                                       ` Alexandre Oliva
  2007-12-18 13:20                                                         ` Diego Novillo
                                                                           ` (2 more replies)
  0 siblings, 3 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18  9:10 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

On Dec 18, 2007, Alexandre Oliva <aoliva@redhat.com> wrote:

> On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:
>> On 12/17/07 19:50, Alexandre Oliva wrote:
>>> Now, since you're so interested in it and you've already read the
>>> various perspectives on the issue that I listed in my yesterday's
>>> e-mail to you, would you help me improve this document, by letting me
>>> know what you believe to be missing from the selected postings on
>>> design strategies, rationales and goals:

>> No.  I am not interested in organizing your thoughts for you.

> Wow, nice shot!

Rats, this below-the-waistline attack really got me annoyed.

So annoyed that I spent the night writing up this consolidated design
document.

So, what do you say now?

[-- Attachment #2: debug-var-loc.txt --]
[-- Type: text/plain, Size: 22558 bytes --]

	A plan to fix local variable debug information in GCC

		by Alexandre Oliva <aoliva@redhat.com>

			   2007-12-18 draft

== Introduction

The DWARF Debugging Information Format, version 3, determines the ways
a compiler can communicate the location of user variables at run time
to debug information consumers such as debuggers, program analysis
tools, run-time monitors, etc.

One possibility is that the location of a variable is fixed throughout
the execution of a function.  This is generally good enough for
unoptimized programs.

However, for optimized programs, the location of a variable can vary.
The variable may be live for some parts of a function, even in
multiple locations simultaneously.  At other parts, it may be
completely unavailable, or it may still be computable even if no
location actually holds its value.  The encoding, in these cases, can
be a location list: tuples with possibly-overlapping ranges of
instructions, and location expressions that determine a location or a
value for the variable.

Historically, GCC started with the simpler, fixed-location model.  In
fact, back then, there weren't debug information formats that could
represent anything better than this.

More recently, GCC gained code to keep track of varying locations, and
to emit debug information accordingly.  Unfortunately, very many
optimization passes discard information that would be necessary to
emit correct and complete variable location lists.

Coalescing, scalarizing, substituting, propagating, and many other
transformations prevent the late-running variable tracker from doing
an accurate job.  By the time it runs, many variables no longer show
up in the retained annotations, although they're still conceptually
available.

The variable tracker can't tell when a user variable overlaps with
another, and it can't tell when a variable is overwritten, if the
assignment is optimized away.  These limitations are inherent to a
model based on inspecting actual code and trying to make inferences
from that.  In order to be able to represent not only what remained in
the code, but also what was optimized, combined or otherwise
apparently-removed, additional information needs to be kept around.

This paper describes an approach to maintain this information.

== Goals

* Ensure that, for every user variable for which we emit debug
information, the information is correct, i.e., if it says the value of
a variable at a certain instruction is at certain locations, or is a
known constant, then the variable must not be at any other location at
that point, and the locations or values must match reasonable
expectations based on source code inspection.

* Defining "reasonable expectations" is tricky, for code reordering
typical of optimization can make room for numerous surprises.  I don't
have a precise definition for this yet, but very clearly to me saying
that a variable holds a value that it couldn't possibly hold (e.g.,
because it is only assigned that value in a code path that is
knowingly not taken) is a very clear indication that something is
amiss.  The general guiding rule is, if we aren't sure the information
is correct (or we're sure it isn't), we shouldn't pretend that it is.

* Try to ensure that, if the value of a variable is a known constant
at a certain point in the program, this information is present in
debug information.

* Try to ensure that, if the value of a variable is available or
computable at any location at a certain point in the program, this
information is present in debug information.

* Stop missing optimizations for the sake of preserving debug
information.

* Avoid using additional memory and CPU cycles that would be needed
only for debug information when compiling without generating debug
information

== Internal Representation

For historical reasons, GCC has two completely different, even if
nearly isomorphic, internal representations: trees and RTL.  This
decision has required a lot of code to be duplicated for low-level
manipulation and simplification of each of these representations.

Since tracking variables and their values must start early to ensure
correctness, and be carried throughout the complete optimization
process, it might seem tempting to introduce yet another
representation for debug information, decaying both isomorphic
representations into a single debug information representation.  The
drawbacks would be additional duplication of internal representation
manipulation code, and the possibility of increasing memory use out of
the need for representing information in yet another format.

Another concern is that even the simplest compiler transformations may
need to be reflected in debug information.  This might indicate a need
for modifying every point of transformation in every optimization pass
so as to propagate information into the debug information
representation.  This is undesirable, because it would be very
intrusive.

But then, keeping references to the correct values, expressions or
variables, as transformations are made, is precisely what optimization
passes have to do to perform their jobs correctly.  Finding a way to
take advantage of this is a very non-intrusive way of keeping debug
information accurate.  In fact, most transformations wouldn't need any
changes whatsoever: uses of variables in debug information can, in
most optimization passes, be handled just like any other uses.

Once this is established, a possible representation becomes almost
obvious: statements (in trees) or instructions (in rtl) that assert,
to the variable tracker, that a user variable or member is represented
by a given expression:

  # DEBUG var expr

By var, we mean a tree expression that denotes a user variable, for
now.  We envision trivially extending it to support components of
variables in the future.

By expr, we mean a tree or rtl expression that computes the value of
the variable at the point in which the statement or instruction
appears in the program.  A special value needs to be specified for
each representation that denotes a location or value that cannot be
determined or represented in debug information, for example, the
location of a variable that was completely optimized away.  It might
be useful to represent the expression as a list of expressions, and to
distinguish lvalues from rvalues, but for now let's keep this simple.

== Generating debug information

Generating initial annotations when entering SSA is early enough in
the translation that the program will still reflect very reliably the
original source code.  Annotations are only generated for user
variables that are GIMPLE registers, i.e., variables that represent
scalar values and that never have their address taken.  Other kinds of
variables don't have varying locations, so we don't need to worry
about them.

After every assignment to such a variable, we emit a DEBUG statement
that will preserve, throughout compilation, the information that, at
that point, the assigned variable was represented by that expression.
So, after turning an assignment such as the following into SSA form,
we emit the debug statement below right after it:

  x_1 = whatever;
  # DEBUG x x_1

Likewise, at control flow merge points, for each PHI node we introduce
in the SSA representation, we emit an annotation:

  # x_4 = PHI <x_1(3), x_2(4), x_3(7)>;
  # DEBUG x x_4

Then, we let tree optimizers do their jobs.  Whenever they rename,
renumber, coalesce, combine or otherwise optimize a variable, they
will automatically update debug statements that mention them as well.

In the rare cases in which the presence of such a statement might
prevent an optimization, we need to adjust the optimizer code such
that the optimization is not prevented.  This most often amounts to
skipping or otherwise ignoring debug statements.  In a few very rare
cases, special code might be needed to adjust debug statements
manually.

After transformation to RTL, the representation needs translation, but
conceptually it's still the same: a mapping from variable to
expression.  Again, optimizers will most often adjust debug
instructions automatically.

The exceptions can be handled at no cost: the test for whether an
element of the instruction stream is an instruction or some kind of
note, that never needs updating, is a range test, in its optimized
form.  By placing the identifier for a debug instruction at one of the
limits of this range, testing for both ranges requires identical code,
except for the constants.

Since most code that tests for INSN_P and handles instructions can and
should match debug instructions as well, in order to keep them up to
date, we extend INSN_P so as to match debug instructions, and modify
the exceptions, that need to skip debug instructions, by using an
alternate test, with the same meaning as the original definition of
INSN_P.  These simple and non-intrusive changes are relatively common,
but still, by far, the exception rather than the rule.

When optimizations are completed, including register allocation and
scheduling, it is time to pick up the debug instructions and emit
debug information out of them.  Conceptually, the debug instructions
represent points of assignment, at which a user variable ought to
evaluate to the annotated expression, maintained throughout
compilation.  However, when the value of a variable is live at more
than one location, it is important to note it, such that, if a
debugging session attempts to modify the variable, all copies are
modified.

The idea is to use some mechanism to determine equivalent expressions
throughout a function (say some variant of Global Value Numbering).
At debug instructions, we assert that the value of the named variable
is in the equivalence class represented by the expression.  As we scan
basic blocks forward and find that expressions in an equivalence class
are modified, we remove them from the equivalence class, and thus from
the list of available locations for the variable.  When such
expressions are further copied, we add them to equivalence classes.
At function calls and volatile asm statements, we remove
non-function-private memory slots from equivalence classes.  At
function calls, we also remove call-clobbered registers from
equivalence classes.  When no live expression remains in the
equivalence class that represents a variable, it is understood that
its value is no longer available.  At basic block confluences, we
combine information from the end states of the incoming blocks and the
debug statements added as a side effect of PHI nodes.

The end result is accurate debug information.  Also, except for
transformations that require special handling to update debug
annotations properly, debug information should come out as complete as
possible.

== Testability

Since debug annotations are added early, and, in most cases,
maintained up-to-date by the same code that optimizers use to maintain
executable code up-to-date, debug annotations are likely to remain
accurate throughout compilation.

The risk of this approach is that the annotations get in the way of
optimizations, thus causing executable code to vary depending on
whether or not debug information is to be generated.  The risk of
varying code could be removed at the expense of generating and
maintaining debug annotations throughout compilation and just throwing
them away at the end.  This is undesirable, for it would slow down
compilation without debug information and waste memory while at that.

Therefore, we've built testing mechanisms into the compiler to detect
cases in which the presence of debug annotations would cause code
changes.

The bootstrap-debug Makefile target, by default, compiles the second
bootstrap stage without debug information, and the third bootstrap
stage with it, and then compares all object files after stripping
them, a process that discards all debug information.

Furthermore, bootstrap4-debug, after bootstrap-debug and
prepare-bootstrap4-debug-lib-g0, rebuilds all target libraries without
debug information, and compares them with the stage3 target libraries,
built with debug information.

At the time of this writing, both tests pass on platforms
x86_64-linux-gnu and i686-linux-gnu, and ppc64-linux-gnu and
ia64-linux-gnu are getting close.

Additional testing mechanisms should be built in, to exercise a wider
range of internal GCC behaviors and extensions, for example, by
comparing the compiler output with and without debug information while
compiling all of its testsuite.

Even if testing mechanisms fail to catch an error, the generation of
debug annotations is controlled by a command-line option, such that
any code changes caused by it can be easily avoided, at the expense of
the quality of the debug information.

Testing for accuracy and completeness of debug information can be best
accomplished using a debugging environment.  For example, writing
programs of increasing complexity, adding functional-call or asm probe
points to stabilize the internal execution state, and then examining
the state of the program at these probe points in a debugger, shall
let us know how accurate and how complete variable location
information is.

Measuring accuracy is easy: if you ask for the value of a variable,
and get a value other than the expected, there's a bug in the
compiler.  If you get "unavailable", this can still be regarded as
accurate, for locations are always optional.  However, it might be
incomplete.  Telling whether the variable was indeed optimized away,
or whether the value is available or computable but the information is
missing, is a harder problem, but it's not part of the accuracy test,
but rather of the completeness test.

The completeness score for an unoptimized program might very often be
unachievable for optimized programs, not because the compiler is doing
a poor job at maintaining debug information, but rather because the
compiler is doing a good job at optimizing it, to the point that it is
no longer possible to determine the value of the inspected variable.

== Concerns

=== Memory consumption

Keeping more information around requires more memory; information
theory tells us that there's only so much information you can fit in a
bit.

In order to generate correct debug information, more information needs
to be retained throughout compilation.  The only way to arrange for
debug information to not require any additional memory is to waste
memory when not generating debug information.  But this is
undesirable.

Therefore, the better debug information we want, the more memory
overhead we're going to have to tolerate.

Of course at times we can trade memory for efficiency, using more
computationally expensive representations that are more compact.

At other times, we may trade memory for maintainability.  For example,
instead of emitting annotations as soon as we enter SSA mode, we could
emit them on demand, i.e., whenever we deleted, moved or significantly
modified an SSA assignment for which we would have emitted a debug
annotation.  Additional memory would be needed to mark assignments
that should have gained annotations but haven't, and care must be
taken to make sure that transformations aren't made without leaving a
correct debug statement in place.  It is not clear that this would
save significant memory, for a large fraction of relevant assignments
are modified or moved anyway, so it might very well be a
maintainability loss and a performance penalty for no measurable
memory gains.

Worst case, we may trade memory for debug information quality: if
memory use of this scheme is too high for some scenario, one can
disable debug information annotations through a command line option,
or disable debug information altogether.

=== Intrusiveness

Given that nearly all compiler transformations would require
reflection in debug information, any solution that doesn't take
advantage of this fact is bound to require changes all over the place.

Perhaps not so much for Tree-SSA passes, that are relatively
well-behaved and use a narrow API to make transformations, but very
clearly so for RTL passes, that very often modify instructions in
place, and at times even reuse locations assigned to user variables as
temporaries.

Even when we do use the strength of optimizers to maintain debug
information up to date, there are exceptions in which detailed
knowledge about the transformation taking place enables us to adjust
the annotations properly, if possible, or to discard location
information for the variable otherwise.

It is just not possible to hope that information can be maintained
accurate throughout compilation without any effort from optimizers, or
even through a trivial API for a debug information generator.  A
number of the exceptions that require detailed knowledge about the
ongoing transformation would be indistinguishable from other common
transformations that would have very different effects on debug
information.  At this point, any expectations of lower intrusiveness
by use of such an API vanish.

By letting optimizers do their jobs on debug annotations, and handling
exceptions only at the few locations where they are needed, trivially
in most such cases, we keep intrusiveness at a minimum.

Of course we could get even lower intrusiveness by accepting errors in
debug information, or accepting to generate different code depending
on debug information command-line options.  But these options
shouldn't be considered seriously.

=== Complexity

The annotations are conceptually trivial and they can be immediately
handled by optimizers.  It is hard to imagine a simpler design that
would still enable us to get right cases such as those in the examples
below.

Worrying about the representation of debug annotations as statements
or instructions, rather than notes, is missing the fact that, most of
the time, we do want them to be updated just like statements and
instructions.

Worrying about the representation of debug annotations in-line, rather
than an on-the-side representation, is a valid concern, but it's
addressed by the testability of the design, and the in-line
representation is highly advantageous, not only for using optimizers
to keep debug information accurate, but also for doing away with the
need for yet another internal representation and all the efforts into
maintaining it accurate.

=== Optimizations

Correct and more complete debugging information isn't supposed to
disable optimizations.  Keep in mind that enabling debug information
isn't supposed to modify the executable code in any way whatsoever.

The goal is to ensure that whatever debug information the compiler
generates actually matches the executable code, and that it is as
complete as viable.

The goal is not to disable optimizations so as to preserve variables
or code, such that it can be represented in debug information and
provide for a debugging experience more like that of code that is not
optimized.

If debug information disables any optimization, that's a bug that
needs fixing.

Now, while testing this design, a number of opportunities for
optimization that GCC missed were detected and fixed, others were
merely detected, and at least one optimization shortcoming kept in
place in order to get better debug information could be removed, for
the new debug information infrastructure enables the optimization to
be applied in its fullest extent.

== Examples

It is desirable to be able to represent constants and other
optimized-away values, rather than stating variables have values they
can no longer have:

int
x1 (int x)
{
  int i;

  i = 2;
  f(i);
  i = x;
  h();
  i = 7;
  g(i);
}

Even if variable i is completely optimized away, a debugger can still
print the correct values for i if we keep annotations such as:

  (debug (var_location i (const_int 2)))
  (set (reg arg0) (const_int 2))
  (call (mem (symbol_ref f)))
  (debug (var_location i unknown))
  (call (mem (symbol_ref h)))
  (debug (var_location i (const_int 7)))
  (set (reg arg0) (const_int 7))
  (call (mem (symbol_ref g)))

In this case, before the call to h, not only the assignment to i was
dead, but also the value of the incoming argument x had already been
clobbered.  If i had been assigned to another constant instead, debug
information could easily represent this.

Another example that covers PHI nodes and conditionals:

int
x2 (int x, int y, int z)
{
  int c = z;
  whatever0(c);
  c = x;
  whatever1();
  if (some_condition)
    {
      whatever2();
      c = y;
      whatever3();
    }
  whatever4(c);
}

With SSA infrastructure, this program can be optimized to:

int
x2 (int x, int y, int z)
{
  int c;
  # bb 1
  whatever0(z_0(D));
  whatever1();
  if (some_condition)
    {
      # bb 2
      whatever2();
      whatever3();
    }
  # bb 3
  # c_1 = PHI <x_2(D)(1), y_3(D)(2)>;
  whatever4(c_1);
}

Note how, without debug annotations, c is only initialized just before
the call to whatever4.  At all other points, the value of c would be
unavailable to the debugger, possibly even wrong.

If we were to annotate the SSA definitions forward-propagated into c
versions as applying to c, we'd end up with all of x_2, y_3 and z_0
applied to c throughout the entire function, in the absence of
additional markers.

Now, with the annotations proposed in this paper, what is initially:

int
x2 (int x, int y, int z)
{
  int c;
  # bb 1
  c_4 = z_0(D);
  # DEBUG c c_4
  whatever0(c_4);
  c_5 = x_2(D);
  # DEBUG c c_5
  whatever1();
  if (some_condition)
    {
      # bb 2
      whatever2();
      c_6 = y_3(D);
      # DEBUG c c_6
      whatever3();
    }

  # bb 3
  # c_1 = PHI <c_5(D)(1), c_6(D)(2)>
  # DEBUG c c_1
  whatever4(c_1);
}

is optimized into:

int
x2 (int x, int y, int z)
{
  int c;
  # bb 1
  # DEBUG c z_0(D)
  whatever0(z_0(D));
  # DEBUG c x_2(D)
  whatever1();
  if (some_condition)
    {
      # bb 2
      whatever2();
      # DEBUG y_3(D)
      whatever3();
    }
  # bb 3
  # c_1 = PHI <x_2(D)(1), y_3(D)(2)>;
  # DEBUG c c_1
  whatever4(c_1);
}

and then, at every one of the inspection points, we get the correct
value for variable c.

== Conclusion

This design enables a compiler to emit variable location debug
information that complies with the DWARF version 3 standard, and that
is likely to be as complete as theoretically possible, with an
implementation that is conceptually simple, relatively easy to
introduce, trivial to test and easy to maintain in the long run.  Not
wasting memory or CPU cycles during compilation without debug
information are welcome bonuses.

[-- Attachment #3: Type: text/plain, Size: 250 bytes --]

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  9:10                                                       ` Alexandre Oliva
@ 2007-12-18 13:20                                                         ` Diego Novillo
  2007-12-18 15:42                                                           ` Alexandre Oliva
  2007-12-18 22:43                                                         ` Daniel Berlin
  2007-12-18 23:35                                                         ` Daniel Berlin
  2 siblings, 1 reply; 150+ messages in thread
From: Diego Novillo @ 2007-12-18 13:20 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/18/07 03:07, Alexandre Oliva wrote:

> Rats, this below-the-waistline attack really got me annoyed.

I'm sorry you feel that way, it was not meant as a personal attack, 
though it was rather brusque.  I was getting tired of asking for the 
same thing over and over again.

> So, what do you say now?

Thank you.  Now I have something concrete to read and comment on.


Diego.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 13:20                                                         ` Diego Novillo
@ 2007-12-18 15:42                                                           ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-18 15:42 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Daniel Berlin, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 18, 2007, Diego Novillo <dnovillo@google.com> wrote:

> On 12/18/07 03:07, Alexandre Oliva wrote:
>> Rats, this below-the-waistline attack really got me annoyed.

> I'm sorry you feel that way, it was not meant as a personal attack,
> though it was rather brusque.  I was getting tired of asking for the
> same thing over and over again.

>> So, what do you say now?

> Thank you.  Now I have something concrete to read and comment on.

You already had it.  Really.  You just didn't feel like reading and
commenting on it, for whatever reason I can't understand, which is why
you kept asking for what you already had over and over again.

Anyhow...  I expect your feedback, err...  "now" ;-P :-D

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  9:10                                                       ` Alexandre Oliva
  2007-12-18 13:20                                                         ` Diego Novillo
@ 2007-12-18 22:43                                                         ` Daniel Berlin
  2007-12-19  6:07                                                           ` Alexandre Oliva
  2007-12-18 23:35                                                         ` Daniel Berlin
  2 siblings, 1 reply; 150+ messages in thread
From: Daniel Berlin @ 2007-12-18 22:43 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/18/07, Alexandre Oliva <aoliva@redhat.com> wrote:

> Then, we let tree optimizers do their jobs.  Whenever they rename,
> renumber, coalesce, combine or otherwise optimize a variable, they
> will automatically update debug statements that mention them as well.
>
Speaking only about the tree level, in this entire email
I make no representations about the RTL level ;)

This is much harder than you give it credit for, unless you plan on
throwing out all the info at elimination points.

Consider PRE alone, which makes new statements that are combinations
of old ones, and eliminate tons of variables in favor of it.

If your debug statement strategy is "move debug statements when we
insert code that is equivalent", it won't work, because our
equivalence is based on value equivalence, not location equivalence.
We only guarantee it has the same value as the whatever it is a copy
of at that point, not that it has the same location.

So you will lose info every time PRE makes an insertion, unless you
make serious modifications to PRE.

This is not to mention the data you lose if you just throw it away at
elimination points.

Let's take another problem.

How do i say debug info for some variable is now dead, we have no idea
what it is right now?
How do I figure out which debug statements need to be modified when
you introduce new memory operations?

When you pass something by address, you get vops.
The vops are not variables, and have no relation to the original
variable (they can be partitions containing more vairables).

If i have

DEBUG(x, x_3)
x_3 = x; // Read from global

y = x_3;
....

If i insert a new call
DEBUG(x, x_3): 1
x_3 = x

foo() // May modify x and *&x)

y = x_3

Now you have two problems.

It is no longer true that at the point of y = x_3, that DEBUG (x, x_3) is true
In act, x_3 may no longer have any relation to x.
You have three choices:
1. Either destroy the DEBUG(x, x_3) losing valuable and correct info
2. Add a new DEBUG (x, unknown)
3. Figure out which debug statement are reached by your call

#3 is a dataflow problem, and not something you want to do every time
you insert a call.

If your answer is #1 or #2, then what you are really doing is
computing roughly the same dataflow problem var-location does, except
on trees and with a different meet-operation.

var-location generates incorrect info not because it represents
something fundamentally different than you are (it doesn't), it falls
down because it uses union as the meet operation.

It says "oh, i don't know which of these locations is right, it must
be both of them".

If you changed the meet operation to "oh, i don't know which of these
locations is right, it must be none of them", and did a little more
work you would inference the same info as yours *at the tree level*

Nothing you have proposed is fundamentally going to give you better info.
All you have done is annotated the IR in some places to make explicit
some bits in the dataflow problem that you could inference anyway.  It
is provable you can inference them with a simple lattice and
associated value, *unless you are going to start guessing* (which you
have said you don't want to do because it can generate incorrect
info).

There is absolutely no reason what you are trying to do needs to
modify the tree IR at all to achieve exactly the same accuracy of
debug info as your design proposes at the tree level.  You could
simply compute the global dataflow problem.

The RTL level is harder, of course.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 22:43                                                         ` Daniel Berlin
@ 2007-12-19  6:07                                                           ` Alexandre Oliva
  2007-12-19  8:39                                                             ` Daniel Berlin
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-19  6:07 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> Consider PRE alone,

> If your debug statement strategy is "move debug statements when we
> insert code that is equivalent"

Move?  Debug statements don't move, in general.  I'm not sure what you
have in mind, but I sense some disconnect here.

> because our equivalence is based on value equivalence, not location
> equivalence.  We only guarantee it has the same value as the
> whatever it is a copy of at that point, not that it has the same
> location.

This sounds perfect to me.  I'm concerned about values.  Locations are
an implementation detail.  The thing to keep in mind is that what was
originally a single user variable may end up mangloptimized into
multiple stack slots, registers, with multiple simultaneously-live
versions.  Trying to pretend that any of these represent the user
variable sounds like a recipe for madness to me.  So I focus on values
instead, and then on trying to recover locations based on binding and
sharing of values.

> How do i say debug info for some variable is now dead, we have no idea
> what it is right now?

For annotations, look for VAR_DEBUG_VALUE_NOVALUE in tree.h and
VAR_LOC_UNKNOWN_P in rtl.h, in the VTA branch.

For dwarf location lists, you just refrain from emitting locations for
a given range.

> How do I figure out which debug statements need to be modified when
> you introduce new memory operations?

None.  By definition, debug annotations are only about variables that
are not addressable.  Those that are are fixed at a single location,
so there's no reason to track them in a fancy way.

> If i insert a new call
> DEBUG(x, x_3): 1
> x_3 = x

> foo() // May modify x and *&x)

> y = x_3

> Now you have two problems.

You're talking about a real problem, but your example is misguided.
Let me give you a real problem scenario.

(set (reg <T>) (<whatever>))
(var_location x (reg <T>))
(set (mem <addr>) (reg <T>))
(set (reg <T>) (<somethingelse>))
(call (mem (symbol_ref foo)))

So, at the var_location debug_insn, we know that x is in reg <T>.
That's stored at *addr, so now we might be able to use it as an
additional location for x.  And then, when reg is modified, we remove
T from the equivalence class, and then only location holding the value
of x is *addr.  Then, a function call, that might modify *addr.

So, do we decide that x is no longer available after the call, or do
we hope *addr still represents it?

The thing to remember is that the annotations are only about gimple
regs.  This means calls don't modify them, ever.  But we still have to
decide whether *addr represents x or not.

My thoughts are leaning towards looking at the memory address or other
memory attributes to tell whether it's an addressable stack slot or
not.  If it's addressable, remove it from the equivalence class at the
call, so the equivalence class becomes empty, and the variable is
regarded as dead.  If it's not addressable (a pseudo assigned to
memory), then we can keep it, even if x is actually dead past the
call.

What we'll see is that, if x is not dead after the call, the compiler
will arrange to preserve its value in one such local non-addressable
stack slot, and it will probably extend the equivalence class again
after the call, as the pseudo is restored.  Or the pseudo will be
temporarily assigned to a call-saved register, which, for being
call-saved, won't be removed from equivalence classes at call
instructions.  Whereas, if x is dead and its value was just copied to
some random memory location, then we may as well flag it as dead at
the call site, where the memory location may be modified.

So, it all works out nicely, because we know we're only dealing with
gimple regs.

volatile asms make this slightly trickier, because they're totally
unpredictable.  I'm thinking it's safe to simply remove addressable
memory locations from equivalence classes at them, just for safety,
but I don't have it completely figured out.

> #3 is a dataflow problem, and not something you want to do every time
> you insert a call.

I'm not sure what you mean by "inserting calls".  We don't do that.
Calls are present in the source code (even when implied by stuff like
TLS, OpenMP or builtins such as memcpy), and they're either kept
around, eliminated or inlined.

(disgression intended to be funny: this "inserting a call" discussion
reminds me of those impossible initial conditions in electromagnetism
textbook exercises, such as uniform magnetic fields in which charged
particle suddenly appear ;-)

> If your answer is #1 or #2, then what you are really doing is
> computing roughly the same dataflow problem var-location does, except
> on trees and with a different meet-operation.

I am actually computing the same dataflow problem of var-tracking.
That's the whole point.  But I'm giving it more information, to enable
it to track more variables.  And it needs to deal with multiple
concurrent locations for the same variable, and multiple variables in
the same locations, which are "slight" complications.  But you're
right, in the end it's the same problem.

But I'm not computing that in trees.  I'm just collecting and
maintaining data points for var-tracking, all the way from the tree
level.

> var-location generates incorrect info not because it represents
> something fundamentally different than you are (it doesn't), it falls
> down because it uses union as the meet operation.

> It says "oh, i don't know which of these locations is right, it must
> be both of them".

However, it can't deal with parallel locations, so this is at odds
with your statement.  I haven't got 'round to studying the exact
dataflow algorithm var-tracking uses, I just figured I needed to do
something along these lines.  Maybe it does need tweaking, if I end up
using it.  I'm not sure yet it's going to make sense to use it for the
more detailed tracking of copying that I'm going to have to do.

> If you changed the meet operation to "oh, i don't know which of these
> locations is right, it must be none of them", and did a little more
> work you would inference the same info as yours *at the tree level*

Intersection sounds like the right approach to me.  I assumed
var-tracking did this, except for unknowns.  It's a bit trickier than
this because var-tracking has to deal with a lot of incomplete
information.  But at least for vta values, we are going to have a
complete picture, so we can be stricter when it comes to gimple reg
variables.

Now, whether the fact that we could infer the very same values at the
tree level is relevant, I don't know.  The tree level is neither
source level nor the final executable code, so unless we can establish
useful mappings from the tree level to both source level and final
executable code, this information is of little use, no matter how true
it is.

> Nothing you have proposed is fundamentally going to give you better info.

Except for what tree transformations currently discard, such as the
points of the program in which variables are bound to values.  This is
indeed the one of the elements that the annotations are trying to
preserve, that the compiler has not cared about preserving.  (The
other being expressions that end up not computed at run time, but that
could still be computed by a debugger based on state available
elsewhere)

> All you have done is annotated the IR in some places to make explicit
> some bits in the dataflow problem that you could inference anyway.

Now, this is not true.  I could infer values, yes, but I couldn't
infer the variables they relate to, nor the point of binding.  And
debug information is not just about the values, it's about mapping
variables to values and locations.  So, we can't infer all the
information we need.

> There is absolutely no reason what you are trying to do needs to
> modify the tree IR at all to achieve exactly the same accuracy of
> debug info as your design proposes at the tree level.

So far these claims have been unconvincing.  I still get the feeling
that you're missing some aspects of the problem, but I invite you to
show me how the information available in the current IR could be used
to generate accurate debug information for the two examples in the
design document.  Even if we leave the RTL aspect of it aside for a
moment.  I certainly wouldn't mind having to generate annotations only
when we move from Trees to RTL, but I can't imagine how we'd
reintroduce bindings at points that are not marked in the tree level,
for variables that are (partially or entirely) gone from the tree IR.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  6:07                                                           ` Alexandre Oliva
@ 2007-12-19  8:39                                                             ` Daniel Berlin
  2007-12-19 16:12                                                               ` Daniel Berlin
  2007-12-19 20:27                                                               ` Alexandre Oliva
  0 siblings, 2 replies; 150+ messages in thread
From: Daniel Berlin @ 2007-12-19  8:39 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>
> > Consider PRE alone,
>
> > If your debug statement strategy is "move debug statements when we
> > insert code that is equivalent"
>
> Move?  Debug statements don't move, in general.  I'm not sure what you
> have in mind, but I sense some disconnect here.

OKay, so if you aren't going to move them, you have to erase them when
you move statements around.

>
> > because our equivalence is based on value equivalence, not location
> > equivalence.  We only guarantee it has the same value as the
> > whatever it is a copy of at that point, not that it has the same
> > location.

This  is just a problem with an initial state and some propagation at
each statement.
How were you going to generate the initial set of debug annotations?
This is how you get your initial state for your dataflow problem
How were you going to update it if you saw a statement was updated to
say x_5 = x_4 instead of x_5 = x_3 + x_2.
The same operation you perform to update your annotations when you see
 x_5 = x_4 works whether you started with x_5 = x_3 + x_2 or not (it
better, or else your updating will give different results for the same
IR depending on how you got there, which is *incredibly* bad).

So then how will using your debug annotations and updating them come
out any different than say performing a value numbering pass where you
also associate user variables with the ssa names (IE alongside our
value numbers), and propagate them around as well?

If you want to associate multiple user variables with a single SSA
definition point, you can do that as well (use union instead of copy).
You can do whatever you think is best at phi nodes (empty set if user
var sets are not equal, or union them or intersect them).

At the end, you could emit DEBUG(user var, ssa name) right after each
SSA_NAME_DEF_STMT for all user vars in the user var set for ssa name.

The right DEBUG statements would then appear at the points you can
guarantee the user variable has the same *value* as the gimple
register you've said it does.
From there, it is up to you to do what you like with the result.

(it's late, so i may have described/ calculated the dataflow problem
backwards, but you get the idea)

This is, after all, more or less what PRE does for it's value
numbering. It computes which things have the same value at what points
in the program, then uses this after computing some more dataflow
problems that say where this implies reuse.

I don't see why you believe user variables/bindings are special and
can't be propagated in this manner, given that you can't depend on the
type of statement change that has occurred, only what the IR looks
like after the statement change.  Otherwise, again, the same IR and
source may have different debug annotations depending on the set of
changes you applied to get that IR from the initial IR, which is not
good the standard reasons [maintainability, determinism,
reproducibility, etc].
>
> > #3 is a dataflow problem, and not something you want to do every time
> > you insert a call.
>
> I'm not sure what you mean by "inserting calls".  We don't do that.

Sure we do.
We will definitely insert new calls when we PRE const/pure calls, or
calls we determine to be movable to the point we want to move them
(using call clobbered results, etc).
This will insert calls in latch blocks, above loops, in branch conditions
This is not just movement.
It is insertion of calls that did not exist in the source code at a
given point, but are allowed to be executed at that point in the
source code anyway.

> Calls are present in the source code (even when implied by stuff like
> TLS, OpenMP or builtins such as memcpy), and they're either kept
> around, eliminated or inlined.
No, we can and will insert new calls.
Not just for PRE, but for profiling, devirtualization, struct reorg, SRA, etc
struct reorg inserts new mallocs and frees
profiling inserts profiling calls
devirt will insert branches and new calls to replace virtual function calls
SRA will insert memcpys to and from structures that were not there in
user source before.
i could go on if you like.
I'm not sure why you believe all the calls that we end up with in the
IR are actually in the source (or even implied by it).

>
> But I'm not computing that in trees.  I'm just collecting and
> maintaining data points for var-tracking, all the way from the tree
> level.
Okay, then for trees,  why bother tracking it when you can compute it
right before translation with the same accuracy you can if you update
it every time you make statement changes?
>
> > All you have done is annotated the IR in some places to make explicit
> > some bits in the dataflow problem that you could inference anyway.
>
> Now, this is not true.  I could infer values, yes, but I couldn't
> infer the variables they relate to, nor the point of binding

See above.

>  And
> debug information is not just about the values, it's about mapping
> variables to values and locations.

You have no locations at the tree level, and i've explicitly said what
i said applies to the tree level :)
> So, we can't infer all the
> information we need.

Again, i believe we can at the tree level.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  8:39                                                             ` Daniel Berlin
@ 2007-12-19 16:12                                                               ` Daniel Berlin
  2007-12-19 16:36                                                                 ` Andrew MacLeod
                                                                                   ` (2 more replies)
  2007-12-19 20:27                                                               ` Alexandre Oliva
  1 sibling, 3 replies; 150+ messages in thread
From: Daniel Berlin @ 2007-12-19 16:12 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/19/07, Daniel Berlin <dberlin@dberlin.org> wrote:
> On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> > On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
> >
> > > Consider PRE alone,
> >
> > > If your debug statement strategy is "move debug statements when we
> > > insert code that is equivalent"
> >
> > Move?  Debug statements don't move, in general.  I'm not sure what you
> > have in mind, but I sense some disconnect here.
>
> OKay, so if you aren't going to move them, you have to erase them when
> you move statements around.
>

Besides this, how do you plan on handling the following situations
(both of which reassoc performs *right now*).  These are the
relatively easy ones

Here is the easy one:

z_5 = a_3 + b_3
x_4 = z_5 + c_3

DEBUG(x, x_4)


Reassoc may transform this into:


z_5 = c_3 + b_3
x_4 = z_5 + a_3

DEBUG(x, x_4)

Now x has the wrong value.

At least in this case, you can tell which DEBUG statement to eliminate
easily (it is an immediate use of x_4)

It gets worse, however

c_3 = a_1 + b_2
z_5 = c_3 + d_9
x_4 = z_5 + e_10
DEBUG(x, x_4)
y_7 = x_4 + f_11
z_8 =  y_7 + g_12
->

c_3 = a_1 + b_2
z_5 = c_3 + g_12
x_4 = z_5 + e_10
DEBUG(x, x_4)
y_7 = x_4 + f_11
z_8 = y_7 + d_9


x_4 now no longer represents the value of x, but we haven't directly
changed x_4, it's immediate users, or the statements that immediately
make up it's defining values.

How do you propose we figure out which DEBUG statements we may have
affected without doing all kinds of walks?

(This is of course, a more general problem of how do i find which
debug statements are reached by my transformation without doing linear
walks)

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:12                                                               ` Daniel Berlin
@ 2007-12-19 16:36                                                                 ` Andrew MacLeod
  2007-12-19 19:49                                                                   ` Daniel Berlin
  2007-12-19 20:00                                                                 ` Andrew MacLeod
  2007-12-19 20:07                                                                 ` Alexandre Oliva
  2 siblings, 1 reply; 150+ messages in thread
From: Andrew MacLeod @ 2007-12-19 16:36 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

Daniel Berlin wrote:
>
> Here is the easy one:
>
> z_5 = a_3 + b_3
> x_4 = z_5 + c_3
>
> DEBUG(x, x_4)
>
>
> Reassoc may transform this into:
>
>
> z_5 = c_3 + b_3
> x_4 = z_5 + a_3
>
> DEBUG(x, x_4)
>
> Now x has the wrong value.
>   
??

x_4 looks like it has the value 'a_3 + b_3 + c_3' in both examples to 
me, although computed in different orders...

so isn't that still the right value?

Andrew

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:36                                                                 ` Andrew MacLeod
@ 2007-12-19 19:49                                                                   ` Daniel Berlin
  0 siblings, 0 replies; 150+ messages in thread
From: Daniel Berlin @ 2007-12-19 19:49 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On 12/19/07, Andrew MacLeod <amacleod@redhat.com> wrote:
> Daniel Berlin wrote:
> >
> > Here is the easy one:
> >
> > z_5 = a_3 + b_3
> > x_4 = z_5 + c_3
> >
> > DEBUG(x, x_4)
> >
> >
> > Reassoc may transform this into:
> >
> >
> > z_5 = c_3 + b_3
> > x_4 = z_5 + a_3
> >
> > DEBUG(x, x_4)
> >
> > Now x has the wrong value.
> >
> ??
>
> x_4 looks like it has the value 'a_3 + b_3 + c_3' in both examples to
> me, although computed in different orders...
>
> so isn't that still the right value?

Yes, sorry, you have to add one more set of adds below and move one so
you can make it have a different value

You get the general idea though :)
Reassoc knows they are all only used in each other, and that it is
okay to change their intermediate value as long as the last thing int
he chain retains its value (which it does since they are all
commutative operations)
>
> Andrew
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:12                                                               ` Daniel Berlin
  2007-12-19 16:36                                                                 ` Andrew MacLeod
@ 2007-12-19 20:00                                                                 ` Andrew MacLeod
  2007-12-19 20:57                                                                   ` Daniel Berlin
  2007-12-19 20:07                                                                 ` Alexandre Oliva
  2 siblings, 1 reply; 150+ messages in thread
From: Andrew MacLeod @ 2007-12-19 20:00 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc


> It gets worse, however
>
> c_3 = a_1 + b_2
> z_5 = c_3 + d_9
> x_4 = z_5 + e_10
> DEBUG(x, x_4)
> y_7 = x_4 + f_11
> z_8 =  y_7 + g_12
> ->
>
> c_3 = a_1 + b_2
> z_5 = c_3 + g_12
> x_4 = z_5 + e_10
> DEBUG(x, x_4)
> y_7 = x_4 + f_11
> z_8 = y_7 + d_9
>
>
> x_4 now no longer represents the value of x, but we haven't directly
> changed x_4, it's immediate users, or the statements that immediately
> make up it's defining values.
>
>   

This does seem more troublesome. Reassociation shuffles things around 
without changing the LHS presumably because it has looked at the uses 
and knows there are no uses outside the expression, so it can manipulate 
them however it wants. It elects not to create new temps since it knows 
the old ones aren't being used elsewhere, so why wast new entries.

So if it was aware of the debug stmt, there would be a use of x_4 
outside the expression, and it would no longer do the same reassociation.

Is that the jist of it?

Andrew

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 20:00                                                                 ` Andrew MacLeod
@ 2007-12-19 20:57                                                                   ` Daniel Berlin
  0 siblings, 0 replies; 150+ messages in thread
From: Daniel Berlin @ 2007-12-19 20:57 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Alexandre Oliva, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On 12/19/07, Andrew MacLeod <amacleod@redhat.com> wrote:
>
> > It gets worse, however
> >
> > c_3 = a_1 + b_2
> > z_5 = c_3 + d_9
> > x_4 = z_5 + e_10
> > DEBUG(x, x_4)
> > y_7 = x_4 + f_11
> > z_8 =  y_7 + g_12
> > ->
> >
> > c_3 = a_1 + b_2
> > z_5 = c_3 + g_12
> > x_4 = z_5 + e_10
> > DEBUG(x, x_4)
> > y_7 = x_4 + f_11
> > z_8 = y_7 + d_9
> >
> >
> > x_4 now no longer represents the value of x, but we haven't directly
> > changed x_4, it's immediate users, or the statements that immediately
> > make up it's defining values.
> >
> >
>
> This does seem more troublesome. Reassociation shuffles things around
> without changing the LHS presumably because it has looked at the uses
> and knows there are no uses outside the expression, so it can manipulate
> them however it wants. It elects not to create new temps since it knows
> the old ones aren't being used elsewhere, so why wast new entries.

Yes.

>
> So if it was aware of the debug stmt, there would be a use of x_4
> outside the expression, and it would no longer do the same reassociation.

Either that, or you would have to hunt all the uses of every single
thing in the chain to see if any were debug expressions, and if the
value is going to change.

>
> Is that the jist of it?
Yes

>
> Andrew
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:12                                                               ` Daniel Berlin
  2007-12-19 16:36                                                                 ` Andrew MacLeod
  2007-12-19 20:00                                                                 ` Andrew MacLeod
@ 2007-12-19 20:07                                                                 ` Alexandre Oliva
  2007-12-19 22:00                                                                   ` Daniel Berlin
  2 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-19 20:07 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> Here is the easy one:

> z_5 = a_3 + b_3
> x_4 = z_5 + c_3

> DEBUG(x, x_4)


> Reassoc may transform this into:


> z_5 = c_3 + b_3
> x_4 = z_5 + a_3

> DEBUG(x, x_4)

> Now x has the wrong value.

As Andrew said, no, it doesn't.

Now, if z_5 were present in a debug expression, then it would need
adjusting.  No different from the adjusting need for any other
instruction in which z_5 was present, though.  That's what I mean when
I talk about letting the optimizers do their job on debug instructions
too.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 20:07                                                                 ` Alexandre Oliva
@ 2007-12-19 22:00                                                                   ` Daniel Berlin
  2007-12-20  9:26                                                                     ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Daniel Berlin @ 2007-12-19 22:00 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>
> > Here is the easy one:
>
> > z_5 = a_3 + b_3
> > x_4 = z_5 + c_3
>
> > DEBUG(x, x_4)
>
>
> > Reassoc may transform this into:
>
>
> > z_5 = c_3 + b_3
> > x_4 = z_5 + a_3
>
> > DEBUG(x, x_4)
>
> > Now x has the wrong value.
>
> As Andrew said, no, it doesn't.
>
Yes, I corrected it later.
You didn't address the other one, which is much harder and does
require addressing by you.


> Now, if z_5 were present in a debug expression, then it would need
> adjusting.  No different from the adjusting need for any other
> instruction in which z_5 was present, though.
uh, but if you don't adjust in the fixed examples, DEBUG(x, x_4) will
give an invalid value.

You can cause this to value to change without ever changing x_4, and
do so legally.
How do i know i need to change this DEBUG expression.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 22:00                                                                   ` Daniel Berlin
@ 2007-12-20  9:26                                                                     ` Alexandre Oliva
  2007-12-20 17:04                                                                       ` Ian Lance Taylor
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-20  9:26 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

>> Now, if z_5 were present in a debug expression, then it would need
>> adjusting.  No different from the adjusting need for any other
>> instruction in which z_5 was present, though.

> uh, but if you don't adjust in the fixed examples, DEBUG(x, x_4) will
> give an invalid value.

My point was that optimizers already had to know how to adjust things
such that it doesn't break code.

Now, in this optimization, it takes additional liberties with existing
variables because it sees they're only used within the sequence.
IMHO, it would be more appropriate to introduce alternate temporaries,
rather than reusing SSA names for different purposes, in this case.
If this approach was taken, the debug annotations referring to a
no-longer-defined SSA name would be recognized as invalid, and the
variable binding would be removed (i.e., turned into a "value unknown"
annotation).  Or, if we left the definitions in place, even though
they're dead, the same code that cleans up undefined SSA names could
recognize these SSA names as unused except in debug information and
substitute them for their values, maintaining accurate and complete
debug information.

But can we do better without introducing more SSA names and keeping
assignments around that are known to be dead?  Yes, with some
additional effort, see below.

> How do i know i need to change this DEBUG expression.

As reassoc looks for sets of variables it can freely mess with, it
should take note of variables that are used in debug annotations in
addition to the kind of single (?) non-debug uses it's interested in,
such that, when it modifies these variables, the annotations can be
compensated for.

OTOH, if the compiler performs reassoc on user variables today, it
means we do get mangled debug information for such variables already,
and they get incorrect values.  So, even if we didn't address this
problem right away, it wouldn't be much of a regression.

But, of course, not dealing with it breaks the goal of having correct
debug information, so it ought to be dealt with properly.

Do you happen to have a yummy testcase handy that I could use to
trigger this kind of transformation in ways that affect the value of
user variables?

Thanks in advance,

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20  9:26                                                                     ` Alexandre Oliva
@ 2007-12-20 17:04                                                                       ` Ian Lance Taylor
  2007-12-20 20:53                                                                         ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Ian Lance Taylor @ 2007-12-20 17:04 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Richard Guenther, gcc-patches, gcc

Alexandre Oliva <aoliva@redhat.com> writes:

> > How do i know i need to change this DEBUG expression.
> 
> As reassoc looks for sets of variables it can freely mess with, it
> should take note of variables that are used in debug annotations in
> addition to the kind of single (?) non-debug uses it's interested in,
> such that, when it modifies these variables, the annotations can be
> compensated for.

The question is how it finds them efficiently, without doing a scan of
all instructions.

Ian

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-20 17:04                                                                       ` Ian Lance Taylor
@ 2007-12-20 20:53                                                                         ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-20 20:53 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Richard Guenther, gcc-patches, gcc

On Dec 20, 2007, Ian Lance Taylor <iant@google.com> wrote:

> Alexandre Oliva <aoliva@redhat.com> writes:
>> > How do i know i need to change this DEBUG expression.
>> 
>> As reassoc looks for sets of variables it can freely mess with, it
>> should take note of variables that are used in debug annotations in
>> addition to the kind of single (?) non-debug uses it's interested in,
>> such that, when it modifies these variables, the annotations can be
>> compensated for.

> The question is how it finds them efficiently, without doing a scan of
> all instructions.

It must keep track of variables it can mess with, so it might as well
take notes about those it has to be more careful about.

*Or* it can just introduce new temporaries, rename the uses and leave
the original sets behind for "garbage collection" AKA dead code
elimination, like I said.

One is more implementation work, the other is potentially more
wasteful in terms of memory use.  None look particularly hard to me.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  8:39                                                             ` Daniel Berlin
  2007-12-19 16:12                                                               ` Daniel Berlin
@ 2007-12-19 20:27                                                               ` Alexandre Oliva
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-19 20:27 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> On 12/19/07, Alexandre Oliva <aoliva@redhat.com> wrote:
>> On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>> 
>> > Consider PRE alone,
>> 
>> > If your debug statement strategy is "move debug statements when we
>> > insert code that is equivalent"
>> 
>> Move?  Debug statements don't move, in general.  I'm not sure what you
>> have in mind, but I sense some disconnect here.

> OKay, so if you aren't going to move them, you have to erase them when
> you move statements around.

Why?  They still represent the point of binding between user variable
and value.

> How were you going to generate the initial set of debug annotations?

It's in the document: after each assignment to user variable, and at
PHI nodes for user variables.  The debug statement means the variable
holds that value from that point on until conflicting information
arises (i.e., another debug statement for the same variable, or a
control flow merge with different values for the same variable)

> How were you going to update it if you saw a statement was updated to
> say x_5 = x_4 instead of x_5 = x_3 + x_2.

No update needed, if x_5 is the value of interest.  I'm not sure
that's what you're asking, though.

> So then how will using your debug annotations and updating them come
> out any different than say performing a value numbering pass where you
> also associate user variables with the ssa names (IE alongside our
> value numbers), and propagate them around as well?

First, debug annotations may be at different points than the
corresponding SSA definitions, because the same SSA definition may be
bound to different variables at different ranges.

Second, debug annotations may contain more complex expressions than a
single SSA name, and there may not be any SSA name that represents the
value of these expressions left.  For example, given:

  x_3 = a_1 + b_2;
  # DEBUG x => x_3
  foo();

if we find that x_3 is unused elsewhere, we can drop it without
discarding debug information about the value of x at that point

  # DEBUG x => a_1 + b_2
  foo();

such that, if we stop at the call and print x, we get the expected
value, even though the actual variable was optimized away.

> At the end, you could emit DEBUG(user var, ssa name) right after each
> SSA_NAME_DEF_STMT for all user vars in the user var set for ssa name.

This doesn't work.  Consider:

  a_2 = whatever1;
  b_4 = whatever2;

  x_1 = a_2;
  probe();

  if (condition) {
    probe();
    x_3 = b_4;
    probe();
  }

  x_5 = PHI <x_1(!condition), x_3(condition)>;
  probe();

Now, if you optimize it and apply the debug stmt generation
technique you suggested, this is what you get:

  T_2 = whatever1;
  # DEBUG a => T_2
  # DEBUG x => T_2
  T_4 = whatever2;
  # DEBUG b => T_4
  # DEBUG x => T_4

  probe();

  if (condition) {
    probe();
    probe();
  }

  T_5 = PHI <T_2(!condition), T_4(condition)>
  # DEBUG x => T_5
  probe();

What do you get if you print x at each of the probe points?

> I don't see why you believe user variables/bindings are special and
> can't be propagated in this manner,

It's not that I don't believe it, it's just that just being able to
propagate them is not enough.  We must also take the binding point
into account.

Now, as I wrote to Ian last night, if we just add a binding point
annotation to this mix, then we have sufficient information:

  T_2 = whatever1;
  # DEBUG a => T_2 here
  # DEBUG x => T_2 at P1
  T_4 = whatever2;
  # DEBUG b => T_4 here
  # DEBUG x => T_4 at P2

  probe();
  # DEBUG point P1

  if (condition) {
    probe();
    # DEBUG point P2
    probe();
  }

  T_5 = PHI <T_2(!condition), T_4(condition)>
  # DEBUG x => T_5
  probe();

I still don't see how, in this notation, we'd represent something like
"at this point, the value of this user variable is unknown".  Any
ideas?

Also, this strategy works for the nice and well-behaved Tree SSA
optimization passes.  For RTL, that is far less abstract, especially
after register allocation, I don't see that we can rely on such a
simple strategy.  But, in a way, I hope I'm wrong ;-)

>> > #3 is a dataflow problem, and not something you want to do every time
>> > you insert a call.

>> I'm not sure what you mean by "inserting calls".  We don't do that.

> Sure we do.
> We will definitely insert new calls when we PRE const/pure calls, or
> calls we determine to be movable to the point we want to move them

I think of that as moving, rather than inserting.  That said, I still
don't quite see what you're getting at.  Calls don't mess with gimple
registers of their callers, ever, so it appears to me that inserting a
call in the tree level is a NOP in terms of debug information
annotations.

> I'm not sure why you believe all the calls that we end up with in the
> IR are actually in the source (or even implied by it).

Conceptually, they are, kind-a sort of :-)  Except perhaps for
profiling calls, that are meant to be fully transparent anyway.
Others are more akin to inlining, or using a call for convenience
rather than expanding a copy or something to that effect.

>> But I'm not computing that in trees.  I'm just collecting and
>> maintaining data points for var-tracking, all the way from the tree
>> level.

> Okay, then for trees,  why bother tracking it when you can compute it
> right before translation with the same accuracy you can if you update
> it every time you make statement changes?

Just because we still haven't found a reliable way to do so that
doesn't drop essential information for correct debug info.  If we do,
I'll be delighted to immediately drop the proposed debug annotations
in the tree level.  And in the RTL level as well.

>> And debug information is not just about the values, it's about
>> mapping variables to values and locations.

> You have no locations at the tree level,

?!?  Locations as in point of execution, rather than DWARF locations,
is waht I mean.

> and i've explicitly said what
> i said applies to the tree level :)

Indeed ;-)

>> So, we can't infer all the
>> information we need.

> Again, i believe we can at the tree level.

Good, let's keep on it.  How about you use something like the example
above to explain how to accomplish it?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18  9:10                                                       ` Alexandre Oliva
  2007-12-18 13:20                                                         ` Diego Novillo
  2007-12-18 22:43                                                         ` Daniel Berlin
@ 2007-12-18 23:35                                                         ` Daniel Berlin
  2007-12-19  5:50                                                           ` Alexandre Oliva
  2 siblings, 1 reply; 150+ messages in thread
From: Daniel Berlin @ 2007-12-18 23:35 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

>
> It is desirable to be able to represent constants and other
> optimized-away values, rather than stating variables have values they
> can no longer have:
>
> int
> x1 (int x)
> {
>   int i;
>
>   i = 2;
>   f(i);
>   i = x;
>   h();
>   i = 7;
>   g(i);
> }
>
> Even if variable i is completely optimized away, a debugger can still
> print the correct values for i if we keep annotations such as:

>
>   (debug (var_location i (const_int 2)))
>   (set (reg arg0) (const_int 2))
>   (call (mem (symbol_ref f)))
>   (debug (var_location i unknown))
>   (call (mem (symbol_ref h)))
>   (debug (var_location i (const_int 7)))
>   (set (reg arg0) (const_int 7))
>   (call (mem (symbol_ref g)))
>
> In this case, before the call to h, not only the assignment to i was
> dead, but also the value of the incoming argument x had already been
> clobbered.  If i had been assigned to another constant instead, debug
> information could easily represent this.
>
> Another example that covers PHI nodes and conditionals:
>
> int
> x2 (int x, int y, int z)
> {
>   int c = z;
>   whatever0(c);
>   c = x;
>   whatever1();
>   if (some_condition)
>     {
>       whatever2();
>       c = y;
>       whatever3();
>     }
>   whatever4(c);
> }
>
> With SSA infrastructure, this program can be optimized to:
>
> int
> x2 (int x, int y, int z)
> {
>   int c;
>   # bb 1
>   whatever0(z_0(D));
>   whatever1();
>   if (some_condition)
>     {
>       # bb 2
>       whatever2();
>       whatever3();
>     }
>   # bb 3
>   # c_1 = PHI <x_2(D)(1), y_3(D)(2)>;
>   whatever4(c_1);
> }
>
> Note how, without debug annotations, c is only initialized just before
> the call to whatever4.  At all other points, the value of c would be
> unavailable to the debugger, possibly even wrong.
>
> If we were to annotate the SSA definitions forward-propagated into c
> versions as applying to c, we'd end up with all of x_2, y_3 and z_0

I> f you forward propagate any annotations, ever,
> applied to c throughout the entire function, in the absence of
> additional markers.
>
> Now, with the annotations proposed in this paper, what is initially:
>
> int
> x2 (int x, int y, int z)
> {
>   int c;
>   # bb 1
>   c_4 = z_0(D);
>  # DEBUG c z_0(D)
> whatever0(z_0(D));
> # DEBUG c x_2(D)
> whatever1();

> and then, at every one of the inspection points, we get the correct
> value for variable c.
Because you have added information you have no way of knowing.
How exactly did you compute that the call *definitely sets c to the
value of z_0*, and definitely sets the value of c to x_2.

This must be "may-information", because we don't know what the call does.

Ignoring this (the solution is to not assume anything at calls,
because you run the risk of gettng the wrong answer at meet points
later on!) your scheme is sufficient to get correct values, but not
correct locations.

However, value equivalene does not imply location equivalence, and all
of our debug formats deal with locations of variables, except for
constants.

IE If you translate this directly into DWARF3, as written, you will
claim that c and x_4 has the same location (since dwarf does not let
you say "it has the same value as x, but not the same location), and
thus incorrectly represent that p *x_4=5 modifies c if i were to do it
in the debugger.  Because of the may-problem, you will also claim the
same value/location for c and x_2, which you can't prove is right,
because you don't know what whatever1/2 actually does.

if all you want is the values you compute above, on SSA, you can
easily use a lattice to compute the same values you are going to
compute as you update the annotations on the fly.

(This is because it is a flow sensitive problem, and you want the flow
answers at each unique definition point, which SSA neatly provides,
except for calls, where you could hang it off the vops).

Tracking which values *definitely represent user values* is actually
quite easy at the tree level, and doesn't require any IR modification.

It may be worth doing at the RTL level, however, where the solution
requires making up program points at each definition site and
computing the dataflow problem in terms of them.
--Dan

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-18 23:35                                                         ` Daniel Berlin
@ 2007-12-19  5:50                                                           ` Alexandre Oliva
  2007-12-19 16:35                                                             ` Daniel Berlin
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-19  5:50 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

>> int c = z;
>> whatever0(c);
>> c = x;

> Because you have added information you have no way of knowing.
> How exactly did you compute that the call *definitely sets c to the
> value of z_0*, and definitely sets the value of c to x_2.

Err...  I guess you're thinking memory, global variables, alias
analysis and that sort of stuff.

None of this applies to gimple registers, which is all the annotations
are about.

Yes, aliasing, memory references and must- and may-alias do play a
role at the time of turning the annotations into equivalence classes,
when memory locations that are not stack slots allocated to gimple
regs that couldn't get hardware registers show up in the equivalence
classes.  These don't seem too hard to handle conservatively (removing
even may-alias assignment destinations from equivalence classes, as
well as non-local memory references at function calls and volatile
asms), at the expense of incompleteness in debug information, or in a
more lax way, at the potential expense of correctness.  I still don't
know exactly where to draw the line here, this note-propagation
algorithm is one that I haven't completely figured out yet.

> However, value equivalene does not imply location equivalence, and all
> of our debug formats deal with locations of variables, except for
> constants.

Dwarf enables arbitrary value expressions too.  There's some
discussion about lvalue vs rvalue in the document, and this is also
something that will take some experimenting.  I'm not entirely sure
where to draw the line, and I'm not entirely sure there is a perfect
answer.

For example, consider that a variable's home is a stack slot, but for
a loop in which it's not modified, it's held in a register.  Clearly
in this case the correct representation is for the variable to be in
both locations, both as lvalues.

But if the variable is further copied to other variables or locations,
these additoinal locations probably shouldn't be regarded as the same
variable any more; at most, as rvalues, but maybe not even that.

And then, if for some particular instruction, the variable in the
register needs to be copied to a different register class, then it is
correct to state that, between the copy and the use, the variable is
held in all three locations.

I'm still trying to figure out how to deal with overlaps between
variables, deciding whether locations are to be handled as lvalues or
rvalues, this sort of stuff.  It is indeed a difficult problem.

> IE If you translate this directly into DWARF3, as written, you will
> claim that c and x_4 has the same location (since dwarf does not let
> you say "it has the same value as x, but not the same location),

Yeah.  The $1M question is, when two variables are coalesced into one,
does this mean we now have two variables sharing the same location, or
do we just use the rvalue of one (which?) for the other?  Isn't this
like talking about body and spirit of variables?  After optimization,
I'm not even sure that talking about location (body) of variables make
much sense.

An important part of the design process was to distinguish between
source-level variables and implementation-level variables.  Our naming
of stack slots or pseudos as variables is just a mnemonic artifact for
us compiler engineers, to simplify debugging.  Which variables they
actually represent depends a lot on optimization decisions, perhaps
even more than on the original code.

So I talk about binding a source-level variable to a value, rather
than to a location.  Then, we figure out the locations that hold the
value, what other variables do, how they overlap, maybe how they're
used, and then figure out which locations should be assigned to each
source variable.  Tricky.

The only certainty I have right now is that the annotations I've
proposed enable us to keep track of values.  Distributing locations in
equivalence classes to different user variables is an open problem,
and there are various possible solutions that could make sense, and
that would be arguably correct.

> if all you want is the values you compute above, on SSA, you can
> easily use a lattice to compute the same values you are going to
> compute as you update the annotations on the fly.

This sounds interesting, but I don't quite follow what you mean.  Can
you elaborate, maybe give some examples?

> Tracking which values *definitely represent user values* is actually
> quite easy at the tree level, and doesn't require any IR modification.

But is the binding of user variables to user values for specified
ranges part of this representation too?  I don't see that it is, and
this is the gap I'm trying to fill with the debug annotations.

> It may be worth doing at the RTL level, however, where the solution
> requires making up program points at each definition site and
> computing the dataflow problem in terms of them.

/me mumbles something about RTL-SSA, that Jeff Law started working on
before we took this turn into Tree-SSA.  I'm sort of having to
introduce some limited form of SSA in RTL to infer global equivalence
classes out of the annotations, in the RTL var-tracking pass.  Fun...
If only we had sticked to a single IR...  (No personal preference, I
like both, but I'd rather not have to duplicate work so as to deal
with both)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19  5:50                                                           ` Alexandre Oliva
@ 2007-12-19 16:35                                                             ` Daniel Berlin
  2007-12-19 19:46                                                               ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Daniel Berlin @ 2007-12-19 16:35 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On 12/18/07, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:
>
> >> int c = z;
> >> whatever0(c);
> >> c = x;
>
> > Because you have added information you have no way of knowing.
> > How exactly did you compute that the call *definitely sets c to the
> > value of z_0*, and definitely sets the value of c to x_2.
>
> Err...  I guess you're thinking memory, global variables, alias
> analysis and that sort of stuff.
>

Yes, i mixed your examples up, i apologize.

> None of this applies to gimple registers, which is all the annotations
> are about.
>
>
> > However, value equivalene does not imply location equivalence, and all
> > of our debug formats deal with locations of variables, except for
> > constants.
>
> Dwarf enables arbitrary value expressions too.
Well, uh, no.

The only way to directly specify the value of a variable is for
constants. DW_AT_const_value does not allow location descriptions.

"An entry describing a variable or formal parameter whose value is
constant and not
represented by an object in the address space of the program, or an
entry describing a named
constant, does not have a location attribute. Such entries have a
DW_AT_const_value
attribute, whose value may be a  string or any of the constant data or
data block forms, as
appropriate for the representation of the variable's value. The value
of this attribute is the
actual constant value of the variable, represented as it would be on
the target architecture.
"

There are no other provisions in DWARF for describing the value of a
variable, it is expected you describe their locations using
DW_AT_location (which gives you the full power of location
descriptions, but requires you to return a location, not a value)
> There's some
> discussion about lvalue vs rvalue in the document, and this is also
> something that will take some experimenting.  I'm not entirely sure
> where to draw the line, and I'm not entirely sure there is a perfect
> answer.
I'm still curious where you think it describes value expressions for
variables other than constants (which again, can't use the location
description language)

Again, i'd support such an extension, but it does not currently exist.
Rest answers in other message.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 16:35                                                             ` Daniel Berlin
@ 2007-12-19 19:46                                                               ` Alexandre Oliva
  2007-12-19 20:39                                                                 ` Daniel Jacobowitz
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-19 19:46 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Diego Novillo, Mark Mitchell, Robert Dewar, Ian Lance Taylor,
	Richard Guenther, gcc-patches, gcc

On Dec 19, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> On 12/18/07, Alexandre Oliva <aoliva@redhat.com> wrote:

>> Dwarf enables arbitrary value expressions too.
> Well, uh, no.

> The only way to directly specify the value of a variable is for
> constants. DW_AT_const_value does not allow location descriptions.

DW_AT_const_value is irrelevant for location lists.  It's DW_OP_* that
I'm talking about.

That said...  I can't find any more the equivalent of
DW_CFA_val_expression in DW_OP_*s that could be used in location
expressions.  I just *knew* it was there, but I guess I just imagined
it.  This is embarrassing.

At this point, there are three options available:

- go back to the drawing board

- discard altogether expressions that don't represent lvalues (maybe
  don't even keep track of them)

- introduce a DWARF extension that enables value expressions to be
  used in location lists (say DW_OP_value, DW_OP_temp_location, or
  even DW_OP_self_location (*))

(*) maps value to a virtual location that, if dereferenced, evaluates
to the value.  Could be "easily" implemented through a virtual
out-of-range base address, plus the offset that represents the value
on dereference, but there are many other ways to implement this in
debug information consumers.

> I'm still curious where you think it describes value expressions for
> variables other than constants

Me too :-)  :-(

Thanks for drawing my attention to this incorrect assumption I made
about DWARF location lists.

> i'd support such an extension

Cool.  Do you happen to know the procedure to propose DWARF standard
extensions?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-19 19:46                                                               ` Alexandre Oliva
@ 2007-12-19 20:39                                                                 ` Daniel Jacobowitz
  0 siblings, 0 replies; 150+ messages in thread
From: Daniel Jacobowitz @ 2007-12-19 20:39 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Daniel Berlin, Diego Novillo, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Wed, Dec 19, 2007 at 05:02:52PM -0200, Alexandre Oliva wrote:
> That said...  I can't find any more the equivalent of
> DW_CFA_val_expression in DW_OP_*s that could be used in location
> expressions.  I just *knew* it was there, but I guess I just imagined
> it.  This is embarrassing.

I am pretty sure such an extension has already been proposed.  Might
want to check with the committee (see dwarf.org).

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-17 20:34                                             ` Alexandre Oliva
  2007-12-17 20:45                                               ` Diego Novillo
@ 2007-12-31 15:40                                               ` Richard Guenther
  1 sibling, 0 replies; 150+ messages in thread
From: Richard Guenther @ 2007-12-31 15:40 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Daniel Berlin, Mark Mitchell, Robert Dewar,
	Ian Lance Taylor, gcc-patches, gcc, Michael Matz

On Dec 17, 2007 9:28 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 17, 2007, Diego Novillo <dnovillo@google.com> wrote:
>
> > On 12/17/07 12:51, Alexandre Oliva wrote:
> >> I guess I'm to blame, for having naÃ¯vely put the code out without as
> >> much as a design and goals document
>
> > Yes, you are.
>
> Wow, thanks.  At least we agree on something! ;-)
>
> > You need to provide such a document now.
>
> Can't I instead provide it when it's ready?
>
> You know, it wasn't me who asked to have the thing developed in the
> open.  I didn't push it out just so that people who didn't want to
> understand it could beat on it before it was ready to defend itself.
> I put it out because there was an offer for contribution.

Yeah - that was me...

Fact is we had a discussion about debug information earlier this year from which
I took the conclusion that most people would appreciate an on-the-side
representation
to address the most limiting design issue of GCCs tree representation (only one
variable per SSA_NAME to track).

So I had the impression you worked in that direction and offered help.  Now, you
seemed to have come to the conclusion that this approach would not help your
goal and started on a different route.  Now the "mistake" maybe was to
before starting
this not to revive the former discussion based on your findings and
elaborate on your
goals.  (I realize this is the way development for GCC works most of
the time, but
this is not what I consider good practice for open source development)

Now - I think your goal is valid, and the choice of implementation might even be
the best one for it.  But we (the GCC community) have not yet decided if the
combination of "your goal" and "this best implementation" is what we want.
(I haven't decided myself either ;))

So my suggestion for you is to continue with your implementation and produce a
white paper about your design (which you ideally would present during the next
GCC summit, where we should do a discussion on this topic in some form).

We (myself and Matz) will continue to implement what is "our goal" (because we
internally committed to it, and to see limitations or problems with
the approach)
and possibly also will present about its outcome at the summit.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-15 21:41                                 ` Alexandre Oliva
  2007-12-16  3:15                                   ` Daniel Berlin
@ 2007-12-16 21:42                                   ` Mark Mitchell
  1 sibling, 0 replies; 150+ messages in thread
From: Mark Mitchell @ 2007-12-16 21:42 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Diego Novillo, Robert Dewar, Ian Lance Taylor, Richard Guenther,
	gcc-patches, gcc

Alexandre Oliva wrote:

>> Yes, please.  I would very much like to see an abstract design
>> document on what you are trying to accomplish.
> 
> Other than the ones I've already posted, here's one:
> 
> http://dwarfstd.org/Dwarf3Std.php
> 
> Seriously.  There is a standard for this stuff. 

That's the specification for the encoding format.  I agree with you that
emitting incorrect debugging information, in the sense of declaring that
the location of a variable is in one place, even though its value is not
available in that place, is bad.  In -O0 code, I consider it a serious bug.

In -O2 code, I think it's still a bug, but with our current
infrastructure, we may have little choice: we either deny all knowledge
of the variable's location, or give one that's sometimes incorrect.
Which alternative is better depends on what you're trying to do with the
information; for interactive debugging, mostly-right is probably better
than nothing, whereas for some programmatic activities, the opposite may
be true.

If your goal is to avoid the information ever being wrong -- without
worrying about whether it is complete -- there is of course a trivial
solution: do not emit the information.  That is not a serious
suggestion, but it does provide a path to a serious suggestion, which I
gave earlier: conservatively emit location information you provide based
on what you can prove at the time you generate debugging information.
For example, if the value of "x" is in a register, and you cross a call
which might clobber that register value, then emit debugging information
that says that at that point the value is unavailable.  You could
probably do this kind of thing with relatively few changes to the GCC
internal representation; you would run a pass before debug-information
generation that attempted to prove dataflow properties about variables
and told you where values could reliably be found.

Your earlier messages, however, suggest that you are trying to do
something harder: emit information that is essentially both complete (in
the sense of providing as much information as possible about the
locations and values of variables) and correct (in the sense of never
giving incorrect information).  If you want to do that, you're going to
have to answer the harder questions, like "what line number corresponds
to this address?" and "what should the debugging information say that
the value of a variable is when it has been optimized away?"

If that's still your goal, then pointing at the DWARF3 specification
doesn't help.  Diego and I are asking you to confront these fundamental
questions about what information you want to provide and what the
correctness criteria are.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 21:26                 ` Ian Lance Taylor
  2007-11-09  9:53                   ` Robert Dewar
@ 2007-11-09  9:55                   ` Seongbae Park (박성배, 朴成培)
  2007-11-09 11:08                     ` Robert Dewar
  1 sibling, 1 reply; 150+ messages in thread
From: Seongbae Park (박성배, 朴成培) @ 2007-11-09  9:55 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

I think both sides are talking over each other, partially because two
different goals are in mind.
IMHO, there are two extremes when it comes to the so called debugging
optimized code.

One camp wants the full debuggability (let's call them debuggability
crowd) - which means
they want to know the value of any valid program state anywhere, and
wants to set breakpoint anywhere
and be able to even change the program state anywhere as if there was
an assignment at the point
the debugger stopped the program at. This camp still wants better
performance (like everyone else)
but they don't want to sacrifice the debuggability for performance,
because they rely on these.

The other camp is the performance crowd, where they want the absolute
best performance
but they still want as much debug information possible. Most people
fall in this camp
and this is what gcc has implemented. This camp doesn't want to change the code
so that they can get better debugging information.

Of course, the real world is somewhere in between, but in practice,
most people fall in the latter group
(aka performance crowd).
Alexandre's proposal would make it possible to make the debuggability
crowd happy
at some unknown cost of compile-time/runtime cost and maintenance cost.

Richiard's proposal (from what I can understand)
would make performance crowd happy, since it would be
less costly to implement than Alexandre's and would provide
incrementally better debugging information
than current,
but it doesn't seem to be that it would make the debuggability crowd happy
(or at least the extremists among debuggability crowd).

So I think the difference in the opinion isn't so much as Alexandre's
proposal is good or bad,
but rather whether we aim to make the debuggability crowd happy or the
performance crowd happy
or both.
Ideally we should serve both groups of users,
but there's non-trivial ongoing maintenance cost for having two
different approaches.

So I'd like to ask both Alexandre and Richard
whether they each can satisfy the other camp,
that is, Alexandre to come up with a way to tweak his proposal so that
it is possible to keep the compile time cost comparable to what is
right now with similar or  better debug information,
and with reasonable maintenance cost,
and Richard whether his proposal can satisfy the debuggability crowd.
Of course, another possible opinion would be to ignore the debuggability crowd
on the ground that they are not important or big.
I personally think it's a mistake to do so, but you may disagree on that point.

Seongbae

On 08 Nov 2007 12:50:17 -0800, Ian Lance Taylor <iant@google.com> wrote:
> Alexandre Oliva <aoliva@redhat.com> writes:
>
> > So...  The compiler is outputting code that tells other tools where to
> > look for certain variables at run time, but it's putting incorrect
> > information there.  How can you possibly argue that this is not a code
> > correctness issue?
>
> I don't see any point to going around this point again, so I'll just
> note that I disagree.
>
>
> > >> >> > We've fixed many many bugs and misoptimizations over the years due to
> > >> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a mistake
> > >> >> > we've made in the past.
> > >> >>
> > >> >> That's a valid concern.  However, per this reasoning, we might as well
> > >> >> push every operand in our IL to separate representations, because
> > >> >> there have been so many bugs and misoptimizations over the years,
> > >> >> especially when the representation didn't make transformations
> > >> >> trivially correct.
> > >>
> > >> > Please don't use strawman arguments.
> > >>
> > >> It's not, really.  A reference to an object within a debug stmt or
> > >> insn is very much like any other operand, in that most optimizer
> > >> passes must keep them up to date.  If you argue for pushing them
> > >> outside the IL, why would any other operands be different?
> >
> > > I think you misread me.  I didn't argue for pushing debugging
> > > information outside the IL.  I argued against a specific
> > > implementation--DEBUG_INSN--based on our experience with similar
> > > implementations.
> >
> > Do you remember any other notes that contained actual rtx expressions
> > and expected optimization passes to keep them accurate?
>
> No.
>
> > Do you think
> > we'd gain anything by moving them to a separate, out-of-line
> > representation?
>
> I don't know.  I don't see such a proposal on the table, and I don't
> have one myself, so I don't know how to evaluate it.
>
> Ian
>

-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com"

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  9:55                   ` Seongbae Park (박성배, 朴成培)
@ 2007-11-09 11:08                     ` Robert Dewar
  0 siblings, 0 replies; 150+ messages in thread
From: Robert Dewar @ 2007-11-09 11:08 UTC (permalink / raw)
  To: "Seongbae Park (¹Ú¼º¹è,
	ÚÓà÷ÛÆ)"
  Cc: Ian Lance Taylor, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Seongbae Park (Â¹ÃšÂ¼ÂºÂ¹Ã¨, ÃšÃ“Ã Ã·Ã›Ã†) wrote:
> Most people
> fall in this camp
> and this is what gcc has implemented. This camp doesn't want to change the code
> so that they can get better debugging information.

This is definitely not the case. At least among our users, very few fall
into this camp. But in any case I think we all agree that there should 
be a mode in which this is the emphasis.
> 
> Of course, the real world is somewhere in between, but in practice,
> most people fall in the latter group
> (aka performance crowd).

You must live in a strange world, after all think about it, lots of
people find Java quite fine, even though it throws away a lot of
performance.

> Of course, another possible opinion would be to ignore the debuggability crowd
> on the ground that they are not important or big.

Actually I think big serious users with programs in the millions of
lines category are much more likely to be in the "debuggability" crowd.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 22:57         ` Ian Lance Taylor
                             ` (2 preceding siblings ...)
  2007-11-08  5:01           ` Alexandre Oliva
@ 2007-11-08  8:58           ` Paolo Bonzini
  3 siblings, 0 replies; 150+ messages in thread
From: Paolo Bonzini @ 2007-11-08  8:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc

> What standards are you talking about?  I'm not aware of any standard
> for debuggability of optimized code.

As a developer of gcc, it would be *invaluable* in debugging for example 
bootstrap comparison failures.  There I have to debug side-by-side the 
stage1 and the stage2 compiler, and no way I can compile the latter 
unoptimized...

As a user more than a developer of gcc this days, definitely yes.  I 
often have programs that run for say 1 minute, and I *know* the bug 
comes up after 50 seconds.  It's already unnerving enough to debug 
programs like this (I often start ten gdbs at the same time, launch them 
to the magic point while I'm taking a coffee, and go back working!); and 
it's only worse if you're doing it on -O0 binaries that take 5 minutes 
to reach the point you're trying to debug.

Backward debugging would also be a possibility for me, much more 
productive than debuggability of optimized code, but since backward 
debugging is pie-in-the-sky, debuggability of optimized code is also good.

Paolo

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC (was: Re: [vta] don't let  debug insns get in the way of simple vect reduction)
  2007-11-07  7:52   ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Alexandre Oliva
  2007-11-07 16:16     ` Ian Lance Taylor
@ 2007-11-07 17:20     ` Michael Matz
  2007-11-07 18:45       ` Designs for better debug info in GCC Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-07 17:20 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Hi,

On Wed, 7 Nov 2007, Alexandre Oliva wrote:

> > With the different approach I and Matz started (and to which we didn't 
> > yet spend enough time to get debug information actually output - but I 
> > hope we'll get there soon), on the tree level the extra information is 
> > stored in a bitmap per SSA_NAME (where necessary).
> 
> This will fail on a very fundamental level.  Consider code such as:
> 
> f(int x, int y) {
>   int c;
>   /* other vars */
> 
>   c = x;
>   do_something_with(c, ...); // doesn't touch x or y
> 
>   c = y;
>   do_something_else_with(c, ...); // doesn't touch x or y
> }
> 
> where do_something_*with are actually complex computations, be that
> explicit code, be it macros or inlined functions.
> 
> This can (and should) be trivially optimized to:
> 
> f(int x, int y) {
>   /* other vars */
> 
>   do_something_with(x, ...); // doesn't touch x or y
> 
>   do_something_else_with(y, ...); // doesn't touch x or y
> }
> 
> But now, if I 'print c' in a debugger in the middle of one of the
> do_something_*with expansions, what do I get?
> 
> With the approach I'm implementing, you should get x and y at the
> appropriate points, even though variable c doesn't really exist any
> more.
> 
> With your approach, what will you get?

x and y at the appropriate part.  Whatever holds 'x' at a point (SSA name, 
pseudo or mem) will also mention that it holds 'c'.  At a later point 
whichever holds 'y' will also mention in holds 'c' .

> There isn't any assignment to x or y you could hook your notes to.

But there are _places_ for x and y.  Those places can and are also 
associated with c.

> Even if you were to set up side representations to model the additional 
> variables that end up mapped to the incoming arguments, you'd have 'c' 
> in both, and at the entry point.  How would you tell?

I don't understand the question.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 17:20     ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Michael Matz
@ 2007-11-07 18:45       ` Alexandre Oliva
  2007-11-08 10:23         ` Michael Matz
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-07 18:45 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  7, 2007, Michael Matz <matz@suse.de> wrote:

> On Wed, 7 Nov 2007, Alexandre Oliva wrote:

>> This will fail on a very fundamental level.  Consider code such as:
>> 
>> f(int x, int y) { int c; /* other vars */
>>  c = x; do_something_with(c, ...); // doesn't touch x or y
>>  c = y; do_something_else_with(c, ...); // doesn't touch x or y

>> This can (and should) be trivially optimized to:
>> 
>> f(int x, int y) { /* other vars */
>>  do_something_with(x, ...); // doesn't touch x or y
>>  do_something_else_with(y, ...); // doesn't touch x or y
>> 
>> But now, if I 'print c' in a debugger in the middle of one of the
>> do_something_*with expansions, what do I get?
>> 
>> With the approach I'm implementing, you should get x and y at the
>> appropriate points, even though variable c doesn't really exist any
>> more.
>> 
>> With your approach, what will you get?

> x and y at the appropriate part.  Whatever holds 'x' at a point (SSA name, 
> pseudo or mem) will also mention that it holds 'c'.  At a later point 
> whichever holds 'y' will also mention in holds 'c' .

I.e., there will be two parallel locations throughout the entire
function that hold the value of 'c'.  Something like:

f(int x /* but also c */, int y /* but also c */) { /* other vars */
 do_something_with(x, ...); // doesn't touch x or y
 do_something_else_with(y, ...); // doesn't touch x or y

Now, what will you get if you 'print c' in the debugger (or if any
other debug info evaluator needs to tell what the value of user
variable c is) at a point within do_something_with(c,...) or
do_something_else_with(c)?


Now consider that f is inlined into the following code:

int g(point2d p) {
  /* lots of code */
  f(p.x, p.y);
  /* more code */
  f(p.y, p.x);
  /* even more code */
}

g gets fully scalarized, so, before inlining, we have:

int g(point2d p) {
  int p$x = p.x, int p$y = p.y;
  /* lots of code */
  f(p$x, p$y);
  /* more code */
  f(p$y, p$x);
  /* even more code */
}

after inlining of f, we end up with:

int g(point2d p) {
  int p$x = p.x, int p$y = p.y;
  /* lots of code */
  { int f()::x.1 /* but also f()::c.1 */ = p$x, f()::y.1 /* but also f()::c.1 */ = p$y;
    { /* other vars */
      do_something_with(f()::x.1, ...); // doesn't touch x or y
      do_something_else_with(f()::y.1, ...); // doesn't touch x or y
  } }
  /* more code */
  { int f()::x.2 /* but also f()::c.2 */ = p$x, f()::y.2 /* but also f()::c.2 */ = p$y;
    { /* other vars */
      do_something_with(f()::x.2, ...); // doesn't touch x or y
      do_something_else_with(f()::y.2, ...); // doesn't touch x or y
  } }
  /* even more code */
}

then, we further optimize g and get:

int g(point2d p) {
  int p$x /* but also f()::x.1, f()::c.1, f()::y.2, f()::c.2 */ = p.x;
  int p$y /* but also f()::y.1, f()::c.1, f()::x.2, f()::c.2 */ = p.y;
  /* lots of code */
  { { /* other vars */
      do_something_with(p$x, ...); // doesn't touch x or y
      do_something_else_with(p$y, ...); // doesn't touch x or y
  } }
  /* more code */
  { { /* other vars */
      do_something_with(p$y, ...); // doesn't touch x or y
      do_something_else_with(p$x, ...); // doesn't touch x or y
  } }
  /* even more code */
}

and now, if you try to resolve the variable name 'c' to a location or
a value within any of the occurrences of do_something_*with(), what do
you get?  What ranges do you generate for each of the variables
involved?

>> There isn't any assignment to x or y you could hook your notes to.

> But there are _places_ for x and y.  Those places can and are also 
> associated with c.

This just goes to show that there's a fundamental mistake in the
mapping.  Instead of mapping user-level concepts to implementation
concepts, which is what debug information is meant to do, you're
mapping implementation details to user-level concepts.

Unfortunately, this mapping is not biunivocal.  The chosen
representation is fundamentally lossy.  It can't possibly get you
accurate debug information.  And the above is just an initial example
of the loss of information that will lead to *incorrect* debug
information, which is far worse than *incomplete* information.

>> Even if you were to set up side representations to model the additional 
>> variables that end up mapped to the incoming arguments, you'd have 'c' 
>> in both, and at the entry point.  How would you tell?

> I don't understand the question.

See the discussion about resolving 'c' above.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-07 18:45       ` Designs for better debug info in GCC Alexandre Oliva
@ 2007-11-08 10:23         ` Michael Matz
  2007-11-08 14:02           ` Robert Dewar
  2007-11-08 16:32           ` Alexandre Oliva
  0 siblings, 2 replies; 150+ messages in thread
From: Michael Matz @ 2007-11-08 10:23 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Guenther, gcc-patches, gcc

Hi,

On Wed, 7 Nov 2007, Alexandre Oliva wrote:

> > x and y at the appropriate part.  Whatever holds 'x' at a point (SSA 
> > name, pseudo or mem) will also mention that it holds 'c'.  At a later 
> > point whichever holds 'y' will also mention in holds 'c' .
> 
> I.e., there will be two parallel locations throughout the entire 
> function that hold the value of 'c'.

No.  For some PC locations the location of 'c' will happen to be the same 
as the one holding 'x', and for a different set of PC locations it will be 
the one also holding 'y'.  The request "what's in 'c'" from a debugger 
only makes sense when done from a certain program counter.  Depending on 
that the location of 'c' will be different.  In the case from above both 
locations might exist in parallel throughout the entire function, but they 
don't hold 'c' in parallel.

> Something like:
> 
> f(int x /* but also c */, int y /* but also c */) { /* other vars */

"int x /* but also c */, int y /* but also c */" implies that x == y 
already, at which point the compiler will most probably have allocated 
just one place for x and y (and c) anyway ...

>  do_something_with(x, ...); // doesn't touch x or y
>  do_something_else_with(y, ...); // doesn't touch x or y
> 
> Now, what will you get if you 'print c' in the debugger (or if any
> other debug info evaluator needs to tell what the value of user
> variable c is) at a point within do_something_with(c,...) or
> do_something_else_with(c)?

... so the answer would be "whatever is in that common place for x,y and 
c".  If the compiler did not allocate one place for x and y the answer 
still would be "whatever is in the place of 'y'", because that value is 
life, unlike 'x'.

> Now consider that f is inlined into the following code:
> 
> int g(point2d p) {
>   /* lots of code */
>   f(p.x, p.y);
>   /* more code */
>   f(p.y, p.x);
>   /* even more code */
> }
> 
> g gets fully scalarized, so, before inlining, we have:
> 
> int g(point2d p) {
>   int p$x = p.x, int p$y = p.y;
>   /* lots of code */
>   f(p$x, p$y);
>   /* more code */
>   f(p$y, p$x);
>   /* even more code */
> }
> 
> after inlining of f, we end up with:
> 
> int g(point2d p) {
>   int p$x = p.x, int p$y = p.y;
>   /* lots of code */
>   { int f()::x.1 /* but also f()::c.1 */ = p$x, f()::y.1 /* but also f()::c.1 */ = p$y;

Here you punt.  How come that f::c is actually set to p$x?  I don't see 
any assignment and in fact no declaration for c in f.  If you had one 
_that_ would be the place were the connection between p$x and 'c' would 
have been made and everything would fall in place.

>     { /* other vars */
>       do_something_with(f()::x.1, ...); // doesn't touch x or y
>       do_something_else_with(f()::y.1, ...); // doesn't touch x or y
>   } }
>   /* more code */
>   { int f()::x.2 /* but also f()::c.2 */ = p$x, f()::y.2 /* but also f()::c.2 */ = p$y;
>     { /* other vars */
>       do_something_with(f()::x.2, ...); // doesn't touch x or y
>       do_something_else_with(f()::y.2, ...); // doesn't touch x or y
>   } }
>   /* even more code */
> }
> 
> then, we further optimize g and get:
> 
> int g(point2d p) {
>   int p$x /* but also f()::x.1, f()::c.1, f()::y.2, f()::c.2 */ = p.x;
>   int p$y /* but also f()::y.1, f()::c.1, f()::x.2, f()::c.2 */ = p.y;
>   /* lots of code */
>   { { /* other vars */
>       do_something_with(p$x, ...); // doesn't touch x or y
>       do_something_else_with(p$y, ...); // doesn't touch x or y
>   } }
>   /* more code */
>   { { /* other vars */
>       do_something_with(p$y, ...); // doesn't touch x or y
>       do_something_else_with(p$x, ...); // doesn't touch x or y
>   } }
>   /* even more code */
> }
> 
> and now, if you try to resolve the variable name 'c' to a location or
> a value within any of the occurrences of do_something_*with(), what do
> you get?  What ranges do you generate for each of the variables
> involved?

It's not possible that p$x _and_ p$y are f()::c.1 at the same time, so the 
above examples are all somehow invalid.  Except if p$x and p$y are somehow 
the same value, and if that's the case it's enough and exactly correct if 
the range of f()::c.1 covers the whole body of your function 'g' referring 
to exactly the one location of f()::c.1, f()::c.2, p$x and p$y.

> Unfortunately, this mapping is not biunivocal.  The chosen 
> representation is fundamentally lossy.

What's fundamentally lossy are transformations done by the compiler.  E.g. 
in this simple case:

int f(int y) {
  int x = 2 * y;
  return x + 2;
}

If the compiler forward-props 2*y into the single use and simplifies:

  return (y+1)*2;

then the value 2*y is never actually calculated anymore, not in any 
register, not in any local variable, nowhere.  There's no way debug 
information could generally rectify this loss of information.  As DWARF is 
capable to encode complete expressions it would be possible in this case 
to express it, because the inverse of the above function is easily 
determined.  In case of more complicated expressions that's not possible 
anymore and you lose.

So, if the value is never ever computed anymore debug information won't 
help you.  You either have to force the value you're interested in to be 
life, or live with the impreciseness.

Forcing some values life is possible, but is independend of generating 
debug information as exact as possible.  It must be independend because 
forcing values life is going to change the code, something which mere 
generation of debug information is not allowed to do.

So, our mapping is as accurate as your's.  If a value is computed in some 
place which can be traced back to some user-declared variable then this 
will be expressed.  If the value is not available then of course it also 
can't be reflected in the debug information (only as "optimized out").  It 
seems in your branch you also force some values life IIUC.  That's okay 
but doesn't have to do with generating precise debug information as shown 
above.

Even for forcing values life there are easier mechanisms.  We for instance 
experimented with volatile asms, which simply refer to the values in 
question (and unsurprisingly we also were interested in formal arguments 
of inlined functions):

  int f (int x) {
    force_use (x);
    ... old body ...
  }

You have to switch off any propagation into force_use(x), so that the 
original value of 'x' and the connection to the DECL of 'x' lives until 
the end of the compilation pipeline.  That's a rather simple hack doing 
exactly what's necessary: it forces GCC to actually have a place for the 
value of 'x' at the function entry point, which also survives inlining.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 10:23         ` Michael Matz
@ 2007-11-08 14:02           ` Robert Dewar
  2007-11-08 15:13             ` H.J. Lu
                               ` (2 more replies)
  2007-11-08 16:32           ` Alexandre Oliva
  1 sibling, 3 replies; 150+ messages in thread
From: Robert Dewar @ 2007-11-08 14:02 UTC (permalink / raw)
  To: Michael Matz; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

My general feelings on this subject:

1. I don't think we should care much about the ability to
*SET* values of variables in optimized code. You can
definitely do without that. So if a variable exists in
two places, no problem, just register one of them.

2. It is much more important to have reasonable debugging
for most users than the last mile of optimization. For me
we should ensure that -O1 is still reasonably debuggable.
The switch to GCC 4, at least in the Ada context, has
significantly degraded -O1 debugging. I have found for
instance that debugging the GNAT compiler itself, -O1
used to be perfectly fine, but now far too many arguments
and variables disappear.

3. The quality of code at -O0 is really terrible compared
to the competition (at least in the case of Ada), and
large scale programs are just too big at -O0 to be
practical (there is a big difference between a 50
megabyte image and a 100 megabyte image). So we really
cannot rely on using -O0 for debugging. At -O1 we are
more than competitive for performance with competing
compilers.

4. In any case, most users really prefer to test and
debug at the same optimization level that they will
use for delivery. As noted above, -O0 is seldom practical
for delivery (furthermore the voluminous extra code makes
certification at the object level more work). -O1 is a
fine compromise from a performance point of view, but
needs to be debuggable.

5. Among our users we have relatively few who care about
even a factor of 2 in performance, and VERY few who care
about 10%. On the other hand we have lots of customers
who definitely have severe problems with the lack of
debuggability of -O1 code.

5. We have talked sometime about a -Od level or somesuch
that would be fully debuggable. That's an interesting
idea, but I think in practice it is more reasonable to
try to ensure good debugging at -O1. Optimizations that
significantly intefere with debugging should be moved
to -O2. I think it is fine for -O2 to mean "optimize
the heck out of the program, I really care about the
last ounce of optimization, and I know debuggability
will suffer."

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 14:02           ` Robert Dewar
@ 2007-11-08 15:13             ` H.J. Lu
  2007-11-08 16:11             ` Michael Matz
  2007-11-08 16:37             ` Alexandre Oliva
  2 siblings, 0 replies; 150+ messages in thread
From: H.J. Lu @ 2007-11-08 15:13 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Michael Matz, Alexandre Oliva, Richard Guenther, gcc-patches, gcc

On Thu, Nov 08, 2007 at 08:59:18AM -0500, Robert Dewar wrote:
> 2. It is much more important to have reasonable debugging
> for most users than the last mile of optimization. For me
> we should ensure that -O1 is still reasonably debuggable.
> The switch to GCC 4, at least in the Ada context, has
> significantly degraded -O1 debugging. I have found for
> instance that debugging the GNAT compiler itself, -O1
> used to be perfectly fine, but now far too many arguments
> and variables disappear.
> 

With gcc 3.4, I can debug binutils at -O1 and -O2 in some cases.
But with gcc 4, I have to use -O0 if I want to do any serious
debug on binutils.



H.J.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 14:02           ` Robert Dewar
  2007-11-08 15:13             ` H.J. Lu
@ 2007-11-08 16:11             ` Michael Matz
  2007-11-08 17:48               ` Alexandre Oliva
  2007-11-08 16:37             ` Alexandre Oliva
  2 siblings, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-08 16:11 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Alexandre Oliva, Richard Guenther, gcc-patches, gcc

Hi,

On Thu, 8 Nov 2007, Robert Dewar wrote:

> significantly degraded -O1 debugging. I have found for
> instance that debugging the GNAT compiler itself, -O1
> used to be perfectly fine, but now far too many arguments
> and variables disappear.

Yes.  That problem is addressed by Alexandre's approach and by ours.  If 
you want to be really sure no arguments disappear (necessary for instance 
for meaningful use of systemtap) you also need to inhibit some 
transformations, which can be done under a certain option (which might or 
might not be on by default for -O1).

> 3. The quality of code at -O0 is really terrible compared
> to the competition (at least in the case of Ada), and
> large scale programs are just too big at -O0 to be
> practical (there is a big difference between a 50
> megabyte image and a 100 megabyte image).

This is a problem on it's own.  We're planning to work on this somewhen 
during the next months, i.e. improve code quality at -O0 at least to a 
point it was in the 3.x line of GCC.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 16:11             ` Michael Matz
@ 2007-11-08 17:48               ` Alexandre Oliva
  2007-11-09 12:46                 ` Michael Matz
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 17:48 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Michael Matz <matz@suse.de> wrote:

> If you want to be really sure no arguments disappear (necessary for
> instance for meaningful use of systemtap) you also need to inhibit
> some transformations,

I'm not aware of any situations in which we must force an argument not
to disappear.  All of the problems I'm aware of are those in which the
argument is there, we're just missing debug information for it.  If
you have information about needs for preserving arguments that are
actually dead, please send it my way.

> This is a problem on it's own.  We're planning to work on this somewhen 
> during the next months, i.e. improve code quality at -O0 at least to a 
> point it was in the 3.x line of GCC.

Aah, I guess the problem here is all the gimple-introduced temps,
right?  That our current -O0 is more like -O-1? :-)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 17:48               ` Alexandre Oliva
@ 2007-11-09 12:46                 ` Michael Matz
  2007-11-12 18:31                   ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-09 12:46 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

Hi,

On Thu, 8 Nov 2007, Alexandre Oliva wrote:

> > If you want to be really sure no arguments disappear (necessary for 
> > instance for meaningful use of systemtap) you also need to inhibit 
> > some transformations,
> 
> I'm not aware of any situations in which we must force an argument not 
> to disappear.  All of the problems I'm aware of are those in which the 
> argument is there, we're just missing debug information for it.  If you 
> have information about needs for preserving arguments that are actually 
> dead, please send it my way.

------------------------------------
static inline int foo(int i)
{
  return i-1;
}

int foobar(int j)
{
  return foo(j+2);
}

int main(int argc, char **argv)
{
  return foobar(argc);
}
------------------------------------

And similar examples.  Depending on circumstances the formal argument 'i' 
of "foo" might be optimized away.  If you want to use systemtap to show 
the actual arguments for all calls to foo, even the inlined ones, then you 
somehow have to make sure that the value of 'i' itself is not optimized 
away.  Again, in this specific case, due to the simplicity of the involved 
expression, it would theoretically be possible to express this with just 
DWARF expressions (relating to the formal argument 'j' of foobar).  In 
more complicated situtation that's not possible anymore, at which point 
you have to force the value of 'i' being live, if you want to be sure that 
systemtap works in all cases.

> > during the next months, i.e. improve code quality at -O0 at least to a 
> > point it was in the 3.x line of GCC.
> 
> Aah, I guess the problem here is all the gimple-introduced temps,
> right?  That our current -O0 is more like -O-1? :-)

Indeed :)  Perhaps also doing a simple DCE and local regalloc, none of 
which inhibits debugging.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09 12:46                 ` Michael Matz
@ 2007-11-12 18:31                   ` Alexandre Oliva
  2007-11-13 13:56                     ` Michael Matz
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-12 18:31 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov  9, 2007, Michael Matz <matz@suse.de> wrote:

> static inline int foo(int i)
> {
>   return i-1;
> }

> int foobar(int j)
> {
>   return foo(j+2);
> }

> int main(int argc, char **argv)
> {
>   return foobar(argc);
> }
> ------------------------------------

> And similar examples.  Depending on circumstances the formal argument 'i' 
> of "foo" might be optimized away.

With the design I've proposed, it is possible to compute the value of
i, for the end result is live, which ensures that the inputs used to
compute i are not completely optimized away.  This means at any point
in the execution of foo it is possible to compute i based on the
inputs (argc or j) or the outputs (the return values of foo, foobar
and main), no matter how much inlining takes place.  Now, it is
perfectly possible that foo is completely optimized away, such that no
instruction remains in the scope in which i is live.  In this case,
it's debatable whether i still remains, but we could still emit debug
information for it if we wanted to.

> If you want to use systemtap to show the actual arguments for all
> calls to foo, even the inlined ones, then you somehow have to make
> sure that the value of 'i' itself is not optimized away.

As I wrote before, I'm not aware of any systemtap bug report about a
situation in which an argument was actually optimized away.  I
wouldn't go as far as stopping the optimization just so that systemtap
can monitor the code.  I'm not working on changing optimization to
improve debugging, I'm working on fixing debug information such that
it matches optimizations that occur.

> at which point you have to force the value of 'i' being live, if you
> want to be sure that systemtap works in all cases.

I don't want to be sure of that.  At least that was not the problem I
was asked to solve.  And, indeed, it's not solvable without disabling
optimizations.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 18:31                   ` Alexandre Oliva
@ 2007-11-13 13:56                     ` Michael Matz
  2007-11-24  2:34                       ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-13 13:56 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

Hi,

On Mon, 12 Nov 2007, Alexandre Oliva wrote:

> With the design I've proposed, it is possible to compute the value of i, 

No.  Only if the function is reservible.  There are many which aren't:

static inline int foo(int i)
{
  return i % 10;
}
int foobar(int j)
{
  return foo(j % 20);
}
int main(int argc, char **argv)
{
  return foobar(argc);
}

If foo is inlined and foobar simplified (to return j%10), the value for 
'i' (j % 20) can not be recovered anymore.  Hence for a 100% solution (and 
for systemtap you want that) you have no choice than to force the value to 
be live, e.g. by a volatile asm or the like.

> As I wrote before, I'm not aware of any systemtap bug report about a
> situation in which an argument was actually optimized away.

I think it all started from PR23551.  For us it also happened in the 
kernel in namei.c, where real_lookup is inlined sometimes, and it's 
arguments are missing.  That might or might not be reversible functions, 
so your scheme perhaps would have helped there.  But generally it won't 
solve the problem for good.

> I wouldn't go as far as stopping the optimization just so that systemtap 
> can monitor the code.

Like I said, at some point you have to or accept that some code remains to 
be not introspectable.

> > at which point you have to force the value of 'i' being live, if you 
> > want to be sure that systemtap works in all cases.
> 
> I don't want to be sure of that.  At least that was not the problem I 
> was asked to solve.

Then I'm probably still confused what problem you're actually trying to 
solve.  If you don't want to be sure you get precise location information 
100% of the time, then what percentage are you required to get?  And how 
do you measure this?  Or is the task rather "emit better debug info"?  But 
that can be done also in our scheme, so why is there a need for DEBUG_INSN 
if it can't solve the systemtap problem for good?

> And, indeed, it's not solvable without disabling optimizations.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-13 13:56                     ` Michael Matz
@ 2007-11-24  2:34                       ` Alexandre Oliva
  2007-11-26 20:56                         ` Michael Matz
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24  2:34 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov 13, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Mon, 12 Nov 2007, Alexandre Oliva wrote:

>> With the design I've proposed, it is possible to compute the value of i, 

> No.  Only if the function is reservible.

Of course.  I meant it for that particular case.  The generalization
is obvious, but I didn't mean it would be always possible.

>> As I wrote before, I'm not aware of any systemtap bug report about a
>> situation in which an argument was actually optimized away.

> I think it all started from PR23551.

Yep.  Nowhere does that bug report request parameters to be forced
live.  What it does request is that parameters that are not completely
optimized away be present in debug information.

Now, consider these cases:

1. function is not inlined

At its entry point, we bind the argument to the register or stack slot
in which the argument is live.  Worst case, it's clobbered at the
entry point instruction itself, because it's entirely unused.  By
emitting a live range from the entry point to the death point, we're
emitting accurate and complete debug information for the argument.  We
win.

2. function is inlined, the argument is unused and thus optimized
away, but the function does some other useful computation

At the inlined entry point, we have a note that binds the argument to
its expected value.  As we transform the program and optimize away the
argument, we retain and update the note, such that we can still
represent the value of the inlined argument for as long as it's
available.

3. function is inlined and completely optimized away

No instruction remains in which the argument is in scope, so we might
as well refrain from emitting location information for it.  Even
though we can figure out where the value lives, there's no code to
attach this information to.  So there's no place to set a breakpoint
on to inspect the variable location anyway.

> For us it also happened in the kernel in namei.c, where real_lookup
> is inlined sometimes, and it's arguments are missing.  That might or
> might not be reversible functions, so your scheme perhaps would have
> helped there.  But generally it won't solve the problem for good.

It looks like you're trying to solve a different problem.

I'm not trying to find a way to ensure that arguments are live.

I'm trying to get GCC to emit debug information that correctly matches
the instructions it generated.

If the value of a variable is completely optimized away at a point in
the porogram, the correct representation for its location at that
point is an empty set.

>> I wouldn't go as far as stopping the optimization just so that systemtap 
>> can monitor the code.

> Like I said, at some point you have to or accept that some code remains to 
> be not introspectable.

Yep.  It's easy enough to tweak the code to keep a variable live, if
you absolutely need it.  But this is not something I'm working to get
the compiler to do by itself.  Quite the opposite, in fact.  I'm going
to set the compiler free to perform some optimizations that it
currently refrains from performing for the sake of debug information,
when the conflict is only apparent because of past implementation
decisions that I'm working to fix.

> Then I'm probably still confused what problem you're actually trying to 
> solve.  If you don't want to be sure you get precise location information 
> 100% of the time, then what percentage are you required to get?

Accuracy comes first.  If we ever emit debug information saying 'this
variable is here' for a point in the program in which it's in fact
elsewhere or unavailable, that's a bug to be fixed.

Completeness comes second.  If we could have emitted debug information
saying 'the value of this variable is here' for a point in the
program, and we instead claim the variable is unavailable at that
point, that's an improvement that can be made.

> And how do you measure this?

Good question.  The implementation approach I've taken, that exposes
debug annotations as actual code, starts out with 100% accuracy
(that's the theory, anyway, otherwise generated code would change,
and, even though we still don't have a complete framework to ensure
code doesn't change, if it does, then at least debug information will
model the change accurately), and we can then grow completeness
incrementally.

> Or is the task rather "emit better debug info"?

Nope.  That's a secondary goal that will be achieved as we get
accurate and sufficiently complete debug information.  I don't have
completeness goals set, but I have reasons to expect we're going to
get much better results than we have now without too much additional
effort.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24  2:34                       ` Alexandre Oliva
@ 2007-11-26 20:56                         ` Michael Matz
  2007-11-27  5:30                           ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Michael Matz @ 2007-11-26 20:56 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

Hi,

On Fri, 23 Nov 2007, Alexandre Oliva wrote:

> Yep.  Nowhere does that bug report request parameters to be forced live.  

Not in that bug report perhaps, but we got requests for exactly this, i.e. 
to be able to introspect all parameters of all functions, be they inlined 
or not, at all time.  I think that's a reasonable request even (which in 
some situations comes at a cost).

> 2. function is inlined, the argument is unused and thus optimized
> away, but the function does some other useful computation
> 
> At the inlined entry point, we have a note that binds the argument to
> its expected value.  As we transform the program and optimize away the
> argument, we retain and update the note,

As far as possible.  If it's not possible you loose (with our 
requirements).

> > For us it also happened in the kernel in namei.c, where real_lookup is 
> > inlined sometimes, and it's arguments are missing.  That might or 
> > might not be reversible functions, so your scheme perhaps would have 
> > helped there.  But generally it won't solve the problem for good.
> 
> It looks like you're trying to solve a different problem.

We work on two fronts:
1) increasing the precision of debug information
2) forcing values life

Our branch, and our ssa-name<->user-name map (and the SET<->decls 
association) is concerned with the first topic.  The second topic can be 
implemented (or hacked) already now, but will potentially be more usefull 
when we also have (1).  So, as in your branch, we are not trying to limit 
optimizers to reach the goal, that's the concern of (2), and happens 
somewhere else.

> I'm trying to get GCC to emit debug information that correctly matches
> the instructions it generated.
> 
> If the value of a variable is completely optimized away at a point in 
> the porogram, the correct representation for its location at that point 
> is an empty set.

I think this is academic.  If a value is dead, but happens to lie in a 
place which isn't yet overwritten with something else, it is harmless to 
reveal this value.  It's the "last" value the variable had.  If OTOH the 
place _is_ already overwritten then it's important that we _don't_ say the 
dead variable lies therein.

So, for me correctness is defined a bit different than for you:
1) if location L contains value X, then debug info should say so (as much 
   as possible, i.e. here the quality of the info comes into play)
2) if location L does not contain value X, debug info should not say that 
   it does.  This is the correctness part.

Where we differ in opinion (I think) is, when location L doesn't contain 
value X anymore.  For you it's when X becomes dead.  For me it's when X is 
dead and when location L is overwritten (with something different than X).  
I think for users there is no practical difference between our approaches, 
but there's a higher cost of implementation for your definition.

> > Then I'm probably still confused what problem you're actually trying to 
> > solve.  If you don't want to be sure you get precise location information 
> > 100% of the time, then what percentage are you required to get?
> 
> Accuracy comes first.  If we ever emit debug information saying 'this
> variable is here' for a point in the program in which it's in fact
> elsewhere

I agree here ...

> or unavailable, that's a bug to be fixed.

... and disagree here.  If a value is dead it's not necessarily 
unavailable in my world.  I think a world requiring this (and hence the 
constraints you were given) is unreasonable.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 20:56                         ` Michael Matz
@ 2007-11-27  5:30                           ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-27  5:30 UTC (permalink / raw)
  To: Michael Matz; +Cc: Robert Dewar, Richard Guenther, gcc-patches, gcc

On Nov 26, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Fri, 23 Nov 2007, Alexandre Oliva wrote:

>> Yep.  Nowhere does that bug report request parameters to be forced live.  

> Not in that bug report perhaps, but we got requests for exactly this, i.e. 
> to be able to introspect all parameters of all functions, be they inlined 
> or not, at all time.  I think that's a reasonable request even (which in 
> some situations comes at a cost).

Fair enough.  And we agree this is not about debug info, it's about
limiting optimizations, so this is indeed a different problem from the
one I was asked to address.

>> 2. function is inlined, the argument is unused and thus optimized
>> away, but the function does some other useful computation
>> 
>> At the inlined entry point, we have a note that binds the argument to
>> its expected value.  As we transform the program and optimize away the
>> argument, we retain and update the note,

> As far as possible.  If it's not possible you loose (with our 
> requirements).

If the argument is completely removed, yes, you won't be able to get
to it by merely improving debug information.  You actually have to
change the generated code.

>> If the value of a variable is completely optimized away at a point in 
>> the porogram, the correct representation for its location at that point 
>> is an empty set.

> I think this is academic.  If a value is dead, but happens to lie in a 
> place which isn't yet overwritten with something else, it is harmless to 
> reveal this value.  It's the "last" value the variable had.  If OTOH the 
> place _is_ already overwritten then it's important that we _don't_ say the 
> dead variable lies therein.

Exactly.  Full agreement.  I wasn't talking about the *location* of
the variable, or the variable itself.  I was talking about the value.
And I wrote "completely optimized away", not "dead".  Liveness has
very little to do with this issue.

The only catch is that, once a variable should be *expected* to hold a
different value, if debug information still claims the variable still
holds the old value it shouldn't hold any more, just because the value
happens to be around and the assignment of the new value could be
optimized away, then I'd say debug information is incorrect.

> So, for me correctness is defined a bit different than for you:
> 1) if location L contains value X, then debug info should say so (as much 
>    as possible, i.e. here the quality of the info comes into play)
> 2) if location L does not contain value X, debug info should not say that 
>    it does.  This is the correctness part.

Your definition is exactly what I've been trying to communicate.  It
looks like we're in complete agreement as to the goals and the two
different metrics (1 being completeness, 2 being correctness).  So
either there's some other underlying difference or you'll soon realize
that the simple SSA name<->variable mapping is insufficient to get you
correctness.

> Where we differ in opinion (I think) is, when location L doesn't contain 
> value X anymore.  For you it's when X becomes dead.  For me it's when X is 
> dead and when location L is overwritten (with something different than X).  

For me, it's when X is overwritten.  That's the point at which the
user is entitled to expect the variable to no longer hold its previous
value (assuming they're different).

Consider this program:

int foo(int x) {
  int i;

  i = x;
  p1();
  i++;
  p2(i);
  i++;
  p3();
}

int main() {
  foo(1);
}

If you set a breakpoint in p1(), go up one frame and print i, you
should ideally get 1 (although "unavailable" is always correct, even
if undesirable).  If you set a breakpoint in p2(int), you should get
2, but "unavailable" is quite likely in the presence of optimization,
depending on the calling conventions.  If you set a breakpoint in
p3(), you should get 3, but "unavailable" is quite likely, given that
the value is not even computed, and it's based on a value that is dead
and thus may have been overwritten.

Getting any other values at any of these points would be a bug in the
compiler.

Does this sound sound to you?

Did you somehow get the impression that the SSA<->names mapping can
get you correct results?

>> Accuracy comes first.  If we ever emit debug information saying 'this
>> variable is here' for a point in the program in which it's in fact
>> elsewhere

> I agree here ...

>> or unavailable, that's a bug to be fixed.

> ... and disagree here.  If a value is dead it's not necessarily 
> unavailable in my world.

I never said "dead", you did.  I said "unavailable", and by that I
don't mean "dead", I really mean "unavailable".  The value I'm talking
about is not "whatever was last assigned to something that resembles
the variable after numerous optimizations" but rather "a value the
user might expect the variable to hold at that point in the program",
given some user tolerance to reordering and other optimizations.

One reason I use separate functions for the breakpoint locations is
precisely because at those points users are entitled to expect the
state of the program to be stable, i.e., there isn't a lot of
reordering or other surprises that a compiler can introduce across
function calls that are by themselves in a statement.

Another reason is that I still don't have a good answer for breakpoint
locations at other points in the program that are less stable across
optimizations, and I can't quite describe what I think users are
entitled to expect at such other points.  But the infrastructure
needed to bring great improvements even in this regard is being set in
place by getting them correct at stable points such as function calls.

That said, I'm putting some thought into getting better debug
information in these less stable points, but making it completely
unsurprising in spite of optimizations isn't the task I was assigned.
Making it correct and far more complete is.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 14:02           ` Robert Dewar
  2007-11-08 15:13             ` H.J. Lu
  2007-11-08 16:11             ` Michael Matz
@ 2007-11-08 16:37             ` Alexandre Oliva
  2007-11-09  1:26               ` Joe Buck
  2007-11-09  1:26               ` Robert Dewar
  2 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 16:37 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Michael Matz, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Robert Dewar <dewar@adacore.com> wrote:

> My general feelings on this subject:

> 1. I don't think we should care much about the ability to
> *SET* values of variables in optimized code.

Indeed.  We should care about correctness of debug information, and
then this ability will come naturally ;-)

> 3. The quality of code at -O0 is really terrible

That's a feature, no?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 16:37             ` Alexandre Oliva
@ 2007-11-09  1:26               ` Joe Buck
  2007-11-09 14:53                 ` Daniel Jacobowitz
  2007-11-09  1:26               ` Robert Dewar
  1 sibling, 1 reply; 150+ messages in thread
From: Joe Buck @ 2007-11-09  1:26 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Robert Dewar, Michael Matz, Richard Guenther, gcc-patches, gcc

On Thu, Nov 08, 2007 at 02:36:57PM -0200, Alexandre Oliva wrote:
> > 3. The quality of code at -O0 is really terrible
> 
> That's a feature, no?

Actually it's a misfeature, in that it's worse than it needs to
be, and it's worse in ways that increase the time required to produce it
(since a larger volume of code then has to be handled by the back end,
assembler, and linker).

Debugging would be just as easy and natural if -O0 only made sure that
values of variables are written out to memory at positions where the
user can set a breakpoint; the code doesn't need to preserve every
operation exactly as written, or read variables in from memory that
are already in registers.  Kind of an -O0.5 would be more desirable
than what we have now.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  1:26               ` Joe Buck
@ 2007-11-09 14:53                 ` Daniel Jacobowitz
  2007-11-09 17:06                   ` Robert Dewar
  0 siblings, 1 reply; 150+ messages in thread
From: Daniel Jacobowitz @ 2007-11-09 14:53 UTC (permalink / raw)
  To: gcc-patches, gcc

[Can we pick just gcc@ or just gcc-patches@ please?]

On Thu, Nov 08, 2007 at 05:11:24PM -0800, Joe Buck wrote:
> Debugging would be just as easy and natural if -O0 only made sure that
> values of variables are written out to memory at positions where the
> user can set a breakpoint; the code doesn't need to preserve every
> operation exactly as written, or read variables in from memory that
> are already in registers.  Kind of an -O0.5 would be more desirable
> than what we have now.

Careful.  Eliminating reads from memory messes up debugger
modification of variables, unless you can explain to the debugger that
the variable is currently in both locations - this has been discussed
but AFAIK there is no representation for it yet.  Changing the memory
location won't change the next operation that thinks it's in the
register.  Changing the register will be lost later.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09 14:53                 ` Daniel Jacobowitz
@ 2007-11-09 17:06                   ` Robert Dewar
  0 siblings, 0 replies; 150+ messages in thread
From: Robert Dewar @ 2007-11-09 17:06 UTC (permalink / raw)
  To: gcc-patches, gcc

Daniel Jacobowitz wrote:

> Careful.  Eliminating reads from memory messes up debugger
> modification of variables, unless you can explain to the debugger that
> the variable is currently in both locations - this has been discussed
> but AFAIK there is no representation for it yet.  Changing the memory
> location won't change the next operation that thinks it's in the
> register.  Changing the register will be lost later.

I still think that changing memory locations is a marginal capability
compared to reading them, and that is is fine if this capability is
impacted by even low level optimization.
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 16:37             ` Alexandre Oliva
  2007-11-09  1:26               ` Joe Buck
@ 2007-11-09  1:26               ` Robert Dewar
  2007-11-12 16:56                 ` Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: Robert Dewar @ 2007-11-09  1:26 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Michael Matz, Richard Guenther, gcc-patches, gcc

Alexandre Oliva wrote:
> On Nov  8, 2007, Robert Dewar <dewar@adacore.com> wrote:
> 
>> My general feelings on this subject:
> 
>> 1. I don't think we should care much about the ability to
>> *SET* values of variables in optimized code.
> 
> Indeed.  We should care about correctness of debug information, and
> then this ability will come naturally ;-)

Not really, there are optimizations that will still allow
reading the value of a variable, but not setting it, and
I think it is just fine to do these optimizations. For
instance if we have

    b = a;

the optimizer may not do a copy, it may simply know that
b and a values are in the same place. This does not stand
in the way of reading the value, but it does make it
impossible to write a or b.

Similarly, if the optimizer does test replacement, and
knows that the value of a can be obtained by evaluating
some expression, the debugger can read the value, but
may not be able to set it.
> 
>> 3. The quality of code at -O0 is really terrible
> 
> That's a feature, no?
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-09  1:26               ` Robert Dewar
@ 2007-11-12 16:56                 ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-12 16:56 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Michael Matz, Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:

>>> 1. I don't think we should care much about the ability to
>>> *SET* values of variables in optimized code.
>> 
>> Indeed.  We should care about correctness of debug information, and
>> then this ability will come naturally ;-)

> Not really, there are optimizations that will still allow
> reading the value of a variable, but not setting it,

Indeed.  I was thinking implementation-level variables, rather than
source-level variables.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-08 10:23         ` Michael Matz
  2007-11-08 14:02           ` Robert Dewar
@ 2007-11-08 16:32           ` Alexandre Oliva
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-08 16:32 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches, gcc

On Nov  8, 2007, Michael Matz <matz@suse.de> wrote:

> Hi,
> On Wed, 7 Nov 2007, Alexandre Oliva wrote:

>> > x and y at the appropriate part.  Whatever holds 'x' at a point (SSA 
>> > name, pseudo or mem) will also mention that it holds 'c'.  At a later 
>> > point whichever holds 'y' will also mention in holds 'c' .
>> 
>> I.e., there will be two parallel locations throughout the entire 
>> function that hold the value of 'c'.

> No.  For some PC locations the location of 'c' will happen to be the same 
> as the one holding 'x', and for a different set of PC locations it will be 
> the one also holding 'y'.

So we're in agreement.  What you say is how it ought to be done, what
I did was to point out that the representation proposed by richi will
be unable to do the right thing.

>> f(int x /* but also c */, int y /* but also c */) { /* other vars */

> "int x /* but also c */, int y /* but also c */" implies that x == y 
> already

No, per the posted design (assuming I understood it correctly) it just
implies that, at some point in the program, an assignment 'c = x' was
optimized away, and that at some other point in the program, an
assignment 'c = y' was optimized away.

>> do_something_with(x, ...); // doesn't touch x or y
>> do_something_else_with(y, ...); // doesn't touch x or y
>> 
>> Now, what will you get if you 'print c' in the debugger (or if any
>> other debug info evaluator needs to tell what the value of user
>> variable c is) at a point within do_something_with(c,...) or
>> do_something_else_with(c)?

> ... so the answer would be "whatever is in that common place for x,y and 
> c".

And once we removed the incorrect assumption you made, that 'x == y',
what do you get?

> How come that f::c is actually set to p$x?

It was in the original source code, was it not?  p$x was passed to f()
as x, and then x was copied to c.

> I don't see any assignment and in fact no declaration for c in f.
> If you had one _that_ would be the place were the connection between
> p$x and 'c' would have been made and everything would fall in place.

Since there is a declaration of c in the original source-level f (the
only one that matters, as far as debug information is concerned), can
you please expand on how you'd get everything to fall in place?

> It's not possible that p$x _and_ p$y are f()::c.1 at the same time,

Exactly

> so the above examples are all somehow invalid.

It's the bitmap debug info representation that makes them nonsensical.

> int f(int y) {
>   int x = 2 * y;
>   return x + 2;
> }

> If the compiler forward-props 2*y into the single use and simplifies:

>   return (y+1)*2;

> then the value 2*y is never actually calculated anymore, not in any 
> register, not in any local variable, nowhere.  There's no way debug 
> information could generally rectify this loss of information.

Actually, while y is live, debug information could encode that x is
2*y, even if the value is not computed at run time.  So your statement
is quite an exaggeration.

> In case of more complicated expressions that's not possible anymore
> and you lose.

Yep.  If the value is unavailable, debug information should say so,
rather than pointing at something else.

> Forcing some values life is possible,

But undesirable.  I'm not trying to do that.  Actually, I'm working
hard to make sure it doesn't happen.

> So, our mapping is as accurate as your's.

Not at all, and you made that point yourself, twice, in a single
e-mail.

> It seems in your branch you also force some values life IIUC.

Nope.  Any values that are forced live by debug annotations are bugs
to be fixed.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
@ 2007-11-12 21:04 Steven Bosscher
  2007-11-24  1:37 ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Steven Bosscher @ 2007-11-12 21:04 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Alexandre Oliva, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

xf. http://gcc.gnu.org/ml/gcc/2007-11/msg00293.html
Mark Mitchell wrote:
> The reason I want to make that assumption is that the part of this where
> the representation is in question is once we reach RTL, right?

The representation in GIMPLE should also be discussed IMVHO. For
GIMPLE Alex has invented DEBUG_STMT, which has the same properties as
DEBUG_INSN in RTL (with one noteworthy difference, namely that having
note-like GIPMLE statements is a totally new concept while DEBUG_INSN
is just a wannabe-real-insn INSN_NOTE).

Gr.
Steven

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-12 21:04 Steven Bosscher
@ 2007-11-24  1:37 ` Alexandre Oliva
  2007-11-24  2:35   ` Steven Bosscher
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24  1:37 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 12, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:

> DEBUG_INSN in RTL (with one noteworthy difference, namely that having
> note-like GIPMLE statements is a totally new concept

Not quite.  There were codeless gimple constructs before (think
labels, for one).  Or empty asm statements.  But then, I'm not sure
what you mean by note-like; maybe it's something else.  As I explained
before, debug insns and debug stmts are more like code than like
notes, because notes generally don't need adjusting as code is
modified elsewhere, whereas code does.  And debug insns and stmts
definitely need adjusting like regular insns.

> while DEBUG_INSN is just a wannabe-real-insn INSN_NOTE).

Except for this tiny detail that INSN_NOTEs are never adjusted as code
is modified, because in general they don't even contain RTL.
VAR_LOCATION is a recent exception, and it used to be introduced so
late precisely because there's no infrastructure to keep notes
up-to-date as code transformations are performed.

So, yes, debug stmts and insns are notes in the sense that they don't
output code.  Like USE insns, labels, empty asm insns and other
UNSPECs.  But wait, those are insns, not notes.  And they do generate
code, just not in the .text section, but rather in .debug sections.

So, what's this prejudice against debug insns?  Why do you regard them
as notes rather than insns?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24  1:37 ` Alexandre Oliva
@ 2007-11-24  2:35   ` Steven Bosscher
  2007-11-24 15:08     ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Steven Bosscher @ 2007-11-24  2:35 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 23, 2007 9:45 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> So, yes, debug stmts and insns are notes in the sense that they don't
> output code.  Like USE insns, labels, empty asm insns and other
> UNSPECs.  But wait, those are insns, not notes.  And they do generate
> code, just not in the .text section, but rather in .debug sections.

All of them relate to code generation though.  Without them, we create
wrong code.  I'm aware of how you feel about debug info and
correctness and so on.

> So, what's this prejudice against debug insns?  Why do you regard them
> as notes rather than insns?

What worries me is that GCC will have to special-case DEBUG_INSN
everywhere where it looks at INSNs.  One can already see some of that
happening on your branch.  Apparently, you can't treat DEBUG_INSN just
like any other normal insn.

What I see happening with your DEBUG_INSN approach, is that all passes
that use NEXT_INSN/PREV_INSN will have to special-case DEBUG_INSN in
addition to the NOTE_P or INSN_P checks that they already have.  I
have seen too many bugs with passes who forgot to look through notes
to feel comfortable about adding another
not-a-note-but-also-not-an-insn like thing to the insn stream. The
fact that DEBUG_INSN also has real operands that are not really real
operands is bound to confuse the matter even more.  Life with proper
insn and operands iterators for RTL would be so much easier, but for
the moment I fear you're just going to see a lot of duplication of
ugly conditionals and bugs where such conditionals are
forgotten/overlooked/missing.

So to summarize: I'm just worried your approach is going to make GCC
even slower, buggier, more difficult to maintain and more difficult to
understand and modify.  And the benefit, well, let's just say I'm not
convinced that less elaborate efforts are not sufficient.

(And to be perfectly honest, I think GCC has bigger issues to solve
than getting perfect debug info -- such as getting compile times of a
linux kernel down ;-))

Gr.
Steven

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24  2:35   ` Steven Bosscher
@ 2007-11-24 15:08     ` Alexandre Oliva
  2007-11-24 15:18       ` Richard Kenner
  2007-11-24 16:45       ` Steven Bosscher
  0 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24 15:08 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 23, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:

>> So, what's this prejudice against debug insns?  Why do you regard them
>> as notes rather than insns?

> What worries me is that GCC will have to special-case DEBUG_INSN
> everywhere where it looks at INSNs.

This is just not true.  Anywhere that simply wants to update insns for
the effects of other transformations won't have to do that.  Only
places in which we need the weak-use semantics of debug_insns need to
give them special treatment.  Not because they're not insns, but
because they're weak uses, i.e., uses that shouldn't interfere with
optimizations.

Yes, catching all such cases hasn't been trivial.  If we miss some,
then what happens is that -O2 -g -fvar-tracking-assignments outputs
different executable code than -O2.  Everything still works just fine,
we eventually get a bug report, we fix it and move on.

This is *much* better than starting out with notes, that nearly
nothing cares about, and try to add code to update the notes as code
transformations are performed.  In this case, we get incorrect,
non-functional compiler output unless we catch absolutely all bugs
upfront.

> Apparently, you can't treat DEBUG_INSN just like any other normal
> insn.

Obviously not.  They're weaker uses than anything else.  We haven't
had any such thing in the compiler before.

> but for the moment I fear you're just going to see a lot of
> duplication of ugly conditionals

Your fear is understandable but not justified.  Go look at the
patches.  x86_64-linux-gnu now bootstraps and produces exactly the
same code with and without -fvar-tracking-assignments.  And no complex
conditionals were needed.  The most I've needed so far was to ignore
debug insns at certain spots.

It's true that in a number of situations this is an oversimplified
course of action, and some additional effort might be needed to
actually update the debug insns when they would have interfered with
optimizations.  Time will tell, I guess.  So far, it doesn't look like
it's been a problem, and I don't foresee these duplicated or ugly
conditionals you fear.

> and bugs where such conditionals are forgotten/overlooked/missing.

See above.  One of the reasons for the approach I've taken is that
such cases will, in the worst case, cause missed optimizations, not
incorrect compiler output.

> And the benefit, well, let's just say I'm not convinced that less
> elaborate efforts are not sufficient.

Sufficient for what?  Efforts towards what?  Generating more incorrect
debug information just for the sake of it?  Adding more debug
information while breaking some that's just fine now?  Is that really
progress?

> (And to be perfectly honest, I think GCC has bigger issues to solve
> than getting perfect debug info -- such as getting compile times of a
> linux kernel down ;-))

Compile speed is a quality of implementation issue.  Output
correctness and standard compliance comes first in my book.

And then, I'm supposed to fix this correctness problem, not other
issues that others might find more important.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 15:08     ` Alexandre Oliva
@ 2007-11-24 15:18       ` Richard Kenner
  2007-11-24 20:11         ` Alexandre Oliva
  2007-11-24 16:45       ` Steven Bosscher
  1 sibling, 1 reply; 150+ messages in thread
From: Richard Kenner @ 2007-11-24 15:18 UTC (permalink / raw)
  To: aoliva; +Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

> Yes, catching all such cases hasn't been trivial.  If we miss some,
> then what happens is that -O2 -g -fvar-tracking-assignments outputs
> different executable code than -O2.

But that's a very serious type of bug because it means you have
situations where a program fails and you can't debug it because when
you turn on debugging information, it doesn't fail anymore.  We need
to make an absolute rule that this *cannot* happen and luckily this is
one of the easiest types of errors to project against.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 15:18       ` Richard Kenner
@ 2007-11-24 20:11         ` Alexandre Oliva
  2007-11-24 20:46           ` Bernd Schmidt
                             ` (2 more replies)
  0 siblings, 3 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24 20:11 UTC (permalink / raw)
  To: Richard Kenner
  Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

On Nov 24, 2007, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:

>> Yes, catching all such cases hasn't been trivial.  If we miss some,
>> then what happens is that -O2 -g -fvar-tracking-assignments outputs
>> different executable code than -O2.

> But that's a very serious type of bug because it means you have
> situations where a program fails and you can't debug it because when
> you turn on debugging information, it doesn't fail anymore.  We need
> to make an absolute rule that this *cannot* happen and luckily this is
> one of the easiest types of errors to project against.

I agree completely.  That's why I've gone to such great lengths to
ensure these errors are easily testable in my implementation, and to
put all my changes under control of a command-line option.  Then, you
can still get (poorer) debug information by disabling (or not
enabling) this option.

And then, despite the consensus that GCC must not generate different
code with and without -g, the patch that fixes one such regression has
been lingering for months, and the patch that introduced the
regression hasn't been reverted either.

Besides, the Ada RTS compiles differently with -g than without -g,
such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
but me seems to care.

I'm sure I'm going to find other differences between -g and -g0 once I
fix this and bootstrap4-debug gets past this point and builds other
target libraries.  I'm not looking forward to the discussions that
will ensue if any fixes for these problems imply any costs whatsoever,
given the experience I've had with the SSA-coalescing and the
optimize-basic-blocks issues that are all about debug information
versus optimization :-(

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:11         ` Alexandre Oliva
@ 2007-11-24 20:46           ` Bernd Schmidt
  2007-11-25  0:42             ` Alexandre Oliva
  2007-11-24 20:48           ` Richard Kenner
  2007-11-25 14:23           ` Robert Dewar
  2 siblings, 1 reply; 150+ messages in thread
From: Bernd Schmidt @ 2007-11-24 20:46 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

Alexandre Oliva wrote:

> And then, despite the consensus that GCC must not generate different
> code with and without -g, the patch that fixes one such regression has
> been lingering for months, and the patch that introduced the
> regression hasn't been reverted either.

Pointers?


Bernd

-- 
This footer brought to you by insane German lawmakers.
Analog Devices GmbH      Wilhelm-Wagenfeld-Str. 6      80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:46           ` Bernd Schmidt
@ 2007-11-25  0:42             ` Alexandre Oliva
  2007-11-25  7:19               ` Richard Guenther
  2007-11-25 14:22               ` Alexandre Oliva
  0 siblings, 2 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-25  0:42 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

On Nov 24, 2007, Bernd Schmidt <bernds_cb1@t-online.de> wrote:

> Alexandre Oliva wrote:
>> And then, despite the consensus that GCC must not generate different
>> code with and without -g, the patch that fixes one such regression has
>> been lingering for months, and the patch that introduced the
>> regression hasn't been reverted either.

> Pointers?

Regression introduced here:

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01745.html

first reported here:

http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00127.html

last proposed patch here:

http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00608.html

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25  0:42             ` Alexandre Oliva
@ 2007-11-25  7:19               ` Richard Guenther
  2007-11-25 14:30                 ` Alexandre Oliva
  2007-11-25 14:22               ` Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: Richard Guenther @ 2007-11-25  7:19 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 24, 2007 9:19 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, Bernd Schmidt <bernds_cb1@t-online.de> wrote:
>
> > Alexandre Oliva wrote:
> >> And then, despite the consensus that GCC must not generate different
> >> code with and without -g, the patch that fixes one such regression has
> >> been lingering for months, and the patch that introduced the
> >> regression hasn't been reverted either.
>
> > Pointers?
>
> Regression introduced here:
>
> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01745.html
>
> first reported here:
>
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00127.html
>
> last proposed patch here:
>
> http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00608.html

Well - it's a workaround for a bug that's elsewhere.  Generated code
shouldn't change
if we allocate extra DECL_UIDs, but only possibly if we change
DECL_UID ordering.
(If that is the problem, as I remember your analysis)

Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25  7:19               ` Richard Guenther
@ 2007-11-25 14:30                 ` Alexandre Oliva
  2007-11-25 14:46                   ` Richard Guenther
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-25 14:30 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:

> Generated code shouldn't change if we allocate extra DECL_UIDs, but
> only possibly if we change DECL_UID ordering.  (If that is the
> problem, as I remember your analysis)

That is indeed the problem, but I'm not sure your requirement is
feasible.  If we permit DECL_UID divergence, it means we can't use
DECL_UID for hashing any more.  Since they already stand for hashable
proxies for the decl pointers, I don't see what we'd gain by
introducing yet another hashable uid that's stable across -g.

What do you suggest us to use for hashing?  Or do you suggest us to do
away with hashing and use sorted set or map data structures?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25 14:30                 ` Alexandre Oliva
@ 2007-11-25 14:46                   ` Richard Guenther
  2007-11-26 10:11                     ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Richard Guenther @ 2007-11-25 14:46 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 25, 2007 12:28 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
>
> > Generated code shouldn't change if we allocate extra DECL_UIDs, but
> > only possibly if we change DECL_UID ordering.  (If that is the
> > problem, as I remember your analysis)
>
> That is indeed the problem, but I'm not sure your requirement is
> feasible.  If we permit DECL_UID divergence, it means we can't use
> DECL_UID for hashing any more.  Since they already stand for hashable
> proxies for the decl pointers, I don't see what we'd gain by
> introducing yet another hashable uid that's stable across -g.
>
> What do you suggest us to use for hashing?  Or do you suggest us to do
> away with hashing and use sorted set or map data structures?

No, hashing is fine, but doing walks over a hashtable when your algorithm
depends on ordering is not.  I have patches to fix the instance of walking
over all referenced vars.  Which is in the case of UIDs using bitmaps and
a walk over a bitmap (which ensures walks in UID order).

Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25 14:46                   ` Richard Guenther
@ 2007-11-26 10:11                     ` Alexandre Oliva
  2007-11-26 12:26                       ` Richard Guenther
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-26 10:11 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:

> No, hashing is fine, but doing walks over a hashtable when your algorithm
> depends on ordering is not.

Point.

> I have patches to fix the instance of walking over all referenced
> vars.  Which is in the case of UIDs using bitmaps and a walk over a
> bitmap (which ensures walks in UID order).

Why is such memory and CPU overhead better than avoiding the
divergence of UIDs in the first place?

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 10:11                     ` Alexandre Oliva
@ 2007-11-26 12:26                       ` Richard Guenther
  2007-11-26 18:58                         ` Alexandre Oliva
  0 siblings, 1 reply; 150+ messages in thread
From: Richard Guenther @ 2007-11-26 12:26 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 26, 2007 7:57 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
>
> > No, hashing is fine, but doing walks over a hashtable when your algorithm
> > depends on ordering is not.
>
> Point.
>
> > I have patches to fix the instance of walking over all referenced
> > vars.  Which is in the case of UIDs using bitmaps and a walk over a
> > bitmap (which ensures walks in UID order).
>
> Why is such memory and CPU overhead better than avoiding the
> divergence of UIDs in the first place?

Actually my patches should be an overall memory savings.  But, as you (and
me and others) look at bugs that happen because of UID divergence, it is
easier to use UIDs in a way that guarantees that generated code does not
change in such cases.  Otherwise what's the point in using UIDs?  If you
later do hashtable walks anyway you can hash on the pointer as well.

So, IMHO an algorithm should produce the same result if for an ordered set
of UIDs M { u1, u2, u3 } instead an ordered set M' { u1', u2', u3' } is used
where element correspondence is u1 : u1', u2 : u2', u3 : u3' independent
on the actual values uN or differences between values uN - uM.

Anything else is a bug.  And compensating for those bugs in other places
by trying to preserve the exact values of UIDs is broken (and in this case,
as it delays memory optimization, actually bad).

Just my few euro-cents,
Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-26 12:26                       ` Richard Guenther
@ 2007-11-26 18:58                         ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-26 18:58 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Bernd Schmidt, Richard Kenner, gcc-patches, gcc, iant, mark, stevenb.gcc

On Nov 26, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:

> On Nov 26, 2007 7:57 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
>> On Nov 24, 2007, "Richard Guenther" <richard.guenther@gmail.com> wrote:
>> 
>> > No, hashing is fine, but doing walks over a hashtable when your algorithm
>> > depends on ordering is not.
>> 
>> Point.
>> 
>> > I have patches to fix the instance of walking over all referenced
>> > vars.  Which is in the case of UIDs using bitmaps and a walk over a
>> > bitmap (which ensures walks in UID order).
>> 
>> Why is such memory and CPU overhead better than avoiding the
>> divergence of UIDs in the first place?

> Actually my patches should be an overall memory savings.

Err...  I don't see how using a bitmap in addition to a hashtable can
save memory over using only a hashtable.  Or are you saying you do
away with the hashtables?  I can see that this is possible and
desirable.

> But, as you (and me and others) look at bugs that happen because of
> UID divergence, it is easier to use UIDs in a way that guarantees
> that generated code does not change in such cases.

Agreed, this property is desirable.  But I wouldn't say it is enough.
Ensuring UIDs remain constant across compilations has helped
tremendously in locating other compilation divergences, for comparing
debug dumps becomes much easier.  So, even if we use algorithms that
don't depend on UIDs remaining constant across compilations, I believe
it is highly desirable that we keep them constant across compilations.

> Otherwise what's the point in using UIDs?

There are several different reasons for having UIDs, some of which
could be having some unique identifier for an object, even in the
presence of a moving garbage collector; being able to create
fully-ordered sets of objects; being able to easily identify objects
across a single compilation; being able to easily identify objects
even across multiple compilations; and I'm sure it's possible to come
up with other reasons that would justify the idea of UIDs on their
own.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25  0:42             ` Alexandre Oliva
  2007-11-25  7:19               ` Richard Guenther
@ 2007-11-25 14:22               ` Alexandre Oliva
  1 sibling, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-25 14:22 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

On Nov 24, 2007, Alexandre Oliva <aoliva@redhat.com> wrote:

> On Nov 24, 2007, Bernd Schmidt <bernds_cb1@t-online.de> wrote:
>> Alexandre Oliva wrote:
>>> And then, despite the consensus that GCC must not generate different
>>> code with and without -g, the patch that fixes one such regression has
>>> been lingering for months, and the patch that introduced the
>>> regression hasn't been reverted either.

>> Pointers?

> Regression introduced here:

> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01745.html

> first reported here:

> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00127.html

> last proposed patch here:

> http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00608.html

I take it back that this patch wasn't approved.  Mark had approved it
on Nov 5, I didn't want to check it in before going on a trip and,
when I returned, I forgot about the approval because it was in an
unrelated thread.  http://gcc.gnu.org/ml/gcc/2007-11/msg00139.html

I'll shortly check in that one and a bunch of others that also got
approval but that I deferred until my return.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:11         ` Alexandre Oliva
  2007-11-24 20:46           ` Bernd Schmidt
@ 2007-11-24 20:48           ` Richard Kenner
  2007-11-25  0:02             ` Alexandre Oliva
  2007-11-25 14:23           ` Robert Dewar
  2 siblings, 1 reply; 150+ messages in thread
From: Richard Kenner @ 2007-11-24 20:48 UTC (permalink / raw)
  To: aoliva; +Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

> Besides, the Ada RTS compiles differently with -g than without -g,
> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
> but me seems to care.

That's wierd.  Except on Windows, VXWorks, and VMS, there's almost
no code in that file.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:48           ` Richard Kenner
@ 2007-11-25  0:02             ` Alexandre Oliva
  0 siblings, 0 replies; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-25  0:02 UTC (permalink / raw)
  To: Richard Kenner
  Cc: gcc-patches, gcc, iant, mark, richard.guenther, stevenb.gcc

On Nov 24, 2007, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:

>> Besides, the Ada RTS compiles differently with -g than without -g,
>> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
>> but me seems to care.

> That's wierd.  Except on Windows, VXWorks, and VMS, there's almost
> no code in that file.

Yep.  On GNU/Linux, the difference is precisely that, when compiling
with -g, you get the variables that represent the file open modes to
the output, while compiling without -g they're completely optimized
away.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 20:11         ` Alexandre Oliva
  2007-11-24 20:46           ` Bernd Schmidt
  2007-11-24 20:48           ` Richard Kenner
@ 2007-11-25 14:23           ` Robert Dewar
  2007-12-15 20:32             ` Alexandre Oliva
  2 siblings, 1 reply; 150+ messages in thread
From: Robert Dewar @ 2007-11-25 14:23 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

Alexandre Oliva wrote:

> Besides, the Ada RTS compiles differently with -g than without -g,
> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
> but me seems to care.

We certainly care about this, and appreciate efforts to fix it!
Robert Dewar. We = all the GNAT folks.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-25 14:23           ` Robert Dewar
@ 2007-12-15 20:32             ` Alexandre Oliva
  2007-12-15 21:41               ` Robert Dewar
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-12-15 20:32 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

On Nov 24, 2007, Robert Dewar <dewar@adacore.com> wrote:

> Alexandre Oliva wrote:

>> Besides, the Ada RTS compiles differently with -g than without -g,
>> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
>> but me seems to care.

> We certainly care about this, and appreciate efforts to fix it!

Should be fixed now, FWIW.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-12-15 20:32             ` Alexandre Oliva
@ 2007-12-15 21:41               ` Robert Dewar
  0 siblings, 0 replies; 150+ messages in thread
From: Robert Dewar @ 2007-12-15 21:41 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Kenner, gcc-patches, gcc, iant, mark, richard.guenther,
	stevenb.gcc

Alexandre Oliva wrote:
> On Nov 24, 2007, Robert Dewar <dewar@adacore.com> wrote:
> 
>> Alexandre Oliva wrote:
> 
>>> Besides, the Ada RTS compiles differently with -g than without -g,
>>> such that compare-debug doesn't pass if you compare sysdep.o.  Nobody
>>> but me seems to care.
> 
>> We certainly care about this, and appreciate efforts to fix it!
> 
> Should be fixed now, FWIW.

Good to hear, definition worth while!
that's an important invariant.
> 

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 15:08     ` Alexandre Oliva
  2007-11-24 15:18       ` Richard Kenner
@ 2007-11-24 16:45       ` Steven Bosscher
  2007-11-24 18:50         ` Alexandre Oliva
  1 sibling, 1 reply; 150+ messages in thread
From: Steven Bosscher @ 2007-11-24 16:45 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 24, 2007 5:54 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> > Apparently, you can't treat DEBUG_INSN just like any other normal
> > insn.
>
> Obviously not.  They're weaker uses than anything else.  We haven't
> had any such thing in the compiler before.

So we get a "third way".  GCC has insns and notes, and now it gets a
third object to deal with in the insns stream.  And it has to handle
this new case everywhere.  To me it seems that your approach will not
help to make GCC easier to work with and understand.  Unless there are
compelling reasons to do this, I think this is a step in the wrong
direction.

> > but for the moment I fear you're just going to see a lot of
> > duplication of ugly conditionals
>
> Your fear is understandable but not justified.  Go look at the
> patches.  x86_64-linux-gnu now bootstraps and produces exactly the
> same code with and without -fvar-tracking-assignments.  And no complex
> conditionals were needed.  The most I've needed so far was to ignore
> debug insns at certain spots.

I didn't say "complex conditionals" but ugly conditionals ;-)
I mean all the "INSN_P && ! DEBUG_INSN_P" conditionals.  There seem to
be a lot of those, and it's not immediately obvious where and when
you'd need them.

> > and bugs where such conditionals are forgotten/overlooked/missing.
>
> See above.  One of the reasons for the approach I've taken is that
> such cases will, in the worst case, cause missed optimizations, not
> incorrect compiler output.

Ah! More on that later.

> > And the benefit, well, let's just say I'm not convinced that less
> > elaborate efforts are not sufficient.
>
> Sufficient for what?  Efforts towards what?  Generating more incorrect
> debug information just for the sake of it?  Adding more debug
> information while breaking some that's just fine now?  Is that really
> progress?

Ah, there you go again with this extremist pro-debug-info stance.  How
can one argue with you when you keep ridiculing other points of view
using ridiculous arguments?  Who said anything about "generating more
incorrect information just for the sake of it"?  I don't think anyone
did.  The "for the sake of it" part is just offensive. You seem imply
that people are arguing gcc should emit wrong debug information on
purpose.  Please step out of your own world of thoughts for a second,
and try to understand that other people can have a different but
nevertheless reasonable point of view.

I think it is impossible to get perfect debug info after very complex
code transformations.  And because of that, I also think it is
reasonable to not get perfect debug info in less complex cases.  Your
colleague expressed perfectly how I define "sufficiently good debug
info":

"It needs to be good enough
that a semi-knowledgable person or a dumb but heuristic-laden program
that processes debugging info can nevertheless extract reliable
information."
(http://gcc.gnu.org/ml/gcc/2007-11/msg00581.html)

Note how this "good enough" does not imply correctness at all cost".

Here is another "extremist" point of view:

Correctness for a optimization algorithm means that it does not miss
optimization opportunities that it is designed to catch.  Therefore if
an optimization algorithm implementation misses an optimization that
it should catch, then this is a correctness issue.
;-)

You said you now get the same code with and without
-fvar-tracking-assignments on your branch.  Can you also prove that
the branch does not introduce new missed optimizations wrt. the latest
revision that you merged from the trunk?

Gr.
Steven

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 16:45       ` Steven Bosscher
@ 2007-11-24 18:50         ` Alexandre Oliva
  2007-11-24 20:21           ` Richard Guenther
  0 siblings, 1 reply; 150+ messages in thread
From: Alexandre Oliva @ 2007-11-24 18:50 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Mark Mitchell, Ian Lance Taylor, Richard Guenther, gcc-patches, gcc

On Nov 24, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:

> On Nov 24, 2007 5:54 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
>> > Apparently, you can't treat DEBUG_INSN just like any other normal
>> > insn.
>> 
>> Obviously not.  They're weaker uses than anything else.  We haven't
>> had any such thing in the compiler before.

> So we get a "third way".  GCC has insns and notes, and now it gets a
> third object to deal with in the insns stream.

Not quite.  It's an insn.  But it is different in some ways.  It's not
unheard of.  Asm insns are also different in some ways.  USEs and
CLOBBERs too.  Delayed-branch instruction groups too.

It would be great if infrastructure for weak uses was already in
place, but if it's needed (we haven't determined that, but I'm
convinced there's no better way) and it isn't there, then it has to be
put in.

> And it has to handle this new case everywhere.

I've already explained why this isn't true.  It's not even close to
being true.  In fact, I've chosen this representation *precisely*
because I reasoned it would lead to the least global impact.  Of
course you can refuse to believe that and point at the changes I had
to make as alleged counter-proof, failing to notice how many other
locations I haven't had to change and that just work because adjusting
other instructions after transformations is precisely what all
transformation passes already do.

> I didn't say "complex conditionals" but ugly conditionals ;-)
> I mean all the "INSN_P && ! DEBUG_INSN_P" conditionals.

Oh, that's easy: NON_DEBUG_INSN_P can simplify that.  There are, what,
a few dozens of such tests in the compiler right now, compared with
the hundreds of tests for INSN_P and a few tens of tests for
DEBUG_INSN_P.  I didn't think it was worth creating yet another macro,
but if you find this so unacceptable, maybe I can rework it.

Would you prefer NON_DEBUG_INSN_P, or would you prefer the original
INSN_P and all uses thereof to be spelled differently, just to keep
the few objectionable INSN_P && ! DEBUG_INSN_P tests more beautiful?

>> Sufficient for what?  Efforts towards what?  Generating more incorrect
>> debug information just for the sake of it?  Adding more debug
>> information while breaking some that's just fine now?  Is that really
>> progress?

> Ah, there you go again with this extremist pro-debug-info stance.  How
> can one argue with you when you keep ridiculing other points of view
> using ridiculous arguments?  Who said anything about "generating more
> incorrect information just for the sake of it"?

Getting even the trivial cases wrong and dismissing those without
realizing how things would fall apart in the big picture looks like
"generating more incorrect information just for the sake of it" to me.
Now, maybe it's not.  Maybe it's just human behavior, a wish that some
simpler solution will take care of a problem and that the simple
counter-examples I've pointed out are rare situations.  I don't see
that they are.  I've put a lot of thought into this problem, I've been
working on it for quite a long time, and I've fallen in many of the
traps that I pointed out, and avoided several others.

I realize I come off as arrogant when I feel cornered by a majority
that obviously hasn't spent enough on the issue to realize the
obvious-to-me major problems with the alternatives that are on the
table.  I realize in such situations I often react in ways that are
detrimental to the points I'm trying to make.  I realize this doesn't
help.  I hope people can see through the mess of proposal-name-calling
that this is turning into.

> The "for the sake of it" part is just offensive.

I agree, and I apologize for that.  It's been a very frustrating
debate.

> You seem imply that people are arguing gcc should emit wrong debug
> information on purpose.

That's how it feels to me when the claims come up that it's not a
matter of correctness, or that it's not important to get it right.

> Your colleague expressed perfectly how I define "sufficiently good
> debug info":

> "It needs to be good enough
> that a semi-knowledgable person or a dumb but heuristic-laden program
> that processes debugging info can nevertheless extract reliable
> information."
> (http://gcc.gnu.org/ml/gcc/2007-11/msg00581.html)

I'm very happy you agree with him.  Unfortunately, you appear to be
focusing on the sloppiness afforded by the wording "good enough", and
assuming that this can be pushed beyond the point of "extract
*reliable* information", which is the key operative qualifier here.

If it's "good enough" for other purposes, but it's not possible to
"extract reliable information from debugging info", then we don't
satisfy the predicate above.

That's why I'm aiming at correctness (it's reliable) rather than
completeness (optimizations can discard stuff).

> Here is another "extremist" point of view:

> Correctness for a optimization algorithm means that it does not miss
> optimization opportunities that it is designed to catch.  Therefore if
> an optimization algorithm implementation misses an optimization that
> it should catch, then this is a correctness issue.
> ;-)

I happen to agree, indeed, but it's a correctness issue of the
implementation, not a correctness issue of the compiler output, which
is what I'm talking about when I speak of correctness issues.

> You said you now get the same code with and without
> -fvar-tracking-assignments on your branch.  Can you also prove that
> the branch does not introduce new missed optimizations wrt. the latest
> revision that you merged from the trunk?

I could, and that's a very good idea (thanks!), but it will be easier
to do that after my next merge, when there won't be fixes for missed
optimizations, that I detected with my testing, missing from the
baseline.

After all such missed optimizations are in the trunk, I intend to
merge that into the branch and compare mergepoint and branch for
compiler output changes other than in debug information.  If there are
any changes (extremenly unlikely), these are bugs that I'll have to
fix.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Designs for better debug info in GCC
  2007-11-24 18:50         ` Alexandre Oliva
@ 2007-11-24 20:21           ` Richard Guenther
  0 siblings, 0 replies; 150+ messages in thread
From: Richard Guenther @ 2007-11-24 20:21 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Steven Bosscher, Mark Mitchell, Ian Lance Taylor, gcc-patches, gcc

On Nov 24, 2007 4:00 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Nov 24, 2007, "Steven Bosscher" <stevenb.gcc@gmail.com> wrote:
>
> > And it has to handle this new case everywhere.
>
> I've already explained why this isn't true.  It's not even close to
> being true.  In fact, I've chosen this representation *precisely*
> because I reasoned it would lead to the least global impact.  Of
> course you can refuse to believe that and point at the changes I had
> to make as alleged counter-proof, failing to notice how many other
> locations I haven't had to change and that just work because adjusting
> other instructions after transformations is precisely what all
> transformation passes already do.

It also makes some things easier - for example during inlining of a function
body we re-map all DECLs in the inlined copy.  With an on-the-side
representation you have to ensure to make the same mapping explicitly,
with DEBUG_INSNs the mapping is automatically done during the copying
of the IL.  A similar problem with using SSA_NAME definition points to
store information is using the renamer to rename a variable that already
has SSA_NAMES (which is IMHO bogus, as we do not detect the errorneous
case of overlapping life-ranges - but ignore that for now) - in this case you
need some magic to transfer the on-the-side debug information from the
old SSA_NAMEs to the new ones (where possible).

Just to mention a few problems we are running into ;)

Richard.

^ permalink raw reply	[flat|nested] 150+ messages in thread

end of thread, other threads:[~2007-12-31 14:25 UTC | newest]

Thread overview: 150+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-05  8:28 [vta] don't let debug insns get in the way of simple vect reduction Alexandre Oliva
2007-11-05 11:27 ` Richard Guenther
2007-11-07  7:52   ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Alexandre Oliva
2007-11-07 16:16     ` Ian Lance Taylor
2007-11-07 19:11       ` Designs for better debug info in GCC Alexandre Oliva
2007-11-07 22:57         ` Ian Lance Taylor
2007-11-07 23:05           ` Daniel Jacobowitz
2007-11-08  0:00           ` Mark Mitchell
2007-11-08  0:15             ` David Edelsohn
2007-11-08  0:35               ` Mark Mitchell
2007-11-08  5:14                 ` Alexandre Oliva
2007-11-08 18:28                   ` Alexandre Oliva
2007-11-22 23:07                 ` Frank Ch. Eigler
2007-11-22 23:13                   ` Richard Guenther
2007-11-23 20:53                     ` Frank Ch. Eigler
2007-11-24  1:53                       ` Alexandre Oliva
2007-11-24 15:02                     ` Robert Dewar
2007-11-08  5:15               ` Alexandre Oliva
2007-11-08 18:18                 ` Alexandre Oliva
2007-11-08 19:46                 ` Andrew Pinski
2007-11-08 20:39                   ` Alexandre Oliva
2007-11-09  8:39                   ` Robert Dewar
2007-11-08  5:44             ` Alexandre Oliva
2007-11-08 18:37               ` Alexandre Oliva
2007-11-08 19:13               ` Mark Mitchell
2007-11-08 19:13                 ` David Daney
2007-11-08 19:17                   ` Mark Mitchell
2007-11-09  2:09                 ` Alexandre Oliva
2007-11-12  4:49                   ` Mark Mitchell
2007-11-12 18:45                     ` Alexandre Oliva
2007-11-12 18:49                       ` Joe Buck
2007-11-25  6:57                         ` Alexandre Oliva
2007-11-25 12:09                           ` Richard Kenner
2007-11-12 18:53                       ` Ian Lance Taylor
2007-11-24  2:12                         ` Alexandre Oliva
2007-11-13 10:30                       ` Mark Mitchell
2007-11-24  1:54                         ` Alexandre Oliva
2007-11-13 15:30                       ` Michael Matz
2007-11-24  2:00                         ` Alexandre Oliva
2007-11-26 21:01                           ` Michael Matz
2007-11-27  5:31                             ` Alexandre Oliva
2007-11-27 20:31                               ` Michael Matz
2007-11-27 21:44                                 ` Alexandre Oliva
2007-11-08  9:54             ` Richard Guenther
2007-11-08  5:01           ` Alexandre Oliva
2007-11-08 18:15             ` Alexandre Oliva
2007-11-08 19:13             ` Ian Lance Taylor
2007-11-08 20:27               ` Alexandre Oliva
2007-11-08 21:26                 ` Ian Lance Taylor
2007-11-09  9:53                   ` Robert Dewar
2007-11-12  5:36                     ` Mark Mitchell
2007-11-12 17:34                       ` Alexandre Oliva
2007-11-12 17:54                         ` Mark Mitchell
2007-11-24  1:55                           ` Alexandre Oliva
2007-11-26  1:08                             ` Mark Mitchell
2007-12-05 14:22                               ` Diego Novillo
2007-12-05 22:10                                 ` Joe Buck
2007-12-15 21:41                                 ` Alexandre Oliva
2007-12-16  3:15                                   ` Daniel Berlin
2007-12-16 13:09                                     ` Alexandre Oliva
2007-12-17  1:27                                       ` Daniel Berlin
2007-12-17  4:20                                         ` Joe Buck
2007-12-17  8:13                                           ` Geert Bosch
2007-12-18  1:24                                             ` Alexandre Oliva
2007-12-18  1:29                                               ` Joe Buck
2007-12-18  4:40                                                 ` Alexandre Oliva
2007-12-18  7:42                                                   ` Robert Dewar
2007-12-18  8:09                                                     ` Alexandre Oliva
2007-12-18 14:01                                                       ` Robert Dewar
2007-12-18 21:20                                                         ` Alexandre Oliva
2007-12-18  7:35                                               ` Robert Dewar
2007-12-18  8:34                                                 ` Alexandre Oliva
2007-12-17 18:36                                           ` Alexandre Oliva
2007-12-17 17:59                                         ` Alexandre Oliva
2007-12-17 18:02                                           ` Diego Novillo
2007-12-17 20:34                                             ` Alexandre Oliva
2007-12-17 20:45                                               ` Diego Novillo
2007-12-18  1:02                                                 ` Alexandre Oliva
2007-12-18  1:14                                                   ` Diego Novillo
2007-12-18  5:21                                                     ` Alexandre Oliva
2007-12-18  9:10                                                       ` Alexandre Oliva
2007-12-18 13:20                                                         ` Diego Novillo
2007-12-18 15:42                                                           ` Alexandre Oliva
2007-12-18 22:43                                                         ` Daniel Berlin
2007-12-19  6:07                                                           ` Alexandre Oliva
2007-12-19  8:39                                                             ` Daniel Berlin
2007-12-19 16:12                                                               ` Daniel Berlin
2007-12-19 16:36                                                                 ` Andrew MacLeod
2007-12-19 19:49                                                                   ` Daniel Berlin
2007-12-19 20:00                                                                 ` Andrew MacLeod
2007-12-19 20:57                                                                   ` Daniel Berlin
2007-12-19 20:07                                                                 ` Alexandre Oliva
2007-12-19 22:00                                                                   ` Daniel Berlin
2007-12-20  9:26                                                                     ` Alexandre Oliva
2007-12-20 17:04                                                                       ` Ian Lance Taylor
2007-12-20 20:53                                                                         ` Alexandre Oliva
2007-12-19 20:27                                                               ` Alexandre Oliva
2007-12-18 23:35                                                         ` Daniel Berlin
2007-12-19  5:50                                                           ` Alexandre Oliva
2007-12-19 16:35                                                             ` Daniel Berlin
2007-12-19 19:46                                                               ` Alexandre Oliva
2007-12-19 20:39                                                                 ` Daniel Jacobowitz
2007-12-31 15:40                                               ` Richard Guenther
2007-12-16 21:42                                   ` Mark Mitchell
2007-11-09  9:55                   ` Seongbae Park (박성배, 朴成培)
2007-11-09 11:08                     ` Robert Dewar
2007-11-08  8:58           ` Paolo Bonzini
2007-11-07 17:20     ` Designs for better debug info in GCC (was: Re: [vta] don't let debug insns get in the way of simple vect reduction) Michael Matz
2007-11-07 18:45       ` Designs for better debug info in GCC Alexandre Oliva
2007-11-08 10:23         ` Michael Matz
2007-11-08 14:02           ` Robert Dewar
2007-11-08 15:13             ` H.J. Lu
2007-11-08 16:11             ` Michael Matz
2007-11-08 17:48               ` Alexandre Oliva
2007-11-09 12:46                 ` Michael Matz
2007-11-12 18:31                   ` Alexandre Oliva
2007-11-13 13:56                     ` Michael Matz
2007-11-24  2:34                       ` Alexandre Oliva
2007-11-26 20:56                         ` Michael Matz
2007-11-27  5:30                           ` Alexandre Oliva
2007-11-08 16:37             ` Alexandre Oliva
2007-11-09  1:26               ` Joe Buck
2007-11-09 14:53                 ` Daniel Jacobowitz
2007-11-09 17:06                   ` Robert Dewar
2007-11-09  1:26               ` Robert Dewar
2007-11-12 16:56                 ` Alexandre Oliva
2007-11-08 16:32           ` Alexandre Oliva
2007-11-12 21:04 Steven Bosscher
2007-11-24  1:37 ` Alexandre Oliva
2007-11-24  2:35   ` Steven Bosscher
2007-11-24 15:08     ` Alexandre Oliva
2007-11-24 15:18       ` Richard Kenner
2007-11-24 20:11         ` Alexandre Oliva
2007-11-24 20:46           ` Bernd Schmidt
2007-11-25  0:42             ` Alexandre Oliva
2007-11-25  7:19               ` Richard Guenther
2007-11-25 14:30                 ` Alexandre Oliva
2007-11-25 14:46                   ` Richard Guenther
2007-11-26 10:11                     ` Alexandre Oliva
2007-11-26 12:26                       ` Richard Guenther
2007-11-26 18:58                         ` Alexandre Oliva
2007-11-25 14:22               ` Alexandre Oliva
2007-11-24 20:48           ` Richard Kenner
2007-11-25  0:02             ` Alexandre Oliva
2007-11-25 14:23           ` Robert Dewar
2007-12-15 20:32             ` Alexandre Oliva
2007-12-15 21:41               ` Robert Dewar
2007-11-24 16:45       ` Steven Bosscher
2007-11-24 18:50         ` Alexandre Oliva
2007-11-24 20:21           ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).