Linemap and pph

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Linemap and pph
@ 2011-07-22 20:40 Gabriel Charette
  2011-07-22 21:15 ` Tom Tromey
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Charette @ 2011-07-22 20:40 UTC (permalink / raw)
  To: Diego Novillo, tromey; +Cc: Lawrence Crowl, gcc, Collin Winter

Hi,

@tromey: We have a question for you: the problem is detailed here and
our question to you is at the bottom. Thanks!

So the line problem we are having in pph is much bigger then we
originally thought.

The problem is:
As we've had issues with before: the chain of name bindings in any
cpp_bindings is inserted at the head, such that when we stream it back
in, the last one put on the chain by the parser is read first.

This is a problem in the linemap case because when we read the last
name, we also read it's input_location (line and column) and try to
replay that location by calling linemap_line_start (so far so
good....). This issue is that linemap_line_start EXPECTS to be called
with lines in increasing orders (but since the bindings are backwards
in the chain we actually call it with lines in a decreasing order),
the logic in linemap_line_start is such that if the line# is less then
the previous line# it was called with, it assumes this must be a new
file and creates a new file entry in the line_table by calling
linemap_add (which in our case is WRONG!!).

From a comment found in libcpp/line-map.c, linemaps clearly assume
this fact: "Since the set (line_table) is built chronologically, the
logical lines are monotonic increasing, and so the list is sorted and
we can use a binary search."

@crowl: Maybe this is the issue with resume_scope?? I'm just thinking
that since it seems lookups are fairly tied to the line_table, and the
fact that it should be built in a monotonically increasing order.

In the past we have solved similar issues (caused by the backwards
ordering), by replaying whatever it was in the correct order (before
doing anything else), and then simply loading references using the
pph_cache when the real ones were needed. BUT we can't do this with
source_locations as they are a simple typedef unsigned int, not actual
structs which are passed by pointer value...

Potential solution? :
I thought of a few ways to solve this:
1) Catch the calls to the linemap functions when generating the pph,
store them with the pph and replay them first before anything else
when reading the pph file.
The problem with that is that to when trying to obtain the source
location for a given binding streamed in: after calling
linemap_line_start when needed, we call linemap_position_for_column
which ASSUMES that we are currently on the last line which
linemap_line_start was called with (and thus doing the full replay
first would not really work here, is there another function we can
call that can take a line as an argument as well as a column??)

2) We could potentially output a reference to the source_location and
use that to do what we always do using the cache in this sort of
situation, however, we would need to figure out exactly which integer
we need to output a reference of (since they are passed around by
value...). This would be kinda hacky, and would also imply building
some hackery to go around the fact that the lto version of the input
location doesn't need to use references (i.e. we would need a streamer
hook, but two different functions signatures...). I don't think this
is a nice way of going about it, but could be an option...

@tromey: I hear you are the person in the know when it comes down to
linemaps, do you have any hint on how we could rebuild a valid
line_table given we are obtaining the bindings from the last one in
the file to the first one (that is, using pph (pre-parsed headers)
where we are trying to restore a cached version of the parser state
for a header)?

Thanks,
Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-22 20:40 Linemap and pph Gabriel Charette
@ 2011-07-22 21:15 ` Tom Tromey
  2011-07-22 22:12   ` Gabriel Charette
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Tromey @ 2011-07-22 21:15 UTC (permalink / raw)
  To: Gabriel Charette; +Cc: Diego Novillo, Lawrence Crowl, gcc, Collin Winter

>>>>> "Gabriel" == Gabriel Charette <gchare@google.com> writes:

Gabriel> @tromey: We have a question for you: the problem is detailed
Gabriel> here and our question to you is at the bottom. Thanks!

Sure.  I have not followed PPH progress very closely and after reading
your message I am not sure I have much to offer.  I'll give it a stab
anyhow.

Gabriel> This is a problem in the linemap case because when we read the last
Gabriel> name, we also read it's input_location (line and column) and try to
Gabriel> replay that location by calling linemap_line_start (so far so
Gabriel> good....). This issue is that linemap_line_start EXPECTS to be called
Gabriel> with lines in increasing orders (but since the bindings are backwards
Gabriel> in the chain we actually call it with lines in a decreasing order),
Gabriel> the logic in linemap_line_start is such that if the line# is less then
Gabriel> the previous line# it was called with, it assumes this must be a new
Gabriel> file and creates a new file entry in the line_table by calling
Gabriel> linemap_add (which in our case is WRONG!!).

My belief is that linemap was designed to support the typical case for a
C and C++ compiler.  However, this is not sacrosanct; you could perhaps
change the implementation to work better for your case.  I suspect,
though, that this won't be easy -- and also be aware of Dodji's patch
series, which complicates linemap.

Gabriel> In the past we have solved similar issues (caused by the backwards
Gabriel> ordering), by replaying whatever it was in the correct order (before
Gabriel> doing anything else), and then simply loading references using the
Gabriel> pph_cache when the real ones were needed. BUT we can't do this with
Gabriel> source_locations as they are a simple typedef unsigned int, not actual
Gabriel> structs which are passed by pointer value...

A source_location is a reference to a particular line_map.  It is sort
of like a compressed pointer.

Gabriel> @tromey: I hear you are the person in the know when it comes down to
Gabriel> linemaps, do you have any hint on how we could rebuild a valid
Gabriel> line_table given we are obtaining the bindings from the last one in
Gabriel> the file to the first one (that is, using pph (pre-parsed headers)
Gabriel> where we are trying to restore a cached version of the parser state
Gabriel> for a header)?

Can you not just serialize the line map when creating the PPH?

Then, when using the PPH, read the serialized line map, in order, into
the current global line map.  This will preserve the include chains and
all the rest.  Then rewrite source locations coming from the PPH to new
locations from the new line map.  This rewriting could perhaps even be
done efficiently, say just by adding an offset to all locations --
though I am not sure, this would require some research.  (It seems
doable to me, though; perhaps, but not definitely, requiring some
additional linemap API.)

I have no idea if this is practical for you.  I guess it depends on how
a PPH is read in.

Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-22 21:15 ` Tom Tromey
@ 2011-07-22 22:12   ` Gabriel Charette
  2011-07-23  5:46     ` Tom Tromey
  2011-07-23 17:54     ` Dodji Seketeli
  0 siblings, 2 replies; 15+ messages in thread
From: Gabriel Charette @ 2011-07-22 22:12 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Diego Novillo, Lawrence Crowl, gcc, Collin Winter

Thanks for the quick reply Tom,

On Fri, Jul 22, 2011 at 1:39 PM, Tom Tromey <tromey@redhat.com> wrote:
>>>>>> "Gabriel" == Gabriel Charette <gchare@google.com> writes:
>
> Gabriel> @tromey: We have a question for you: the problem is detailed
> Gabriel> here and our question to you is at the bottom. Thanks!
>
> Sure.  I have not followed PPH progress very closely and after reading
> your message I am not sure I have much to offer.  I'll give it a stab
> anyhow.
>
> Gabriel> This is a problem in the linemap case because when we read the last
> Gabriel> name, we also read it's input_location (line and column) and try to
> Gabriel> replay that location by calling linemap_line_start (so far so
> Gabriel> good....). This issue is that linemap_line_start EXPECTS to be called
> Gabriel> with lines in increasing orders (but since the bindings are backwards
> Gabriel> in the chain we actually call it with lines in a decreasing order),
> Gabriel> the logic in linemap_line_start is such that if the line# is less then
> Gabriel> the previous line# it was called with, it assumes this must be a new
> Gabriel> file and creates a new file entry in the line_table by calling
> Gabriel> linemap_add (which in our case is WRONG!!).
>
> My belief is that linemap was designed to support the typical case for a
> C and C++ compiler.  However, this is not sacrosanct; you could perhaps
> change the implementation to work better for your case.  I suspect,
> though, that this won't be easy -- and also be aware of Dodji's patch
> series, which complicates linemap.
>

We are tying to keep pph as "pluginable" as possible (Diego correct me
if I'm wrong), so changing the actual implementation of the linemap
would be our very last resort I think.

> Gabriel> In the past we have solved similar issues (caused by the backwards
> Gabriel> ordering), by replaying whatever it was in the correct order (before
> Gabriel> doing anything else), and then simply loading references using the
> Gabriel> pph_cache when the real ones were needed. BUT we can't do this with
> Gabriel> source_locations as they are a simple typedef unsigned int, not actual
> Gabriel> structs which are passed by pointer value...
>
> A source_location is a reference to a particular line_map.  It is sort
> of like a compressed pointer.
>

Right, but our cache works in a way that, say you output a refrenece
to a struct, you first remember the pointer value, then output the
struct, if you try to ouput the same reference (same pointer value)
later we simply mark that as a shared reference in the output (and
have logic to link the two on the way in).

However since source_location aren't pointers per se, this wouldn't
work (at least with our current cache implementation, and changing
that is also last resort in my opinion)

> Gabriel> @tromey: I hear you are the person in the know when it comes down to
> Gabriel> linemaps, do you have any hint on how we could rebuild a valid
> Gabriel> line_table given we are obtaining the bindings from the last one in
> Gabriel> the file to the first one (that is, using pph (pre-parsed headers)
> Gabriel> where we are trying to restore a cached version of the parser state
> Gabriel> for a header)?
>
> Can you not just serialize the line map when creating the PPH?
>

We were wondering about that, the problem we thought is that a pph can
be loaded from anywhere in many different C files (which would
generate a different linemap entry in each case I think?). If there
was a way to serialize the linemap entries from the LC_ENTER entry for
the header file to the LC_LEAVE (i.e. ignoring builtins, command line,
etc.), and then directly insert those entries in a given C file
compilation's linemaps entries, that would be great!

> Then, when using the PPH, read the serialized line map, in order, into
> the current global line map.  This will preserve the include chains and
> all the rest.  Then rewrite source locations coming from the PPH to new
> locations from the new line map.  This rewriting could perhaps even be
> done efficiently, say just by adding an offset to all locations --
> though I am not sure, this would require some research.  (It seems
> doable to me, though; perhaps, but not definitely, requiring some
> additional linemap API.)
>

I will investigate in that direction.

> I have no idea if this is practical for you.  I guess it depends on how
> a PPH is read in.
>

A pph is read in by: reading the preprocessor tokens and replaying
them (i.e. redefining them as if they had been pasted in the C file at
the very location of the include triggering the pph). We then load the
bindings attached to the global_namespace defined in the header and
all bindings are recursively loaded in that fashion, the only special
fact to be aware of regarding the linemap issue is that those bindings
(by the very nature of the parser logic happening before we cache the
parser's state into the pph), get to us in the reverse order they were
originally seen in the header file (thus creating this linemap
problem).

Thank you very much,
Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-22 22:12   ` Gabriel Charette
@ 2011-07-23  5:46     ` Tom Tromey
  2011-07-23 17:54     ` Dodji Seketeli
  1 sibling, 0 replies; 15+ messages in thread
From: Tom Tromey @ 2011-07-23  5:46 UTC (permalink / raw)
  To: Gabriel Charette; +Cc: Diego Novillo, Lawrence Crowl, gcc, Collin Winter

Gabriel> We are tying to keep pph as "pluginable" as possible (Diego correct me
Gabriel> if I'm wrong), so changing the actual implementation of the linemap
Gabriel> would be our very last resort I think.

Gabriel> However since source_location aren't pointers per se, this wouldn't
Gabriel> work (at least with our current cache implementation, and changing
Gabriel> that is also last resort in my opinion)

I think something's got to give :)

Tom> Can you not just serialize the line map when creating the PPH?

Gabriel> We were wondering about that, the problem we thought is that a pph can
Gabriel> be loaded from anywhere in many different C files (which would
Gabriel> generate a different linemap entry in each case I think?). If there
Gabriel> was a way to serialize the linemap entries from the LC_ENTER entry for
Gabriel> the header file to the LC_LEAVE (i.e. ignoring builtins, command line,
Gabriel> etc.), and then directly insert those entries in a given C file
Gabriel> compilation's linemaps entries, that would be great!

Sure, I think you could probably serialize just some subset of the
linemap.  It is hard to be positive, as nobody has ever tried it, and I
don't know enough about your needs to suggest what subsetting might make
sense.

The question then is whether you are certain that the other objects you
write out are guaranteed not to reference locations outside this set.

Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-22 22:12   ` Gabriel Charette
  2011-07-23  5:46     ` Tom Tromey
@ 2011-07-23 17:54     ` Dodji Seketeli
  2011-07-25 19:34       ` Gabriel Charette
  1 sibling, 1 reply; 15+ messages in thread
From: Dodji Seketeli @ 2011-07-23 17:54 UTC (permalink / raw)
  To: Gabriel Charette
  Cc: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter

Gabriel Charette <gchare@google.com> a écrit:

>> Gabriel> @tromey: I hear you are the person in the know when it comes down to
>> Gabriel> linemaps, do you have any hint on how we could rebuild a valid
>> Gabriel> line_table given we are obtaining the bindings from the last one in
>> Gabriel> the file to the first one (that is, using pph (pre-parsed headers)
>> Gabriel> where we are trying to restore a cached version of the parser state
>> Gabriel> for a header)?
>>
>> Can you not just serialize the line map when creating the PPH?
>>
>
> We were wondering about that, the problem we thought is that a pph can
> be loaded from anywhere in many different C files (which would
> generate a different linemap entry in each case I think?).

Just to make sure I understand, a given header included from two
different main C files with two different sets of macro context would
yield two macro maps with different contents anyway.  So I would believe
that these would yield two different pph, one for each C file.  Is that
correct?

For the cases where the same pph could be loaded from two different main
CUs, I'd say that in the context of a given main CU being parsed, loading
the PPH and its associated serialized line map would yield a new line
map to be inserted into the struct linemaps of the current CU.

Then, not only should the source_locations encoded by that new
de-serialized line map be rewritten (probably by modifying things like
struct linemap::to_line and struct linemap::included_from, if need be)
to fit into the source_location space of the line map set carried by the
struct linemaps of the current main CU being parsed, but the
source_locations carried by the tokens present inside the pph must also
be changed to look as if they were yielded by the newly rewritten line
map.  This is what could look expensive at first sight.

Wouldn't that address the issue of a given pph loaded by multiple main C
files?

> If there was a way to serialize the linemap entries from the LC_ENTER
> entry for the header file to the LC_LEAVE (i.e. ignoring builtins,
> command line, etc.)

I believe all the builtins have the same source_location, which is the
BUILTIN_LOCATION constant.  As for command line options, I believe they
get the location UNKNOWN_LOCATION.  So I would think these should not be
a problem when de-serializing the line map.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-23 17:54     ` Dodji Seketeli
@ 2011-07-25 19:34       ` Gabriel Charette
  2011-07-25 22:55         ` Gabriel Charette
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Charette @ 2011-07-25 19:34 UTC (permalink / raw)
  To: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Gabriel Charette, Dodji Seketeli

On Sat, Jul 23, 2011 at 8:51 AM, Dodji Seketeli <dodji@seketeli.org> wrote:
> Gabriel Charette <gchare@google.com> a écrit:
>
>>> Gabriel> @tromey: I hear you are the person in the know when it comes down to
>>> Gabriel> linemaps, do you have any hint on how we could rebuild a valid
>>> Gabriel> line_table given we are obtaining the bindings from the last one in
>>> Gabriel> the file to the first one (that is, using pph (pre-parsed headers)
>>> Gabriel> where we are trying to restore a cached version of the parser state
>>> Gabriel> for a header)?
>>>
>>> Can you not just serialize the line map when creating the PPH?
>>>
>>
>> We were wondering about that, the problem we thought is that a pph can
>> be loaded from anywhere in many different C files (which would
>> generate a different linemap entry in each case I think?).
>
> Just to make sure I understand, a given header included from two
> different main C files with two different sets of macro context would
> yield two macro maps with different contents anyway.  So I would believe
> that these would yield two different pph, one for each C file.  Is that
> correct?
>

Before using a pph, we make sure the current macro context is the same
as the one in which it was originally compiled (as far as what that
header is using from that context). So yes, sometimes we have
different pph for different C files, but it's very possible to use the
same pph for different includes in independent C files.

> For the cases where the same pph could be loaded from two different main
> CUs, I'd say that in the context of a given main CU being parsed, loading
> the PPH and its associated serialized line map would yield a new line
> map to be inserted into the struct linemaps of the current CU.
>
> Then, not only should the source_locations encoded by that new
> de-serialized line map be rewritten (probably by modifying things like
> struct linemap::to_line and struct linemap::included_from, if need be)
> to fit into the source_location space of the line map set carried by the
> struct linemaps of the current main CU being parsed, but the
> source_locations carried by the tokens present inside the pph must also
> be changed to look as if they were yielded by the newly rewritten line
> map.  This is what could look expensive at first sight.
>

Right this is one of the option, but as you say, potentially an
expensive change.

> Wouldn't that address the issue of a given pph loaded by multiple main C
> files?
>

If it works, yes, I think.

>> If there was a way to serialize the linemap entries from the LC_ENTER
>> entry for the header file to the LC_LEAVE (i.e. ignoring builtins,
>> command line, etc.)
>
> I believe all the builtins have the same source_location, which is the
> BUILTIN_LOCATION constant.  As for command line options, I believe they
> get the location UNKNOWN_LOCATION.  So I would think these should not be
> a problem when de-serializing the line map.
>

Right, but no point serializing them only to filter them on input.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-25 19:34       ` Gabriel Charette
@ 2011-07-25 22:55         ` Gabriel Charette
  2011-07-26 14:46           ` Dodji Seketeli
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Charette @ 2011-07-25 22:55 UTC (permalink / raw)
  To: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Gabriel Charette, Dodji Seketeli
  Cc: Cary Coutant

Alright, so after looking even more at the linemap code, here is what
I'm thinking now:

So the main problem is:
with the linemap logic is that linemap_line_start adds a new table
entry if line_delta < 0 (I don't understand why this is needed, my
assumption is that it is because the rest of the logic depends on
set->highest_location, thus we want to make sure we are higher than
the highest so far? can someone clarify why we need this?)

My solution:
I will add a boolean flag to linemap_line_start's parameters,
allowEarlierStartLineStart, which when true, will not create a new
entry in the line_table even if the line_delta < 0.

Instead of relying on highest_location to find the current line's
source location, it will use the start_location and add to it the
delta between to_line and map->to_line).

I will also add a new linemap function,
linemap_position_for_line_and_column, which will be just like
linemap_position_for_column, except that it won't assume the
highest_line in the set is the one we are currently on (this function
will thus allow us to get a source_location without depending on the
current assumptions on the order these functions are called in).

This solution would not modify the actual final result of the entries,
at the end of parsing, as far as I understand (I would also of course
only use this new flag in the call from lto_input_location (which we
use in pph to get our source_location from the streamed
expanded_location))

Let me know what you think,
Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-25 22:55         ` Gabriel Charette
@ 2011-07-26 14:46           ` Dodji Seketeli
  2011-07-26 18:49             ` Gabriel Charette
  0 siblings, 1 reply; 15+ messages in thread
From: Dodji Seketeli @ 2011-07-26 14:46 UTC (permalink / raw)
  To: Gabriel Charette
  Cc: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Cary Coutant

Gabriel Charette <gchare@google.com> a écrit:

> Alright, so after looking even more at the linemap code, here is what
> I'm thinking now:
>
> So the main problem is:
> with the linemap logic is that linemap_line_start adds a new table
> entry if line_delta < 0 (I don't understand why this is needed, my
> assumption is that it is because the rest of the logic depends on
> set->highest_location, thus we want to make sure we are higher than
> the highest so far? can someone clarify why we need this?)

Here is how I understand this.  Sorry if it's too obvious.

The linemap_line_start model follows that of a parser that parses tokens
from left to right, and from top to bottom.  Let's imagine a main CU
called M which has this layout:

<---- start, locus #0

[... tokens ...]

#include "A"  <--- locus #1

[... tokens ...]

There are going to be at least three line maps (instances of struct
line_map) created and added to the line map set (struct line_maps) of M:
one for loci from #0 to right before #1, at least one for loci coming
from the included header A (there can be several line maps here, if A
itself includes other headers) and another one from loci coming right
after #1.

A hard invariant here is that source_locations yielded by a given line
map must be less than the struct line_map::start_location of the next
line map.  This is (partly) so that line map lookup (when we want to get
the line map that corresponds to a given source_location) can be done
using a binary search, which is faster than just doing a linear search.
A side effect of this is that source_locations handed out by
linemap_line_start increase monotonically.

With these assumptions in mind, when the parser code calls
linemap_line_start to get the source_location corresponding to a new
line at column 0, the only case where the new line is less than the
preceding line (so that line_delta < 0) is if the parser is entering a
new file.  E.g, it is entering the header A, coming from M.  Or, it is
getting back to M, leaving A.  In that case, it seems required for
linemap_line_start to create the new line map for A or for M.

> My solution:
> I will add a boolean flag to linemap_line_start's parameters,
> allowEarlierStartLineStart, which when true, will not create a new
> entry in the line_table even if the line_delta < 0.

If I understand correctly, you are acting like if you were parsing from
right to left and from bottom to top, for the content of A.

In that context, you need to keep the hard invariant I talked about
above, so that line map lookup keeps working, at least.

In other words, when you generate a source_location for a token T0 that
comes topologically /before/ the token T1 you generated source_location
for at the previous step, you need to make sure that source_location of
T0 is less than source_location of T1, and that the struct
line_map::start_location of the current map is less than T0.  If T0
comes from a different header than T1 (suppose both T1 and T0 comes from
headers that are included by A) then a new line map (different from the
line map for T1) needs to be created for T0.  Now I am not sure if in
your design, there is going to be only one pph for A, or if there is
going to be one pph for each header included by A as well.  I would have
imagined the later case to be more appropriate.

It seems to me that for you approach to work, you should at least make
sure to add a new line map when T0 comes a different header than T1,
while processing the pph of A.

> Instead of relying on highest_location to find the current line's
> source location, it will use the start_location and add to it the
> delta between to_line and map->to_line).

If the map->start_location was allocated correctly, I guess this could
work assuming the constraints I raised above are respected.

Note however that there are corner cases like having so long lines in
the source code being parsed that we fill the column-space of the line
map so quickly that we need to allocate more line maps for a given file.
These are corner cases taken care of by linemap_line_start today that
you could miss if you are trying to walk in reverse order like this.
Those would have been addressed by serializing the original line map set
of A that we knew was correct, barring the value of their
map->start_location and map->to_line.

> I will also add a new linemap function,
> linemap_position_for_line_and_column, which will be just like
> linemap_position_for_column, except that it won't assume the
> highest_line in the set is the one we are currently on (this function
> will thus allow us to get a source_location without depending on the
> current assumptions on the order these functions are called in).

Heh.  I have added a function with that exact name in my patch set to
track locations across macro expansions.  Look at the source code at
http://tinyurl.com/3egxp47, search for
linemap_position_for_line_and_column (obviously).  The implicit
assumption made by that function is that the {line,column} pair which
you are requesting a source_location for belongs to the current line
map, though.

HTH.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-26 14:46           ` Dodji Seketeli
@ 2011-07-26 18:49             ` Gabriel Charette
  2011-07-26 22:54               ` Gabriel Charette
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Charette @ 2011-07-26 18:49 UTC (permalink / raw)
  To: Gabriel Charette, Tom Tromey, Diego Novillo, Lawrence Crowl, gcc,
	Collin Winter, Cary Coutant

On Tue, Jul 26, 2011 at 4:25 AM, Dodji Seketeli <dodji@seketeli.org> wrote:
> Gabriel Charette <gchare@google.com> a écrit:
>
>> Alright, so after looking even more at the linemap code, here is what
>> I'm thinking now:
>>
>> So the main problem is:
>> with the linemap logic is that linemap_line_start adds a new table
>> entry if line_delta < 0 (I don't understand why this is needed, my
>> assumption is that it is because the rest of the logic depends on
>> set->highest_location, thus we want to make sure we are higher than
>> the highest so far? can someone clarify why we need this?)
>
> Here is how I understand this.  Sorry if it's too obvious.
>
> The linemap_line_start model follows that of a parser that parses tokens
> from left to right, and from top to bottom.  Let's imagine a main CU
> called M which has this layout:
>
> <---- start, locus #0
>
> [... tokens ...]
>
> #include "A"  <--- locus #1
>
> [... tokens ...]
>
> There are going to be at least three line maps (instances of struct
> line_map) created and added to the line map set (struct line_maps) of M:
> one for loci from #0 to right before #1, at least one for loci coming
> from the included header A (there can be several line maps here, if A
> itself includes other headers) and another one from loci coming right
> after #1.
>
> A hard invariant here is that source_locations yielded by a given line
> map must be less than the struct line_map::start_location of the next
> line map.  This is (partly) so that line map lookup (when we want to get
> the line map that corresponds to a given source_location) can be done
> using a binary search, which is faster than just doing a linear search.
> A side effect of this is that source_locations handed out by
> linemap_line_start increase monotonically.
>
> With these assumptions in mind, when the parser code calls
> linemap_line_start to get the source_location corresponding to a new
> line at column 0, the only case where the new line is less than the
> preceding line (so that line_delta < 0) is if the parser is entering a
> new file.  E.g, it is entering the header A, coming from M.  Or, it is
> getting back to M, leaving A.  In that case, it seems required for
> linemap_line_start to create the new line map for A or for M.
>

Thanks for this explanation, that's pretty much how I had understood
the linemap code, but you clarified some of the things I wasn't clear
about.

>> My solution:
>> I will add a boolean flag to linemap_line_start's parameters,
>> allowEarlierStartLineStart, which when true, will not create a new
>> entry in the line_table even if the line_delta < 0.
>
> If I understand correctly, you are acting like if you were parsing from
> right to left and from bottom to top, for the content of A.
>

Right.

> In that context, you need to keep the hard invariant I talked about
> above, so that line map lookup keeps working, at least.
>
> In other words, when you generate a source_location for a token T0 that
> comes topologically /before/ the token T1 you generated source_location
> for at the previous step, you need to make sure that source_location of
> T0 is less than source_location of T1, and that the struct
> line_map::start_location of the current map is less than T0.  If T0
> comes from a different header than T1 (suppose both T1 and T0 comes from
> headers that are included by A) then a new line map (different from the
> line map for T1) needs to be created for T0.  Now I am not sure if in
> your design, there is going to be only one pph for A, or if there is
> going to be one pph for each header included by A as well.  I would have
> imagined the later case to be more appropriate.
>

Correct we are using the multiple pph approach (one for each included
header in the header).

This actually makes the problem harder than I had originally pictured
it yesterday. I'm now thinking using serialization is going to be a
better solution because of that.

>> Instead of relying on highest_location to find the current line's
>> source location, it will use the start_location and add to it the
>> delta between to_line and map->to_line).
>
> If the map->start_location was allocated correctly, I guess this could
> work assuming the constraints I raised above are respected.
>
> Note however that there are corner cases like having so long lines in
> the source code being parsed that we fill the column-space of the line
> map so quickly that we need to allocate more line maps for a given file.
> These are corner cases taken care of by linemap_line_start today that
> you could miss if you are trying to walk in reverse order like this.
> Those would have been addressed by serializing the original line map set
> of A that we knew was correct, barring the value of their
> map->start_location and map->to_line.
>

One more reason for serialization!

>> I will also add a new linemap function,
>> linemap_position_for_line_and_column, which will be just like
>> linemap_position_for_column, except that it won't assume the
>> highest_line in the set is the one we are currently on (this function
>> will thus allow us to get a source_location without depending on the
>> current assumptions on the order these functions are called in).
>
> Heh.  I have added a function with that exact name in my patch set to
> track locations across macro expansions.  Look at the source code at
> http://tinyurl.com/3egxp47, search for
> linemap_position_for_line_and_column (obviously).  The implicit
> assumption made by that function is that the {line,column} pair which
> you are requesting a source_location for belongs to the current line
> map, though.
>

Good to know, I had made the same assumption in mine that we were
looking for {line, column} in the current line map.
----
So, I think serializing the linemap and tweaking it on the way in is
going to be easier, in particular because of nested pph (header
including headers).
I'll put down my strategy for serializing the linemap and send a
follow up email soon.

Thanks,
Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-26 18:49             ` Gabriel Charette
@ 2011-07-26 22:54               ` Gabriel Charette
  2011-07-27 10:11                 ` Dodji Seketeli
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Charette @ 2011-07-26 22:54 UTC (permalink / raw)
  To: Gabriel Charette, Tom Tromey, Diego Novillo, Lawrence Crowl, gcc,
	Collin Winter, Cary Coutant

As mentionned in my previous email, I'm now thinking of serializing
the linemap from the header by pph'ed, here is how I think this could
work:

From what I understand, the source_locations allocated for everything
in a given set of headers (from the LC_ENTER for the header in the
line_table up to, and including everything in between, the
corresponding LC_LEAVE) is dependent on only one thing; the value of
line_table->highest_location when the header was inserted (i.e. if in
two different contexts we happen to have the same
line_table->highest_location when inserting header A.h, all of the
tokens for A.h in each context will have the same source_location).

If the previous statement is true, then we calculate an offset between
the line_table->highest_location as it was in the serialized version
of the header and as it is in the C file in which we are about to
de-serialize the line table.

We then need to update some things based on that offset:
  for each line_map entry in the line_table: start_location
  for each source_location field on tokens coming in from the header

Some other things need to be updated with simple logic:
  for each line_map entry in the line_table: included_from
  for the line_table itself: highest_location, highest_line, max_column_hint

Some things just stay as is when de-serialized:
  for each line_map entry in the line_table: reason, sysp, to_file,
to_line, column_bits

After doing this we should have the same line_map we would have had
when doing a regular compile (non-pph), and the tokens coming from the
header file should have the correct source_location, simply by adding
the calculated offset to the one they had previously (this also allows
us to only serialize the source_location for each token as opposed to
serializing it's expanded location, saving space in the pph).

Let me know what you think,
Gabriel

On Tue, Jul 26, 2011 at 11:10 AM, Gabriel Charette <gchare@google.com> wrote:
> On Tue, Jul 26, 2011 at 4:25 AM, Dodji Seketeli <dodji@seketeli.org> wrote:
>> Gabriel Charette <gchare@google.com> a écrit:
>>
>>> Alright, so after looking even more at the linemap code, here is what
>>> I'm thinking now:
>>>
>>> So the main problem is:
>>> with the linemap logic is that linemap_line_start adds a new table
>>> entry if line_delta < 0 (I don't understand why this is needed, my
>>> assumption is that it is because the rest of the logic depends on
>>> set->highest_location, thus we want to make sure we are higher than
>>> the highest so far? can someone clarify why we need this?)
>>
>> Here is how I understand this.  Sorry if it's too obvious.
>>
>> The linemap_line_start model follows that of a parser that parses tokens
>> from left to right, and from top to bottom.  Let's imagine a main CU
>> called M which has this layout:
>>
>> <---- start, locus #0
>>
>> [... tokens ...]
>>
>> #include "A"  <--- locus #1
>>
>> [... tokens ...]
>>
>> There are going to be at least three line maps (instances of struct
>> line_map) created and added to the line map set (struct line_maps) of M:
>> one for loci from #0 to right before #1, at least one for loci coming
>> from the included header A (there can be several line maps here, if A
>> itself includes other headers) and another one from loci coming right
>> after #1.
>>
>> A hard invariant here is that source_locations yielded by a given line
>> map must be less than the struct line_map::start_location of the next
>> line map.  This is (partly) so that line map lookup (when we want to get
>> the line map that corresponds to a given source_location) can be done
>> using a binary search, which is faster than just doing a linear search.
>> A side effect of this is that source_locations handed out by
>> linemap_line_start increase monotonically.
>>
>> With these assumptions in mind, when the parser code calls
>> linemap_line_start to get the source_location corresponding to a new
>> line at column 0, the only case where the new line is less than the
>> preceding line (so that line_delta < 0) is if the parser is entering a
>> new file.  E.g, it is entering the header A, coming from M.  Or, it is
>> getting back to M, leaving A.  In that case, it seems required for
>> linemap_line_start to create the new line map for A or for M.
>>
>
> Thanks for this explanation, that's pretty much how I had understood
> the linemap code, but you clarified some of the things I wasn't clear
> about.
>
>>> My solution:
>>> I will add a boolean flag to linemap_line_start's parameters,
>>> allowEarlierStartLineStart, which when true, will not create a new
>>> entry in the line_table even if the line_delta < 0.
>>
>> If I understand correctly, you are acting like if you were parsing from
>> right to left and from bottom to top, for the content of A.
>>
>
> Right.
>
>> In that context, you need to keep the hard invariant I talked about
>> above, so that line map lookup keeps working, at least.
>>
>> In other words, when you generate a source_location for a token T0 that
>> comes topologically /before/ the token T1 you generated source_location
>> for at the previous step, you need to make sure that source_location of
>> T0 is less than source_location of T1, and that the struct
>> line_map::start_location of the current map is less than T0.  If T0
>> comes from a different header than T1 (suppose both T1 and T0 comes from
>> headers that are included by A) then a new line map (different from the
>> line map for T1) needs to be created for T0.  Now I am not sure if in
>> your design, there is going to be only one pph for A, or if there is
>> going to be one pph for each header included by A as well.  I would have
>> imagined the later case to be more appropriate.
>>
>
> Correct we are using the multiple pph approach (one for each included
> header in the header).
>
> This actually makes the problem harder than I had originally pictured
> it yesterday. I'm now thinking using serialization is going to be a
> better solution because of that.
>
>>> Instead of relying on highest_location to find the current line's
>>> source location, it will use the start_location and add to it the
>>> delta between to_line and map->to_line).
>>
>> If the map->start_location was allocated correctly, I guess this could
>> work assuming the constraints I raised above are respected.
>>
>> Note however that there are corner cases like having so long lines in
>> the source code being parsed that we fill the column-space of the line
>> map so quickly that we need to allocate more line maps for a given file.
>> These are corner cases taken care of by linemap_line_start today that
>> you could miss if you are trying to walk in reverse order like this.
>> Those would have been addressed by serializing the original line map set
>> of A that we knew was correct, barring the value of their
>> map->start_location and map->to_line.
>>
>
> One more reason for serialization!
>
>>> I will also add a new linemap function,
>>> linemap_position_for_line_and_column, which will be just like
>>> linemap_position_for_column, except that it won't assume the
>>> highest_line in the set is the one we are currently on (this function
>>> will thus allow us to get a source_location without depending on the
>>> current assumptions on the order these functions are called in).
>>
>> Heh.  I have added a function with that exact name in my patch set to
>> track locations across macro expansions.  Look at the source code at
>> http://tinyurl.com/3egxp47, search for
>> linemap_position_for_line_and_column (obviously).  The implicit
>> assumption made by that function is that the {line,column} pair which
>> you are requesting a source_location for belongs to the current line
>> map, though.
>>
>
> Good to know, I had made the same assumption in mine that we were
> looking for {line, column} in the current line map.
> ----
> So, I think serializing the linemap and tweaking it on the way in is
> going to be easier, in particular because of nested pph (header
> including headers).
> I'll put down my strategy for serializing the linemap and send a
> follow up email soon.
>
> Thanks,
> Gabriel
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-26 22:54               ` Gabriel Charette
@ 2011-07-27 10:11                 ` Dodji Seketeli
  2011-07-27 14:17                   ` Gabriel Charette
  0 siblings, 1 reply; 15+ messages in thread
From: Dodji Seketeli @ 2011-07-27 10:11 UTC (permalink / raw)
  To: Gabriel Charette
  Cc: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Cary Coutant

Gabriel Charette <gchare@google.com> a écrit:

> From what I understand, the source_locations allocated for
> everything in a given set of headers (from the LC_ENTER for the
> header in the line_table up to, and including everything in between,
> the corresponding LC_LEAVE) is dependent on only one thing; the
> value of line_table->highest_location when the header was inserted

The source_location handed out by a given line map is dependant on
three things:

1/ The line_map::start_location.  For a given map, this one equals
line_table->highest_location + 1, because at the time the line map is
created, its line_map::start_location must be greater than the highest
source_location handed out by any line map previously created.  At any
point in time, line_table->highest_location equals the
line_map::start_location of the lastly created line_map.  This is part
of the monotonically increasing property of source_location we were
talking about earlier.

2/ The line number of the location

3/ The column number of the location

> (i.e. if in two different contexts we happen to have the same
> line_table->highest_location when inserting header A.h, all of the
> tokens for A.h in each context will have the same source_location).

Each token coming from a given A.h will have a different
source_location, as they will presumably have a different {line,column}
pair.

If in two different contexts we happen to have the same
line_table->highest_location, the line_map::start_location of the two
different line maps of the two different A.hs will be equal, though.

> If the previous statement is true, then we calculate an offset
> between the line_table->highest_location as it was in the serialized
> version of the header and as it is in the C file in which we are
> about to de-serialize the line table.
>
> We then need to update some things based on that offset:

[...]

It seems to me that you could just set the line_map::start_location of
the de-serialized map for the current portion of A.h to the current
line_table->highest_location of the main CU your are currently parsing
(i.e, just forget about the serialized line_table->highest_location
into account.  Actually I would think that it's unnecessary to
serialize the line_table->highest_location) + 1.

Then you should also set the de-serialized line_map::to_line to the
current line in your context.  Then you should add that line map to
the current line_table and set line_table->highest_location to the
line_map::start_location you have just computed.

Then, assign the source_location of each token that belongs to that
de-serialized map to a source_location that is handed out by your new
linemap_position_for_line_and_column, with de-serialized map passed in
argument.  This is where you would progress "backward" in the token
stream, for that given map.

Then somehow, when you need to suck in a new pph (so this time you are
going downward in the stream again), just start this dance of
de-serializing the map for that new pph, updating its properties,
adding it to line_table, and setting the source_location of the tokens
coming for that pph again.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-27 10:11                 ` Dodji Seketeli
@ 2011-07-27 14:17                   ` Gabriel Charette
  2011-07-27 14:26                     ` Dodji Seketeli
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Charette @ 2011-07-27 14:17 UTC (permalink / raw)
  To: Dodji Seketeli
  Cc: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Cary Coutant

I think I wasn't clear in the way I expressed my assumptions in my last email:

On Wed, Jul 27, 2011 at 1:11 AM, Dodji Seketeli <dodji@seketeli.org> wrote:
>
> Gabriel Charette <gchare@google.com> a écrit:
>
> > From what I understand, the source_locations allocated for
> > everything in a given set of headers (from the LC_ENTER for the
> > header in the line_table up to, and including everything in between,
> > the corresponding LC_LEAVE) is dependent on only one thing; the
> > value of line_table->highest_location when the header was inserted
>
> The source_location handed out by a given line map is dependant on
> three things:
>
> 1/ The line_map::start_location.  For a given map, this one equals
> line_table->highest_location + 1, because at the time the line map is
> created, its line_map::start_location must be greater than the highest
> source_location handed out by any line map previously created.  At any
> point in time, line_table->highest_location equals the
> line_map::start_location of the lastly created line_map.  This is part
> of the monotonically increasing property of source_location we were
> talking about earlier.
>

Actually from my understanding, highest_location is not equal to the
last start_location, but to the last source_location returned by
either linemap_line_start or linemap_positition_for_column (which is
>= to the start_location of the current line_map).

> 2/ The line number of the location
>
> 3/ The column number of the location
>

Right, that's what I mean, I would not actually stream
highest_location. What I meant by they "all depend on only
highest_location" is that IF highest_location is the same in file B.c
and in a different compiled file C.c when they happen to include A.h,
then all of the source_locations for the tokens in A.h (in both
compilation) would be identical (i.e. token1's loc in B == token1's
loc in C, etc.) (as 2/3 always is the same since we're talking about
the same file A.h in both compilation, hence if 1 also holds, we get
the same result).

> > (i.e. if in two different contexts we happen to have the same
> > line_table->highest_location when inserting header A.h, all of the
> > tokens for A.h in each context will have the same source_location).
>
> Each token coming from a given A.h will have a different
> source_location, as they will presumably have a different {line,column}
> pair.
>

What I meant is that all of the source locations handed out in the
first compilation will be the same as all of the source locations
handed out in the second compilation, pairwise (not that ALL token's
source locations themselves will be the same within a single
compilation of course!).

> If in two different contexts we happen to have the same
> line_table->highest_location, the line_map::start_location of the two
> different line maps of the two different A.hs will be equal, though.
>
> > If the previous statement is true, then we calculate an offset
> > between the line_table->highest_location as it was in the serialized
> > version of the header and as it is in the C file in which we are
> > about to de-serialize the line table.
> >
> > We then need to update some things based on that offset:
>
> [...]
>

Hence, given that they only depend on start_location, I just have to
calculate an offset between the serialized start_location and the
start_location as it would be (highest_location + 1) in the C file
including the header, and offset all of the source_locations on each
token coming from the pph (without even needing to recalculate them!).

> It seems to me that you could just set the line_map::start_location of
> the de-serialized map for the current portion of A.h to the current
> line_table->highest_location of the main CU your are currently parsing
> (i.e, just forget about the serialized line_table->highest_location
> into account.  Actually I would think that it's unnecessary to
> serialize the line_table->highest_location) + 1.
>
> Then you should also set the de-serialized line_map::to_line to the
> current line in your context.  Then you should add that line map to
> the current line_table and set line_table->highest_location to the
> line_map::start_location you have just computed.
>
> Then, assign the source_location of each token that belongs to that
> de-serialized map to a source_location that is handed out by your new
> linemap_position_for_line_and_column, with de-serialized map passed in
> argument.  This is where you would progress "backward" in the token
> stream, for that given map.
>
> Then somehow, when you need to suck in a new pph (so this time you are
> going downward in the stream again), just start this dance of
> de-serializing the map for that new pph, updating its properties,
> adding it to line_table, and setting the source_location of the tokens
> coming for that pph again.
>

Doing it this way (with the offset) I would read in all the tokens and
linemap entries inherited from that header and it's underlying include
tree, thus no need to be tricky about inserting line tables for the
header's included file, as they are part of the header's serialized
line_table by recursion (a pph'ed header can include other pph'ed
header), nor to recalculate source_locations (it would be hard anyway
to replay the line_table getters to recalculate source_locations as in
the serialized parsed state we don't have the notion of which line the
#includes were on, nor which tokens came before or after... its
potentially possible to know that from the serialized line_table, but
it would be tricky to replay the tokens before, in, and after the
included headers in the correct way)

Thanks,
Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-27 14:17                   ` Gabriel Charette
@ 2011-07-27 14:26                     ` Dodji Seketeli
  2011-07-27 14:51                       ` Gabriel Charette
  0 siblings, 1 reply; 15+ messages in thread
From: Dodji Seketeli @ 2011-07-27 14:26 UTC (permalink / raw)
  To: Gabriel Charette
  Cc: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Cary Coutant

Gabriel Charette <gchare@google.com> a écrit:


> Actually from my understanding, highest_location is not equal to the
> last start_location, but to the last source_location returned by
> either linemap_line_start or linemap_positition_for_column (which is
> >>= to the start_location of the current line_map).

Right, it equals the highest location yielded by any map in the system.
Sorry.

>
>> 2/ The line number of the location
>>
>> 3/ The column number of the location
>>
>
> Right, that's what I mean, I would not actually stream
> highest_location. What I meant by they "all depend on only
> highest_location" is that IF highest_location is the same in file B.c
> and in a different compiled file C.c when they happen to include A.h,
> then all of the source_locations for the tokens in A.h (in both
> compilation) would be identical (i.e. token1's loc in B == token1's
> loc in C, etc.) (as 2/3 always is the same since we're talking about
> the same file A.h in both compilation, hence if 1 also holds, we get
> the same result).
>

Oh, OK.

>> > (i.e. if in two different contexts we happen to have the same
>> > line_table->highest_location when inserting header A.h, all of the
>> > tokens for A.h in each context will have the same source_location).
>>
>> Each token coming from a given A.h will have a different
>> source_location, as they will presumably have a different {line,column}
>> pair.
>>
>
> What I meant is that all of the source locations handed out in the
> first compilation will be the same as all of the source locations
> handed out in the second compilation, pairwise (not that ALL token's
> source locations themselves will be the same within a single
> compilation of course!).
>

OK.

>> If in two different contexts we happen to have the same
>> line_table->highest_location, the line_map::start_location of the two
>> different line maps of the two different A.hs will be equal, though.
>>
>> > If the previous statement is true, then we calculate an offset
>> > between the line_table->highest_location as it was in the serialized
>> > version of the header and as it is in the C file in which we are
>> > about to de-serialize the line table.
>> >
>> > We then need to update some things based on that offset:
>>
>> [...]
>>
>
> Hence, given that they only depend on start_location, I just have to
> calculate an offset between the serialized start_location and the
> start_location as it would be (highest_location + 1) in the C file
> including the header, and offset all of the source_locations on each
> token coming from the pph (without even needing to recalculate them!).
>

That could work.  But then you'd need to do something for a map encoding
the locations of tokens coming from the pph to appear in line_table,
right?  Otherwise, at lookup time, (when you want to find the map that
matches the source_location of a token coming for that pph), you'll be
in trouble.  I am saying this b/c you are not calling linemap_line_start
anymore.  And that function was the one that was including the said map
to line_table.  And the map still must be inserted into line_table
somehow.

>> It seems to me that you could just set the line_map::start_location of
>> the de-serialized map for the current portion of A.h to the current
>> line_table->highest_location of the main CU your are currently parsing
>> (i.e, just forget about the serialized line_table->highest_location
>> into account.  Actually I would think that it's unnecessary to
>> serialize the line_table->highest_location) + 1.
>>
>> Then you should also set the de-serialized line_map::to_line to the
>> current line in your context.  Then you should add that line map to
>> the current line_table and set line_table->highest_location to the
>> line_map::start_location you have just computed.
>>
>> Then, assign the source_location of each token that belongs to that
>> de-serialized map to a source_location that is handed out by your new
>> linemap_position_for_line_and_column, with de-serialized map passed in
>> argument.  This is where you would progress "backward" in the token
>> stream, for that given map.
>>
>> Then somehow, when you need to suck in a new pph (so this time you are
>> going downward in the stream again), just start this dance of
>> de-serializing the map for that new pph, updating its properties,
>> adding it to line_table, and setting the source_location of the tokens
>> coming for that pph again.
>>
>
> Doing it this way (with the offset) I would read in all the tokens and
> linemap entries inherited from that header and it's underlying include
> tree, thus no need to be tricky about inserting line tables for the
> header's included file, as they are part of the header's serialized
> line_table by recursion (a pph'ed header can include other pph'ed
> header),

This is what I am not sure to understand.  There is only one line table
per CU.  The headers included by the CU generate instances of struct
line map that are inserted into the line table of the CU.  So I don't
understand what you mean by "header's serialized line_table", as I don't
think there is such a thing as a header's line_table.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-27 14:26                     ` Dodji Seketeli
@ 2011-07-27 14:51                       ` Gabriel Charette
  2011-07-27 15:45                         ` Dodji Seketeli
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Charette @ 2011-07-27 14:51 UTC (permalink / raw)
  To: Dodji Seketeli
  Cc: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Cary Coutant

>>
>> Hence, given that they only depend on start_location, I just have to
>> calculate an offset between the serialized start_location and the
>> start_location as it would be (highest_location + 1) in the C file
>> including the header, and offset all of the source_locations on each
>> token coming from the pph (without even needing to recalculate them!).
>>
>
> That could work.  But then you'd need to do something for a map encoding
> the locations of tokens coming from the pph to appear in line_table,
> right?  Otherwise, at lookup time, (when you want to find the map that
> matches the source_location of a token coming for that pph), you'll be
> in trouble.  I am saying this b/c you are not calling linemap_line_start
> anymore.  And that function was the one that was including the said map
> to line_table.  And the map still must be inserted into line_table
> somehow.
>

The lookup, from what I understand, only depends on the line_map
entries for the header to be present, and the same as they would be
*had* the functions been called (not that each token actually called
the linemap getters to get its location), and also of course depends
on that the tokens have the correct source_locations *as if* obtained
from the linemap getters.

>> Doing it this way (with the offset) I would read in all the tokens and
>> linemap entries inherited from that header and it's underlying include
>> tree, thus no need to be tricky about inserting line tables for the
>> header's included file, as they are part of the header's serialized
>> line_table by recursion (a pph'ed header can include other pph'ed
>> header),
>
> This is what I am not sure to understand.  There is only one line table
> per CU.  The headers included by the CU generate instances of struct
> line map that are inserted into the line table of the CU.  So I don't
> understand what you mean by "header's serialized line_table", as I don't
> think there is such a thing as a header's line_table.

What I mean by "serialized header line_table" is the serialized
version of the line_table as it was when were done parsing the header
being pph'ed.

I would then de-serialize that and insert it's line_map entries in the
C file's line_table, doing the necessary offset adjustements in the
process (and updating all other line_table variables like
highest_location that would have changed if we had actually called the
linemap functions)


Best,
Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linemap and pph
  2011-07-27 14:51                       ` Gabriel Charette
@ 2011-07-27 15:45                         ` Dodji Seketeli
  0 siblings, 0 replies; 15+ messages in thread
From: Dodji Seketeli @ 2011-07-27 15:45 UTC (permalink / raw)
  To: Gabriel Charette
  Cc: Tom Tromey, Diego Novillo, Lawrence Crowl, gcc, Collin Winter,
	Cary Coutant

Gabriel Charette <gchare@google.com> a écrit:

>>>
>>> Hence, given that they only depend on start_location, I just have to
>>> calculate an offset between the serialized start_location and the
>>> start_location as it would be (highest_location + 1) in the C file
>>> including the header, and offset all of the source_locations on each
>>> token coming from the pph (without even needing to recalculate them!).
>>>
>>
>> That could work.  But then you'd need to do something for a map encoding
>> the locations of tokens coming from the pph to appear in line_table,
>> right?  Otherwise, at lookup time, (when you want to find the map that
>> matches the source_location of a token coming for that pph), you'll be
>> in trouble.  I am saying this b/c you are not calling linemap_line_start
>> anymore.  And that function was the one that was including the said map
>> to line_table.  And the map still must be inserted into line_table
>> somehow.
>>
>
> The lookup, from what I understand, only depends on the line_map
> entries for the header to be present,

Exactly.  The line_map need to be present inside line_table->maps, so
you need to insert it in there somehow.  It wasn't clear to me from my
reading of your previous messages.

[...]

>>> Doing it this way (with the offset) I would read in all the tokens and
>>> linemap entries inherited from that header and it's underlying include
>>> tree, thus no need to be tricky about inserting line tables for the
>>> header's included file, as they are part of the header's serialized
>>> line_table by recursion (a pph'ed header can include other pph'ed
>>> header),
>>
>> This is what I am not sure to understand.  There is only one line table
>> per CU.  The headers included by the CU generate instances of struct
>> line map that are inserted into the line table of the CU.  So I don't
>> understand what you mean by "header's serialized line_table", as I don't
>> think there is such a thing as a header's line_table.
>
> What I mean by "serialized header line_table" is the serialized
> version of the line_table as it was when were done parsing the header
> being pph'ed.

OK.  Please note that you don't necessarily need to serialize the entire
line_map at the parsing point of the pph.  I believe that just
serializing the line maps of the pph'ed header should suffice.  How to
determine them is another question. :-)

> I would then de-serialize that and insert it's line_map entries in the
> C file's line_table, doing the necessary offset adjustements in the
> process (and updating all other line_table variables like
> highest_location that would have changed if we had actually called the
> linemap functions)

OK.  We are on the same page now.  Thanks.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-07-27 14:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-22 20:40 Linemap and pph Gabriel Charette
2011-07-22 21:15 ` Tom Tromey
2011-07-22 22:12   ` Gabriel Charette
2011-07-23  5:46     ` Tom Tromey
2011-07-23 17:54     ` Dodji Seketeli
2011-07-25 19:34       ` Gabriel Charette
2011-07-25 22:55         ` Gabriel Charette
2011-07-26 14:46           ` Dodji Seketeli
2011-07-26 18:49             ` Gabriel Charette
2011-07-26 22:54               ` Gabriel Charette
2011-07-27 10:11                 ` Dodji Seketeli
2011-07-27 14:17                   ` Gabriel Charette
2011-07-27 14:26                     ` Dodji Seketeli
2011-07-27 14:51                       ` Gabriel Charette
2011-07-27 15:45                         ` Dodji Seketeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).