Re: tree decl stored during LGEN does not map to a symtab_node during WPA

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Erick Ochoa <eochoa@gcc.gnu.org>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jan Hubicka <hubicka@ucw.cz>, GCC Development <gcc@gcc.gnu.org>
Subject: Re: tree decl stored during LGEN does not map to a symtab_node during WPA
Date: Thu, 22 Jul 2021 16:32:51 +0200	[thread overview]
Message-ID: <CAJ_nqzhYAUgMUDu+4z0yOC_EkkYZsgGcgDTO8m5--MdJKZ4HRw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc2Y4PkkxfcL=aNQ0+Letzcht1r0h2ApvNi2r7ywx-N2DA@mail.gmail.com>

>
> But the addresses are at LGEN time?

The following is what runs at WPA time

unsigned long pid = streamer_read_uhwi (&ib);
unsigned long id = streamer_read_uhwi (&ib);
lto_symtab_encoder_t encoder = file_data->symtab_node_encoder;
cgraph_node *cnode =
dyn_cast<cgraph_node*>(lto_symtab_encoder_deref(encoder, id));
logger ("%s %ld %ld %p\n", cnode->name (), pid, id, cnode);

> Note the nodes are actually
> streamed to different instances by input_symtab, then decls are merged
> (lto_symtab_merge_decls), then I think the IPA
> pass summaries are read in (to different unmerged instances!), _then_
> the symtab merging process starts (lto_symtab_merge_symbols).
> I think the last step eventually calls the cgraph/varpool removal hook
> IPA passes registered.

Ah, so what you are saying is that during the read_summary stage they
will still be different, but during execute or
write_optimization_summary (), will they be finally merged? I think
maybe the terminology of LGEN/WPA/LTRANS should be expanded to be
lgen_gen, lgen_write, lwpa_read, lwpa_exec/lwpa_write, ltrans_read,
ltrans_exec?

So, just to be a bit more concrete, when initializing the
ipa_opt_pass_d instance one has to write functions which will be
called by a parent process. Normally I see the following comments with
them:

generate_summary
write_summary
read_summary
write_optimization_summary
read_optimization_summary

and finally there's the execute function that gets called.

I am doing the following:

generate_summary, /* generating pid */
write_summary /* generating id and writing pid and id */
read_summary /* reading and printing the info I told about */
write_optimization_summary /* nothing yet */
read_optimization_summary /* nothing yet */
execute /* nothing yet */

And I think these correspond to the following "LGEN/WPA/LTRANS" stages

1. lgen (multiple processes) generate_summary
2. lgen (multiple process) write_summary
3. wpa (single process) read_summary
4. wpa (single process) execute
5. wpa? (single process?) write_optimization_summary
6  ltrans (multiple processes) read_optimization_summary

And you are telling me that cgraph_node and varpool_nodes will have
the same address only after the beginning of the execute stage but not
before that?

Is the above correct?

<OPEN EDIT>

I did try printing cnode->name() during execute and it segfaulted, so
perhaps those function bodies where merged to something else? Note,
that some names were successfully printed out. I'm wondering, can I
use the function lto_symtab_encoder_deref during execute? I think this
is unlikely... because in the past I've tried to use
lto_symtab_encoder_encode during generate_summary and it caused
segfaults. I'll still give it a try.

Perhaps this is still a bit of progress? But now I'm wondering, if I
can't use lto_symtab_encoder_deref and the nodes were indeed merged,
do some of the varpool_node* I saved during read_summary are pointing
to random memory? How am I able to tell which ones survived?

<CLOSE EDIT>

>
> It might be that you need a replace hook to do what you want, I think
> that for example IPA CP encodes references to global vars aka &global
> as IPA_REF and those are transparently re-written.
>
> As said, I think it can be made work but the details, since this is the
> first IPA pass needing this, can be incomplete infra-structure-wise.
>
> Basically you have summaries like
>
>  'global = <fn::1>_3'
>
> where the <fn::1> should eventually be implicit and the constraints
> grouped into constraints generated from the respective function body
> and constraints generated by call stmts (not sure here), and constraints
> for global variable init.  But for the above constraint the point is to
> make the 'global' references from different LGEN units the same by
> some means (but not streaming and comparing the actual assembler name).
>

I'll need some more time to read through how ipa-cp encodes references
to global variables. Thanks for the suggestion!

I don't really follow the paragraph that details what you think my
summaries look like. I'm thinking that for

global = <fn::1>_3

global is a variable? and <fn::1>_3 states that it is an SSA variable
in function 1? I think that can be a possible notation. I prefer to
just use integers.

What do you mean by implicit?

But the idea is to essentially "compile" down all
variables/functions/locals/ssa/etc into integers. And have the program
represented as integers and relation of integers. For example:

int* a

extern void foo (int* c);

int main ()
{
  int b;
  a = &b;
  foo (a) // defined in a different function
}

Should have the following at LGEN time (more specifically write_summary)

variable -> long integer encoding
--------------------------------------------
abstract null -> $null_id
cgraph main -> 0
cgraph foo -> 1
varpool a -> 2
tree b -> 0 x 0  // corresponds to main position 0
real arg c -> 1 x 0 // corresponds to foo position 0

Technically we can also map the other way around, we just need to know
in which "table" the information is stored. (I.e., the symbol_table,
the local_decl table or the ssa_table...)

Then, we give them a unique id

id for lgen <-> variable <-> long integer encoding
--------------------------------------------------------------
$null_id <-> abstract null -> $null_id
0 <-> cgraph main -> 0
1 <-> cgraph foo -> 1
2 <-> varpool a -> 2
3 <-> tree b -> 0 x 0
4 <-> real arg c -> 1 x 0

Then we can generate the constraints

2 = &3 // a = &b
4 = 2   // parm c = a
call foo

The problem is that because this is happening in parallel the other
partition might generate the following constraints:

void foo(int *c)
{
  c = NULL;
}

abstract null -> $null_id
cgraph foo -> 0
formal arg c -> 0 x 0

Give the following global id:

$null_id <-> abstract null -> $null_id
0 <-> cgraph foo -> 0
1 <-> formal arg c -> 0 x 0

And have the following constraint:

1 = $null_id

and so if we were to merge the constraints from both partitions
naively, we would get that 0 and 1 refer to different parts of the
program.

I am trying to get the primary ID's to match at WPA time to be something like:

FROM PARTITON pid 1
0 <-> cgraph main -> 0
1 <-> cgraph foo -> 1
2 <-> varpool a -> 2
3 <-> tree b -> 0 x 0
4 <-> real arg c -> 1 x 0

2 = &3 // a = &b
4 = 2   // parm c = a
call 1

FROM PARTITION pid 2
$null_id <-> abstract null -> $null_id
0 <-> cgraph foo -> 0
1 <-> formal arg c -> 0 x 0

1 = $null_id

MERGED with a map back to their old PID
wpa id, pid x lgen id, var,
0 <-> 1 x 0 <-> cgraph main -> 0
1 <-> 1 x 1 <-> cgraph foo -> 1
1 <-> 2 x 0 <-> cgraph foo -> 0
2 <-> 1 x 2 <-> varpool a -> 2
3 <-> 1 x 3 <-> tree b -> 0 x 0
4 <-> 1 x 4 <-> real arg c -> 1 x 0
5 <-> 2 x 1 <-> formal arg c -> 1 x 0

2 = &3 // a = &b
4 = 2   // real arg c = a
call 1  //  call foo
5 = $null_id  // formal arg c = NULL

Finally, with this information we can run points-to analysis using
integers standing in for memory locations and can output a pointer
pointee relationship also as integers.

I don't want to go through the whole derivation (and I already omitted
details and probably have made some silly mistakes here) but in the
end, for example we should at least have:

Pointer, pointee
---------------------
2, 3  // a may-points-to b
4, 3  // real arg c may-points-to b
2, $null_id // a may-points-to NULL
5, $null_id // formal arg c may-points-to NULL
5, 3 // formal arg c may-points-to b

And we can use these numbers to map back to the gimple source.

This might be inefficient and there's room for removing some
redundancy, but that's kinda what I'm thinking about.

>
> One node is dropped and all references are adjusted.  And somehow
> IPA passes are notified about this _after_they have read their
> summaries.
>
> Richard.

next prev parent reply	other threads:[~2021-07-22 14:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  9:27 Erick Ochoa
2021-07-09  7:51 ` Erick Ochoa
2021-07-09  9:49   ` Richard Biener
2021-07-12 10:55     ` Erick Ochoa
2021-07-13  9:21       ` Erick Ochoa
2021-07-13  9:41         ` Richard Biener
2021-07-13 10:49           ` Erick Ochoa
2021-07-13 12:55             ` Richard Biener
2021-07-14 13:56               ` Erick Ochoa
2021-07-15  7:23                 ` Richard Biener
2021-07-21 16:55                   ` Erick Ochoa
2021-07-22 11:40                     ` Richard Biener
2021-07-22 12:04                       ` Erick Ochoa
2021-07-22 12:08                         ` Erick Ochoa
2021-07-22 12:23                         ` Richard Biener
2021-07-22 12:33                           ` Erick Ochoa
2021-07-22 12:48                             ` Richard Biener
2021-07-22 14:32                               ` Erick Ochoa [this message]
2021-07-28 10:35                                 ` Richard Biener
2021-07-13 11:56           ` Erick Ochoa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ_nqzhYAUgMUDu+4z0yOC_EkkYZsgGcgDTO8m5--MdJKZ4HRw@mail.gmail.com \
    --to=eochoa@gcc.gnu.org \
    --cc=gcc@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).