public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Erick Ochoa <eochoa@gcc.gnu.org>
Cc: Jan Hubicka <hubicka@ucw.cz>, GCC Development <gcc@gcc.gnu.org>
Subject: Re: tree decl stored during LGEN does not map to a symtab_node during WPA
Date: Thu, 15 Jul 2021 09:23:49 +0200	[thread overview]
Message-ID: <CAFiYyc1koJPbCSaokgKaro=WVB0fdejMC-43cK73cgSVZ6NOow@mail.gmail.com> (raw)
In-Reply-To: <CAJ_nqziRehp6oODtvGhLU72kR06eZMvKn0ZwMHhddMX8MzC54w@mail.gmail.com>

On Wed, Jul 14, 2021 at 3:56 PM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> > I guess the way to encode SSA trees would be to use sth like a
> > <function-encoder>, SSA-version tuple much like PTA internally
> > uses the varinfo array index as identifier for the variables in the
> > constraints.  For local decls (as opposed to SSA names) it's a bit
> > more difficult - you'd have to devise your own encoding here.
> >
> > What you can rely on I think is that for local variables UID relations
> > are preserved, so you could sort cfun->local_decls and use the
> > position in this array as encoding (in fact I see local_decls is
> > streamed literally, so you don't even need to sort that for the
> > start - but we could likely do that without harm to make searching
> > for a UID O(log n)).
>
> At the moment I am generating a unique id for each constraint variable
> generated. I have assigned a unique LGEN number to each variable and
> during WPA I have merged duplicates. The duplication of equivalent
> gimple variables in distinct LGEN partitions happens for global
> variables (as we have discussed before). Do you know if there are
> other cases of duplication that might happen? For example, could a
> single function be analyzed in different LGEN partitions?

A single source representation of inline functions and template
instantiations can be analyzed in different LGEN partitions, yes.
Those are merged as well.

> I followed your example here and I am "encoding" the constraint
> variables that relate to SSA variables by looking at the cgraph_node
> and the SSA-version. The tree is not stored but at WPA we know the
> SSA-version and the cgraph_node and I think this is enough to relate
> back to the SSA variable in the gimple source.

Yes, I think so.

> You mention that I need to devise my own "encoder", but I am not sure
> if we are conflating two notions:
>
> 1. encoding tree variables to constraint variables (i.e., a mapping of
> some tuple (cgraph_node x symtab_node x ssa-version) to an integer
> that represents the constraint variable)
> 2. encoding as an implementation of a data structure used during LTO
> to stream in and stream out trees/symbols to and from partitions.
> (e.g., lto_symtab_encoder_t).

I meant 1) and streaming using the LTO cgraph encoder for the cgraph
part and simply using the SSA version for the second part.

> So to be clear, when you say I need to devise my own "encoder" you are
> referring to definition number 1, not definition number 2, right? And
> at LTRANS using the relation (cgraph_node x symtab_node x ssa-version)
> x constraint-variable-id one should be able to map to the interesting
> pointer/pointee from the constraint variable id.
>
> I am thinking a little bit ahead, but I will need a way to relate
> memory allocation sites (e.g., malloc's) to some constraint variable
> and perhaps generalize this to expressions (I would like to say that a
> variable is pointing to a STRING_CST for example). Do you have an idea
> on how to go and encode using the first definition of encoding tree
> expressions?

The easiest is probably to hook it up to things you already encode,
like for malloc it would be the SSA def of the resulting pointer.

We do preserve the order of stmts in basic-blocks and basic-block
indices, we also use stmt UIDs (but re-number them at streaming
time, so you can't directly use them), so using a "stmt number" would
be possible as well.  The LTO streaming uses this to map back
the callgraph edge -> gimple stmt reference
(lto-streamer-in.c:fixup_call_stmt_edges), the code also calls
execute_all_ipa_stmt_fixups which presumably is more "generic"
code - I'm not familiar with it but you can dig whether it would suit
your needs.  It's not that the generic LTO / IPA mechanisms cannot
be extended.

I have seen some papers that use instruction-id's
> (potentially an integer that corresponds as a unique identifier for
> the instruction) but I am unsure if there is something similar to this
> in GCC. If what you meant is the second definition, can someone
> elaborate on the precise steps for making my own encoder? While I am
> somewhat familiar with using the LTO framework I am unfamiliar with
> potentially extending it in these sorts of ways.
>
> Thanks! Any help is appreciated.

  reply	other threads:[~2021-07-15  7:24 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  9:27 Erick Ochoa
2021-07-09  7:51 ` Erick Ochoa
2021-07-09  9:49   ` Richard Biener
2021-07-12 10:55     ` Erick Ochoa
2021-07-13  9:21       ` Erick Ochoa
2021-07-13  9:41         ` Richard Biener
2021-07-13 10:49           ` Erick Ochoa
2021-07-13 12:55             ` Richard Biener
2021-07-14 13:56               ` Erick Ochoa
2021-07-15  7:23                 ` Richard Biener [this message]
2021-07-21 16:55                   ` Erick Ochoa
2021-07-22 11:40                     ` Richard Biener
2021-07-22 12:04                       ` Erick Ochoa
2021-07-22 12:08                         ` Erick Ochoa
2021-07-22 12:23                         ` Richard Biener
2021-07-22 12:33                           ` Erick Ochoa
2021-07-22 12:48                             ` Richard Biener
2021-07-22 14:32                               ` Erick Ochoa
2021-07-28 10:35                                 ` Richard Biener
2021-07-13 11:56           ` Erick Ochoa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc1koJPbCSaokgKaro=WVB0fdejMC-43cK73cgSVZ6NOow@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=eochoa@gcc.gnu.org \
    --cc=gcc@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).