public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Manolis Tsamis <manolis.tsamis@vrull.eu>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Christoph Muellner <christoph.muellner@vrull.eu>,
	gcc-patches@gcc.gnu.org,  Martin Jambor <mjambor@suse.cz>,
	Jan Hubicka <jh@suse.cz>,
	Philipp Tomsich <philipp.tomsich@vrull.eu>
Subject: Re: [RFC PATCH] ipa-cp: Speculatively call specialized functions
Date: Mon, 14 Nov 2022 12:35:41 +0200	[thread overview]
Message-ID: <CAM3yNXr5yxjy7PKea1_qe2oeL4zAjWMYAveTzzb=_gh8GOyrug@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc2eVud6z9v8Apv5uxx2hJO8BpV=cp6ZqDWcUhKqejqatw@mail.gmail.com>

On Mon, Nov 14, 2022 at 9:37 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sun, Nov 13, 2022 at 4:38 PM Christoph Muellner
> <christoph.muellner@vrull.eu> wrote:
> >
> > From: mtsamis <manolis.tsamis@vrull.eu>
> >
> > The IPA CP pass offers a wide range of optimizations, where most of them
> > lead to specialized functions that are called from a call site.
> > This can lead to multiple specialized function clones, if more than
> > one call-site allows such an optimization.
> > If not all call-sites can be optimized, the program might end
> > up with call-sites to the original function.
> >
> > This pass assumes that non-optimized call-sites (i.e. call-sites
> > that don't call specialized functions) are likely to be called
> > with arguments that would allow calling specialized clones.
> > Since we cannot guarantee this (for obvious reasons), we can't
> > replace the existing calls. However, we can introduce dynamic
> > guards that test the arguments for the collected constants
> > and calls the specialized function if there is a match.
> >
> > To demonstrate the effect, let's consider the following program part:
> >
> >   func_1()
> >     myfunc(1)
> >   func_2()
> >     myfunc(2)
> >   func_i(i)
> >     myfunc(i)
> >
> > In this case the transformation would do the following:
> >
> >   func_1()
> >     myfunc.constprop.1() // myfunc() with arg0 == 1
> >   func_2()
> >     myfunc.constprop.2() // myfunc() with arg0 == 2
> >   func_i(i)
> >     if (i == 1)
> >       myfunc.constprop.1() // myfunc() with arg0 == 1
> >     else if (i == 2)
> >       myfunc.constprop.2() // myfunc() with arg0 == 2
> >     else
> >       myfunc(i)
> >
> > The pass consists of two main parts:
> > * collecting all specialized functions and the argument/constant pair(s)
> > * insertion of the guards during materialization
> >
> > The patch integrates well into ipa-cp and related IPA functionality.
> > Given the nature of IPA, the changes are touching many IPA-related
> > files as well as call-graph data structures.
> >
> > The impact of the dynamic guard is expected to be less than the speedup
> > gained by enabled optimizations (e.g. inlining or constant propagation).
>
> I don't see any limits on the number of callee candidates or the complexity
> of the guard.  Is there any reason to not factor the guards into a wrapper
> function to avoid bloating cold call sites and to allow inlining to decide
> where the expansion is useful?
>

There is indeed no limit on the numbers of guards or guard complexity
currently. Would it be a good choice here to introduce two parameters
for the maximum number of guards and conditions per guard and assign
some sane default value?

About the wrapper functions, that is an interesting question that I haven't
explored as much. One reason is that this transformation aims to work in
a similar way as the speculative edges (which already existed). Since the
expected number of guards is low (1-2 in most cases), I considered the
two optimizations quite similar and wanted to share as much of the design
and functionality as I could. I also tried to make the overhead of the
non-specialized original function call as low as possible.

But I can also see how there is a difference in the speculative and
specialized edges that make creating a wrapper meaningful for
this case: The maximum speedup of a direct vs indirect function
call can be much smaller than that of a specialized call instead
of the generic one.

> Skimming the patch I noticed an #if 0 commented assert with a comment
> that this was to be temporary?
>

Thanks for pointing that out, this is unintentional. I will fix it.

Best,
Manolis

> Thanks,
> Richard.
>
> > PR ipa/107667
> > gcc/Changelog:
> >
> >         * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for guarded specialized edges.
> >         (cgraph_edge::set_call_stmt): Likewise.
> >         (symbol_table::create_edge): Likewise.
> >         (cgraph_edge::remove): Likewise.
> >         (cgraph_edge::make_speculative): Likewise.
> >         (cgraph_edge::make_specialized): Likewise.
> >         (cgraph_edge::remove_specializations): Likewise.
> >         (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
> >         (cgraph_edge::dump_edge_flags): Likewise.
> >         (verify_speculative_call): Likewise.
> >         (verify_specialized_call): Likewise.
> >         (cgraph_node::verify_node): Likewise.
> >         * cgraph.h (class GTY): Add new class that contains info of specialized edges.
> >         * cgraphclones.cc (cgraph_edge::clone): Add support for guarded specialized edges.
> >         (cgraph_node::set_call_stmt_including_clones): Likewise.
> >         * ipa-cp.cc (want_remove_some_param_p): Likewise.
> >         (create_specialized_node): Likewise.
> >         (add_specialized_edges): Likewise.
> >         (ipcp_driver): Likewise.
> >         * ipa-fnsummary.cc (redirect_to_unreachable): Likewise.
> >         (ipa_fn_summary_t::duplicate): Likewise.
> >         (analyze_function_body): Likewise.
> >         (estimate_edge_size_and_time): Likewise.
> >         (remap_edge_summaries): Likewise.
> >         * ipa-inline-transform.cc (inline_transform): Likewise.
> >         * ipa-inline.cc (edge_badness): Likewise.
> >          lto-cgraph.cc (lto_output_edge): Likewise.
> >         (input_edge): Likewise.
> >         * tree-inline.cc (copy_bb): Likewise.
> >         * value-prof.cc (gimple_sc): Add function to create guarded specializations.
> >         * value-prof.h (gimple_sc): Likewise.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >  gcc/cgraph.cc               | 316 +++++++++++++++++++++++++++++++++++-
> >  gcc/cgraph.h                | 102 ++++++++++++
> >  gcc/cgraphclones.cc         |  30 ++++
> >  gcc/common.opt              |   4 +
> >  gcc/ipa-cp.cc               | 105 ++++++++++++
> >  gcc/ipa-fnsummary.cc        |  42 +++++
> >  gcc/ipa-inline-transform.cc |  11 ++
> >  gcc/ipa-inline.cc           |   5 +
> >  gcc/lto-cgraph.cc           |  46 ++++++
> >  gcc/tree-inline.cc          |  54 ++++++
> >  gcc/value-prof.cc           | 214 ++++++++++++++++++++++++
> >  gcc/value-prof.h            |   1 +
> >  12 files changed, 923 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> > index 5851b2ffc6c..ee819c87261 100644
> > --- a/gcc/cgraph.cc
> > +++ b/gcc/cgraph.cc
> > @@ -718,18 +718,24 @@ cgraph_add_edge_to_call_site_hash (cgraph_edge *e)
> >       one indirect); always hash the direct one.  */
> >    if (e->speculative && e->indirect_unknown_callee)
> >      return;
> > +  /* There are potentially multiple specialization edges for every
> > +     specialized call; always hash the base egde.  */
> > +  if (e->guarded_specialization_edge_p ())
> > +    return;
> >    cgraph_edge **slot = e->caller->call_site_hash->find_slot_with_hash
> >        (e->call_stmt, cgraph_edge_hasher::hash (e->call_stmt), INSERT);
> >    if (*slot)
> >      {
> > -      gcc_assert (((cgraph_edge *)*slot)->speculative);
> > +      gcc_assert (((cgraph_edge *)*slot)->speculative
> > +                 || ((cgraph_edge *)*slot)->specialized);
> >        if (e->callee && (!e->prev_callee
> >                         || !e->prev_callee->speculative
> > +                       || !e->prev_callee->specialized
> >                         || e->prev_callee->call_stmt != e->call_stmt))
> >         *slot = e;
> >        return;
> >      }
> > -  gcc_assert (!*slot || e->speculative);
> > +  gcc_assert (!*slot || e->speculative || e->specialized);
> >    *slot = e;
> >  }
> >
> > @@ -800,6 +806,23 @@ cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
> >        gcc_checking_assert (new_direct_callee);
> >      }
> >
> > +  /* Update specialized first and do not return yet in case we're dealing
> > +     with an edge that is both specialized and speculative.  */
> > +  if (update_speculative && e->specialized)
> > +    {
> > +      cgraph_edge *next, *base = e->specialized_call_base_edge ();
> > +      for (cgraph_edge *d = e->first_specialized_call_target (); d; d = next)
> > +       {
> > +         next = d->next_specialized_call_target ();
> > +         cgraph_edge *d2 = set_call_stmt (d, new_stmt, false);
> > +         gcc_assert (d2 == d);
> > +       }
> > +
> > +      /* Don't update base for speculative edges.  The code below will.  */
> > +      if (!e->speculative)
> > +       set_call_stmt (base, new_stmt, false);
> > +    }
> > +
> >    /* Speculative edges has three component, update all of them
> >       when asked to.  */
> >    if (update_speculative && e->speculative
> > @@ -835,12 +858,16 @@ cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
> >        return e_indirect ? indirect : direct;
> >      }
> >
> > +  if (update_speculative && e->specialized)
> > +    return e;
> > +
> >    if (new_direct_callee)
> >      e = make_direct (e, new_direct_callee);
> >
> >    /* Only direct speculative edges go to call_site_hash.  */
> >    if (e->caller->call_site_hash
> >        && (!e->speculative || !e->indirect_unknown_callee)
> > +      && (!e->specialized || e->spec_args == NULL)
> >        /* It is possible that edge was previously speculative.  In this case
> >          we have different value in call stmt hash which needs preserving.  */
> >        && e->caller->get_edge (e->call_stmt) == e)
> > @@ -854,11 +881,12 @@ cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
> >    /* Update call stite hash.  For speculative calls we only record the first
> >       direct edge.  */
> >    if (e->caller->call_site_hash
> > -      && (!e->speculative
> > +      && ((!e->speculative && !e->specialized)
> >           || (e->callee
> >               && (!e->prev_callee || !e->prev_callee->speculative
> >                   || e->prev_callee->call_stmt != e->call_stmt))
> > -         || (e->speculative && !e->callee)))
> > +         || (e->speculative && !e->callee)
> > +         || e->base_specialization_edge_p ()))
> >      cgraph_add_edge_to_call_site_hash (e);
> >    return e;
> >  }
> > @@ -883,7 +911,8 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
> >          construction of call stmt hashtable.  */
> >        cgraph_edge *e;
> >        gcc_checking_assert (!(e = caller->get_edge (call_stmt))
> > -                          || e->speculative);
> > +                          || e->speculative
> > +                          || e->specialized);
> >
> >        gcc_assert (is_gimple_call (call_stmt));
> >      }
> > @@ -909,6 +938,8 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
> >    edge->indirect_info = NULL;
> >    edge->indirect_inlining_edge = 0;
> >    edge->speculative = false;
> > +  edge->specialized = false;
> > +  edge->spec_args = NULL;
> >    edge->indirect_unknown_callee = indir_unknown_callee;
> >    if (call_stmt && caller->call_site_hash)
> >      cgraph_add_edge_to_call_site_hash (edge);
> > @@ -1066,6 +1097,11 @@ symbol_table::free_edge (cgraph_edge *e)
> >  void
> >  cgraph_edge::remove (cgraph_edge *edge)
> >  {
> > +  /* If we remove the base edge of a group of specialized
> > +     edges then we must also remove all of its specializations.  */
> > +  if (edge->base_specialization_edge_p ())
> > +    cgraph_edge::remove_specializations (edge);
> > +
> >    /* Call all edge removal hooks.  */
> >    symtab->call_edge_removal_hooks (edge);
> >
> > @@ -1109,6 +1145,8 @@ cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count,
> >    ipa_ref *ref = NULL;
> >    cgraph_edge *e2;
> >
> > +  gcc_checking_assert (!specialized);
> > +
> >    if (dump_file)
> >      fprintf (dump_file, "Indirect call -> speculative call %s => %s\n",
> >              n->dump_name (), n2->dump_name ());
> > @@ -1134,6 +1172,62 @@ cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count,
> >    return e2;
> >  }
> >
> > +/* Mark this edge as specialized and add a new edge representing that N2
> > +   is a specialized version of the CALLE of this edge, with the specialized
> > +   arguments found in SPEC_ARGS.  */
> > +cgraph_edge *
> > +cgraph_edge::make_specialized (cgraph_node *n2,
> > +                               vec<cgraph_specialization_info>* spec_args,
> > +                               profile_count spec_count)
> > +{
> > +  if (speculative)
> > +    {
> > +      /* Because both speculative and specialized edges use CALL_STMT and
> > +        LTO_STMT_UID to link edges together there is a limitation in
> > +        specializing speculative edges.  Only one group of specialized
> > +        edges can exist for a given group of speculative edges.  */
> > +      for (cgraph_edge *direct = first_speculative_call_target ();
> > +          direct; direct = direct->next_speculative_call_target ())
> > +       if (direct != this && direct->specialized)
> > +         return NULL;
> > +    }
> > +
> > +  cgraph_node *n = caller;
> > +  cgraph_edge *e2;
> > +
> > +  if (dump_file)
> > +    fprintf (dump_file, "Creating guarded specialized edge %s -> %s "
> > +                       "from%s callee %s\n",
> > +                       caller->dump_name (), n2->dump_name (),
> > +                       (speculative? " speculative" : ""),
> > +                       callee->dump_name ());
> > +  specialized = true;
> > +  e2 = n->create_edge (n2, call_stmt, spec_count);
> > +
> > +  /* We don't want to inline the specialized edges seperately.  If the base
> > +     specialized edge is inlined then we will drop the specializations.  */
> > +  e2->inline_failed = CIF_UNSPECIFIED;
> > +  if (TREE_NOTHROW (n2->decl))
> > +    e2->can_throw_external = false;
> > +  else
> > +    e2->can_throw_external = can_throw_external;
> > +
> > +  e2->specialized = true;
> > +
> > +  unsigned i;
> > +  cgraph_specialization_info* spec_info;
> > +  vec_alloc (e2->spec_args, spec_args->length ());
> > +
> > +  FOR_EACH_VEC_ELT (*spec_args, i, spec_info)
> > +    e2->spec_args->quick_push (*spec_info);
> > +
> > +  e2->lto_stmt_uid = lto_stmt_uid;
> > +  e2->in_polymorphic_cdtor = in_polymorphic_cdtor;
> > +  count -= e2->count;
> > +  symtab->call_edge_duplication_hooks (this, e2);
> > +  return e2;
> > +}
> > +
> >  /* Speculative call consists of an indirect edge and one or more
> >     direct edge+ref pairs.
> >
> > @@ -1364,6 +1458,39 @@ cgraph_edge::make_direct (cgraph_edge *edge, cgraph_node *callee)
> >    return edge;
> >  }
> >
> > +/* Given the base edge of a group of specialized edges remove all its
> > +   specialized edges.  Essentially this can be used to undo the descision
> > +   to specialize EDGE.  */
> > +
> > +void
> > +cgraph_edge::remove_specializations (cgraph_edge *edge)
> > +{
> > +  if (!edge->specialized)
> > +    return;
> > +
> > +  if (edge->base_specialization_edge_p ())
> > +    {
> > +      cgraph_edge *next;
> > +      for (cgraph_edge *e2 = edge->caller->callees; e2; e2 = next)
> > +       {
> > +         next = e2->next_callee;
> > +
> > +         if (e2->guarded_specialization_edge_p ()
> > +             && edge->call_stmt == e2->call_stmt
> > +             && edge->lto_stmt_uid == e2->lto_stmt_uid)
> > +           {
> > +             edge->count += e2->count;
> > +             if (e2->inline_failed)
> > +               remove (e2);
> > +             else
> > +               e2->callee->remove_symbol_and_inline_clones ();
> > +           }
> > +       }
> > +    }
> > +  else
> > +    gcc_checking_assert (false);
> > +}
> > +
> >  /* Redirect callee of the edge to N.  The function does not update underlying
> >     call expression.  */
> >
> > @@ -1411,6 +1538,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> >  {
> >    tree decl = gimple_call_fndecl (e->call_stmt);
> >    gcall *new_stmt;
> > +  bool remove_specializations_if_base = true;
> >
> >    if (e->speculative)
> >      {
> > @@ -1467,6 +1595,27 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> >           /* Indirect edges are not both in the call site hash.
> >              get it updated.  */
> >           update_call_stmt_hash_for_removing_direct_edge (e, indirect);
> > +
> > +         if (e->specialized)
> > +           {
> > +             gcc_checking_assert (e->base_specialization_edge_p ());
> > +
> > +             /* If we're materializing a speculative and base specialized edge
> > +                then we want to keep the specializations alive.  This amounts
> > +                to changing the call statements of the guarded
> > +                specializations.  */
> > +             remove_specializations_if_base = false;
> > +             cgraph_edge *next;
> > +
> > +             for (cgraph_edge *d = e->first_specialized_call_target ();
> > +                  d; d = next)
> > +               {
> > +                 next = d->next_specialized_call_target ();
> > +                 cgraph_edge *d2 = set_call_stmt (d, new_stmt, false);
> > +                 gcc_assert (d2 == d);
> > +               }
> > +           }
> > +
> >           cgraph_edge::set_call_stmt (e, new_stmt, false);
> >           e->count = gimple_bb (e->call_stmt)->count;
> >
> > @@ -1482,6 +1631,53 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> >         }
> >      }
> >
> > +  if (e->specialized)
> > +    {
> > +      if (e->spec_args != NULL)
> > +       {
> > +         /* Be sure we redirect all specialized targets before poking
> > +            about base edge.  */
> > +         cgraph_edge *base = e->specialized_call_base_edge ();
> > +         gcall *new_stmt;
> > +
> > +         /* Expand specialization into GIMPLE code.  */
> > +         if (dump_file)
> > +           fprintf (dump_file,
> > +                    "Expanding specialized call of %s -> %s\n",
> > +                    e->caller->dump_name (), e->callee->dump_name ());
> > +
> > +         push_cfun (DECL_STRUCT_FUNCTION (e->caller->decl));
> > +
> > +         profile_count all = base->count;
> > +         for (cgraph_edge *e2 = e->first_specialized_call_target ();
> > +              e2; e2 = e2->next_specialized_call_target ())
> > +           all = all + e2->count;
> > +
> > +         profile_probability prob = e->count.probability_in (all);
> > +         if (!prob.initialized_p ())
> > +           prob = profile_probability::even ();
> > +
> > +         new_stmt = gimple_sc (e, prob);
> > +         e->specialized = false;
> > +         e->spec_args = NULL;
> > +         if (!base->first_specialized_call_target ())
> > +           base->specialized = false;
> > +
> > +         cgraph_edge::set_call_stmt (e, new_stmt, false);
> > +         e->count = gimple_bb (e->call_stmt)->count;
> > +         /* Once we are done with expanding the sequence, update also base
> > +            call probability.  Until then the basic block accounts for the
> > +            sum of specialized edges and all non-expanded specializations.  */
> > +         if (!base->specialized)
> > +           base->count = gimple_bb (base->call_stmt)->count;
> > +
> > +         pop_cfun ();
> > +       }
> > +      else if (remove_specializations_if_base)
> > +       /* The specialized edges are in part connected by CALL_STMT so if
> > +          we change it for the base edge then remove all specializations.  */
> > +       cgraph_edge::remove_specializations (e);
> > +    }
> >
> >    if (e->indirect_unknown_callee
> >        || decl == e->callee->decl)
> > @@ -2069,6 +2265,10 @@ cgraph_edge::dump_edge_flags (FILE *f)
> >  {
> >    if (speculative)
> >      fprintf (f, "(speculative) ");
> > +  if (base_specialization_edge_p ())
> > +    fprintf (f, "(specialized base) ");
> > +  if (guarded_specialization_edge_p ())
> > +    fprintf (f, "(guarded specialization) ");
> >    if (!inline_failed)
> >      fprintf (f, "(inlined) ");
> >    if (call_stmt_cannot_inline_p)
> > @@ -3313,6 +3513,10 @@ verify_speculative_call (struct cgraph_node *node, gimple *stmt,
> >         direct = direct->next_callee)
> >      if (direct->call_stmt == stmt && direct->lto_stmt_uid == lto_stmt_uid)
> >        {
> > +       /* Guarded specialized edges share the same CALL_STMT and LTO_STMT_UID
> > +          but are handled separately.  */
> > +       if (direct->guarded_specialization_edge_p ())
> > +         continue;
> >         if (!first_call)
> >           first_call = direct;
> >         if (prev_call && direct != prev_call->next_callee)
> > @@ -3402,6 +3606,93 @@ verify_speculative_call (struct cgraph_node *node, gimple *stmt,
> >    return false;
> >  }
> >
> > +/* Verify consistency of specialized call in NODE corresponding to STMT
> > +   and LTO_STMT_UID.  If BASE is set, assume that it is the base
> > +   edge of call sequence.  Return true if error is found.
> > +
> > +   This function is called to every component of specialized call (base edge
> > +   and specialized edges).  To save duplicated work, do full testing only
> > +   when testing the base edge.  */
> > +static bool
> > +verify_specialized_call (struct cgraph_node *node, gimple *stmt,
> > +                        unsigned int lto_stmt_uid,
> > +                        struct cgraph_edge *base)
> > +{
> > +  if (base == NULL)
> > +    {
> > +      cgraph_edge *base;
> > +      for (base = node->callees; base;
> > +          base = base->next_callee)
> > +       if (base->call_stmt == stmt
> > +           && base->lto_stmt_uid == lto_stmt_uid
> > +           && base->spec_args == NULL)
> > +         break;
> > +      if (!base)
> > +       {
> > +         error ("missing base call in specialized call sequence");
> > +         return true;
> > +       }
> > +      if (!base->specialized)
> > +       {
> > +         error ("base call in specialized call sequence has no "
> > +                "specialized flag");
> > +         return true;
> > +       }
> > +      for (base = base->next_callee; base;
> > +          base = base->next_callee)
> > +       if (base->call_stmt == stmt
> > +           && base->lto_stmt_uid == lto_stmt_uid
> > +           && base->spec_args == NULL)
> > +         {
> > +           error ("cannot have more than one base edge in specialized "
> > +                  "call sequence");
> > +           return true;
> > +         }
> > +      return false;
> > +    }
> > +
> > +  cgraph_edge *prev_call = NULL;
> > +
> > +  cgraph_node *origin_base = base->callee;
> > +  while (origin_base->clone_of)
> > +    origin_base = origin_base->clone_of;
> > +
> > +  for (cgraph_edge *spec = node->callees; spec;
> > +       spec = spec->next_callee)
> > +    if (spec->call_stmt == stmt
> > +       && spec->lto_stmt_uid == lto_stmt_uid
> > +       && spec->spec_args != NULL)
> > +      {
> > +       cgraph_node *origin_spec = spec->callee;
> > +       while (origin_spec->clone_of)
> > +         origin_spec = origin_spec->clone_of;
> > +
> > +       if (spec->callee->clone_of && origin_base != origin_spec)
> > +         {
> > +           error ("specialized call to %s in specialized call sequence has "
> > +                  "different origin than base %s %s %s",
> > +                  origin_spec->dump_name (), origin_base->dump_name (),
> > +                  base->callee->dump_name (), spec->callee->dump_name ());
> > +           return true;
> > +         }
> > +
> > +       if (prev_call && spec != prev_call->next_callee)
> > +         {
> > +           error ("specialized edges are not adjacent");
> > +           return true;
> > +         }
> > +       prev_call = spec;
> > +       if (!spec->specialized)
> > +         {
> > +           error ("call to %s in specialized call sequence has no "
> > +                  "specialized flag", spec->callee->dump_name ());
> > +           return true;
> > +         }
> > +      }
> > +
> > +  return false;
> > +}
> > +
> >  /* Verify cgraph nodes of given cgraph node.  */
> >  DEBUG_FUNCTION void
> >  cgraph_node::verify_node (void)
> > @@ -3578,6 +3869,7 @@ cgraph_node::verify_node (void)
> >        if (gimple_has_body_p (e->caller->decl)
> >           && !e->caller->inlined_to
> >           && !e->speculative
> > +         && !e->specialized
> >           /* Optimized out calls are redirected to __builtin_unreachable.  */
> >           && (e->count.nonzero_p ()
> >               || ! e->callee->decl
> > @@ -3604,6 +3896,10 @@ cgraph_node::verify_node (void)
> >           && verify_speculative_call (e->caller, e->call_stmt, e->lto_stmt_uid,
> >                                       NULL))
> >         error_found = true;
> > +      if (e->specialized
> > +         && verify_specialized_call (e->caller, e->call_stmt, e->lto_stmt_uid,
> > +                                     e->spec_args == NULL? e : NULL))
> > +       error_found = true;
> >      }
> >    for (e = indirect_calls; e; e = e->next_callee)
> >      {
> > @@ -3612,6 +3908,7 @@ cgraph_node::verify_node (void)
> >        if (gimple_has_body_p (e->caller->decl)
> >           && !e->caller->inlined_to
> >           && !e->speculative
> > +         && !e->specialized
> >           && e->count.ipa_p ()
> >           && count
> >               == ENTRY_BLOCK_PTR_FOR_FN (DECL_STRUCT_FUNCTION (decl))->count
> > @@ -3630,6 +3927,11 @@ cgraph_node::verify_node (void)
> >           && verify_speculative_call (e->caller, e->call_stmt, e->lto_stmt_uid,
> >                                       e))
> >         error_found = true;
> > +      if (e->specialized || e->spec_args != NULL)
> > +       {
> > +         error ("Cannot have specialized edges in indirect call");
> > +         error_found = true;
> > +       }
> >      }
> >    for (i = 0; iterate_reference (i, ref); i++)
> >      {
> > @@ -3824,7 +4126,7 @@ cgraph_node::verify_node (void)
> >
> >        for (e = callees; e; e = e->next_callee)
> >         {
> > -         if (!e->aux && !e->speculative)
> > +         if (!e->aux && !e->speculative && !e->specialized)
> >             {
> >               error ("edge %s->%s has no corresponding call_stmt",
> >                      identifier_to_locale (e->caller->name ()),
> > @@ -3836,7 +4138,7 @@ cgraph_node::verify_node (void)
> >         }
> >        for (e = indirect_calls; e; e = e->next_callee)
> >         {
> > -         if (!e->aux && !e->speculative)
> > +         if (!e->aux && !e->speculative && !e->specialized)
> >             {
> >               error ("an indirect edge from %s has no corresponding call_stmt",
> >                      identifier_to_locale (e->caller->name ()));
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index 4be67e3cea9..4caed96e803 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -1683,6 +1683,19 @@ public:
> >    unsigned vptr_changed : 1;
> >  };
> >
> > +class GTY (()) cgraph_specialization_info
> > +{
> > +public:
> > +  unsigned arg_idx;
> > +  int is_unsigned; /* Whether the specialization constant is unsigned.  */
> > +  union
> > +    {
> > +      HOST_WIDE_INT GTY ((tag ("0"))) sval;
> > +      unsigned HOST_WIDE_INT GTY ((tag ("1"))) uval;
> > +    }
> > +  GTY ((desc ("%1.is_unsigned"))) cst;
> > +};
> > +
> >  class GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
> >            for_user)) cgraph_edge
> >  {
> > @@ -1723,6 +1736,12 @@ public:
> >     */
> >    cgraph_edge *make_speculative (cgraph_node *n2, profile_count direct_count,
> >                                  unsigned int speculative_id = 0);
> > +  /* Mark that this edge represents a specialized call to N2.
> > +     SPEC_ARGS represent the position and values of the CALL_STMT of this edge
> > +     that are specialized in N2.  */
> > +  cgraph_edge *make_specialized (cgraph_node *n2,
> > +                                vec<cgraph_specialization_info> *spec_args,
> > +                                profile_count spec_count);
> >
> >    /* Speculative call consists of an indirect edge and one or more
> >       direct edge+ref pairs.  Speculative will expand to the following sequence:
> > @@ -1802,6 +1821,66 @@ public:
> >      gcc_unreachable ();
> >    }
> >
> > +  /* Return the first edge that represents a specialization of the CALL_STMT
> > +     of this edge if one exists or NULL otherwise.  */
> > +  cgraph_edge *first_specialized_call_target ()
> > +  {
> > +    gcc_checking_assert (specialized && callee);
> > +    for (cgraph_edge *e2 = caller->callees;
> > +        e2; e2 = e2->next_callee)
> > +      if (e2->guarded_specialization_edge_p ()
> > +         && call_stmt == e2->call_stmt
> > +         && lto_stmt_uid == e2->lto_stmt_uid)
> > +       return e2;
> > +
> > +    return NULL;
> > +  }
> > +
> > +  /* Return the next edge that represents a specialization of the CALL_STMT
> > +     of this edge if one exists or NULL otherwise.  */
> > +  cgraph_edge *next_specialized_call_target ()
> > +  {
> > +    cgraph_edge *e = this;
> > +    gcc_checking_assert (specialized && callee);
> > +
> > +    if (e->next_callee
> > +       && e->next_callee->guarded_specialization_edge_p ()
> > +       && e->next_callee->call_stmt == e->call_stmt
> > +       && e->next_callee->lto_stmt_uid == e->lto_stmt_uid)
> > +      return e->next_callee;
> > +    return NULL;
> > +  }
> > +
> > +  /* When called on any edge in a specialized call return the (unique)
> > +     edge that points to the non specialized function.  */
> > +  cgraph_edge *specialized_call_base_edge ()
> > +  {
> > +    gcc_checking_assert (specialized && callee);
> > +    for (cgraph_edge *e2 = caller->callees;
> > +        e2; e2 = e2->next_callee)
> > +      if (e2->base_specialization_edge_p ()
> > +         && call_stmt == e2->call_stmt
> > +         && lto_stmt_uid == e2->lto_stmt_uid)
> > +       return e2;
> > +
> > +    return NULL;
> > +  }
> > +
> > +  /* Return true iff this edge is part of specialized sequence and is the
> > +     original edge for which other specialization edges potentially exist.  */
> > +  bool base_specialization_edge_p () const
> > +  {
> > +    return specialized && spec_args == NULL;
> > +  }
> > +
> > +  /* Return true iff this edge is part of specialized sequence and it
> > +     represents a potential specialization target that canbe used instead
> > +     of the base edge.  */
> > +  bool guarded_specialization_edge_p () const
> > +  {
> > +    return specialized && spec_args != NULL;
> > +  }
> > +
> >    /* Speculative call edge turned out to be direct call to CALLEE_DECL.  Remove
> >       the speculative call sequence and return edge representing the call, the
> >       original EDGE can be removed and deallocated.  It is up to caller to
> > @@ -1820,6 +1899,11 @@ public:
> >    static cgraph_edge *resolve_speculation (cgraph_edge *edge,
> >                                            tree callee_decl = NULL);
> >
> > +  /* Given the base edge of a group of specialized edges remove all its
> > +     specialized edges.  Essentially this can be used to undo the descision
> > +     to specialize EDGE.  */
> > +  static void remove_specializations (cgraph_edge *edge);
> > +
> >    /* If necessary, change the function declaration in the call statement
> >       associated with edge E so that it corresponds to the edge callee.
> >       Speculations can be resolved in the process and EDGE can be removed and
> > @@ -1895,6 +1979,9 @@ public:
> >    /* Additional information about an indirect call.  Not cleared when an edge
> >       becomes direct.  */
> >    cgraph_indirect_call_info *indirect_info;
> > +  /* If this edge has a specialized function as a callee then this vector
> > +     holds the indices and values of the specialized arguments.  */
> > +  vec<cgraph_specialization_info>* GTY ((skip (""))) spec_args;
> >    void *GTY ((skip (""))) aux;
> >    /* When equal to CIF_OK, inline this call.  Otherwise, points to the
> >       explanation why function was not inlined.  */
> > @@ -1933,6 +2020,21 @@ public:
> >       Optimizers may later redirect direct call to clone, so 1) and 3)
> >       do not need to necessarily agree with destination.  */
> >    unsigned int speculative : 1;
> > +  /* Edges with SPECIALIZED flag represents calls that have additional
> > +     specialized functions that can be used instead (as a result of ipa-cp).
> > +     The final code sequence will have form:
> > +
> > +     if (specialized_arg_0 == specialized_const_0
> > +        && ...
> > +        && specialized_arg_i == specialized_const_i)
> > +       call_target.constprop.N (non_specialized_arg_0, ...);
> > +     ...
> > +     more potential specializations
> > +     ...
> > +     else
> > +       call_target ();
> > +  */
> > +  unsigned int specialized : 1;
> >    /* Set to true when caller is a constructor or destructor of polymorphic
> >       type.  */
> >    unsigned in_polymorphic_cdtor : 1;
> > diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
> > index bb4b3c5407d..9e12fa19180 100644
> > --- a/gcc/cgraphclones.cc
> > +++ b/gcc/cgraphclones.cc
> > @@ -141,6 +141,20 @@ cgraph_edge::clone (cgraph_node *n, gcall *call_stmt, unsigned stmt_uid,
> >    new_edge->can_throw_external = can_throw_external;
> >    new_edge->call_stmt_cannot_inline_p = call_stmt_cannot_inline_p;
> >    new_edge->speculative = speculative;
> > +
> > +  new_edge->specialized = specialized;
> > +  new_edge->spec_args = NULL;
> > +
> > +  if (spec_args)
> > +    {
> > +      unsigned i;
> > +      cgraph_specialization_info* spec_info;
> > +      vec_alloc (new_edge->spec_args, spec_args->length ());
> > +
> > +      FOR_EACH_VEC_ELT (*spec_args, i, spec_info)
> > +       new_edge->spec_args->quick_push (*spec_info);
> > +    }
> > +
> >    new_edge->in_polymorphic_cdtor = in_polymorphic_cdtor;
> >
> >    /* Update IPA profile.  Local profiles need no updating in original.  */
> > @@ -791,6 +805,22 @@ cgraph_node::set_call_stmt_including_clones (gimple *old_stmt,
> >                   }
> >                 indirect->speculative = false;
> >               }
> > +
> > +           if (edge->specialized && !update_speculative)
> > +             {
> > +               cgraph_edge *base = edge->specialized_call_base_edge ();
> > +
> > +               for (cgraph_edge *next, *specialized
> > +                       = edge->first_specialized_call_target ();
> > +                    specialized;
> > +                    specialized = next)
> > +                 {
> > +                   next = specialized->next_specialized_call_target ();
> > +                   specialized->specialized = false;
> > +                 }
> > +               base->specialized = false;
> > +             }
> > +
> >           }
> >         if (node->clones)
> >           node = node->clones;
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index bce3e514f65..437f2f4295b 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -1933,6 +1933,10 @@ fipa-bit-cp
> >  Common Var(flag_ipa_bit_cp) Optimization
> >  Perform interprocedural bitwise constant propagation.
> >
> > +fipa-guarded-specialization
> > +Common Var(flag_ipa_guarded_specialization) Optimization
> > +Add speculative edges for existing specialized functions.
> > +
> >  fipa-modref
> >  Common Var(flag_ipa_modref) Optimization
> >  Perform interprocedural modref analysis.
> > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> > index d2bcd5e5e69..5a24f6987ac 100644
> > --- a/gcc/ipa-cp.cc
> > +++ b/gcc/ipa-cp.cc
> > @@ -119,6 +119,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "symbol-summary.h"
> >  #include "tree-vrp.h"
> >  #include "ipa-prop.h"
> > +#include "gimple-pretty-print.h"
> >  #include "tree-pretty-print.h"
> >  #include "tree-inline.h"
> >  #include "ipa-fnsummary.h"
> > @@ -5239,6 +5240,8 @@ want_remove_some_param_p (cgraph_node *node, vec<tree> known_csts)
> >    return false;
> >  }
> >
> > +static hash_map<cgraph_node*, vec<cgraph_node*>> *available_specializations;
> > +
> >  /* Create a specialized version of NODE with known constants in KNOWN_CSTS,
> >     known contexts in KNOWN_CONTEXTS and known aggregate values in AGGVALS and
> >     redirect all edges in CALLERS to it.  */
> > @@ -5409,6 +5412,13 @@ create_specialized_node (struct cgraph_node *node,
> >    new_info->known_csts = known_csts;
> >    new_info->known_contexts = known_contexts;
> >
> > +  if (!info->ipcp_orig_node)
> > +    {
> > +      vec<cgraph_node*> &spec_nodes
> > +       = available_specializations->get_or_insert (node);
> > +      spec_nodes.safe_push (new_node);
> > +    }
> > +
> >    ipcp_discover_new_direct_edges (new_node, known_csts, known_contexts,
> >                                   aggvals);
> >
> > @@ -6538,6 +6548,96 @@ ipcp_store_vr_results (void)
> >      }
> >  }
> >
> > +/* Add new edges to the call graph to represent the available specializations
> > +   of each specialized function.  */
> > +static void
> > +add_specialized_edges (void)
> > +{
> > +  cgraph_edge *e;
> > +  cgraph_node *n, *spec_n;
> > +  tree spec_v;
> > +  unsigned i, j;
> > +
> > +  FOR_EACH_DEFINED_FUNCTION (n)
> > +    {
> > +      if (dump_file && n->callees)
> > +       fprintf (dump_file,
> > +                "Procesing function %s for specialization of edges.\n",
> > +                n->dump_name ());
> > +
> > +      if (n->ipcp_clone)
> > +       continue;
> > +
> > +      bool update = false;
> > +      for (e = n->callees; e; e = e->next_callee)
> > +       {
> > +         if (!e->callee || e->recursive_p ())
> > +           continue;
> > +
> > +         vec<cgraph_node*> *specialization_nodes
> > +           = available_specializations->get (e->callee);
> > +
> > +         if (!specialization_nodes)
> > +           continue;
> > +
> > +         FOR_EACH_VEC_ELT (*specialization_nodes, i, spec_n)
> > +           {
> > +             if (dump_file)
> > +               fprintf (dump_file,
> > +                        "Edge has available specialization %s.\n",
> > +                        spec_n->dump_name ());
> > +
> > +             ipa_node_params *spec_params = ipa_node_params_sum->get (spec_n);
> > +             vec<cgraph_specialization_info> replaced_args = vNULL;
> > +             bool failed = false;
> > +
> > +             FOR_EACH_VEC_ELT (spec_params->known_csts, j, spec_v)
> > +               {
> > +                 if (spec_v != NULL_TREE)
> > +                   {
> > +                     if (TREE_CODE (spec_v) == INTEGER_CST
> > +                         && TYPE_UNSIGNED (TREE_TYPE (spec_v))
> > +                         && tree_fits_uhwi_p (spec_v))
> > +                       {
> > +                             cgraph_specialization_info spec_info;
> > +                             spec_info.arg_idx = j;
> > +                             spec_info.is_unsigned = 1;
> > +                             spec_info.cst.uval = tree_to_uhwi (spec_v);
> > +                             replaced_args.safe_push (spec_info);
> > +                       }
> > +                     else if (TREE_CODE (spec_v) == INTEGER_CST
> > +                              && !TYPE_UNSIGNED (TREE_TYPE (spec_v))
> > +                              && tree_fits_shwi_p (spec_v))
> > +                       {
> > +                             cgraph_specialization_info spec_info;
> > +                             spec_info.arg_idx = j;
> > +                             spec_info.is_unsigned = 0;
> > +                             spec_info.cst.uval = tree_to_shwi (spec_v);
> > +                             replaced_args.safe_push (spec_info);
> > +                       }
> > +                     else
> > +                       {
> > +                         failed = true;
> > +                         break;
> > +                       }
> > +                   }
> > +               }
> > +
> > +             if (!failed && replaced_args.length () > 0)
> > +               {
> > +                 if (e->make_specialized (spec_n,
> > +                                          &replaced_args,
> > +                                          e->count.apply_scale (1, 10)))
> > +                   update = true;
> > +               }
> > +           }
> > +       }
> > +
> > +      if (update)
> > +       ipa_update_overall_fn_summary (n);
> > +    }
> > +}
> > +
> >  /* The IPCP driver.  */
> >
> >  static unsigned int
> > @@ -6551,6 +6651,7 @@ ipcp_driver (void)
> >    ipa_check_create_node_params ();
> >    ipa_check_create_edge_args ();
> >    clone_num_suffixes = new hash_map<const char *, unsigned>;
> > +  available_specializations = new hash_map<cgraph_node*, vec<cgraph_node*>>;
> >
> >    if (dump_file)
> >      {
> > @@ -6570,8 +6671,12 @@ ipcp_driver (void)
> >    ipcp_store_bits_results ();
> >    /* Store results of value range propagation.  */
> >    ipcp_store_vr_results ();
> > +  /* Add new edges for specializations.  */
> > +  if (flag_ipa_guarded_specialization)
> > +    add_specialized_edges ();
> >
> >    /* Free all IPCP structures.  */
> > +  delete available_specializations;
> >    delete clone_num_suffixes;
> >    free_toporder_info (&topo);
> >    delete edge_clone_summaries;
> > diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> > index fd3d7d6c5e8..a1f219a056e 100644
> > --- a/gcc/ipa-fnsummary.cc
> > +++ b/gcc/ipa-fnsummary.cc
> > @@ -257,6 +257,13 @@ redirect_to_unreachable (struct cgraph_edge *e)
> >      e = cgraph_edge::resolve_speculation (e, target->decl);
> >    else if (!e->callee)
> >      e = cgraph_edge::make_direct (e, target);
> > +  else if (e->base_specialization_edge_p ())
> > +    {
> > +      /* If the base edge becomes unreachable there's no reason to
> > +        keep the specializations around.  */
> > +      cgraph_edge::remove_specializations (e);
> > +      e->redirect_callee (target);
> > +    }
> >    else
> >      e->redirect_callee (target);
> >    class ipa_call_summary *es = ipa_call_summaries->get (e);
> > @@ -866,6 +873,7 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
> >           ipa_predicate new_predicate;
> >           class ipa_call_summary *es = ipa_call_summaries->get (edge);
> >           next = edge->next_callee;
> > +         bool update_next = edge->specialized;
> >
> >           if (!edge->inline_failed)
> >             inlined_to_p = true;
> > @@ -876,6 +884,9 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
> >           if (new_predicate == false && *es->predicate != false)
> >             optimized_out_size += es->call_stmt_size * ipa_fn_summary::size_scale;
> >           edge_set_predicate (edge, &new_predicate);
> > +         /* NEXT may be invalidated for specialized calls.  */
> > +         if (update_next)
> > +           next = edge->next_callee;
> >         }
> >
> >        /* Remap indirect edge predicates with the same simplification as above.
> > @@ -2825,6 +2836,29 @@ analyze_function_body (struct cgraph_node *node, bool early)
> >                                                      es, es3);
> >                     }
> >                 }
> > +             if (edge->specialized)
> > +               {
> > +                 cgraph_edge *base
> > +                       = edge->specialized_call_base_edge ();
> > +                 ipa_call_summary *es2
> > +                        = ipa_call_summaries->get_create (base);
> > +                 ipa_call_summaries->duplicate (edge, base,
> > +                                                es, es2);
> > +
> > +                 /* Edge is the first direct call.
> > +                    create and duplicate call summaries for multiple
> > +                    speculative call targets.  */
> > +                 for (cgraph_edge *specialization
> > +                        = edge->next_specialized_call_target ();
> > +                      specialization; specialization
> > +                        = specialization->next_specialized_call_target ())
> > +                   {
> > +                     ipa_call_summary *es3
> > +                       = ipa_call_summaries->get_create (specialization);
> > +                     ipa_call_summaries->duplicate (edge, specialization,
> > +                                                    es, es3);
> > +                   }
> > +               }
> >             }
> >
> >           /* TODO: When conditional jump or switch is known to be constant, but
> > @@ -3275,6 +3309,9 @@ estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
> >                              sreal *time, ipa_call_arg_values *avals,
> >                              ipa_hints *hints)
> >  {
> > +  if (e->guarded_specialization_edge_p ())
> > +    return;
> > +
> >    class ipa_call_summary *es = ipa_call_summaries->get (e);
> >    int call_size = es->call_stmt_size;
> >    int call_time = es->call_stmt_time;
> > @@ -4050,6 +4087,7 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
> >      {
> >        ipa_predicate p;
> >        next = e->next_callee;
> > +      bool update_next = e->specialized;
> >
> >        if (e->inline_failed)
> >         {
> > @@ -4073,6 +4111,10 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
> >                               params_summary, callee_info,
> >                               operand_map, offset_map, possible_truths,
> >                               toplev_predicate);
> > +
> > +      /* NEXT may be invalidated for specialized calls.  */
> > +      if (update_next)
> > +       next = e->next_callee;
> >      }
> >    for (e = node->indirect_calls; e; e = next)
> >      {
> > diff --git a/gcc/ipa-inline-transform.cc b/gcc/ipa-inline-transform.cc
> > index 07288e57c73..d0b9cd9e599 100644
> > --- a/gcc/ipa-inline-transform.cc
> > +++ b/gcc/ipa-inline-transform.cc
> > @@ -775,11 +775,22 @@ inline_transform (struct cgraph_node *node)
> >      }
> >
> >    maybe_materialize_called_clones (node);
> > +  /* Perform call statement redirection in two steps.  In the first step
> > +     only consider speculative edges and then process the rest in a separate
> > +     step.  This is required due to the potential existance of edges that are
> > +     both speculative and specialized, in which case we need to process them
> > +     in this order.  */
> >    for (e = node->callees; e; e = next)
> >      {
> >        if (!e->inline_failed)
> >         has_inline = true;
> >        next = e->next_callee;
> > +      if (e->speculative)
> > +       cgraph_edge::redirect_call_stmt_to_callee (e);
> > +    }
> > +  for (e = node->callees; e; e = next)
> > +    {
> > +      next = e->next_callee;
> >        cgraph_edge::redirect_call_stmt_to_callee (e);
> >      }
> >    node->remove_all_references ();
> > diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
> > index 14969198cde..c6cd2b92f6e 100644
> > --- a/gcc/ipa-inline.cc
> > +++ b/gcc/ipa-inline.cc
> > @@ -1185,12 +1185,17 @@ edge_badness (struct cgraph_edge *edge, bool dump)
> >    edge_time = estimate_edge_time (edge, &unspec_edge_time);
> >    hints = estimate_edge_hints (edge);
> >    gcc_checking_assert (edge_time >= 0);
> > +
> > +  /* Temporarily disabled due to the way time is calculated
> > +     with specialized edges.  */
> > +#if 0
> >    /* Check that inlined time is better, but tolerate some roundoff issues.
> >       FIXME: When callee profile drops to 0 we account calls more.  This
> >       should be fixed by never doing that.  */
> >    gcc_checking_assert ((edge_time * 100
> >                         - callee_info->time * 101).to_int () <= 0
> >                         || callee->count.ipa ().initialized_p ());
> > +#endif
> >    gcc_checking_assert (growth <= ipa_size_summaries->get (callee)->size);
> >
> >    if (dump)
> > diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
> > index 350195d86db..c8250f7b73c 100644
> > --- a/gcc/lto-cgraph.cc
> > +++ b/gcc/lto-cgraph.cc
> > @@ -271,6 +271,8 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
> >    bp_pack_value (&bp, edge->speculative_id, 16);
> >    bp_pack_value (&bp, edge->indirect_inlining_edge, 1);
> >    bp_pack_value (&bp, edge->speculative, 1);
> > +  bp_pack_value (&bp, edge->specialized, 1);
> > +  bp_pack_value (&bp, edge->spec_args != NULL, 1);
> >    bp_pack_value (&bp, edge->call_stmt_cannot_inline_p, 1);
> >    gcc_assert (!edge->call_stmt_cannot_inline_p
> >               || edge->inline_failed != CIF_BODY_NOT_AVAILABLE);
> > @@ -295,7 +297,27 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
> >        bp_pack_value (&bp, edge->indirect_info->num_speculative_call_targets,
> >                      16);
> >      }
> > +
> >    streamer_write_bitpack (&bp);
> > +
> > +  if (edge->spec_args != NULL)
> > +    {
> > +      cgraph_specialization_info *spec_info;
> > +      unsigned len = edge->spec_args->length (), i;
> > +      streamer_write_uhwi_stream (ob->main_stream, len);
> > +
> > +      FOR_EACH_VEC_ELT (*edge->spec_args, i, spec_info)
> > +       {
> > +         unsigned idx = spec_info->arg_idx;
> > +         streamer_write_uhwi_stream (ob->main_stream, idx);
> > +         streamer_write_hwi_stream (ob->main_stream, spec_info->is_unsigned);
> > +
> > +         if (spec_info->is_unsigned)
> > +           streamer_write_uhwi_stream (ob->main_stream, spec_info->cst.uval);
> > +         else
> > +           streamer_write_hwi_stream (ob->main_stream, spec_info->cst.sval);
> > +       }
> > +    }
> >  }
> >
> >  /* Return if NODE contain references from other partitions.  */
> > @@ -1517,6 +1539,8 @@ input_edge (class lto_input_block *ib, vec<symtab_node *> nodes,
> >
> >    edge->indirect_inlining_edge = bp_unpack_value (&bp, 1);
> >    edge->speculative = bp_unpack_value (&bp, 1);
> > +  edge->specialized = bp_unpack_value (&bp, 1);
> > +  bool has_edge_spec_args = bp_unpack_value (&bp, 1);
> >    edge->lto_stmt_uid = stmt_id;
> >    edge->speculative_id = speculative_id;
> >    edge->inline_failed = inline_failed;
> > @@ -1542,6 +1566,28 @@ input_edge (class lto_input_block *ib, vec<symtab_node *> nodes,
> >        edge->indirect_info->num_speculative_call_targets
> >         = bp_unpack_value (&bp, 16);
> >      }
> > +
> > +  if (has_edge_spec_args)
> > +    {
> > +      unsigned len = streamer_read_uhwi (ib);
> > +      vec_alloc (edge->spec_args, len);
> > +
> > +      for (unsigned i = 0; i < len; i++)
> > +       {
> > +         cgraph_specialization_info spec_info;
> > +         spec_info.arg_idx = streamer_read_uhwi (ib);
> > +         spec_info.is_unsigned = streamer_read_hwi (ib);
> > +
> > +         if (spec_info.is_unsigned)
> > +           spec_info.cst.uval = streamer_read_uhwi (ib);
> > +         else
> > +           spec_info.cst.sval = streamer_read_hwi (ib);
> > +
> > +         edge->spec_args->quick_push (spec_info);
> > +      }
> > +    }
> > +  else
> > +    edge->spec_args = NULL;
> >  }
> >
> >
> > diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> > index 8091ba8f13b..26657f7c017 100644
> > --- a/gcc/tree-inline.cc
> > +++ b/gcc/tree-inline.cc
> > @@ -2307,6 +2307,60 @@ copy_bb (copy_body_data *id, basic_block bb,
> >                           indirect->count
> >                              = copy_basic_block->count.apply_probability (prob);
> >                         }
> > +                     /* A specialized call is consist of multiple
> > +                        edges - a base edge and one or more specialized edges.
> > +                        Duplicate and distribute frequencies in a way similar
> > +                        to the speculative edges.  */
> > +                     else if (edge->specialized)
> > +                       {
> > +                         int n = 0;
> > +                         cgraph_edge *first
> > +                                = old_edge->first_specialized_call_target ();
> > +                         profile_count spec_cnt
> > +                                = profile_count::zero ();
> > +
> > +                         /* First figure out the distribution of counts
> > +                            so we can re-scale BB profile accordingly.  */
> > +                         for (cgraph_edge *e = first; e;
> > +                              e = e->next_specialized_call_target ())
> > +                           spec_cnt = spec_cnt + e->count;
> > +
> > +                         cgraph_edge *base
> > +                                = old_edge->specialized_call_base_edge ();
> > +                         profile_count base_cnt = base->count;
> > +
> > +                         /* Next iterate all specializations, clone them
> > +                            and update the profile.  */
> > +                         for (cgraph_edge *e = first; e;
> > +                              e = e->next_specialized_call_target ())
> > +                           {
> > +                             profile_count cnt = e->count;
> > +
> > +                             edge = e->clone (id->dst_node, call_stmt,
> > +                                              gimple_uid (stmt), num, den,
> > +                                              true);
> > +                             profile_probability prob
> > +                                = cnt.probability_in (spec_cnt
> > +                                                      + base_cnt);
> > +                             edge->count
> > +                                = copy_basic_block->count.apply_probability
> > +                                        (prob);
> > +                             n++;
> > +                           }
> > +
> > +                         /* Duplicate the base edge after all specialized
> > +                            edges cloned.  */
> > +                         base = base->clone (id->dst_node, call_stmt,
> > +                                                     gimple_uid (stmt),
> > +                                                     num, den,
> > +                                                     true);
> > +
> > +                         profile_probability prob
> > +                            = base_cnt.probability_in (spec_cnt
> > +                                                        + base_cnt);
> > +                         base->count
> > +                            = copy_basic_block->count.apply_probability (prob);
> > +                       }
> >                       else
> >                         {
> >                           edge = edge->clone (id->dst_node, call_stmt,
> > diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc
> > index 9656ce5870d..3db0070bcbf 100644
> > --- a/gcc/value-prof.cc
> > +++ b/gcc/value-prof.cc
> > @@ -42,6 +42,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gimple-pretty-print.h"
> >  #include "dumpfile.h"
> >  #include "builtins.h"
> > +#include "tree-cfg.h"
> > +#include "tree-dfa.h"
> >
> >  /* In this file value profile based optimizations are placed.  Currently the
> >     following optimizations are implemented (for more detailed descriptions
> > @@ -1434,6 +1436,218 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
> >    return dcall_stmt;
> >  }
> >
> > +/* Do transformation
> > +
> > +  if (arg_i == spec_args[y] && ...)
> > +    do call to specialized target callee
> > +  else
> > +    old call
> > + */
> > +
> > +gcall *
> > +gimple_sc (struct cgraph_edge *edg, profile_probability prob)
> > +{
> > +  /* The call statement we're modifying.  */
> > +  gcall *call_stmt = edg->call_stmt;
> > +  /* The cgraph_node of the specialized function.  */
> > +  cgraph_node *callee = edg->callee;
> > +  vec<cgraph_specialization_info> *spec_args = edg->spec_args;
> > +
> > +  /* CALL_STMT should be the call_stmt of the generic function.  */
> > +  gcc_checking_assert (edg->specialized_call_base_edge ()->call_stmt
> > +                     == call_stmt);
> > +
> > +  gcall *spec_call_stmt = NULL;
> > +  tree cond_tree = NULL_TREE;
> > +  gcond *cond_stmt = NULL;
> > +  basic_block cond_bb, dcall_bb, icall_bb, join_bb = NULL;
> > +  edge e_cd, e_ci, e_di, e_dj = NULL, e_ij;
> > +  gimple_stmt_iterator gsi;
> > +  int lp_nr, dflags;
> > +  edge e_eh, e;
> > +  edge_iterator ei;
> > +
> > +  cond_bb = gimple_bb (call_stmt);
> > +  gsi = gsi_for_stmt (call_stmt);
> > +
> > +  /* To call the specialized function we need to build a guard conditional
> > +     with the specialized arguments and constants.  */
> > +  unsigned nargs = gimple_call_num_args (call_stmt);
> > +  unsigned cur_spec = 0;
> > +  bool dump_first = true;
> > +
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, "Creating specialization guard for edge %s -> %s:\n",
> > +                        edg->caller->dump_name (), edg->callee->dump_name ());
> > +      fprintf (dump_file, "if (");
> > +    }
> > +
> > +  for (unsigned arg_idx = 0; arg_idx < nargs; arg_idx++)
> > +    {
> > +      tree cur_arg = gimple_call_arg (call_stmt, arg_idx);
> > +      bool cur_arg_specialized_p = cur_spec < spec_args->length ()
> > +       && arg_idx == (*spec_args)[cur_spec].arg_idx;
> > +
> > +      if (cur_arg_specialized_p)
> > +       {
> > +         gcc_checking_assert (!cond_stmt);
> > +
> > +         cgraph_specialization_info spec_info = (*spec_args)[cur_spec];
> > +         cur_spec++;
> > +
> > +         tree spec_v;
> > +         if (spec_info.is_unsigned)
> > +           spec_v = build_int_cstu (integer_type_node, spec_info.cst.uval);
> > +         else
> > +           spec_v = build_int_cst (integer_type_node, spec_info.cst.sval);
> > +
> > +         tree cmp_const = fold_convert (TREE_TYPE (cur_arg), spec_v);
> > +
> > +         tree cur_arg_eq_spec = build2 (EQ_EXPR, boolean_type_node,
> > +                                             cur_arg, cmp_const);
> > +
> > +         if (dump_file)
> > +           {
> > +             if (!dump_first)
> > +               fprintf (dump_file, " && ");
> > +             print_generic_expr (dump_file, cur_arg_eq_spec);
> > +             dump_first = false;
> > +           }
> > +
> > +         tree tmp1 = make_temp_ssa_name (boolean_type_node, NULL, "SPEC");
> > +         gassign* load_stmt1 = gimple_build_assign (tmp1, cur_arg_eq_spec);
> > +         gsi_insert_before (&gsi, load_stmt1, GSI_SAME_STMT);
> > +
> > +         if (!cond_tree)
> > +           cond_tree = tmp1;
> > +         else
> > +           {
> > +             tree cur_and_prev_true = fold_build2 (BIT_AND_EXPR,
> > +                                        boolean_type_node,
> > +                                        cond_tree,
> > +                                        tmp1);
> > +
> > +             tree tmp2 = make_temp_ssa_name (boolean_type_node, NULL, "SPEC");
> > +             gassign* load_stmt2
> > +               = gimple_build_assign (tmp2, cur_and_prev_true);
> > +             gsi_insert_before (&gsi, load_stmt2, GSI_SAME_STMT);
> > +             cond_tree = tmp2;
> > +           }
> > +       }
> > +    }
> > +
> > +  cond_stmt = gimple_build_cond (EQ_EXPR, cond_tree, boolean_true_node,
> > +                                NULL_TREE, NULL_TREE);
> > +
> > +  gsi_insert_before (&gsi, cond_stmt, GSI_SAME_STMT);
> > +
> > +  if (gimple_vdef (call_stmt)
> > +      && TREE_CODE (gimple_vdef (call_stmt)) == SSA_NAME)
> > +    {
> > +      unlink_stmt_vdef (call_stmt);
> > +      release_ssa_name (gimple_vdef (call_stmt));
> > +    }
> > +  gimple_set_vdef (call_stmt, NULL_TREE);
> > +  gimple_set_vuse (call_stmt, NULL_TREE);
> > +  update_stmt (call_stmt);
> > +  spec_call_stmt = as_a <gcall *> (gimple_copy (call_stmt));
> > +  gimple_call_set_fndecl (spec_call_stmt, callee->decl);
> > +  dflags = flags_from_decl_or_type (callee->decl);
> > +
> > +  if ((dflags & ECF_NORETURN) != 0
> > +      && should_remove_lhs_p (gimple_call_lhs (spec_call_stmt)))
> > +    gimple_call_set_lhs (spec_call_stmt, NULL_TREE);
> > +  gsi_insert_before (&gsi, spec_call_stmt, GSI_SAME_STMT);
> > +
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, ")\n  ");
> > +      print_gimple_stmt (dump_file, spec_call_stmt, 0);
> > +    }
> > +
> > +  e_cd = split_block (cond_bb, cond_stmt);
> > +  dcall_bb = e_cd->dest;
> > +  dcall_bb->count = cond_bb->count.apply_probability (prob);
> > +
> > +  e_di = split_block (dcall_bb, spec_call_stmt);
> > +  icall_bb = e_di->dest;
> > +  icall_bb->count = cond_bb->count - dcall_bb->count;
> > +
> > +  if (!stmt_ends_bb_p (call_stmt))
> > +    e_ij = split_block (icall_bb, call_stmt);
> > +  else
> > +    {
> > +      e_ij = find_fallthru_edge (icall_bb->succs);
> > +      if (e_ij != NULL)
> > +       {
> > +         e_ij->probability = profile_probability::always ();
> > +         e_ij = single_pred_edge (split_edge (e_ij));
> > +       }
> > +    }
> > +  if (e_ij != NULL)
> > +    {
> > +      join_bb = e_ij->dest;
> > +      join_bb->count = cond_bb->count;
> > +    }
> > +
> > +  e_cd->flags = (e_cd->flags & ~EDGE_FALLTHRU) | EDGE_TRUE_VALUE;
> > +  e_cd->probability = prob;
> > +
> > +  e_ci = make_edge (cond_bb, icall_bb, EDGE_FALSE_VALUE);
> > +  e_ci->probability = prob.invert ();
> > +
> > +  remove_edge (e_di);
> > +
> > +  if (e_ij != NULL)
> > +    {
> > +      if ((dflags & ECF_NORETURN) == 0)
> > +       {
> > +         e_dj = make_edge (dcall_bb, join_bb, EDGE_FALLTHRU);
> > +         e_dj->probability = profile_probability::always ();
> > +       }
> > +      e_ij->probability = profile_probability::always ();
> > +    }
> > +
> > +  if (gimple_call_lhs (call_stmt)
> > +      && TREE_CODE (gimple_call_lhs (call_stmt)) == SSA_NAME
> > +      && (dflags & ECF_NORETURN) == 0)
> > +    {
> > +      tree result = gimple_call_lhs (call_stmt);
> > +      gphi *phi = create_phi_node (result, join_bb);
> > +      gimple_call_set_lhs (call_stmt,
> > +                          duplicate_ssa_name (result, call_stmt));
> > +      add_phi_arg (phi, gimple_call_lhs (call_stmt), e_ij, UNKNOWN_LOCATION);
> > +      gimple_call_set_lhs (spec_call_stmt,
> > +                          duplicate_ssa_name (result, spec_call_stmt));
> > +      add_phi_arg (phi, gimple_call_lhs (spec_call_stmt), e_dj,
> > +                  UNKNOWN_LOCATION);
> > +    }
> > +
> > +  lp_nr = lookup_stmt_eh_lp (call_stmt);
> > +  if (lp_nr > 0 && stmt_could_throw_p (cfun, spec_call_stmt))
> > +    {
> > +      add_stmt_to_eh_lp (spec_call_stmt, lp_nr);
> > +    }
> > +
> > +  FOR_EACH_EDGE (e_eh, ei, icall_bb->succs)
> > +    if (e_eh->flags & (EDGE_EH | EDGE_ABNORMAL))
> > +      {
> > +       e = make_edge (dcall_bb, e_eh->dest, e_eh->flags);
> > +       e->probability = e_eh->probability;
> > +       for (gphi_iterator psi = gsi_start_phis (e_eh->dest);
> > +            !gsi_end_p (psi); gsi_next (&psi))
> > +         {
> > +           gphi *phi = psi.phi ();
> > +           SET_USE (PHI_ARG_DEF_PTR_FROM_EDGE (phi, e),
> > +                    PHI_ARG_DEF_FROM_EDGE (phi, e_eh));
> > +         }
> > +       }
> > +  if (!stmt_could_throw_p (cfun, spec_call_stmt))
> > +    gimple_purge_dead_eh_edges (dcall_bb);
> > +  return spec_call_stmt;
> > +}
> > +
> >  /* Dump info about indirect call profile.  */
> >
> >  static void
> > diff --git a/gcc/value-prof.h b/gcc/value-prof.h
> > index d852c41f33f..7d8be5920b9 100644
> > --- a/gcc/value-prof.h
> > +++ b/gcc/value-prof.h
> > @@ -89,6 +89,7 @@ void verify_histograms (void);
> >  void free_histograms (function *);
> >  void stringop_block_profile (gimple *, unsigned int *, HOST_WIDE_INT *);
> >  gcall *gimple_ic (gcall *, struct cgraph_node *, profile_probability);
> > +gcall *gimple_sc (struct cgraph_edge *, profile_probability);
> >  bool get_nth_most_common_value (gimple *stmt, const char *counter_type,
> >                                 histogram_value hist, gcov_type *value,
> >                                 gcov_type *count, gcov_type *all,
> > --
> > 2.38.1
> >

  reply	other threads:[~2022-11-14 10:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-13 15:37 Christoph Muellner
2022-11-14  7:37 ` Richard Biener
2022-11-14 10:35   ` Manolis Tsamis [this message]
2022-11-14 10:48     ` Richard Biener
2022-12-16 16:19   ` Manolis Tsamis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM3yNXr5yxjy7PKea1_qe2oeL4zAjWMYAveTzzb=_gh8GOyrug@mail.gmail.com' \
    --to=manolis.tsamis@vrull.eu \
    --cc=christoph.muellner@vrull.eu \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jh@suse.cz \
    --cc=mjambor@suse.cz \
    --cc=philipp.tomsich@vrull.eu \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).