public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Manolis Tsamis <manolis.tsamis@vrull.eu>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Christoph Muellner <christoph.muellner@vrull.eu>,
	gcc-patches@gcc.gnu.org,  Martin Jambor <mjambor@suse.cz>,
	Jan Hubicka <jh@suse.cz>,
	Philipp Tomsich <philipp.tomsich@vrull.eu>
Subject: Re: [RFC PATCH] ipa-cp: Speculatively call specialized functions
Date: Fri, 16 Dec 2022 18:19:28 +0200	[thread overview]
Message-ID: <CAM3yNXpOLT95T3Qbru+3=_485SVMd2ROgmwiuJ7vpnrGC56s2A@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc2eVud6z9v8Apv5uxx2hJO8BpV=cp6ZqDWcUhKqejqatw@mail.gmail.com>

On Mon, Nov 14, 2022 at 9:37 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sun, Nov 13, 2022 at 4:38 PM Christoph Muellner
> <christoph.muellner@vrull.eu> wrote:
> >
> > From: mtsamis <manolis.tsamis@vrull.eu>
> >
> > The IPA CP pass offers a wide range of optimizations, where most of them
> > lead to specialized functions that are called from a call site.
> > This can lead to multiple specialized function clones, if more than
> > one call-site allows such an optimization.
> > If not all call-sites can be optimized, the program might end
> > up with call-sites to the original function.
> >
> > This pass assumes that non-optimized call-sites (i.e. call-sites
> > that don't call specialized functions) are likely to be called
> > with arguments that would allow calling specialized clones.
> > Since we cannot guarantee this (for obvious reasons), we can't
> > replace the existing calls. However, we can introduce dynamic
> > guards that test the arguments for the collected constants
> > and calls the specialized function if there is a match.
> >
> > To demonstrate the effect, let's consider the following program part:
> >
> >   func_1()
> >     myfunc(1)
> >   func_2()
> >     myfunc(2)
> >   func_i(i)
> >     myfunc(i)
> >
> > In this case the transformation would do the following:
> >
> >   func_1()
> >     myfunc.constprop.1() // myfunc() with arg0 == 1
> >   func_2()
> >     myfunc.constprop.2() // myfunc() with arg0 == 2
> >   func_i(i)
> >     if (i == 1)
> >       myfunc.constprop.1() // myfunc() with arg0 == 1
> >     else if (i == 2)
> >       myfunc.constprop.2() // myfunc() with arg0 == 2
> >     else
> >       myfunc(i)
> >
> > The pass consists of two main parts:
> > * collecting all specialized functions and the argument/constant pair(s)
> > * insertion of the guards during materialization
> >
> > The patch integrates well into ipa-cp and related IPA functionality.
> > Given the nature of IPA, the changes are touching many IPA-related
> > files as well as call-graph data structures.
> >
> > The impact of the dynamic guard is expected to be less than the speedup
> > gained by enabled optimizations (e.g. inlining or constant propagation).
>
> I don't see any limits on the number of callee candidates or the complexity
> of the guard.  Is there any reason to not factor the guards into a wrapper
> function to avoid bloating cold call sites and to allow inlining to decide
> where the expansion is useful?
>
> Skimming the patch I noticed an #if 0 commented assert with a comment
> that this was to be temporary?
>

I have sent a V2 of this with both issues addressed. It also contains a number
of other important fixes.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608639.html

Thanks!
Manolis

> Thanks,
> Richard.
>
> > PR ipa/107667
> > gcc/Changelog:
> >
> >         * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for guarded specialized edges.
> >         (cgraph_edge::set_call_stmt): Likewise.
> >         (symbol_table::create_edge): Likewise.
> >         (cgraph_edge::remove): Likewise.
> >         (cgraph_edge::make_speculative): Likewise.
> >         (cgraph_edge::make_specialized): Likewise.
> >         (cgraph_edge::remove_specializations): Likewise.
> >         (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
> >         (cgraph_edge::dump_edge_flags): Likewise.
> >         (verify_speculative_call): Likewise.
> >         (verify_specialized_call): Likewise.
> >         (cgraph_node::verify_node): Likewise.
> >         * cgraph.h (class GTY): Add new class that contains info of specialized edges.
> >         * cgraphclones.cc (cgraph_edge::clone): Add support for guarded specialized edges.
> >         (cgraph_node::set_call_stmt_including_clones): Likewise.
> >         * ipa-cp.cc (want_remove_some_param_p): Likewise.
> >         (create_specialized_node): Likewise.
> >         (add_specialized_edges): Likewise.
> >         (ipcp_driver): Likewise.
> >         * ipa-fnsummary.cc (redirect_to_unreachable): Likewise.
> >         (ipa_fn_summary_t::duplicate): Likewise.
> >         (analyze_function_body): Likewise.
> >         (estimate_edge_size_and_time): Likewise.
> >         (remap_edge_summaries): Likewise.
> >         * ipa-inline-transform.cc (inline_transform): Likewise.
> >         * ipa-inline.cc (edge_badness): Likewise.
> >          lto-cgraph.cc (lto_output_edge): Likewise.
> >         (input_edge): Likewise.
> >         * tree-inline.cc (copy_bb): Likewise.
> >         * value-prof.cc (gimple_sc): Add function to create guarded specializations.
> >         * value-prof.h (gimple_sc): Likewise.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >  gcc/cgraph.cc               | 316 +++++++++++++++++++++++++++++++++++-
> >  gcc/cgraph.h                | 102 ++++++++++++
> >  gcc/cgraphclones.cc         |  30 ++++
> >  gcc/common.opt              |   4 +
> >  gcc/ipa-cp.cc               | 105 ++++++++++++
> >  gcc/ipa-fnsummary.cc        |  42 +++++
> >  gcc/ipa-inline-transform.cc |  11 ++
> >  gcc/ipa-inline.cc           |   5 +
> >  gcc/lto-cgraph.cc           |  46 ++++++
> >  gcc/tree-inline.cc          |  54 ++++++
> >  gcc/value-prof.cc           | 214 ++++++++++++++++++++++++
> >  gcc/value-prof.h            |   1 +
> >  12 files changed, 923 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> > index 5851b2ffc6c..ee819c87261 100644
> > --- a/gcc/cgraph.cc
> > +++ b/gcc/cgraph.cc
> > @@ -718,18 +718,24 @@ cgraph_add_edge_to_call_site_hash (cgraph_edge *e)
> >       one indirect); always hash the direct one.  */
> >    if (e->speculative && e->indirect_unknown_callee)
> >      return;
> > +  /* There are potentially multiple specialization edges for every
> > +     specialized call; always hash the base egde.  */
> > +  if (e->guarded_specialization_edge_p ())
> > +    return;
> >    cgraph_edge **slot = e->caller->call_site_hash->find_slot_with_hash
> >        (e->call_stmt, cgraph_edge_hasher::hash (e->call_stmt), INSERT);
> >    if (*slot)
> >      {
> > -      gcc_assert (((cgraph_edge *)*slot)->speculative);
> > +      gcc_assert (((cgraph_edge *)*slot)->speculative
> > +                 || ((cgraph_edge *)*slot)->specialized);
> >        if (e->callee && (!e->prev_callee
> >                         || !e->prev_callee->speculative
> > +                       || !e->prev_callee->specialized
> >                         || e->prev_callee->call_stmt != e->call_stmt))
> >         *slot = e;
> >        return;
> >      }
> > -  gcc_assert (!*slot || e->speculative);
> > +  gcc_assert (!*slot || e->speculative || e->specialized);
> >    *slot = e;
> >  }
> >
> > @@ -800,6 +806,23 @@ cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
> >        gcc_checking_assert (new_direct_callee);
> >      }
> >
> > +  /* Update specialized first and do not return yet in case we're dealing
> > +     with an edge that is both specialized and speculative.  */
> > +  if (update_speculative && e->specialized)
> > +    {
> > +      cgraph_edge *next, *base = e->specialized_call_base_edge ();
> > +      for (cgraph_edge *d = e->first_specialized_call_target (); d; d = next)
> > +       {
> > +         next = d->next_specialized_call_target ();
> > +         cgraph_edge *d2 = set_call_stmt (d, new_stmt, false);
> > +         gcc_assert (d2 == d);
> > +       }
> > +
> > +      /* Don't update base for speculative edges.  The code below will.  */
> > +      if (!e->speculative)
> > +       set_call_stmt (base, new_stmt, false);
> > +    }
> > +
> >    /* Speculative edges has three component, update all of them
> >       when asked to.  */
> >    if (update_speculative && e->speculative
> > @@ -835,12 +858,16 @@ cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
> >        return e_indirect ? indirect : direct;
> >      }
> >
> > +  if (update_speculative && e->specialized)
> > +    return e;
> > +
> >    if (new_direct_callee)
> >      e = make_direct (e, new_direct_callee);
> >
> >    /* Only direct speculative edges go to call_site_hash.  */
> >    if (e->caller->call_site_hash
> >        && (!e->speculative || !e->indirect_unknown_callee)
> > +      && (!e->specialized || e->spec_args == NULL)
> >        /* It is possible that edge was previously speculative.  In this case
> >          we have different value in call stmt hash which needs preserving.  */
> >        && e->caller->get_edge (e->call_stmt) == e)
> > @@ -854,11 +881,12 @@ cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
> >    /* Update call stite hash.  For speculative calls we only record the first
> >       direct edge.  */
> >    if (e->caller->call_site_hash
> > -      && (!e->speculative
> > +      && ((!e->speculative && !e->specialized)
> >           || (e->callee
> >               && (!e->prev_callee || !e->prev_callee->speculative
> >                   || e->prev_callee->call_stmt != e->call_stmt))
> > -         || (e->speculative && !e->callee)))
> > +         || (e->speculative && !e->callee)
> > +         || e->base_specialization_edge_p ()))
> >      cgraph_add_edge_to_call_site_hash (e);
> >    return e;
> >  }
> > @@ -883,7 +911,8 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
> >          construction of call stmt hashtable.  */
> >        cgraph_edge *e;
> >        gcc_checking_assert (!(e = caller->get_edge (call_stmt))
> > -                          || e->speculative);
> > +                          || e->speculative
> > +                          || e->specialized);
> >
> >        gcc_assert (is_gimple_call (call_stmt));
> >      }
> > @@ -909,6 +938,8 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
> >    edge->indirect_info = NULL;
> >    edge->indirect_inlining_edge = 0;
> >    edge->speculative = false;
> > +  edge->specialized = false;
> > +  edge->spec_args = NULL;
> >    edge->indirect_unknown_callee = indir_unknown_callee;
> >    if (call_stmt && caller->call_site_hash)
> >      cgraph_add_edge_to_call_site_hash (edge);
> > @@ -1066,6 +1097,11 @@ symbol_table::free_edge (cgraph_edge *e)
> >  void
> >  cgraph_edge::remove (cgraph_edge *edge)
> >  {
> > +  /* If we remove the base edge of a group of specialized
> > +     edges then we must also remove all of its specializations.  */
> > +  if (edge->base_specialization_edge_p ())
> > +    cgraph_edge::remove_specializations (edge);
> > +
> >    /* Call all edge removal hooks.  */
> >    symtab->call_edge_removal_hooks (edge);
> >
> > @@ -1109,6 +1145,8 @@ cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count,
> >    ipa_ref *ref = NULL;
> >    cgraph_edge *e2;
> >
> > +  gcc_checking_assert (!specialized);
> > +
> >    if (dump_file)
> >      fprintf (dump_file, "Indirect call -> speculative call %s => %s\n",
> >              n->dump_name (), n2->dump_name ());
> > @@ -1134,6 +1172,62 @@ cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count,
> >    return e2;
> >  }
> >
> > +/* Mark this edge as specialized and add a new edge representing that N2
> > +   is a specialized version of the CALLE of this edge, with the specialized
> > +   arguments found in SPEC_ARGS.  */
> > +cgraph_edge *
> > +cgraph_edge::make_specialized (cgraph_node *n2,
> > +                               vec<cgraph_specialization_info>* spec_args,
> > +                               profile_count spec_count)
> > +{
> > +  if (speculative)
> > +    {
> > +      /* Because both speculative and specialized edges use CALL_STMT and
> > +        LTO_STMT_UID to link edges together there is a limitation in
> > +        specializing speculative edges.  Only one group of specialized
> > +        edges can exist for a given group of speculative edges.  */
> > +      for (cgraph_edge *direct = first_speculative_call_target ();
> > +          direct; direct = direct->next_speculative_call_target ())
> > +       if (direct != this && direct->specialized)
> > +         return NULL;
> > +    }
> > +
> > +  cgraph_node *n = caller;
> > +  cgraph_edge *e2;
> > +
> > +  if (dump_file)
> > +    fprintf (dump_file, "Creating guarded specialized edge %s -> %s "
> > +                       "from%s callee %s\n",
> > +                       caller->dump_name (), n2->dump_name (),
> > +                       (speculative? " speculative" : ""),
> > +                       callee->dump_name ());
> > +  specialized = true;
> > +  e2 = n->create_edge (n2, call_stmt, spec_count);
> > +
> > +  /* We don't want to inline the specialized edges seperately.  If the base
> > +     specialized edge is inlined then we will drop the specializations.  */
> > +  e2->inline_failed = CIF_UNSPECIFIED;
> > +  if (TREE_NOTHROW (n2->decl))
> > +    e2->can_throw_external = false;
> > +  else
> > +    e2->can_throw_external = can_throw_external;
> > +
> > +  e2->specialized = true;
> > +
> > +  unsigned i;
> > +  cgraph_specialization_info* spec_info;
> > +  vec_alloc (e2->spec_args, spec_args->length ());
> > +
> > +  FOR_EACH_VEC_ELT (*spec_args, i, spec_info)
> > +    e2->spec_args->quick_push (*spec_info);
> > +
> > +  e2->lto_stmt_uid = lto_stmt_uid;
> > +  e2->in_polymorphic_cdtor = in_polymorphic_cdtor;
> > +  count -= e2->count;
> > +  symtab->call_edge_duplication_hooks (this, e2);
> > +  return e2;
> > +}
> > +
> >  /* Speculative call consists of an indirect edge and one or more
> >     direct edge+ref pairs.
> >
> > @@ -1364,6 +1458,39 @@ cgraph_edge::make_direct (cgraph_edge *edge, cgraph_node *callee)
> >    return edge;
> >  }
> >
> > +/* Given the base edge of a group of specialized edges remove all its
> > +   specialized edges.  Essentially this can be used to undo the descision
> > +   to specialize EDGE.  */
> > +
> > +void
> > +cgraph_edge::remove_specializations (cgraph_edge *edge)
> > +{
> > +  if (!edge->specialized)
> > +    return;
> > +
> > +  if (edge->base_specialization_edge_p ())
> > +    {
> > +      cgraph_edge *next;
> > +      for (cgraph_edge *e2 = edge->caller->callees; e2; e2 = next)
> > +       {
> > +         next = e2->next_callee;
> > +
> > +         if (e2->guarded_specialization_edge_p ()
> > +             && edge->call_stmt == e2->call_stmt
> > +             && edge->lto_stmt_uid == e2->lto_stmt_uid)
> > +           {
> > +             edge->count += e2->count;
> > +             if (e2->inline_failed)
> > +               remove (e2);
> > +             else
> > +               e2->callee->remove_symbol_and_inline_clones ();
> > +           }
> > +       }
> > +    }
> > +  else
> > +    gcc_checking_assert (false);
> > +}
> > +
> >  /* Redirect callee of the edge to N.  The function does not update underlying
> >     call expression.  */
> >
> > @@ -1411,6 +1538,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> >  {
> >    tree decl = gimple_call_fndecl (e->call_stmt);
> >    gcall *new_stmt;
> > +  bool remove_specializations_if_base = true;
> >
> >    if (e->speculative)
> >      {
> > @@ -1467,6 +1595,27 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> >           /* Indirect edges are not both in the call site hash.
> >              get it updated.  */
> >           update_call_stmt_hash_for_removing_direct_edge (e, indirect);
> > +
> > +         if (e->specialized)
> > +           {
> > +             gcc_checking_assert (e->base_specialization_edge_p ());
> > +
> > +             /* If we're materializing a speculative and base specialized edge
> > +                then we want to keep the specializations alive.  This amounts
> > +                to changing the call statements of the guarded
> > +                specializations.  */
> > +             remove_specializations_if_base = false;
> > +             cgraph_edge *next;
> > +
> > +             for (cgraph_edge *d = e->first_specialized_call_target ();
> > +                  d; d = next)
> > +               {
> > +                 next = d->next_specialized_call_target ();
> > +                 cgraph_edge *d2 = set_call_stmt (d, new_stmt, false);
> > +                 gcc_assert (d2 == d);
> > +               }
> > +           }
> > +
> >           cgraph_edge::set_call_stmt (e, new_stmt, false);
> >           e->count = gimple_bb (e->call_stmt)->count;
> >
> > @@ -1482,6 +1631,53 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> >         }
> >      }
> >
> > +  if (e->specialized)
> > +    {
> > +      if (e->spec_args != NULL)
> > +       {
> > +         /* Be sure we redirect all specialized targets before poking
> > +            about base edge.  */
> > +         cgraph_edge *base = e->specialized_call_base_edge ();
> > +         gcall *new_stmt;
> > +
> > +         /* Expand specialization into GIMPLE code.  */
> > +         if (dump_file)
> > +           fprintf (dump_file,
> > +                    "Expanding specialized call of %s -> %s\n",
> > +                    e->caller->dump_name (), e->callee->dump_name ());
> > +
> > +         push_cfun (DECL_STRUCT_FUNCTION (e->caller->decl));
> > +
> > +         profile_count all = base->count;
> > +         for (cgraph_edge *e2 = e->first_specialized_call_target ();
> > +              e2; e2 = e2->next_specialized_call_target ())
> > +           all = all + e2->count;
> > +
> > +         profile_probability prob = e->count.probability_in (all);
> > +         if (!prob.initialized_p ())
> > +           prob = profile_probability::even ();
> > +
> > +         new_stmt = gimple_sc (e, prob);
> > +         e->specialized = false;
> > +         e->spec_args = NULL;
> > +         if (!base->first_specialized_call_target ())
> > +           base->specialized = false;
> > +
> > +         cgraph_edge::set_call_stmt (e, new_stmt, false);
> > +         e->count = gimple_bb (e->call_stmt)->count;
> > +         /* Once we are done with expanding the sequence, update also base
> > +            call probability.  Until then the basic block accounts for the
> > +            sum of specialized edges and all non-expanded specializations.  */
> > +         if (!base->specialized)
> > +           base->count = gimple_bb (base->call_stmt)->count;
> > +
> > +         pop_cfun ();
> > +       }
> > +      else if (remove_specializations_if_base)
> > +       /* The specialized edges are in part connected by CALL_STMT so if
> > +          we change it for the base edge then remove all specializations.  */
> > +       cgraph_edge::remove_specializations (e);
> > +    }
> >
> >    if (e->indirect_unknown_callee
> >        || decl == e->callee->decl)
> > @@ -2069,6 +2265,10 @@ cgraph_edge::dump_edge_flags (FILE *f)
> >  {
> >    if (speculative)
> >      fprintf (f, "(speculative) ");
> > +  if (base_specialization_edge_p ())
> > +    fprintf (f, "(specialized base) ");
> > +  if (guarded_specialization_edge_p ())
> > +    fprintf (f, "(guarded specialization) ");
> >    if (!inline_failed)
> >      fprintf (f, "(inlined) ");
> >    if (call_stmt_cannot_inline_p)
> > @@ -3313,6 +3513,10 @@ verify_speculative_call (struct cgraph_node *node, gimple *stmt,
> >         direct = direct->next_callee)
> >      if (direct->call_stmt == stmt && direct->lto_stmt_uid == lto_stmt_uid)
> >        {
> > +       /* Guarded specialized edges share the same CALL_STMT and LTO_STMT_UID
> > +          but are handled separately.  */
> > +       if (direct->guarded_specialization_edge_p ())
> > +         continue;
> >         if (!first_call)
> >           first_call = direct;
> >         if (prev_call && direct != prev_call->next_callee)
> > @@ -3402,6 +3606,93 @@ verify_speculative_call (struct cgraph_node *node, gimple *stmt,
> >    return false;
> >  }
> >
> > +/* Verify consistency of specialized call in NODE corresponding to STMT
> > +   and LTO_STMT_UID.  If BASE is set, assume that it is the base
> > +   edge of call sequence.  Return true if error is found.
> > +
> > +   This function is called to every component of specialized call (base edge
> > +   and specialized edges).  To save duplicated work, do full testing only
> > +   when testing the base edge.  */
> > +static bool
> > +verify_specialized_call (struct cgraph_node *node, gimple *stmt,
> > +                        unsigned int lto_stmt_uid,
> > +                        struct cgraph_edge *base)
> > +{
> > +  if (base == NULL)
> > +    {
> > +      cgraph_edge *base;
> > +      for (base = node->callees; base;
> > +          base = base->next_callee)
> > +       if (base->call_stmt == stmt
> > +           && base->lto_stmt_uid == lto_stmt_uid
> > +           && base->spec_args == NULL)
> > +         break;
> > +      if (!base)
> > +       {
> > +         error ("missing base call in specialized call sequence");
> > +         return true;
> > +       }
> > +      if (!base->specialized)
> > +       {
> > +         error ("base call in specialized call sequence has no "
> > +                "specialized flag");
> > +         return true;
> > +       }
> > +      for (base = base->next_callee; base;
> > +          base = base->next_callee)
> > +       if (base->call_stmt == stmt
> > +           && base->lto_stmt_uid == lto_stmt_uid
> > +           && base->spec_args == NULL)
> > +         {
> > +           error ("cannot have more than one base edge in specialized "
> > +                  "call sequence");
> > +           return true;
> > +         }
> > +      return false;
> > +    }
> > +
> > +  cgraph_edge *prev_call = NULL;
> > +
> > +  cgraph_node *origin_base = base->callee;
> > +  while (origin_base->clone_of)
> > +    origin_base = origin_base->clone_of;
> > +
> > +  for (cgraph_edge *spec = node->callees; spec;
> > +       spec = spec->next_callee)
> > +    if (spec->call_stmt == stmt
> > +       && spec->lto_stmt_uid == lto_stmt_uid
> > +       && spec->spec_args != NULL)
> > +      {
> > +       cgraph_node *origin_spec = spec->callee;
> > +       while (origin_spec->clone_of)
> > +         origin_spec = origin_spec->clone_of;
> > +
> > +       if (spec->callee->clone_of && origin_base != origin_spec)
> > +         {
> > +           error ("specialized call to %s in specialized call sequence has "
> > +                  "different origin than base %s %s %s",
> > +                  origin_spec->dump_name (), origin_base->dump_name (),
> > +                  base->callee->dump_name (), spec->callee->dump_name ());
> > +           return true;
> > +         }
> > +
> > +       if (prev_call && spec != prev_call->next_callee)
> > +         {
> > +           error ("specialized edges are not adjacent");
> > +           return true;
> > +         }
> > +       prev_call = spec;
> > +       if (!spec->specialized)
> > +         {
> > +           error ("call to %s in specialized call sequence has no "
> > +                  "specialized flag", spec->callee->dump_name ());
> > +           return true;
> > +         }
> > +      }
> > +
> > +  return false;
> > +}
> > +
> >  /* Verify cgraph nodes of given cgraph node.  */
> >  DEBUG_FUNCTION void
> >  cgraph_node::verify_node (void)
> > @@ -3578,6 +3869,7 @@ cgraph_node::verify_node (void)
> >        if (gimple_has_body_p (e->caller->decl)
> >           && !e->caller->inlined_to
> >           && !e->speculative
> > +         && !e->specialized
> >           /* Optimized out calls are redirected to __builtin_unreachable.  */
> >           && (e->count.nonzero_p ()
> >               || ! e->callee->decl
> > @@ -3604,6 +3896,10 @@ cgraph_node::verify_node (void)
> >           && verify_speculative_call (e->caller, e->call_stmt, e->lto_stmt_uid,
> >                                       NULL))
> >         error_found = true;
> > +      if (e->specialized
> > +         && verify_specialized_call (e->caller, e->call_stmt, e->lto_stmt_uid,
> > +                                     e->spec_args == NULL? e : NULL))
> > +       error_found = true;
> >      }
> >    for (e = indirect_calls; e; e = e->next_callee)
> >      {
> > @@ -3612,6 +3908,7 @@ cgraph_node::verify_node (void)
> >        if (gimple_has_body_p (e->caller->decl)
> >           && !e->caller->inlined_to
> >           && !e->speculative
> > +         && !e->specialized
> >           && e->count.ipa_p ()
> >           && count
> >               == ENTRY_BLOCK_PTR_FOR_FN (DECL_STRUCT_FUNCTION (decl))->count
> > @@ -3630,6 +3927,11 @@ cgraph_node::verify_node (void)
> >           && verify_speculative_call (e->caller, e->call_stmt, e->lto_stmt_uid,
> >                                       e))
> >         error_found = true;
> > +      if (e->specialized || e->spec_args != NULL)
> > +       {
> > +         error ("Cannot have specialized edges in indirect call");
> > +         error_found = true;
> > +       }
> >      }
> >    for (i = 0; iterate_reference (i, ref); i++)
> >      {
> > @@ -3824,7 +4126,7 @@ cgraph_node::verify_node (void)
> >
> >        for (e = callees; e; e = e->next_callee)
> >         {
> > -         if (!e->aux && !e->speculative)
> > +         if (!e->aux && !e->speculative && !e->specialized)
> >             {
> >               error ("edge %s->%s has no corresponding call_stmt",
> >                      identifier_to_locale (e->caller->name ()),
> > @@ -3836,7 +4138,7 @@ cgraph_node::verify_node (void)
> >         }
> >        for (e = indirect_calls; e; e = e->next_callee)
> >         {
> > -         if (!e->aux && !e->speculative)
> > +         if (!e->aux && !e->speculative && !e->specialized)
> >             {
> >               error ("an indirect edge from %s has no corresponding call_stmt",
> >                      identifier_to_locale (e->caller->name ()));
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index 4be67e3cea9..4caed96e803 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -1683,6 +1683,19 @@ public:
> >    unsigned vptr_changed : 1;
> >  };
> >
> > +class GTY (()) cgraph_specialization_info
> > +{
> > +public:
> > +  unsigned arg_idx;
> > +  int is_unsigned; /* Whether the specialization constant is unsigned.  */
> > +  union
> > +    {
> > +      HOST_WIDE_INT GTY ((tag ("0"))) sval;
> > +      unsigned HOST_WIDE_INT GTY ((tag ("1"))) uval;
> > +    }
> > +  GTY ((desc ("%1.is_unsigned"))) cst;
> > +};
> > +
> >  class GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
> >            for_user)) cgraph_edge
> >  {
> > @@ -1723,6 +1736,12 @@ public:
> >     */
> >    cgraph_edge *make_speculative (cgraph_node *n2, profile_count direct_count,
> >                                  unsigned int speculative_id = 0);
> > +  /* Mark that this edge represents a specialized call to N2.
> > +     SPEC_ARGS represent the position and values of the CALL_STMT of this edge
> > +     that are specialized in N2.  */
> > +  cgraph_edge *make_specialized (cgraph_node *n2,
> > +                                vec<cgraph_specialization_info> *spec_args,
> > +                                profile_count spec_count);
> >
> >    /* Speculative call consists of an indirect edge and one or more
> >       direct edge+ref pairs.  Speculative will expand to the following sequence:
> > @@ -1802,6 +1821,66 @@ public:
> >      gcc_unreachable ();
> >    }
> >
> > +  /* Return the first edge that represents a specialization of the CALL_STMT
> > +     of this edge if one exists or NULL otherwise.  */
> > +  cgraph_edge *first_specialized_call_target ()
> > +  {
> > +    gcc_checking_assert (specialized && callee);
> > +    for (cgraph_edge *e2 = caller->callees;
> > +        e2; e2 = e2->next_callee)
> > +      if (e2->guarded_specialization_edge_p ()
> > +         && call_stmt == e2->call_stmt
> > +         && lto_stmt_uid == e2->lto_stmt_uid)
> > +       return e2;
> > +
> > +    return NULL;
> > +  }
> > +
> > +  /* Return the next edge that represents a specialization of the CALL_STMT
> > +     of this edge if one exists or NULL otherwise.  */
> > +  cgraph_edge *next_specialized_call_target ()
> > +  {
> > +    cgraph_edge *e = this;
> > +    gcc_checking_assert (specialized && callee);
> > +
> > +    if (e->next_callee
> > +       && e->next_callee->guarded_specialization_edge_p ()
> > +       && e->next_callee->call_stmt == e->call_stmt
> > +       && e->next_callee->lto_stmt_uid == e->lto_stmt_uid)
> > +      return e->next_callee;
> > +    return NULL;
> > +  }
> > +
> > +  /* When called on any edge in a specialized call return the (unique)
> > +     edge that points to the non specialized function.  */
> > +  cgraph_edge *specialized_call_base_edge ()
> > +  {
> > +    gcc_checking_assert (specialized && callee);
> > +    for (cgraph_edge *e2 = caller->callees;
> > +        e2; e2 = e2->next_callee)
> > +      if (e2->base_specialization_edge_p ()
> > +         && call_stmt == e2->call_stmt
> > +         && lto_stmt_uid == e2->lto_stmt_uid)
> > +       return e2;
> > +
> > +    return NULL;
> > +  }
> > +
> > +  /* Return true iff this edge is part of specialized sequence and is the
> > +     original edge for which other specialization edges potentially exist.  */
> > +  bool base_specialization_edge_p () const
> > +  {
> > +    return specialized && spec_args == NULL;
> > +  }
> > +
> > +  /* Return true iff this edge is part of specialized sequence and it
> > +     represents a potential specialization target that canbe used instead
> > +     of the base edge.  */
> > +  bool guarded_specialization_edge_p () const
> > +  {
> > +    return specialized && spec_args != NULL;
> > +  }
> > +
> >    /* Speculative call edge turned out to be direct call to CALLEE_DECL.  Remove
> >       the speculative call sequence and return edge representing the call, the
> >       original EDGE can be removed and deallocated.  It is up to caller to
> > @@ -1820,6 +1899,11 @@ public:
> >    static cgraph_edge *resolve_speculation (cgraph_edge *edge,
> >                                            tree callee_decl = NULL);
> >
> > +  /* Given the base edge of a group of specialized edges remove all its
> > +     specialized edges.  Essentially this can be used to undo the descision
> > +     to specialize EDGE.  */
> > +  static void remove_specializations (cgraph_edge *edge);
> > +
> >    /* If necessary, change the function declaration in the call statement
> >       associated with edge E so that it corresponds to the edge callee.
> >       Speculations can be resolved in the process and EDGE can be removed and
> > @@ -1895,6 +1979,9 @@ public:
> >    /* Additional information about an indirect call.  Not cleared when an edge
> >       becomes direct.  */
> >    cgraph_indirect_call_info *indirect_info;
> > +  /* If this edge has a specialized function as a callee then this vector
> > +     holds the indices and values of the specialized arguments.  */
> > +  vec<cgraph_specialization_info>* GTY ((skip (""))) spec_args;
> >    void *GTY ((skip (""))) aux;
> >    /* When equal to CIF_OK, inline this call.  Otherwise, points to the
> >       explanation why function was not inlined.  */
> > @@ -1933,6 +2020,21 @@ public:
> >       Optimizers may later redirect direct call to clone, so 1) and 3)
> >       do not need to necessarily agree with destination.  */
> >    unsigned int speculative : 1;
> > +  /* Edges with SPECIALIZED flag represents calls that have additional
> > +     specialized functions that can be used instead (as a result of ipa-cp).
> > +     The final code sequence will have form:
> > +
> > +     if (specialized_arg_0 == specialized_const_0
> > +        && ...
> > +        && specialized_arg_i == specialized_const_i)
> > +       call_target.constprop.N (non_specialized_arg_0, ...);
> > +     ...
> > +     more potential specializations
> > +     ...
> > +     else
> > +       call_target ();
> > +  */
> > +  unsigned int specialized : 1;
> >    /* Set to true when caller is a constructor or destructor of polymorphic
> >       type.  */
> >    unsigned in_polymorphic_cdtor : 1;
> > diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
> > index bb4b3c5407d..9e12fa19180 100644
> > --- a/gcc/cgraphclones.cc
> > +++ b/gcc/cgraphclones.cc
> > @@ -141,6 +141,20 @@ cgraph_edge::clone (cgraph_node *n, gcall *call_stmt, unsigned stmt_uid,
> >    new_edge->can_throw_external = can_throw_external;
> >    new_edge->call_stmt_cannot_inline_p = call_stmt_cannot_inline_p;
> >    new_edge->speculative = speculative;
> > +
> > +  new_edge->specialized = specialized;
> > +  new_edge->spec_args = NULL;
> > +
> > +  if (spec_args)
> > +    {
> > +      unsigned i;
> > +      cgraph_specialization_info* spec_info;
> > +      vec_alloc (new_edge->spec_args, spec_args->length ());
> > +
> > +      FOR_EACH_VEC_ELT (*spec_args, i, spec_info)
> > +       new_edge->spec_args->quick_push (*spec_info);
> > +    }
> > +
> >    new_edge->in_polymorphic_cdtor = in_polymorphic_cdtor;
> >
> >    /* Update IPA profile.  Local profiles need no updating in original.  */
> > @@ -791,6 +805,22 @@ cgraph_node::set_call_stmt_including_clones (gimple *old_stmt,
> >                   }
> >                 indirect->speculative = false;
> >               }
> > +
> > +           if (edge->specialized && !update_speculative)
> > +             {
> > +               cgraph_edge *base = edge->specialized_call_base_edge ();
> > +
> > +               for (cgraph_edge *next, *specialized
> > +                       = edge->first_specialized_call_target ();
> > +                    specialized;
> > +                    specialized = next)
> > +                 {
> > +                   next = specialized->next_specialized_call_target ();
> > +                   specialized->specialized = false;
> > +                 }
> > +               base->specialized = false;
> > +             }
> > +
> >           }
> >         if (node->clones)
> >           node = node->clones;
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index bce3e514f65..437f2f4295b 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -1933,6 +1933,10 @@ fipa-bit-cp
> >  Common Var(flag_ipa_bit_cp) Optimization
> >  Perform interprocedural bitwise constant propagation.
> >
> > +fipa-guarded-specialization
> > +Common Var(flag_ipa_guarded_specialization) Optimization
> > +Add speculative edges for existing specialized functions.
> > +
> >  fipa-modref
> >  Common Var(flag_ipa_modref) Optimization
> >  Perform interprocedural modref analysis.
> > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> > index d2bcd5e5e69..5a24f6987ac 100644
> > --- a/gcc/ipa-cp.cc
> > +++ b/gcc/ipa-cp.cc
> > @@ -119,6 +119,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "symbol-summary.h"
> >  #include "tree-vrp.h"
> >  #include "ipa-prop.h"
> > +#include "gimple-pretty-print.h"
> >  #include "tree-pretty-print.h"
> >  #include "tree-inline.h"
> >  #include "ipa-fnsummary.h"
> > @@ -5239,6 +5240,8 @@ want_remove_some_param_p (cgraph_node *node, vec<tree> known_csts)
> >    return false;
> >  }
> >
> > +static hash_map<cgraph_node*, vec<cgraph_node*>> *available_specializations;
> > +
> >  /* Create a specialized version of NODE with known constants in KNOWN_CSTS,
> >     known contexts in KNOWN_CONTEXTS and known aggregate values in AGGVALS and
> >     redirect all edges in CALLERS to it.  */
> > @@ -5409,6 +5412,13 @@ create_specialized_node (struct cgraph_node *node,
> >    new_info->known_csts = known_csts;
> >    new_info->known_contexts = known_contexts;
> >
> > +  if (!info->ipcp_orig_node)
> > +    {
> > +      vec<cgraph_node*> &spec_nodes
> > +       = available_specializations->get_or_insert (node);
> > +      spec_nodes.safe_push (new_node);
> > +    }
> > +
> >    ipcp_discover_new_direct_edges (new_node, known_csts, known_contexts,
> >                                   aggvals);
> >
> > @@ -6538,6 +6548,96 @@ ipcp_store_vr_results (void)
> >      }
> >  }
> >
> > +/* Add new edges to the call graph to represent the available specializations
> > +   of each specialized function.  */
> > +static void
> > +add_specialized_edges (void)
> > +{
> > +  cgraph_edge *e;
> > +  cgraph_node *n, *spec_n;
> > +  tree spec_v;
> > +  unsigned i, j;
> > +
> > +  FOR_EACH_DEFINED_FUNCTION (n)
> > +    {
> > +      if (dump_file && n->callees)
> > +       fprintf (dump_file,
> > +                "Procesing function %s for specialization of edges.\n",
> > +                n->dump_name ());
> > +
> > +      if (n->ipcp_clone)
> > +       continue;
> > +
> > +      bool update = false;
> > +      for (e = n->callees; e; e = e->next_callee)
> > +       {
> > +         if (!e->callee || e->recursive_p ())
> > +           continue;
> > +
> > +         vec<cgraph_node*> *specialization_nodes
> > +           = available_specializations->get (e->callee);
> > +
> > +         if (!specialization_nodes)
> > +           continue;
> > +
> > +         FOR_EACH_VEC_ELT (*specialization_nodes, i, spec_n)
> > +           {
> > +             if (dump_file)
> > +               fprintf (dump_file,
> > +                        "Edge has available specialization %s.\n",
> > +                        spec_n->dump_name ());
> > +
> > +             ipa_node_params *spec_params = ipa_node_params_sum->get (spec_n);
> > +             vec<cgraph_specialization_info> replaced_args = vNULL;
> > +             bool failed = false;
> > +
> > +             FOR_EACH_VEC_ELT (spec_params->known_csts, j, spec_v)
> > +               {
> > +                 if (spec_v != NULL_TREE)
> > +                   {
> > +                     if (TREE_CODE (spec_v) == INTEGER_CST
> > +                         && TYPE_UNSIGNED (TREE_TYPE (spec_v))
> > +                         && tree_fits_uhwi_p (spec_v))
> > +                       {
> > +                             cgraph_specialization_info spec_info;
> > +                             spec_info.arg_idx = j;
> > +                             spec_info.is_unsigned = 1;
> > +                             spec_info.cst.uval = tree_to_uhwi (spec_v);
> > +                             replaced_args.safe_push (spec_info);
> > +                       }
> > +                     else if (TREE_CODE (spec_v) == INTEGER_CST
> > +                              && !TYPE_UNSIGNED (TREE_TYPE (spec_v))
> > +                              && tree_fits_shwi_p (spec_v))
> > +                       {
> > +                             cgraph_specialization_info spec_info;
> > +                             spec_info.arg_idx = j;
> > +                             spec_info.is_unsigned = 0;
> > +                             spec_info.cst.uval = tree_to_shwi (spec_v);
> > +                             replaced_args.safe_push (spec_info);
> > +                       }
> > +                     else
> > +                       {
> > +                         failed = true;
> > +                         break;
> > +                       }
> > +                   }
> > +               }
> > +
> > +             if (!failed && replaced_args.length () > 0)
> > +               {
> > +                 if (e->make_specialized (spec_n,
> > +                                          &replaced_args,
> > +                                          e->count.apply_scale (1, 10)))
> > +                   update = true;
> > +               }
> > +           }
> > +       }
> > +
> > +      if (update)
> > +       ipa_update_overall_fn_summary (n);
> > +    }
> > +}
> > +
> >  /* The IPCP driver.  */
> >
> >  static unsigned int
> > @@ -6551,6 +6651,7 @@ ipcp_driver (void)
> >    ipa_check_create_node_params ();
> >    ipa_check_create_edge_args ();
> >    clone_num_suffixes = new hash_map<const char *, unsigned>;
> > +  available_specializations = new hash_map<cgraph_node*, vec<cgraph_node*>>;
> >
> >    if (dump_file)
> >      {
> > @@ -6570,8 +6671,12 @@ ipcp_driver (void)
> >    ipcp_store_bits_results ();
> >    /* Store results of value range propagation.  */
> >    ipcp_store_vr_results ();
> > +  /* Add new edges for specializations.  */
> > +  if (flag_ipa_guarded_specialization)
> > +    add_specialized_edges ();
> >
> >    /* Free all IPCP structures.  */
> > +  delete available_specializations;
> >    delete clone_num_suffixes;
> >    free_toporder_info (&topo);
> >    delete edge_clone_summaries;
> > diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> > index fd3d7d6c5e8..a1f219a056e 100644
> > --- a/gcc/ipa-fnsummary.cc
> > +++ b/gcc/ipa-fnsummary.cc
> > @@ -257,6 +257,13 @@ redirect_to_unreachable (struct cgraph_edge *e)
> >      e = cgraph_edge::resolve_speculation (e, target->decl);
> >    else if (!e->callee)
> >      e = cgraph_edge::make_direct (e, target);
> > +  else if (e->base_specialization_edge_p ())
> > +    {
> > +      /* If the base edge becomes unreachable there's no reason to
> > +        keep the specializations around.  */
> > +      cgraph_edge::remove_specializations (e);
> > +      e->redirect_callee (target);
> > +    }
> >    else
> >      e->redirect_callee (target);
> >    class ipa_call_summary *es = ipa_call_summaries->get (e);
> > @@ -866,6 +873,7 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
> >           ipa_predicate new_predicate;
> >           class ipa_call_summary *es = ipa_call_summaries->get (edge);
> >           next = edge->next_callee;
> > +         bool update_next = edge->specialized;
> >
> >           if (!edge->inline_failed)
> >             inlined_to_p = true;
> > @@ -876,6 +884,9 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
> >           if (new_predicate == false && *es->predicate != false)
> >             optimized_out_size += es->call_stmt_size * ipa_fn_summary::size_scale;
> >           edge_set_predicate (edge, &new_predicate);
> > +         /* NEXT may be invalidated for specialized calls.  */
> > +         if (update_next)
> > +           next = edge->next_callee;
> >         }
> >
> >        /* Remap indirect edge predicates with the same simplification as above.
> > @@ -2825,6 +2836,29 @@ analyze_function_body (struct cgraph_node *node, bool early)
> >                                                      es, es3);
> >                     }
> >                 }
> > +             if (edge->specialized)
> > +               {
> > +                 cgraph_edge *base
> > +                       = edge->specialized_call_base_edge ();
> > +                 ipa_call_summary *es2
> > +                        = ipa_call_summaries->get_create (base);
> > +                 ipa_call_summaries->duplicate (edge, base,
> > +                                                es, es2);
> > +
> > +                 /* Edge is the first direct call.
> > +                    create and duplicate call summaries for multiple
> > +                    speculative call targets.  */
> > +                 for (cgraph_edge *specialization
> > +                        = edge->next_specialized_call_target ();
> > +                      specialization; specialization
> > +                        = specialization->next_specialized_call_target ())
> > +                   {
> > +                     ipa_call_summary *es3
> > +                       = ipa_call_summaries->get_create (specialization);
> > +                     ipa_call_summaries->duplicate (edge, specialization,
> > +                                                    es, es3);
> > +                   }
> > +               }
> >             }
> >
> >           /* TODO: When conditional jump or switch is known to be constant, but
> > @@ -3275,6 +3309,9 @@ estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
> >                              sreal *time, ipa_call_arg_values *avals,
> >                              ipa_hints *hints)
> >  {
> > +  if (e->guarded_specialization_edge_p ())
> > +    return;
> > +
> >    class ipa_call_summary *es = ipa_call_summaries->get (e);
> >    int call_size = es->call_stmt_size;
> >    int call_time = es->call_stmt_time;
> > @@ -4050,6 +4087,7 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
> >      {
> >        ipa_predicate p;
> >        next = e->next_callee;
> > +      bool update_next = e->specialized;
> >
> >        if (e->inline_failed)
> >         {
> > @@ -4073,6 +4111,10 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
> >                               params_summary, callee_info,
> >                               operand_map, offset_map, possible_truths,
> >                               toplev_predicate);
> > +
> > +      /* NEXT may be invalidated for specialized calls.  */
> > +      if (update_next)
> > +       next = e->next_callee;
> >      }
> >    for (e = node->indirect_calls; e; e = next)
> >      {
> > diff --git a/gcc/ipa-inline-transform.cc b/gcc/ipa-inline-transform.cc
> > index 07288e57c73..d0b9cd9e599 100644
> > --- a/gcc/ipa-inline-transform.cc
> > +++ b/gcc/ipa-inline-transform.cc
> > @@ -775,11 +775,22 @@ inline_transform (struct cgraph_node *node)
> >      }
> >
> >    maybe_materialize_called_clones (node);
> > +  /* Perform call statement redirection in two steps.  In the first step
> > +     only consider speculative edges and then process the rest in a separate
> > +     step.  This is required due to the potential existance of edges that are
> > +     both speculative and specialized, in which case we need to process them
> > +     in this order.  */
> >    for (e = node->callees; e; e = next)
> >      {
> >        if (!e->inline_failed)
> >         has_inline = true;
> >        next = e->next_callee;
> > +      if (e->speculative)
> > +       cgraph_edge::redirect_call_stmt_to_callee (e);
> > +    }
> > +  for (e = node->callees; e; e = next)
> > +    {
> > +      next = e->next_callee;
> >        cgraph_edge::redirect_call_stmt_to_callee (e);
> >      }
> >    node->remove_all_references ();
> > diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
> > index 14969198cde..c6cd2b92f6e 100644
> > --- a/gcc/ipa-inline.cc
> > +++ b/gcc/ipa-inline.cc
> > @@ -1185,12 +1185,17 @@ edge_badness (struct cgraph_edge *edge, bool dump)
> >    edge_time = estimate_edge_time (edge, &unspec_edge_time);
> >    hints = estimate_edge_hints (edge);
> >    gcc_checking_assert (edge_time >= 0);
> > +
> > +  /* Temporarily disabled due to the way time is calculated
> > +     with specialized edges.  */
> > +#if 0
> >    /* Check that inlined time is better, but tolerate some roundoff issues.
> >       FIXME: When callee profile drops to 0 we account calls more.  This
> >       should be fixed by never doing that.  */
> >    gcc_checking_assert ((edge_time * 100
> >                         - callee_info->time * 101).to_int () <= 0
> >                         || callee->count.ipa ().initialized_p ());
> > +#endif
> >    gcc_checking_assert (growth <= ipa_size_summaries->get (callee)->size);
> >
> >    if (dump)
> > diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
> > index 350195d86db..c8250f7b73c 100644
> > --- a/gcc/lto-cgraph.cc
> > +++ b/gcc/lto-cgraph.cc
> > @@ -271,6 +271,8 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
> >    bp_pack_value (&bp, edge->speculative_id, 16);
> >    bp_pack_value (&bp, edge->indirect_inlining_edge, 1);
> >    bp_pack_value (&bp, edge->speculative, 1);
> > +  bp_pack_value (&bp, edge->specialized, 1);
> > +  bp_pack_value (&bp, edge->spec_args != NULL, 1);
> >    bp_pack_value (&bp, edge->call_stmt_cannot_inline_p, 1);
> >    gcc_assert (!edge->call_stmt_cannot_inline_p
> >               || edge->inline_failed != CIF_BODY_NOT_AVAILABLE);
> > @@ -295,7 +297,27 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
> >        bp_pack_value (&bp, edge->indirect_info->num_speculative_call_targets,
> >                      16);
> >      }
> > +
> >    streamer_write_bitpack (&bp);
> > +
> > +  if (edge->spec_args != NULL)
> > +    {
> > +      cgraph_specialization_info *spec_info;
> > +      unsigned len = edge->spec_args->length (), i;
> > +      streamer_write_uhwi_stream (ob->main_stream, len);
> > +
> > +      FOR_EACH_VEC_ELT (*edge->spec_args, i, spec_info)
> > +       {
> > +         unsigned idx = spec_info->arg_idx;
> > +         streamer_write_uhwi_stream (ob->main_stream, idx);
> > +         streamer_write_hwi_stream (ob->main_stream, spec_info->is_unsigned);
> > +
> > +         if (spec_info->is_unsigned)
> > +           streamer_write_uhwi_stream (ob->main_stream, spec_info->cst.uval);
> > +         else
> > +           streamer_write_hwi_stream (ob->main_stream, spec_info->cst.sval);
> > +       }
> > +    }
> >  }
> >
> >  /* Return if NODE contain references from other partitions.  */
> > @@ -1517,6 +1539,8 @@ input_edge (class lto_input_block *ib, vec<symtab_node *> nodes,
> >
> >    edge->indirect_inlining_edge = bp_unpack_value (&bp, 1);
> >    edge->speculative = bp_unpack_value (&bp, 1);
> > +  edge->specialized = bp_unpack_value (&bp, 1);
> > +  bool has_edge_spec_args = bp_unpack_value (&bp, 1);
> >    edge->lto_stmt_uid = stmt_id;
> >    edge->speculative_id = speculative_id;
> >    edge->inline_failed = inline_failed;
> > @@ -1542,6 +1566,28 @@ input_edge (class lto_input_block *ib, vec<symtab_node *> nodes,
> >        edge->indirect_info->num_speculative_call_targets
> >         = bp_unpack_value (&bp, 16);
> >      }
> > +
> > +  if (has_edge_spec_args)
> > +    {
> > +      unsigned len = streamer_read_uhwi (ib);
> > +      vec_alloc (edge->spec_args, len);
> > +
> > +      for (unsigned i = 0; i < len; i++)
> > +       {
> > +         cgraph_specialization_info spec_info;
> > +         spec_info.arg_idx = streamer_read_uhwi (ib);
> > +         spec_info.is_unsigned = streamer_read_hwi (ib);
> > +
> > +         if (spec_info.is_unsigned)
> > +           spec_info.cst.uval = streamer_read_uhwi (ib);
> > +         else
> > +           spec_info.cst.sval = streamer_read_hwi (ib);
> > +
> > +         edge->spec_args->quick_push (spec_info);
> > +      }
> > +    }
> > +  else
> > +    edge->spec_args = NULL;
> >  }
> >
> >
> > diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> > index 8091ba8f13b..26657f7c017 100644
> > --- a/gcc/tree-inline.cc
> > +++ b/gcc/tree-inline.cc
> > @@ -2307,6 +2307,60 @@ copy_bb (copy_body_data *id, basic_block bb,
> >                           indirect->count
> >                              = copy_basic_block->count.apply_probability (prob);
> >                         }
> > +                     /* A specialized call is consist of multiple
> > +                        edges - a base edge and one or more specialized edges.
> > +                        Duplicate and distribute frequencies in a way similar
> > +                        to the speculative edges.  */
> > +                     else if (edge->specialized)
> > +                       {
> > +                         int n = 0;
> > +                         cgraph_edge *first
> > +                                = old_edge->first_specialized_call_target ();
> > +                         profile_count spec_cnt
> > +                                = profile_count::zero ();
> > +
> > +                         /* First figure out the distribution of counts
> > +                            so we can re-scale BB profile accordingly.  */
> > +                         for (cgraph_edge *e = first; e;
> > +                              e = e->next_specialized_call_target ())
> > +                           spec_cnt = spec_cnt + e->count;
> > +
> > +                         cgraph_edge *base
> > +                                = old_edge->specialized_call_base_edge ();
> > +                         profile_count base_cnt = base->count;
> > +
> > +                         /* Next iterate all specializations, clone them
> > +                            and update the profile.  */
> > +                         for (cgraph_edge *e = first; e;
> > +                              e = e->next_specialized_call_target ())
> > +                           {
> > +                             profile_count cnt = e->count;
> > +
> > +                             edge = e->clone (id->dst_node, call_stmt,
> > +                                              gimple_uid (stmt), num, den,
> > +                                              true);
> > +                             profile_probability prob
> > +                                = cnt.probability_in (spec_cnt
> > +                                                      + base_cnt);
> > +                             edge->count
> > +                                = copy_basic_block->count.apply_probability
> > +                                        (prob);
> > +                             n++;
> > +                           }
> > +
> > +                         /* Duplicate the base edge after all specialized
> > +                            edges cloned.  */
> > +                         base = base->clone (id->dst_node, call_stmt,
> > +                                                     gimple_uid (stmt),
> > +                                                     num, den,
> > +                                                     true);
> > +
> > +                         profile_probability prob
> > +                            = base_cnt.probability_in (spec_cnt
> > +                                                        + base_cnt);
> > +                         base->count
> > +                            = copy_basic_block->count.apply_probability (prob);
> > +                       }
> >                       else
> >                         {
> >                           edge = edge->clone (id->dst_node, call_stmt,
> > diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc
> > index 9656ce5870d..3db0070bcbf 100644
> > --- a/gcc/value-prof.cc
> > +++ b/gcc/value-prof.cc
> > @@ -42,6 +42,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gimple-pretty-print.h"
> >  #include "dumpfile.h"
> >  #include "builtins.h"
> > +#include "tree-cfg.h"
> > +#include "tree-dfa.h"
> >
> >  /* In this file value profile based optimizations are placed.  Currently the
> >     following optimizations are implemented (for more detailed descriptions
> > @@ -1434,6 +1436,218 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
> >    return dcall_stmt;
> >  }
> >
> > +/* Do transformation
> > +
> > +  if (arg_i == spec_args[y] && ...)
> > +    do call to specialized target callee
> > +  else
> > +    old call
> > + */
> > +
> > +gcall *
> > +gimple_sc (struct cgraph_edge *edg, profile_probability prob)
> > +{
> > +  /* The call statement we're modifying.  */
> > +  gcall *call_stmt = edg->call_stmt;
> > +  /* The cgraph_node of the specialized function.  */
> > +  cgraph_node *callee = edg->callee;
> > +  vec<cgraph_specialization_info> *spec_args = edg->spec_args;
> > +
> > +  /* CALL_STMT should be the call_stmt of the generic function.  */
> > +  gcc_checking_assert (edg->specialized_call_base_edge ()->call_stmt
> > +                     == call_stmt);
> > +
> > +  gcall *spec_call_stmt = NULL;
> > +  tree cond_tree = NULL_TREE;
> > +  gcond *cond_stmt = NULL;
> > +  basic_block cond_bb, dcall_bb, icall_bb, join_bb = NULL;
> > +  edge e_cd, e_ci, e_di, e_dj = NULL, e_ij;
> > +  gimple_stmt_iterator gsi;
> > +  int lp_nr, dflags;
> > +  edge e_eh, e;
> > +  edge_iterator ei;
> > +
> > +  cond_bb = gimple_bb (call_stmt);
> > +  gsi = gsi_for_stmt (call_stmt);
> > +
> > +  /* To call the specialized function we need to build a guard conditional
> > +     with the specialized arguments and constants.  */
> > +  unsigned nargs = gimple_call_num_args (call_stmt);
> > +  unsigned cur_spec = 0;
> > +  bool dump_first = true;
> > +
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, "Creating specialization guard for edge %s -> %s:\n",
> > +                        edg->caller->dump_name (), edg->callee->dump_name ());
> > +      fprintf (dump_file, "if (");
> > +    }
> > +
> > +  for (unsigned arg_idx = 0; arg_idx < nargs; arg_idx++)
> > +    {
> > +      tree cur_arg = gimple_call_arg (call_stmt, arg_idx);
> > +      bool cur_arg_specialized_p = cur_spec < spec_args->length ()
> > +       && arg_idx == (*spec_args)[cur_spec].arg_idx;
> > +
> > +      if (cur_arg_specialized_p)
> > +       {
> > +         gcc_checking_assert (!cond_stmt);
> > +
> > +         cgraph_specialization_info spec_info = (*spec_args)[cur_spec];
> > +         cur_spec++;
> > +
> > +         tree spec_v;
> > +         if (spec_info.is_unsigned)
> > +           spec_v = build_int_cstu (integer_type_node, spec_info.cst.uval);
> > +         else
> > +           spec_v = build_int_cst (integer_type_node, spec_info.cst.sval);
> > +
> > +         tree cmp_const = fold_convert (TREE_TYPE (cur_arg), spec_v);
> > +
> > +         tree cur_arg_eq_spec = build2 (EQ_EXPR, boolean_type_node,
> > +                                             cur_arg, cmp_const);
> > +
> > +         if (dump_file)
> > +           {
> > +             if (!dump_first)
> > +               fprintf (dump_file, " && ");
> > +             print_generic_expr (dump_file, cur_arg_eq_spec);
> > +             dump_first = false;
> > +           }
> > +
> > +         tree tmp1 = make_temp_ssa_name (boolean_type_node, NULL, "SPEC");
> > +         gassign* load_stmt1 = gimple_build_assign (tmp1, cur_arg_eq_spec);
> > +         gsi_insert_before (&gsi, load_stmt1, GSI_SAME_STMT);
> > +
> > +         if (!cond_tree)
> > +           cond_tree = tmp1;
> > +         else
> > +           {
> > +             tree cur_and_prev_true = fold_build2 (BIT_AND_EXPR,
> > +                                        boolean_type_node,
> > +                                        cond_tree,
> > +                                        tmp1);
> > +
> > +             tree tmp2 = make_temp_ssa_name (boolean_type_node, NULL, "SPEC");
> > +             gassign* load_stmt2
> > +               = gimple_build_assign (tmp2, cur_and_prev_true);
> > +             gsi_insert_before (&gsi, load_stmt2, GSI_SAME_STMT);
> > +             cond_tree = tmp2;
> > +           }
> > +       }
> > +    }
> > +
> > +  cond_stmt = gimple_build_cond (EQ_EXPR, cond_tree, boolean_true_node,
> > +                                NULL_TREE, NULL_TREE);
> > +
> > +  gsi_insert_before (&gsi, cond_stmt, GSI_SAME_STMT);
> > +
> > +  if (gimple_vdef (call_stmt)
> > +      && TREE_CODE (gimple_vdef (call_stmt)) == SSA_NAME)
> > +    {
> > +      unlink_stmt_vdef (call_stmt);
> > +      release_ssa_name (gimple_vdef (call_stmt));
> > +    }
> > +  gimple_set_vdef (call_stmt, NULL_TREE);
> > +  gimple_set_vuse (call_stmt, NULL_TREE);
> > +  update_stmt (call_stmt);
> > +  spec_call_stmt = as_a <gcall *> (gimple_copy (call_stmt));
> > +  gimple_call_set_fndecl (spec_call_stmt, callee->decl);
> > +  dflags = flags_from_decl_or_type (callee->decl);
> > +
> > +  if ((dflags & ECF_NORETURN) != 0
> > +      && should_remove_lhs_p (gimple_call_lhs (spec_call_stmt)))
> > +    gimple_call_set_lhs (spec_call_stmt, NULL_TREE);
> > +  gsi_insert_before (&gsi, spec_call_stmt, GSI_SAME_STMT);
> > +
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, ")\n  ");
> > +      print_gimple_stmt (dump_file, spec_call_stmt, 0);
> > +    }
> > +
> > +  e_cd = split_block (cond_bb, cond_stmt);
> > +  dcall_bb = e_cd->dest;
> > +  dcall_bb->count = cond_bb->count.apply_probability (prob);
> > +
> > +  e_di = split_block (dcall_bb, spec_call_stmt);
> > +  icall_bb = e_di->dest;
> > +  icall_bb->count = cond_bb->count - dcall_bb->count;
> > +
> > +  if (!stmt_ends_bb_p (call_stmt))
> > +    e_ij = split_block (icall_bb, call_stmt);
> > +  else
> > +    {
> > +      e_ij = find_fallthru_edge (icall_bb->succs);
> > +      if (e_ij != NULL)
> > +       {
> > +         e_ij->probability = profile_probability::always ();
> > +         e_ij = single_pred_edge (split_edge (e_ij));
> > +       }
> > +    }
> > +  if (e_ij != NULL)
> > +    {
> > +      join_bb = e_ij->dest;
> > +      join_bb->count = cond_bb->count;
> > +    }
> > +
> > +  e_cd->flags = (e_cd->flags & ~EDGE_FALLTHRU) | EDGE_TRUE_VALUE;
> > +  e_cd->probability = prob;
> > +
> > +  e_ci = make_edge (cond_bb, icall_bb, EDGE_FALSE_VALUE);
> > +  e_ci->probability = prob.invert ();
> > +
> > +  remove_edge (e_di);
> > +
> > +  if (e_ij != NULL)
> > +    {
> > +      if ((dflags & ECF_NORETURN) == 0)
> > +       {
> > +         e_dj = make_edge (dcall_bb, join_bb, EDGE_FALLTHRU);
> > +         e_dj->probability = profile_probability::always ();
> > +       }
> > +      e_ij->probability = profile_probability::always ();
> > +    }
> > +
> > +  if (gimple_call_lhs (call_stmt)
> > +      && TREE_CODE (gimple_call_lhs (call_stmt)) == SSA_NAME
> > +      && (dflags & ECF_NORETURN) == 0)
> > +    {
> > +      tree result = gimple_call_lhs (call_stmt);
> > +      gphi *phi = create_phi_node (result, join_bb);
> > +      gimple_call_set_lhs (call_stmt,
> > +                          duplicate_ssa_name (result, call_stmt));
> > +      add_phi_arg (phi, gimple_call_lhs (call_stmt), e_ij, UNKNOWN_LOCATION);
> > +      gimple_call_set_lhs (spec_call_stmt,
> > +                          duplicate_ssa_name (result, spec_call_stmt));
> > +      add_phi_arg (phi, gimple_call_lhs (spec_call_stmt), e_dj,
> > +                  UNKNOWN_LOCATION);
> > +    }
> > +
> > +  lp_nr = lookup_stmt_eh_lp (call_stmt);
> > +  if (lp_nr > 0 && stmt_could_throw_p (cfun, spec_call_stmt))
> > +    {
> > +      add_stmt_to_eh_lp (spec_call_stmt, lp_nr);
> > +    }
> > +
> > +  FOR_EACH_EDGE (e_eh, ei, icall_bb->succs)
> > +    if (e_eh->flags & (EDGE_EH | EDGE_ABNORMAL))
> > +      {
> > +       e = make_edge (dcall_bb, e_eh->dest, e_eh->flags);
> > +       e->probability = e_eh->probability;
> > +       for (gphi_iterator psi = gsi_start_phis (e_eh->dest);
> > +            !gsi_end_p (psi); gsi_next (&psi))
> > +         {
> > +           gphi *phi = psi.phi ();
> > +           SET_USE (PHI_ARG_DEF_PTR_FROM_EDGE (phi, e),
> > +                    PHI_ARG_DEF_FROM_EDGE (phi, e_eh));
> > +         }
> > +       }
> > +  if (!stmt_could_throw_p (cfun, spec_call_stmt))
> > +    gimple_purge_dead_eh_edges (dcall_bb);
> > +  return spec_call_stmt;
> > +}
> > +
> >  /* Dump info about indirect call profile.  */
> >
> >  static void
> > diff --git a/gcc/value-prof.h b/gcc/value-prof.h
> > index d852c41f33f..7d8be5920b9 100644
> > --- a/gcc/value-prof.h
> > +++ b/gcc/value-prof.h
> > @@ -89,6 +89,7 @@ void verify_histograms (void);
> >  void free_histograms (function *);
> >  void stringop_block_profile (gimple *, unsigned int *, HOST_WIDE_INT *);
> >  gcall *gimple_ic (gcall *, struct cgraph_node *, profile_probability);
> > +gcall *gimple_sc (struct cgraph_edge *, profile_probability);
> >  bool get_nth_most_common_value (gimple *stmt, const char *counter_type,
> >                                 histogram_value hist, gcov_type *value,
> >                                 gcov_type *count, gcov_type *all,
> > --
> > 2.38.1
> >

      parent reply	other threads:[~2022-12-16 16:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-13 15:37 Christoph Muellner
2022-11-14  7:37 ` Richard Biener
2022-11-14 10:35   ` Manolis Tsamis
2022-11-14 10:48     ` Richard Biener
2022-12-16 16:19   ` Manolis Tsamis [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM3yNXpOLT95T3Qbru+3=_485SVMd2ROgmwiuJ7vpnrGC56s2A@mail.gmail.com' \
    --to=manolis.tsamis@vrull.eu \
    --cc=christoph.muellner@vrull.eu \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jh@suse.cz \
    --cc=mjambor@suse.cz \
    --cc=philipp.tomsich@vrull.eu \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).