public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
       [not found] <20220802130109.AD89F385381B@sourceware.org>
@ 2022-08-05 16:35 ` Jeff Law
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2022-08-05 16:35 UTC (permalink / raw)
  To: gcc-patches



On 8/2/2022 7:00 AM, Richard Biener via Gcc-patches wrote:
> I am trying to make sense of back_threader_profitability::profitable_path_p
> and the first thing I notice is that we do
>
>    /* Threading is profitable if the path duplicated is hot but also
>       in a case we separate cold path from hot path and permit optimization
>       of the hot path later.  Be on the agressive side here. In some testcases,
>       as in PR 78407 this leads to noticeable improvements.  */
>    if (m_speed_p
>        && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
>            || contains_hot_bb))
>      {
>        if (n_insns >= param_max_fsm_thread_path_insns)
>          {
>            if (dump_file && (dump_flags & TDF_DETAILS))
>              fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>                       "the number of instructions on the path "
>                       "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
>            return false;
>          }
> ...
>      }
>    else if (!m_speed_p && n_insns > 1)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>                   "duplication of %i insns is needed and optimizing for size.\n",
>                   n_insns);
>        return false;
>      }
> ...
>    return true;
>
> thus we apply the n_insns >= param_max_fsm_thread_path_insns only
> to "hot paths".  The comment above this isn't entirely clear whether
> this is by design ("Be on the aggressive side here ...") but I think
> this is a mistake.  In fact the "hot path" check seems entirely
> useless since if the path is not hot we simply continue threading it.
I think the intent here was to allow more insns to be copied if the path 
was hot than if it was not not hot.     But the logic seems a bit 
convoluted here.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 16:31         ` Jan Hubicka
@ 2022-08-03  9:58           ` Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-03  9:58 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Aldy Hernandez, gcc-patches, MacLeod, Andrew, Jeff Law

On Tue, 2 Aug 2022, Jan Hubicka wrote:

> > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > 
> > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > > >
> > > > > Unfortunately, this was before my time, so I don't know.
> > > > >
> > > > > That being said, thanks for tackling these issues that my work
> > > > > triggered last release.  Much appreciated.
> > > >
> > > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > > >
> > > > -  else if (n_insns > 1)
> > > > +  else if (!m_speed_p && n_insns > 1)
> > > >
> > > > causing the breakage on the 12 branch.  That leads to a simpler
> > > > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> > > 
> > > Huh.  It's been a while, but that looks like a typo.  That patch was
> > > supposed to be non-behavior changing.
> > 
> > Exactly my thinking so reverting it shouldn't be a reason for
> > detailed questions.  Now, the contains_hot_bb computation is,
> > that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa
> > together with the comment and a testcase.
> > 
> > So - Honza, what was the reasoning to look at raw BB counts here
> > rather than for example the path entry edge count?
> I think the explanation is in the final comment:
>   /* Threading is profitable if the path duplicated is hot but also
>      in a case we separate cold path from hot path and permit ptimization
>      of the hot path later.  Be on the agressive side here. In some estcases,
>      as in PR 78407 this leads to noticeable improvements.  */
> 
> If you have non-threadable hot path threading out cold paths will make
> it easier to be optimized since you have fewer meets in the dataflow.

I see.  It does seem that it would be better handled by tracing the
hot path but of course threading the cold path might be cheaper.

That said, the cost modeling checks hotness of the threaded path
by looking at its exit edge (the one we can optimize statically)
but shouldn't hotness of the threaded path be determined by
looking at the path entry edge count instead?  With inconsistent
guessed profile (inconsistent now that we can statically compute
the outgoing branch on one of the paths) it's of course all GIGO, but ...

I'll leave this alone for now.  Still we now have the exceptions of
predicted never executed entry/exit to avoid diagnostic fallout.

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 11:59       ` Richard Biener
  2022-08-02 12:03         ` Aldy Hernandez
@ 2022-08-02 16:31         ` Jan Hubicka
  2022-08-03  9:58           ` Richard Biener
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Hubicka @ 2022-08-02 16:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: Aldy Hernandez, gcc-patches, MacLeod, Andrew, Jeff Law

> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> 
> > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > Unfortunately, this was before my time, so I don't know.
> > > >
> > > > That being said, thanks for tackling these issues that my work
> > > > triggered last release.  Much appreciated.
> > >
> > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > >
> > > -  else if (n_insns > 1)
> > > +  else if (!m_speed_p && n_insns > 1)
> > >
> > > causing the breakage on the 12 branch.  That leads to a simpler
> > > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> > 
> > Huh.  It's been a while, but that looks like a typo.  That patch was
> > supposed to be non-behavior changing.
> 
> Exactly my thinking so reverting it shouldn't be a reason for
> detailed questions.  Now, the contains_hot_bb computation is,
> that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa
> together with the comment and a testcase.
> 
> So - Honza, what was the reasoning to look at raw BB counts here
> rather than for example the path entry edge count?
I think the explanation is in the final comment:
  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit ptimization
     of the hot path later.  Be on the agressive side here. In some estcases,
     as in PR 78407 this leads to noticeable improvements.  */

If you have non-threadable hot path threading out cold paths will make
it easier to be optimized since you have fewer meets in the dataflow.

Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 13:29           ` Richard Biener
@ 2022-08-02 14:25             ` Aldy Hernandez
  0 siblings, 0 replies; 16+ messages in thread
From: Aldy Hernandez @ 2022-08-02 14:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, MacLeod, Andrew, Jeff Law, Jan Hubicka

Feel free to blame me for everything except the profitability code and the
generic block copier. That stuff was all there before and I mostly avoided
it.

:-)

Thanks for the work in this space.
Aldy

On Tue, Aug 2, 2022, 15:29 Richard Biener <rguenther@suse.de> wrote:

> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
>
> > On Tue, Aug 2, 2022 at 1:59 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener <rguenther@suse.de>
> wrote:
> > > > >
> > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > > > >
> > > > > > Unfortunately, this was before my time, so I don't know.
> > > > > >
> > > > > > That being said, thanks for tackling these issues that my work
> > > > > > triggered last release.  Much appreciated.
> > > > >
> > > > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > > > >
> > > > > -  else if (n_insns > 1)
> > > > > +  else if (!m_speed_p && n_insns > 1)
> > > > >
> > > > > causing the breakage on the 12 branch.  That leads to a simpler
> > > > > fix I guess.  Will re-test and also backport to GCC 12 if
> successful.
> > > >
> > > > Huh.  It's been a while, but that looks like a typo.  That patch was
> > > > supposed to be non-behavior changing.
> > >
> > > Exactly my thinking so reverting it shouldn't be a reason for
> > > detailed questions.  Now, the contains_hot_bb computation is,
> >
> > Sorry for the pain.
>
> So - actually the change was probably done on purpose (even if
> reverting - which I've now already one - caused no testsuite regressions).
> That's because the whole function is invoked N + 1 times for a path
> of length N and we definitely want to avoid using the size optimization
> heuristics when the path is not complete yet.  I think the proper
> way is to do
>
> diff --git a/gcc/tree-ssa-threadbackward.cc
> b/gcc/tree-ssa-threadbackward.cc
> index ba114e98a41..6979398ef76 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -767,7 +767,11 @@ back_threader_profitability::profitable_path_p (const
> vec<basic_block> &m_path,
>       as in PR 78407 this leads to noticeable improvements.  */
>    if (m_speed_p
>        && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> -         || contains_hot_bb))
> +         || contains_hot_bb
> +         /* Avoid using the size heuristics when not doing the final
> +            thread evaluation, we get called for each added BB
> +            to the path.  */
> +         || !taken_edge))
>      {
>        if (n_insns >= param_max_fsm_thread_path_insns)
>         {
>
> thus assume there'll be a hot BB in the future.
>
> That said, the very best fix would be to not call this function
> N + 1 times (I have a patch to call it only N times - yay), but
> instead factor out parts to be called per BB plus keeping enough
> state so we can incrementally collect info.
>
> There's more "odd" things in the backward threader, of course :/
>
> I'm looking for things applicable to the GCC 12 branch right now
> so will try the above.
>
> Richard.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 12:03         ` Aldy Hernandez
@ 2022-08-02 13:29           ` Richard Biener
  2022-08-02 14:25             ` Aldy Hernandez
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2022-08-02 13:29 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches, MacLeod, Andrew, Jeff Law, Jan Hubicka

On Tue, 2 Aug 2022, Aldy Hernandez wrote:

> On Tue, Aug 2, 2022 at 1:59 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> >
> > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > > >
> > > > > Unfortunately, this was before my time, so I don't know.
> > > > >
> > > > > That being said, thanks for tackling these issues that my work
> > > > > triggered last release.  Much appreciated.
> > > >
> > > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > > >
> > > > -  else if (n_insns > 1)
> > > > +  else if (!m_speed_p && n_insns > 1)
> > > >
> > > > causing the breakage on the 12 branch.  That leads to a simpler
> > > > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> > >
> > > Huh.  It's been a while, but that looks like a typo.  That patch was
> > > supposed to be non-behavior changing.
> >
> > Exactly my thinking so reverting it shouldn't be a reason for
> > detailed questions.  Now, the contains_hot_bb computation is,
> 
> Sorry for the pain.

So - actually the change was probably done on purpose (even if
reverting - which I've now already one - caused no testsuite regressions).
That's because the whole function is invoked N + 1 times for a path
of length N and we definitely want to avoid using the size optimization
heuristics when the path is not complete yet.  I think the proper
way is to do

diff --git a/gcc/tree-ssa-threadbackward.cc 
b/gcc/tree-ssa-threadbackward.cc
index ba114e98a41..6979398ef76 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -767,7 +767,11 @@ back_threader_profitability::profitable_path_p (const 
vec<basic_block> &m_path,
      as in PR 78407 this leads to noticeable improvements.  */
   if (m_speed_p
       && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
-         || contains_hot_bb))
+         || contains_hot_bb
+         /* Avoid using the size heuristics when not doing the final
+            thread evaluation, we get called for each added BB
+            to the path.  */
+         || !taken_edge))
     {
       if (n_insns >= param_max_fsm_thread_path_insns)
        {

thus assume there'll be a hot BB in the future.

That said, the very best fix would be to not call this function
N + 1 times (I have a patch to call it only N times - yay), but
instead factor out parts to be called per BB plus keeping enough
state so we can incrementally collect info.

There's more "odd" things in the backward threader, of course :/

I'm looking for things applicable to the GCC 12 branch right now
so will try the above.

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
@ 2022-08-02 13:00 Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-02 13:00 UTC (permalink / raw)
  To: gcc-patches

I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
          || contains_hot_bb))
    {
      if (n_insns >= param_max_fsm_thread_path_insns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                     "the number of instructions on the path "
                     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
          return false;
        }
...
    }
  else if (!m_speed_p && n_insns > 1)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                 "duplication of %i insns is needed and optimizing for size.\n",
                 n_insns);
      return false;
    }
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

This was caused by r12-324-g69e5544210e3c0 and the following simply
reverts the offending change.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

	* tree-ssa-threadbackwards.cc
	(back_threader_profitability::profitable_path_p): Apply
	size constraints to all paths again.
---
 gcc/tree-ssa-threadbackward.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..ba114e98a41 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -794,7 +794,7 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	  return false;
 	}
     }
-  else if (!m_speed_p && n_insns > 1)
+  else if (n_insns > 1)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-- 
2.35.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
@ 2022-08-02 13:00 Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-02 13:00 UTC (permalink / raw)
  To: gcc-patches

I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
          || contains_hot_bb))
    {
      if (n_insns >= param_max_fsm_thread_path_insns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                     "the number of instructions on the path "
                     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
          return false;
        }
...
    }
  else if (!m_speed_p && n_insns > 1)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                 "duplication of %i insns is needed and optimizing for size.\n",
                 n_insns);
      return false;
    }
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

This was caused by r12-324-g69e5544210e3c0 and the following simply
reverts the offending change.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

	* tree-ssa-threadbackwards.cc
	(back_threader_profitability::profitable_path_p): Apply
	size constraints to all paths again.
---
 gcc/tree-ssa-threadbackward.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..ba114e98a41 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -794,7 +794,7 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	  return false;
 	}
     }
-  else if (!m_speed_p && n_insns > 1)
+  else if (n_insns > 1)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-- 
2.35.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
@ 2022-08-02 13:00 Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-02 13:00 UTC (permalink / raw)
  To: gcc-patches

I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
          || contains_hot_bb))
    {
      if (n_insns >= param_max_fsm_thread_path_insns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                     "the number of instructions on the path "
                     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
          return false;
        }
...
    }
  else if (!m_speed_p && n_insns > 1)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                 "duplication of %i insns is needed and optimizing for size.\n",
                 n_insns);
      return false;
    }
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

This was caused by r12-324-g69e5544210e3c0 and the following simply
reverts the offending change.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

	* tree-ssa-threadbackwards.cc
	(back_threader_profitability::profitable_path_p): Apply
	size constraints to all paths again.
---
 gcc/tree-ssa-threadbackward.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..ba114e98a41 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -794,7 +794,7 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	  return false;
 	}
     }
-  else if (!m_speed_p && n_insns > 1)
+  else if (n_insns > 1)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-- 
2.35.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 11:59       ` Richard Biener
@ 2022-08-02 12:03         ` Aldy Hernandez
  2022-08-02 13:29           ` Richard Biener
  2022-08-02 16:31         ` Jan Hubicka
  1 sibling, 1 reply; 16+ messages in thread
From: Aldy Hernandez @ 2022-08-02 12:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, MacLeod, Andrew, Jeff Law, Jan Hubicka

On Tue, Aug 2, 2022 at 1:59 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
>
> > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > Unfortunately, this was before my time, so I don't know.
> > > >
> > > > That being said, thanks for tackling these issues that my work
> > > > triggered last release.  Much appreciated.
> > >
> > > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> > >
> > > -  else if (n_insns > 1)
> > > +  else if (!m_speed_p && n_insns > 1)
> > >
> > > causing the breakage on the 12 branch.  That leads to a simpler
> > > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> >
> > Huh.  It's been a while, but that looks like a typo.  That patch was
> > supposed to be non-behavior changing.
>
> Exactly my thinking so reverting it shouldn't be a reason for
> detailed questions.  Now, the contains_hot_bb computation is,

Sorry for the pain.

Thanks for taking care of this.
Aldy

> that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa
> together with the comment and a testcase.
>
> So - Honza, what was the reasoning to look at raw BB counts here
> rather than for example the path entry edge count?
>
> Richard.
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 11:54     ` Aldy Hernandez
@ 2022-08-02 11:59       ` Richard Biener
  2022-08-02 12:03         ` Aldy Hernandez
  2022-08-02 16:31         ` Jan Hubicka
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-02 11:59 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches, MacLeod, Andrew, Jeff Law, Jan Hubicka

On Tue, 2 Aug 2022, Aldy Hernandez wrote:

> On Tue, Aug 2, 2022 at 1:45 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> >
> > > Unfortunately, this was before my time, so I don't know.
> > >
> > > That being said, thanks for tackling these issues that my work
> > > triggered last release.  Much appreciated.
> >
> > Ah.  But it was your r12-324-g69e5544210e3c0 that did
> >
> > -  else if (n_insns > 1)
> > +  else if (!m_speed_p && n_insns > 1)
> >
> > causing the breakage on the 12 branch.  That leads to a simpler
> > fix I guess.  Will re-test and also backport to GCC 12 if successful.
> 
> Huh.  It's been a while, but that looks like a typo.  That patch was
> supposed to be non-behavior changing.

Exactly my thinking so reverting it shouldn't be a reason for
detailed questions.  Now, the contains_hot_bb computation is,
that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa
together with the comment and a testcase.

So - Honza, what was the reasoning to look at raw BB counts here
rather than for example the path entry edge count?

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 11:45   ` Richard Biener
@ 2022-08-02 11:54     ` Aldy Hernandez
  2022-08-02 11:59       ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Aldy Hernandez @ 2022-08-02 11:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, MacLeod, Andrew, Jeff Law

On Tue, Aug 2, 2022 at 1:45 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
>
> > Unfortunately, this was before my time, so I don't know.
> >
> > That being said, thanks for tackling these issues that my work
> > triggered last release.  Much appreciated.
>
> Ah.  But it was your r12-324-g69e5544210e3c0 that did
>
> -  else if (n_insns > 1)
> +  else if (!m_speed_p && n_insns > 1)
>
> causing the breakage on the 12 branch.  That leads to a simpler
> fix I guess.  Will re-test and also backport to GCC 12 if successful.

Huh.  It's been a while, but that looks like a typo.  That patch was
supposed to be non-behavior changing.

Aldy


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
  2022-08-02 10:21 ` Aldy Hernandez
@ 2022-08-02 11:45   ` Richard Biener
  2022-08-02 11:54     ` Aldy Hernandez
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2022-08-02 11:45 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches, MacLeod, Andrew, Jeff Law

On Tue, 2 Aug 2022, Aldy Hernandez wrote:

> Unfortunately, this was before my time, so I don't know.
> 
> That being said, thanks for tackling these issues that my work
> triggered last release.  Much appreciated.

Ah.  But it was your r12-324-g69e5544210e3c0 that did

-  else if (n_insns > 1)
+  else if (!m_speed_p && n_insns > 1)

causing the breakage on the 12 branch.  That leads to a simpler
fix I guess.  Will re-test and also backport to GCC 12 if successful.

Richard.

> Aldy
> 
> On Tue, Aug 2, 2022 at 10:41 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > I am trying to make sense of back_threader_profitability::profitable_path_p
> > and the first thing I notice is that we do
> >
> >   /* Threading is profitable if the path duplicated is hot but also
> >      in a case we separate cold path from hot path and permit optimization
> >      of the hot path later.  Be on the agressive side here. In some testcases,
> >      as in PR 78407 this leads to noticeable improvements.  */
> >   if (m_speed_p
> >       && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> >           || contains_hot_bb))
> >     {
> >       if (n_insns >= param_max_fsm_thread_path_insns)
> >         {
> >           if (dump_file && (dump_flags & TDF_DETAILS))
> >             fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> >                      "the number of instructions on the path "
> >                      "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> >           return false;
> >         }
> > ...
> >     }
> >   else if (!m_speed_p && n_insns > 1)
> >     {
> >       if (dump_file && (dump_flags & TDF_DETAILS))
> >         fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> >                  "duplication of %i insns is needed and optimizing for size.\n",
> >                  n_insns);
> >       return false;
> >     }
> > ...
> >   return true;
> >
> > thus we apply the n_insns >= param_max_fsm_thread_path_insns only
> > to "hot paths".  The comment above this isn't entirely clear whether
> > this is by design ("Be on the aggressive side here ...") but I think
> > this is a mistake.  In fact the "hot path" check seems entirely
> > useless since if the path is not hot we simply continue threading it.
> >
> > I have my reservations about how we compute hot (contains_hot_bb
> > in particular), but the following first refactors the above to apply
> > the size constraints always and then _not_ threading if the path
> > is not considered hot (but allow threading if n_insns <= 1 as with
> > the !m_speed_p case).
> >
> > As for contains_hot_bb - it might be that this consciously captures
> > the case where we separate a cold from a hot path even though the
> > threaded path itself is cold.  Consider
> >
> >    A
> >   / \ (unlikely)
> >  B   C
> >   \ /
> >    D
> >   / \
> >  ..  abort()
> >
> > when we want to thread A -> B -> D -> abort () and A (or D)
> > has a hot BB count then we have contains_hot_bb even though
> > the counts on the path itself are small.  In fact when we
> > thread the only relevant count for the resulting threaded
> > path is the count of A with the A->C probability applied
> > (that should also the count to subtract from the blocks
> > we copied - sth missing for the backwards threader as well).
> >
> > So I'm wondering how the logic computing contains_hot_bb
> > relates to the above comment before the costing block.
> > Anyone remembers?
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >
> >         * tree-ssa-threadbackwards.cc
> >         (back_threader_profitability::profitable_path_p): Apply
> >         size constraints to all paths.  Do not thread cold paths.
> > ---
> >  gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++-------------
> >  1 file changed, 33 insertions(+), 20 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> > index 0519f2a8c4b..a887568032b 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
> >         *creates_irreducible_loop = true;
> >      }
> >
> > -  /* Threading is profitable if the path duplicated is hot but also
> > +  const int max_cold_insns = 1;
> > +  if (!m_speed_p && n_insns > max_cold_insns)
> > +    {
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > +                "duplication of %i insns is needed and optimizing for size.\n",
> > +                n_insns);
> > +      return false;
> > +    }
> > +  else if (n_insns >= param_max_fsm_thread_path_insns)
> > +    {
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > +                "the number of instructions on the path "
> > +                "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> > +      return false;
> > +    }
> > +
> > +  /* Threading is profitable if the path duplicated is small or hot but also
> >       in a case we separate cold path from hot path and permit optimization
> >       of the hot path later.  Be on the agressive side here. In some testcases,
> >       as in PR 78407 this leads to noticeable improvements.  */
> > -  if (m_speed_p
> > -      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> > -         || contains_hot_bb))
> > +  if (!(n_insns <= max_cold_insns
> > +       || contains_hot_bb
> > +       || (taken_edge && optimize_edge_for_speed_p (taken_edge))))
> > +    {
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > +                "path is not profitable to thread.\n");
> > +      return false;
> > +    }
> > +
> > +  /* If the path is not small to duplicate and either the entry or
> > +     the final destination is probably never executed avoid separating
> > +     the cold path since that can lead to spurious diagnostics.  */
> > +  if (n_insns > max_cold_insns)
> >      {
> > -      if (n_insns >= param_max_fsm_thread_path_insns)
> > -       {
> > -         if (dump_file && (dump_flags & TDF_DETAILS))
> > -           fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > -                    "the number of instructions on the path "
> > -                    "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> > -         return false;
> > -       }
> >        if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge))
> >         {
> >           if (dump_file && (dump_flags & TDF_DETAILS))
> > @@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
> >           return false;
> >         }
> >      }
> > -  else if (!m_speed_p && n_insns > 1)
> > -    {
> > -      if (dump_file && (dump_flags & TDF_DETAILS))
> > -       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > -                "duplication of %i insns is needed and optimizing for size.\n",
> > -                n_insns);
> > -      return false;
> > -    }
> >
> >    /* We avoid creating irreducible inner loops unless we thread through
> >       a multiway branch, in which case we have deemed it worth losing
> > --
> > 2.35.3
> >
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
       [not found] <04261.122080204410800126@us-mta-529.us.mimecast.lan>
@ 2022-08-02 10:21 ` Aldy Hernandez
  2022-08-02 11:45   ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Aldy Hernandez @ 2022-08-02 10:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, MacLeod, Andrew, Jeff Law

Unfortunately, this was before my time, so I don't know.

That being said, thanks for tackling these issues that my work
triggered last release.  Much appreciated.

Aldy

On Tue, Aug 2, 2022 at 10:41 AM Richard Biener <rguenther@suse.de> wrote:
>
> I am trying to make sense of back_threader_profitability::profitable_path_p
> and the first thing I notice is that we do
>
>   /* Threading is profitable if the path duplicated is hot but also
>      in a case we separate cold path from hot path and permit optimization
>      of the hot path later.  Be on the agressive side here. In some testcases,
>      as in PR 78407 this leads to noticeable improvements.  */
>   if (m_speed_p
>       && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
>           || contains_hot_bb))
>     {
>       if (n_insns >= param_max_fsm_thread_path_insns)
>         {
>           if (dump_file && (dump_flags & TDF_DETAILS))
>             fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>                      "the number of instructions on the path "
>                      "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
>           return false;
>         }
> ...
>     }
>   else if (!m_speed_p && n_insns > 1)
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>                  "duplication of %i insns is needed and optimizing for size.\n",
>                  n_insns);
>       return false;
>     }
> ...
>   return true;
>
> thus we apply the n_insns >= param_max_fsm_thread_path_insns only
> to "hot paths".  The comment above this isn't entirely clear whether
> this is by design ("Be on the aggressive side here ...") but I think
> this is a mistake.  In fact the "hot path" check seems entirely
> useless since if the path is not hot we simply continue threading it.
>
> I have my reservations about how we compute hot (contains_hot_bb
> in particular), but the following first refactors the above to apply
> the size constraints always and then _not_ threading if the path
> is not considered hot (but allow threading if n_insns <= 1 as with
> the !m_speed_p case).
>
> As for contains_hot_bb - it might be that this consciously captures
> the case where we separate a cold from a hot path even though the
> threaded path itself is cold.  Consider
>
>    A
>   / \ (unlikely)
>  B   C
>   \ /
>    D
>   / \
>  ..  abort()
>
> when we want to thread A -> B -> D -> abort () and A (or D)
> has a hot BB count then we have contains_hot_bb even though
> the counts on the path itself are small.  In fact when we
> thread the only relevant count for the resulting threaded
> path is the count of A with the A->C probability applied
> (that should also the count to subtract from the blocks
> we copied - sth missing for the backwards threader as well).
>
> So I'm wondering how the logic computing contains_hot_bb
> relates to the above comment before the costing block.
> Anyone remembers?
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
>         * tree-ssa-threadbackwards.cc
>         (back_threader_profitability::profitable_path_p): Apply
>         size constraints to all paths.  Do not thread cold paths.
> ---
>  gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++-------------
>  1 file changed, 33 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 0519f2a8c4b..a887568032b 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
>         *creates_irreducible_loop = true;
>      }
>
> -  /* Threading is profitable if the path duplicated is hot but also
> +  const int max_cold_insns = 1;
> +  if (!m_speed_p && n_insns > max_cold_insns)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +                "duplication of %i insns is needed and optimizing for size.\n",
> +                n_insns);
> +      return false;
> +    }
> +  else if (n_insns >= param_max_fsm_thread_path_insns)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +                "the number of instructions on the path "
> +                "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> +      return false;
> +    }
> +
> +  /* Threading is profitable if the path duplicated is small or hot but also
>       in a case we separate cold path from hot path and permit optimization
>       of the hot path later.  Be on the agressive side here. In some testcases,
>       as in PR 78407 this leads to noticeable improvements.  */
> -  if (m_speed_p
> -      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> -         || contains_hot_bb))
> +  if (!(n_insns <= max_cold_insns
> +       || contains_hot_bb
> +       || (taken_edge && optimize_edge_for_speed_p (taken_edge))))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +                "path is not profitable to thread.\n");
> +      return false;
> +    }
> +
> +  /* If the path is not small to duplicate and either the entry or
> +     the final destination is probably never executed avoid separating
> +     the cold path since that can lead to spurious diagnostics.  */
> +  if (n_insns > max_cold_insns)
>      {
> -      if (n_insns >= param_max_fsm_thread_path_insns)
> -       {
> -         if (dump_file && (dump_flags & TDF_DETAILS))
> -           fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> -                    "the number of instructions on the path "
> -                    "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> -         return false;
> -       }
>        if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge))
>         {
>           if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
>           return false;
>         }
>      }
> -  else if (!m_speed_p && n_insns > 1)
> -    {
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> -                "duplication of %i insns is needed and optimizing for size.\n",
> -                n_insns);
> -      return false;
> -    }
>
>    /* We avoid creating irreducible inner loops unless we thread through
>       a multiway branch, in which case we have deemed it worth losing
> --
> 2.35.3
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
@ 2022-08-02  8:41 Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-02  8:41 UTC (permalink / raw)
  To: gcc-patches

I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
          || contains_hot_bb))
    {
      if (n_insns >= param_max_fsm_thread_path_insns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                     "the number of instructions on the path "
                     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
          return false;
        }
...
    }
  else if (!m_speed_p && n_insns > 1)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                 "duplication of %i insns is needed and optimizing for size.\n",
                 n_insns);
      return false;
    }
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

I have my reservations about how we compute hot (contains_hot_bb
in particular), but the following first refactors the above to apply
the size constraints always and then _not_ threading if the path
is not considered hot (but allow threading if n_insns <= 1 as with
the !m_speed_p case).

As for contains_hot_bb - it might be that this consciously captures
the case where we separate a cold from a hot path even though the
threaded path itself is cold.  Consider

   A
  / \ (unlikely)
 B   C
  \ /
   D
  / \
 ..  abort()

when we want to thread A -> B -> D -> abort () and A (or D)
has a hot BB count then we have contains_hot_bb even though
the counts on the path itself are small.  In fact when we
thread the only relevant count for the resulting threaded
path is the count of A with the A->C probability applied
(that should also the count to subtract from the blocks
we copied - sth missing for the backwards threader as well).

So I'm wondering how the logic computing contains_hot_bb
relates to the above comment before the costing block.
Anyone remembers?

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

	* tree-ssa-threadbackwards.cc
	(back_threader_profitability::profitable_path_p): Apply
	size constraints to all paths.  Do not thread cold paths.
---
 gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..a887568032b 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	*creates_irreducible_loop = true;
     }
 
-  /* Threading is profitable if the path duplicated is hot but also
+  const int max_cold_insns = 1;
+  if (!m_speed_p && n_insns > max_cold_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "duplication of %i insns is needed and optimizing for size.\n",
+		 n_insns);
+      return false;
+    }
+  else if (n_insns >= param_max_fsm_thread_path_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "the number of instructions on the path "
+		 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
+      return false;
+    }
+
+  /* Threading is profitable if the path duplicated is small or hot but also
      in a case we separate cold path from hot path and permit optimization
      of the hot path later.  Be on the agressive side here. In some testcases,
      as in PR 78407 this leads to noticeable improvements.  */
-  if (m_speed_p
-      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
-	  || contains_hot_bb))
+  if (!(n_insns <= max_cold_insns
+	|| contains_hot_bb
+	|| (taken_edge && optimize_edge_for_speed_p (taken_edge))))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "path is not profitable to thread.\n");
+      return false;
+    }
+
+  /* If the path is not small to duplicate and either the entry or
+     the final destination is probably never executed avoid separating
+     the cold path since that can lead to spurious diagnostics.  */
+  if (n_insns > max_cold_insns)
     {
-      if (n_insns >= param_max_fsm_thread_path_insns)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		     "the number of instructions on the path "
-		     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
-	  return false;
-	}
       if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	  return false;
 	}
     }
-  else if (!m_speed_p && n_insns > 1)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		 "duplication of %i insns is needed and optimizing for size.\n",
-		 n_insns);
-      return false;
-    }
 
   /* We avoid creating irreducible inner loops unless we thread through
      a multiway branch, in which case we have deemed it worth losing
-- 
2.35.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
@ 2022-08-02  8:41 Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-02  8:41 UTC (permalink / raw)
  To: gcc-patches

I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
          || contains_hot_bb))
    {
      if (n_insns >= param_max_fsm_thread_path_insns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                     "the number of instructions on the path "
                     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
          return false;
        }
...
    }
  else if (!m_speed_p && n_insns > 1)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                 "duplication of %i insns is needed and optimizing for size.\n",
                 n_insns);
      return false;
    }
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

I have my reservations about how we compute hot (contains_hot_bb
in particular), but the following first refactors the above to apply
the size constraints always and then _not_ threading if the path
is not considered hot (but allow threading if n_insns <= 1 as with
the !m_speed_p case).

As for contains_hot_bb - it might be that this consciously captures
the case where we separate a cold from a hot path even though the
threaded path itself is cold.  Consider

   A
  / \ (unlikely)
 B   C
  \ /
   D
  / \
 ..  abort()

when we want to thread A -> B -> D -> abort () and A (or D)
has a hot BB count then we have contains_hot_bb even though
the counts on the path itself are small.  In fact when we
thread the only relevant count for the resulting threaded
path is the count of A with the A->C probability applied
(that should also the count to subtract from the blocks
we copied - sth missing for the backwards threader as well).

So I'm wondering how the logic computing contains_hot_bb
relates to the above comment before the costing block.
Anyone remembers?

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

	* tree-ssa-threadbackwards.cc
	(back_threader_profitability::profitable_path_p): Apply
	size constraints to all paths.  Do not thread cold paths.
---
 gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..a887568032b 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	*creates_irreducible_loop = true;
     }
 
-  /* Threading is profitable if the path duplicated is hot but also
+  const int max_cold_insns = 1;
+  if (!m_speed_p && n_insns > max_cold_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "duplication of %i insns is needed and optimizing for size.\n",
+		 n_insns);
+      return false;
+    }
+  else if (n_insns >= param_max_fsm_thread_path_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "the number of instructions on the path "
+		 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
+      return false;
+    }
+
+  /* Threading is profitable if the path duplicated is small or hot but also
      in a case we separate cold path from hot path and permit optimization
      of the hot path later.  Be on the agressive side here. In some testcases,
      as in PR 78407 this leads to noticeable improvements.  */
-  if (m_speed_p
-      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
-	  || contains_hot_bb))
+  if (!(n_insns <= max_cold_insns
+	|| contains_hot_bb
+	|| (taken_edge && optimize_edge_for_speed_p (taken_edge))))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "path is not profitable to thread.\n");
+      return false;
+    }
+
+  /* If the path is not small to duplicate and either the entry or
+     the final destination is probably never executed avoid separating
+     the cold path since that can lead to spurious diagnostics.  */
+  if (n_insns > max_cold_insns)
     {
-      if (n_insns >= param_max_fsm_thread_path_insns)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		     "the number of instructions on the path "
-		     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
-	  return false;
-	}
       if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	  return false;
 	}
     }
-  else if (!m_speed_p && n_insns > 1)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		 "duplication of %i insns is needed and optimizing for size.\n",
-		 n_insns);
-      return false;
-    }
 
   /* We avoid creating irreducible inner loops unless we thread through
      a multiway branch, in which case we have deemed it worth losing
-- 
2.35.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
@ 2022-08-02  8:41 Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2022-08-02  8:41 UTC (permalink / raw)
  To: gcc-patches

I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */
  if (m_speed_p
      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
          || contains_hot_bb))
    {
      if (n_insns >= param_max_fsm_thread_path_insns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                     "the number of instructions on the path "
                     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
          return false;
        }
...
    }
  else if (!m_speed_p && n_insns > 1)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
                 "duplication of %i insns is needed and optimizing for size.\n",
                 n_insns);
      return false;
    }
...
  return true;

thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths".  The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake.  In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.

I have my reservations about how we compute hot (contains_hot_bb
in particular), but the following first refactors the above to apply
the size constraints always and then _not_ threading if the path
is not considered hot (but allow threading if n_insns <= 1 as with
the !m_speed_p case).

As for contains_hot_bb - it might be that this consciously captures
the case where we separate a cold from a hot path even though the
threaded path itself is cold.  Consider

   A
  / \ (unlikely)
 B   C
  \ /
   D
  / \
 ..  abort()

when we want to thread A -> B -> D -> abort () and A (or D)
has a hot BB count then we have contains_hot_bb even though
the counts on the path itself are small.  In fact when we
thread the only relevant count for the resulting threaded
path is the count of A with the A->C probability applied
(that should also the count to subtract from the blocks
we copied - sth missing for the backwards threader as well).

So I'm wondering how the logic computing contains_hot_bb
relates to the above comment before the costing block.
Anyone remembers?

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

	* tree-ssa-threadbackwards.cc
	(back_threader_profitability::profitable_path_p): Apply
	size constraints to all paths.  Do not thread cold paths.
---
 gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0519f2a8c4b..a887568032b 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	*creates_irreducible_loop = true;
     }
 
-  /* Threading is profitable if the path duplicated is hot but also
+  const int max_cold_insns = 1;
+  if (!m_speed_p && n_insns > max_cold_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "duplication of %i insns is needed and optimizing for size.\n",
+		 n_insns);
+      return false;
+    }
+  else if (n_insns >= param_max_fsm_thread_path_insns)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "the number of instructions on the path "
+		 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
+      return false;
+    }
+
+  /* Threading is profitable if the path duplicated is small or hot but also
      in a case we separate cold path from hot path and permit optimization
      of the hot path later.  Be on the agressive side here. In some testcases,
      as in PR 78407 this leads to noticeable improvements.  */
-  if (m_speed_p
-      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
-	  || contains_hot_bb))
+  if (!(n_insns <= max_cold_insns
+	|| contains_hot_bb
+	|| (taken_edge && optimize_edge_for_speed_p (taken_edge))))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+		 "path is not profitable to thread.\n");
+      return false;
+    }
+
+  /* If the path is not small to duplicate and either the entry or
+     the final destination is probably never executed avoid separating
+     the cold path since that can lead to spurious diagnostics.  */
+  if (n_insns > max_cold_insns)
     {
-      if (n_insns >= param_max_fsm_thread_path_insns)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		     "the number of instructions on the path "
-		     "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
-	  return false;
-	}
       if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	  return false;
 	}
     }
-  else if (!m_speed_p && n_insns > 1)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-		 "duplication of %i insns is needed and optimizing for size.\n",
-		 n_insns);
-      return false;
-    }
 
   /* We avoid creating irreducible inner loops unless we thread through
      a multiway branch, in which case we have deemed it worth losing
-- 
2.35.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-08-05 16:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220802130109.AD89F385381B@sourceware.org>
2022-08-05 16:35 ` [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader Jeff Law
2022-08-02 13:00 Richard Biener
  -- strict thread matches above, loose matches on Subject: below --
2022-08-02 13:00 Richard Biener
2022-08-02 13:00 Richard Biener
     [not found] <04261.122080204410800126@us-mta-529.us.mimecast.lan>
2022-08-02 10:21 ` Aldy Hernandez
2022-08-02 11:45   ` Richard Biener
2022-08-02 11:54     ` Aldy Hernandez
2022-08-02 11:59       ` Richard Biener
2022-08-02 12:03         ` Aldy Hernandez
2022-08-02 13:29           ` Richard Biener
2022-08-02 14:25             ` Aldy Hernandez
2022-08-02 16:31         ` Jan Hubicka
2022-08-03  9:58           ` Richard Biener
2022-08-02  8:41 Richard Biener
2022-08-02  8:41 Richard Biener
2022-08-02  8:41 Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).