public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Enable jump threading on maths meeting hot paths
@ 2017-01-25 18:19 Jan Hubicka
  2017-01-26  9:15 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Hubicka @ 2017-01-25 18:19 UTC (permalink / raw)
  To: gcc-patches, law

Hi,
this patch modifies profitable_jump_thread_path heuristics by enabling
code expansion when the threaded path contains at least one hot path.
The basic idea is that while we do not decrease instruction count on the
non-duplicated path, we reduce number of entry edges and by this path
separation we possibly enable later optimization.

We may try to get more careful about when optimization is enabled but it is
hard to do and I don't think the number of cold paths that meet hot paths
is large enough to make this matter too much.

Bootstrapped/regtested x86_64-linux, OK?
	* tree-ssa-threadbackward.c (profitable_jump_thread_path): Adjust
	hot/cold heuristics.

	* gcc.dg/tree-ssa/pr77445-2.c: Update testcase.
Index: tree-ssa-threadbackward.c
===================================================================
--- tree-ssa-threadbackward.c	(revision 244732)
+++ tree-ssa-threadbackward.c	(working copy)
@@ -159,6 +159,7 @@ profitable_jump_thread_path (vec<basic_b
   bool threaded_through_latch = false;
   bool multiway_branch_in_path = false;
   bool threaded_multiway_branch = false;
+  bool contains_hot_bb = false;
 
   /* Count the number of instructions on the path: as these instructions
      will have to be duplicated, we will not record the path if there
@@ -168,6 +169,9 @@ profitable_jump_thread_path (vec<basic_b
     {
       basic_block bb = (*path)[j];
 
+      if (!contains_hot_bb && speed_p)
+	contains_hot_bb |= optimize_bb_for_speed_p (bb);
+
       /* Remember, blocks in the path are stored in opposite order
 	 in the PATH array.  The last entry in the array represents
 	 the block with an outgoing edge that we will redirect to the
@@ -311,7 +315,11 @@ profitable_jump_thread_path (vec<basic_b
       return NULL;
     }
 
-  if (speed_p && optimize_edge_for_speed_p (taken_edge))
+  /* Threading is profitable if the path duplicated is hot but also
+     in a case we separate cold path from hot path and permit optimization
+     of the hot path later.  Be on the agressive side here. In some testcases,
+     as in PR 78407 this leads to noticeable improvements.  */
+  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
     {
       if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
 	{
Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr77445-2.c	(revision 244746)
+++ testsuite/gcc.dg/tree-ssa/pr77445-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats -fdump-tree-thread2-details-blocks-stats -fdump-tree-thread3-details-blocks-stats -fdump-tree-thread4-details-blocks-stats" } */
 typedef enum STATES {
 	START=0,
 	INVALID,
@@ -121,3 +121,7 @@ enum STATES FMS( u8 **in , u32 *transiti
    increase much.  */
 /* { dg-final { scan-tree-dump "Jumps threaded: 1[1-9]" "thread1" } } */
 /* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */
Index: testsuite/gcc.dg/tree-ssa/threadbackward-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/threadbackward-1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/threadbackward-1.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ethread" } */
+char *c;
+int t()
+{
+  for (int i=0;i<5000;i++)
+    c[i]=i;
+}
+/* { dg-final { scan-tree-dump-times "Registering FSM jump thread" 1 "ethread"} } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on maths meeting hot paths
  2017-01-25 18:19 Enable jump threading on maths meeting hot paths Jan Hubicka
@ 2017-01-26  9:15 ` Richard Biener
  2017-01-26 10:16   ` Jan Hubicka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2017-01-26  9:15 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Jeff Law

On Wed, Jan 25, 2017 at 7:11 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch modifies profitable_jump_thread_path heuristics by enabling
> code expansion when the threaded path contains at least one hot path.
> The basic idea is that while we do not decrease instruction count on the
> non-duplicated path, we reduce number of entry edges and by this path
> separation we possibly enable later optimization.
>
> We may try to get more careful about when optimization is enabled but it is
> hard to do and I don't think the number of cold paths that meet hot paths
> is large enough to make this matter too much.
>
> Bootstrapped/regtested x86_64-linux, OK?
>         * tree-ssa-threadbackward.c (profitable_jump_thread_path): Adjust
>         hot/cold heuristics.
>
>         * gcc.dg/tree-ssa/pr77445-2.c: Update testcase.
> Index: tree-ssa-threadbackward.c
> ===================================================================
> --- tree-ssa-threadbackward.c   (revision 244732)
> +++ tree-ssa-threadbackward.c   (working copy)
> @@ -159,6 +159,7 @@ profitable_jump_thread_path (vec<basic_b
>    bool threaded_through_latch = false;
>    bool multiway_branch_in_path = false;
>    bool threaded_multiway_branch = false;
> +  bool contains_hot_bb = false;
>
>    /* Count the number of instructions on the path: as these instructions
>       will have to be duplicated, we will not record the path if there
> @@ -168,6 +169,9 @@ profitable_jump_thread_path (vec<basic_b
>      {
>        basic_block bb = (*path)[j];
>
> +      if (!contains_hot_bb && speed_p)
> +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
> +

Hmm, but you are also counting the destination of the threading here
which we will
not duplicate.  Shouldn't this be under the if (j < path_length - 1)
conditional so we
look for hot BBs we are actually duplicating only (similar restrictions apply to
path[0]?).

RIchard.

>        /* Remember, blocks in the path are stored in opposite order
>          in the PATH array.  The last entry in the array represents
>          the block with an outgoing edge that we will redirect to the
> @@ -311,7 +315,11 @@ profitable_jump_thread_path (vec<basic_b
>        return NULL;
>      }
>
> -  if (speed_p && optimize_edge_for_speed_p (taken_edge))
> +  /* Threading is profitable if the path duplicated is hot but also
> +     in a case we separate cold path from hot path and permit optimization
> +     of the hot path later.  Be on the agressive side here. In some testcases,
> +     as in PR 78407 this leads to noticeable improvements.  */
> +  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
>      {
>        if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
>         {
> Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/pr77445-2.c       (revision 244746)
> +++ testsuite/gcc.dg/tree-ssa/pr77445-2.c       (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats -fdump-tree-thread2-details-blocks-stats -fdump-tree-thread3-details-blocks-stats -fdump-tree-thread4-details-blocks-stats" } */
>  typedef enum STATES {
>         START=0,
>         INVALID,
> @@ -121,3 +121,7 @@ enum STATES FMS( u8 **in , u32 *transiti
>     increase much.  */
>  /* { dg-final { scan-tree-dump "Jumps threaded: 1[1-9]" "thread1" } } */
>  /* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */
> Index: testsuite/gcc.dg/tree-ssa/threadbackward-1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/threadbackward-1.c        (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/threadbackward-1.c        (working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ethread" } */
> +char *c;
> +int t()
> +{
> +  for (int i=0;i<5000;i++)
> +    c[i]=i;
> +}
> +/* { dg-final { scan-tree-dump-times "Registering FSM jump thread" 1 "ethread"} } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on maths meeting hot paths
  2017-01-26  9:15 ` Richard Biener
@ 2017-01-26 10:16   ` Jan Hubicka
  2017-01-26 10:39     ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Hubicka @ 2017-01-26 10:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches, Jeff Law

> > +      if (!contains_hot_bb && speed_p)
> > +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
> > +
> 
> Hmm, but you are also counting the destination of the threading here
> which we will
> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
> conditional so we
> look for hot BBs we are actually duplicating only (similar restrictions apply to
> path[0]?).

Aha, you are right.  I am re-testing updated patch (it also solves the PR)

Honza

Index: tree-ssa-threadbackward.c
===================================================================
--- tree-ssa-threadbackward.c	(revision 244732)
+++ tree-ssa-threadbackward.c	(working copy)
@@ -159,6 +159,7 @@ profitable_jump_thread_path (vec<basic_b
   bool threaded_through_latch = false;
   bool multiway_branch_in_path = false;
   bool threaded_multiway_branch = false;
+  bool contains_hot_bb = false;
 
   /* Count the number of instructions on the path: as these instructions
      will have to be duplicated, we will not record the path if there
@@ -168,6 +169,9 @@ profitable_jump_thread_path (vec<basic_b
     {
       basic_block bb = (*path)[j];
 
+      if (!contains_hot_bb && speed_p && j < path_length - 1)
+	contains_hot_bb |= optimize_bb_for_speed_p (bb);
+
       /* Remember, blocks in the path are stored in opposite order
 	 in the PATH array.  The last entry in the array represents
 	 the block with an outgoing edge that we will redirect to the
@@ -311,7 +315,11 @@ profitable_jump_thread_path (vec<basic_b
       return NULL;
     }
 
-  if (speed_p && optimize_edge_for_speed_p (taken_edge))
+  /* Threading is profitable if the path duplicated is hot but also
+     in a case we separate cold path from hot path and permit optimization
+     of the hot path later.  Be on the agressive side here. In some testcases,
+     as in PR 78407 this leads to noticeable improvements.  */
+  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
     {
       if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
 	{

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on maths meeting hot paths
  2017-01-26 10:16   ` Jan Hubicka
@ 2017-01-26 10:39     ` Richard Biener
  2017-01-26 10:45       ` Jan Hubicka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2017-01-26 10:39 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Jeff Law

On Thu, Jan 26, 2017 at 11:04 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > +      if (!contains_hot_bb && speed_p)
>> > +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
>> > +
>>
>> Hmm, but you are also counting the destination of the threading here
>> which we will
>> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
>> conditional so we
>> look for hot BBs we are actually duplicating only (similar restrictions apply to
>> path[0]?).
>
> Aha, you are right.  I am re-testing updated patch (it also solves the PR)

What about j == 0?  We don't duplicate that block either.

> Honza
>
> Index: tree-ssa-threadbackward.c
> ===================================================================
> --- tree-ssa-threadbackward.c   (revision 244732)
> +++ tree-ssa-threadbackward.c   (working copy)
> @@ -159,6 +159,7 @@ profitable_jump_thread_path (vec<basic_b
>    bool threaded_through_latch = false;
>    bool multiway_branch_in_path = false;
>    bool threaded_multiway_branch = false;
> +  bool contains_hot_bb = false;
>
>    /* Count the number of instructions on the path: as these instructions
>       will have to be duplicated, we will not record the path if there
> @@ -168,6 +169,9 @@ profitable_jump_thread_path (vec<basic_b
>      {
>        basic_block bb = (*path)[j];
>
> +      if (!contains_hot_bb && speed_p && j < path_length - 1)
> +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
> +
>        /* Remember, blocks in the path are stored in opposite order
>          in the PATH array.  The last entry in the array represents
>          the block with an outgoing edge that we will redirect to the
> @@ -311,7 +315,11 @@ profitable_jump_thread_path (vec<basic_b
>        return NULL;
>      }
>
> -  if (speed_p && optimize_edge_for_speed_p (taken_edge))
> +  /* Threading is profitable if the path duplicated is hot but also
> +     in a case we separate cold path from hot path and permit optimization
> +     of the hot path later.  Be on the agressive side here. In some testcases,
> +     as in PR 78407 this leads to noticeable improvements.  */
> +  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
>      {
>        if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
>         {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on maths meeting hot paths
  2017-01-26 10:39     ` Richard Biener
@ 2017-01-26 10:45       ` Jan Hubicka
  2017-01-27 20:51         ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Hubicka @ 2017-01-26 10:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, GCC Patches, Jeff Law

> On Thu, Jan 26, 2017 at 11:04 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> > +      if (!contains_hot_bb && speed_p)
> >> > +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
> >> > +
> >>
> >> Hmm, but you are also counting the destination of the threading here
> >> which we will
> >> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
> >> conditional so we
> >> look for hot BBs we are actually duplicating only (similar restrictions apply to
> >> path[0]?).
> >
> > Aha, you are right.  I am re-testing updated patch (it also solves the PR)
> 
> What about j == 0?  We don't duplicate that block either.

We do duplicate it, just eliminate the control flow at the end of bb
(j==0 is the last basic block of path containing the control flow being threaded)

I am updating patch actually print into details dump file what paths are considered
and what insn counts are counted.

Honza

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on maths meeting hot paths
  2017-01-26 10:45       ` Jan Hubicka
@ 2017-01-27 20:51         ` Jeff Law
  2017-02-02 13:49           ` Enable jump threading on paths " Jan Hubicka
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2017-01-27 20:51 UTC (permalink / raw)
  To: Jan Hubicka, Richard Biener; +Cc: GCC Patches

On 01/26/2017 03:39 AM, Jan Hubicka wrote:
>> On Thu, Jan 26, 2017 at 11:04 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> +      if (!contains_hot_bb && speed_p)
>>>>> +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
>>>>> +
>>>>
>>>> Hmm, but you are also counting the destination of the threading here
>>>> which we will
>>>> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
>>>> conditional so we
>>>> look for hot BBs we are actually duplicating only (similar restrictions apply to
>>>> path[0]?).
>>>
>>> Aha, you are right.  I am re-testing updated patch (it also solves the PR)
>>
>> What about j == 0?  We don't duplicate that block either.
>
> We do duplicate it, just eliminate the control flow at the end of bb
> (j==0 is the last basic block of path containing the control flow being threaded)
Right.  And that can (of course) allow other parts of the block to be 
eliminated as dead code -- we've never tried to model any of that.

>
> I am updating patch actually print into details dump file what paths are considered
> and what insn counts are counted.
Probably wise.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on paths meeting hot paths
  2017-01-27 20:51         ` Jeff Law
@ 2017-02-02 13:49           ` Jan Hubicka
  2017-02-02 14:28             ` Richard Biener
  2017-02-02 17:37             ` Jeff Law
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Hubicka @ 2017-02-02 13:49 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jan Hubicka, Richard Biener, GCC Patches

Hi,
it seems I forgot to send the updated patch. Here it is.
We now dump info like:
Checking profitability of path:  5 (16 insns) 3 (2 insns) 34 (2 insns) 33 (4 insns) 32 (1 insns) 10 (3 insns) 6
  Control statement insns: 16
  Overall: 12 insns
  Registering FSM jump thread: (6, 10) incoming edge;  (10, 32)  (32, 33)  (33, 34)  (34, 3)  (3, 5)  (5, 16) nocopy; (5, 16) 

path is printed backwards. It is how the loop process it.

Bootstrapped/regtested x86_64-linux, OK?

Honza

	PR middle-end/77445
	* gcc.dg/tree-ssa/pr77445-2.c: Update testcase to check that all
	threading is done.
	* tree-ssa-threadbackward.c (profitable_jump_thread_path): Dump
	statistics of the analyzed path; allow threading for speed when
	any of BBs along the path are optimized for speed.

Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr77445-2.c	(revision 245124)
+++ testsuite/gcc.dg/tree-ssa/pr77445-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats -fdump-tree-thread2-details-blocks-stats -fdump-tree-thread3-details-blocks-stats -fdump-tree-thread4-details-blocks-stats" } */
 typedef enum STATES {
 	START=0,
 	INVALID,
@@ -121,3 +121,7 @@ enum STATES FMS( u8 **in , u32 *transiti
    increase much.  */
 /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
 /* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */
Index: tree-ssa-threadbackward.c
===================================================================
--- tree-ssa-threadbackward.c	(revision 245124)
+++ tree-ssa-threadbackward.c	(working copy)
@@ -159,6 +159,10 @@ profitable_jump_thread_path (vec<basic_b
   bool threaded_through_latch = false;
   bool multiway_branch_in_path = false;
   bool threaded_multiway_branch = false;
+  bool contains_hot_bb = false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Checking profitability of path: ");
 
   /* Count the number of instructions on the path: as these instructions
      will have to be duplicated, we will not record the path if there
@@ -168,6 +172,8 @@ profitable_jump_thread_path (vec<basic_b
     {
       basic_block bb = (*path)[j];
 
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, " %i", bb->index);
       /* Remember, blocks in the path are stored in opposite order
 	 in the PATH array.  The last entry in the array represents
 	 the block with an outgoing edge that we will redirect to the
@@ -177,6 +183,7 @@ profitable_jump_thread_path (vec<basic_b
 	 branch.  */
       if (j < path_length - 1)
 	{
+	  int orig_n_insns = n_insns;
 	  if (bb->loop_father != loop)
 	    {
 	      path_crosses_loops = true;
@@ -219,6 +226,9 @@ profitable_jump_thread_path (vec<basic_b
 		}
 	    }
 
+
+	  if (!contains_hot_bb && speed_p && j < path_length - 1)
+	    contains_hot_bb |= optimize_bb_for_speed_p (bb);
 	  for (gsi = gsi_after_labels (bb);
 	       !gsi_end_p (gsi);
 	       gsi_next_nondebug (&gsi))
@@ -229,8 +239,10 @@ profitable_jump_thread_path (vec<basic_b
 		  && !(gimple_code (stmt) == GIMPLE_ASSIGN
 		       && gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
 		  && !is_gimple_debug (stmt))
-	        n_insns += estimate_num_insns (stmt, &eni_size_weights);
+		n_insns += estimate_num_insns (stmt, &eni_size_weights);
 	    }
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, " (%i insns)", n_insns-orig_n_insns);
 
 	  /* We do not look at the block with the threaded branch
 	     in this loop.  So if any block with a last statement that
@@ -264,7 +276,13 @@ profitable_jump_thread_path (vec<basic_b
      last block in the threading path.  So don't count it against our
      statement count.  */
 
-  n_insns-= estimate_num_insns (stmt, &eni_size_weights);
+  int stmt_insns = estimate_num_insns (stmt, &eni_size_weights);
+  n_insns-= stmt_insns;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\n  Control statement insns: %i\n"
+	     "  Overall: %i insns\n",
+	     stmt_insns, n_insns);
 
   /* We have found a constant value for ARG.  For GIMPLE_SWITCH
      and GIMPLE_GOTO, we use it as-is.  However, for a GIMPLE_COND
@@ -311,7 +329,11 @@ profitable_jump_thread_path (vec<basic_b
       return NULL;
     }
 
-  if (speed_p && optimize_edge_for_speed_p (taken_edge))
+  /* Threading is profitable if the path duplicated is hot but also
+     in a case we separate cold path from hot path and permit optimization
+     of the hot path later.  Be on the agressive side here. In some testcases,
+     as in PR 78407 this leads to noticeable improvements.  */
+  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
     {
       if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
 	{

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on paths meeting hot paths
  2017-02-02 13:49           ` Enable jump threading on paths " Jan Hubicka
@ 2017-02-02 14:28             ` Richard Biener
  2017-02-02 14:48               ` Jan Hubicka
  2017-02-02 17:37             ` Jeff Law
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2017-02-02 14:28 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, GCC Patches

On Thu, Feb 2, 2017 at 2:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> it seems I forgot to send the updated patch. Here it is.
> We now dump info like:
> Checking profitability of path:  5 (16 insns) 3 (2 insns) 34 (2 insns) 33 (4 insns) 32 (1 insns) 10 (3 insns) 6
>   Control statement insns: 16
>   Overall: 12 insns
>   Registering FSM jump thread: (6, 10) incoming edge;  (10, 32)  (32, 33)  (33, 34)  (34, 3)  (3, 5)  (5, 16) nocopy; (5, 16)
>
> path is printed backwards. It is how the loop process it.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
>         PR middle-end/77445
>         * gcc.dg/tree-ssa/pr77445-2.c: Update testcase to check that all
>         threading is done.
>         * tree-ssa-threadbackward.c (profitable_jump_thread_path): Dump
>         statistics of the analyzed path; allow threading for speed when
>         any of BBs along the path are optimized for speed.
>
> Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/pr77445-2.c       (revision 245124)
> +++ testsuite/gcc.dg/tree-ssa/pr77445-2.c       (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats -fdump-tree-thread2-details-blocks-stats -fdump-tree-thread3-details-blocks-stats -fdump-tree-thread4-details-blocks-stats" } */

You can use -fdump-tree-thread-details-blocks-stats to get all thread dumps.

>  typedef enum STATES {
>         START=0,
>         INVALID,
> @@ -121,3 +121,7 @@ enum STATES FMS( u8 **in , u32 *transiti
>     increase much.  */
>  /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
>  /* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */
> Index: tree-ssa-threadbackward.c
> ===================================================================
> --- tree-ssa-threadbackward.c   (revision 245124)
> +++ tree-ssa-threadbackward.c   (working copy)
> @@ -159,6 +159,10 @@ profitable_jump_thread_path (vec<basic_b
>    bool threaded_through_latch = false;
>    bool multiway_branch_in_path = false;
>    bool threaded_multiway_branch = false;
> +  bool contains_hot_bb = false;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Checking profitability of path: ");
>
>    /* Count the number of instructions on the path: as these instructions
>       will have to be duplicated, we will not record the path if there
> @@ -168,6 +172,8 @@ profitable_jump_thread_path (vec<basic_b
>      {
>        basic_block bb = (*path)[j];
>
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, " %i", bb->index);
>        /* Remember, blocks in the path are stored in opposite order
>          in the PATH array.  The last entry in the array represents
>          the block with an outgoing edge that we will redirect to the
> @@ -177,6 +183,7 @@ profitable_jump_thread_path (vec<basic_b
>          branch.  */
>        if (j < path_length - 1)
>         {
> +         int orig_n_insns = n_insns;
>           if (bb->loop_father != loop)
>             {
>               path_crosses_loops = true;
> @@ -219,6 +226,9 @@ profitable_jump_thread_path (vec<basic_b
>                 }
>             }
>
> +

Please do not add excess vertical space.

> +         if (!contains_hot_bb && speed_p && j < path_length - 1)

j < path_length - 1 is already checked above?

Otherwise looks ok.  If it does fix the regression - does it?

Richard.

> +           contains_hot_bb |= optimize_bb_for_speed_p (bb);
>           for (gsi = gsi_after_labels (bb);
>                !gsi_end_p (gsi);
>                gsi_next_nondebug (&gsi))
> @@ -229,8 +239,10 @@ profitable_jump_thread_path (vec<basic_b
>                   && !(gimple_code (stmt) == GIMPLE_ASSIGN
>                        && gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
>                   && !is_gimple_debug (stmt))
> -               n_insns += estimate_num_insns (stmt, &eni_size_weights);
> +               n_insns += estimate_num_insns (stmt, &eni_size_weights);
>             }
> +         if (dump_file && (dump_flags & TDF_DETAILS))
> +           fprintf (dump_file, " (%i insns)", n_insns-orig_n_insns);
>
>           /* We do not look at the block with the threaded branch
>              in this loop.  So if any block with a last statement that
> @@ -264,7 +276,13 @@ profitable_jump_thread_path (vec<basic_b
>       last block in the threading path.  So don't count it against our
>       statement count.  */
>
> -  n_insns-= estimate_num_insns (stmt, &eni_size_weights);
> +  int stmt_insns = estimate_num_insns (stmt, &eni_size_weights);
> +  n_insns-= stmt_insns;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "\n  Control statement insns: %i\n"
> +            "  Overall: %i insns\n",
> +            stmt_insns, n_insns);
>
>    /* We have found a constant value for ARG.  For GIMPLE_SWITCH
>       and GIMPLE_GOTO, we use it as-is.  However, for a GIMPLE_COND
> @@ -311,7 +329,11 @@ profitable_jump_thread_path (vec<basic_b
>        return NULL;
>      }
>
> -  if (speed_p && optimize_edge_for_speed_p (taken_edge))
> +  /* Threading is profitable if the path duplicated is hot but also
> +     in a case we separate cold path from hot path and permit optimization
> +     of the hot path later.  Be on the agressive side here. In some testcases,
> +     as in PR 78407 this leads to noticeable improvements.  */
> +  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
>      {
>        if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
>         {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on paths meeting hot paths
  2017-02-02 14:28             ` Richard Biener
@ 2017-02-02 14:48               ` Jan Hubicka
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Hubicka @ 2017-02-02 14:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, Jeff Law, GCC Patches

> > +         if (!contains_hot_bb && speed_p && j < path_length - 1)
> 
> j < path_length - 1 is already checked above?
> 
> Otherwise looks ok.  If it does fix the regression - does it?

Thanks, yes it fixes the regression.

Honza

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Enable jump threading on paths meeting hot paths
  2017-02-02 13:49           ` Enable jump threading on paths " Jan Hubicka
  2017-02-02 14:28             ` Richard Biener
@ 2017-02-02 17:37             ` Jeff Law
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff Law @ 2017-02-02 17:37 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, GCC Patches

On 02/02/2017 06:49 AM, Jan Hubicka wrote:
> Hi,
> it seems I forgot to send the updated patch. Here it is.
> We now dump info like:
> Checking profitability of path:  5 (16 insns) 3 (2 insns) 34 (2 insns) 33 (4 insns) 32 (1 insns) 10 (3 insns) 6
>   Control statement insns: 16
>   Overall: 12 insns
>   Registering FSM jump thread: (6, 10) incoming edge;  (10, 32)  (32, 33)  (33, 34)  (34, 3)  (3, 5)  (5, 16) nocopy; (5, 16)
>
> path is printed backwards. It is how the loop process it.
Yea.  I've always found that confusing, particularly since the other 
threaders go the other direction.

One thing on the todo list is to stop converting between the two 
recording formats and remove the forward/backwards threading 
entanglement in tree-ssa-threadupdate.c.

Anyway, the patch looks fine to me as well (please fix the nits Richi 
pointed out).

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-02 17:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-25 18:19 Enable jump threading on maths meeting hot paths Jan Hubicka
2017-01-26  9:15 ` Richard Biener
2017-01-26 10:16   ` Jan Hubicka
2017-01-26 10:39     ` Richard Biener
2017-01-26 10:45       ` Jan Hubicka
2017-01-27 20:51         ` Jeff Law
2017-02-02 13:49           ` Enable jump threading on paths " Jan Hubicka
2017-02-02 14:28             ` Richard Biener
2017-02-02 14:48               ` Jan Hubicka
2017-02-02 17:37             ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).