public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Improve jump threading #5 of N
@ 2011-06-16  5:49 Jeff Law
  2011-06-16  7:57 ` Richard Guenther
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jeff Law @ 2011-06-16  5:49 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4116 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1




So as I've mentioned previously, I've been working on a relatively small
change to the jump threading code which would allow it to duplicate a
join block when doing so allows us to thread through a successor of the
join block.  This is expected to be the last extension to the existing
jump threading code.

This was mainly done to improve our ability to eliminate unexecutable
paths through the CFG which helps avoid false positives with certain
warnings.  It also has the nice property that it eliminates conditionals
and often results in further optimization of nearby code.

To help evaluate the code generation improvements of this change I built
gcc-4.6 (checking enabled) using a compiler with and without this
improvement.  I then used the 4.6 cc1s to compile a bunch of .i files
under the watchful eye of valgrind.

                        without patch    with patch
Total cbranches          231072754220     229626578262
Total ibranches:	   7687404775       7686994201


cbranches shows the number of dynamically executed conditional branches.
 As you can see, with the patch we eliminated about .625% of the runtime
conditional branches.  Not bad at all.  We eliminated a trivial number
of indirect branches.  In all we eliminated 1446595532 runtime branches.

                        without patch    with patch
Total instructions:     1254106133886    1247718004946


I was expecting a reduction in the total number of instructions
executed, but was quite surprised at the actual data.  We end up
eliminating 6388128940 dynamic instructions --- which means that for
every dynamic branch eliminated, on average we were able to eliminate an
additional 3.4 dynamic instructions -- that's a huge secondary effect.
Clearly improving jump threading in this way is allowing the rest of the
optimizers to do a better job.

Anyway, attached is the patch.  Again, the concept is pretty simple,
when we have a join block which can not be threaded, we peek at the
successors of the join block and see if one or more of them can be threaded.

If so, we make a duplicate of the join block, wire the incoming edge we
were originally trying to thread to reach the duplicate rather than the
original join block.  We then wire the outgoing edge from the duplicate
to the final jump thread target.

So if given a CFG like this (from  a routine in cfgexpand):

           A
          / \
         B   C
         |  / \
         | D   E
         | |  / \
         | | F   G
          \| |
            \|
             H
            / \
           I   J
          / \
         L   M
         |  / \
         | N   O
         | |  / \
         | | P   Q
          \| |
            \|
             R


As it turns out some blocks have the same condition (A,I), (C,M), (E,O).
But because of the merge block H, no threading is possible.  What we
want to do is make 3 copies of H, each reachable from one predecessor of
the original H.  That exposes the jump threading opportunities B->L,
D->N and F->P.  The final CFG looks something like this:

           A
          / \
        BH'L C
         |  / \
         |DH'N E
         | |  / \
         | |FH'P G
          \| |
            \|
             R



Where each H' also has an edge to J from the original CFG, but which is
hard to show here... Note that I, M, O & Q all disappear and each
dynamic path through the cfg is shortened, even though we had to
duplicate H multiple times.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.

OK for mainline?






-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN+YXtAAoJEBRtltQi2kC7ncwH/2RqgygBPIdholt7jxRH6X1X
7xeBarsQX7SyhO6X1kT7KpWy1tdFElv2UlmqYVKq1Z6U8OZtCwAU3skePk7WcZ/c
gmsUJYLrrDEz93poPgaOnVP62iqa2svFI20xjUDyxN9xf/82Tc6/emV+fmrStxk3
AsgrmfGR31mKtot0HxDFAT14+sqLrrcJ49WFpgfAj1FDLXAajX+q8hAf6cXABHJS
YdFZXeo8NohvYDezLgOhD+YY4/afKzZ3L41ka5gb2fKWrsRwFqCECk7VpbfdDsKc
9EqK+X8Xte/Cy0SmSUQU9/vBoN3Wj0O9kA5Bp3UknbjK9WtrLVKAjjz0b7AaxHg=
=DMtP
-----END PGP SIGNATURE-----

[-- Attachment #2: patch3 --]
[-- Type: text/plain, Size: 23007 bytes --]

	* tree-ssa-threadupdate.c (struct redirection_data): New field
	intermediate_edge.
	(THREAD_TARGET2): Define.
	(redirection_data_eq): Also check that the intermediate edge is
	equal.
	(lookup_redirection_data): Drop useless argument.  Extract the
	outgoing_edge and intermediate edge from E.  Callers updated.
	(copy_phi_args, update_destination_phis): New functions.
	(fix_duplicate_block_edges): Likewise.
	(create_edge_and_update_destination_phis): Duplicate all the edges
	hung off e->aux.  Use copy_phi_args.
	(create_duplicates): Use fix_duplicate_block_edges.
	(fixup_template_block): Likewise.
	(redirect_edges): If necessary, redirect the joiner block's incoming
	edge to the duplicate of the joiner block.
	(thread_block): Don't muck up loops when threading through a joiner
	block.
	(thread_through_loop_header): Handle threading through a joiner
	block.
	(mark_threaded_blocks, register_jump_thread): Likewise.
	* tree-flow.h (register_jump_thread): Add new argument.  Callers
	updated.
	* tree-ssa-threadedge.c (phi_args_equal_on_edges): New function.
	(thread_across_edge): Handle threading through a joiner block.
	

	
	* gcc.dg/builtin-object-size-1.c: Update to handle chances from
	improved jump threading.
	* gcc.dg/builtin-object-size-2.c: Likewise.
	* gcc.dg/tree-ssa/20030728-1.c: Likewise.

Index: tree-ssa-threadupdate.c
===================================================================
*** tree-ssa-threadupdate.c	(revision 174927)
--- tree-ssa-threadupdate.c	(working copy)
*************** struct redirection_data
*** 121,126 ****
--- 121,128 ----
       its single successor.  */
    edge outgoing_edge;
  
+   edge intermediate_edge;
+ 
    /* A list of incoming edges which we want to thread to
       OUTGOING_EDGE->dest.  */
    struct el *incoming_edges;
*************** static VEC(edge,heap) *threaded_edges;
*** 153,158 ****
--- 155,161 ----
     threading is attached to the AUX field for the incoming edge.  Use these
     macros to access the underlying structure attached to the AUX field.  */
  #define THREAD_TARGET(E) ((edge *)(E)->aux)[0]
+ #define THREAD_TARGET2(E) ((edge *)(E)->aux)[1]
  
  /* Jump threading statistics.  */
  
*************** redirection_data_eq (const void *p1, con
*** 231,238 ****
  {
    edge e1 = ((const struct redirection_data *)p1)->outgoing_edge;
    edge e2 = ((const struct redirection_data *)p2)->outgoing_edge;
  
!   return e1 == e2;
  }
  
  /* Given an outgoing edge E lookup and return its entry in our hash table.
--- 234,243 ----
  {
    edge e1 = ((const struct redirection_data *)p1)->outgoing_edge;
    edge e2 = ((const struct redirection_data *)p2)->outgoing_edge;
+   edge e3 = ((const struct redirection_data *)p1)->intermediate_edge;
+   edge e4 = ((const struct redirection_data *)p2)->intermediate_edge;
  
!   return e1 == e2 && e3 == e4;
  }
  
  /* Given an outgoing edge E lookup and return its entry in our hash table.
*************** redirection_data_eq (const void *p1, con
*** 242,248 ****
     edges associated with E in the hash table.  */
  
  static struct redirection_data *
! lookup_redirection_data (edge e, edge incoming_edge, enum insert_option insert)
  {
    void **slot;
    struct redirection_data *elt;
--- 247,253 ----
     edges associated with E in the hash table.  */
  
  static struct redirection_data *
! lookup_redirection_data (edge e, enum insert_option insert)
  {
    void **slot;
    struct redirection_data *elt;
*************** lookup_redirection_data (edge e, edge in
*** 250,256 ****
   /* Build a hash table element so we can see if E is already
       in the table.  */
    elt = XNEW (struct redirection_data);
!   elt->outgoing_edge = e;
    elt->dup_block = NULL;
    elt->incoming_edges = NULL;
  
--- 255,263 ----
   /* Build a hash table element so we can see if E is already
       in the table.  */
    elt = XNEW (struct redirection_data);
!   elt->intermediate_edge = THREAD_TARGET2 (e) ? THREAD_TARGET (e) : NULL;
!   elt->outgoing_edge = THREAD_TARGET2 (e) ? THREAD_TARGET2 (e) 
! 					  : THREAD_TARGET (e);
    elt->dup_block = NULL;
    elt->incoming_edges = NULL;
  
*************** lookup_redirection_data (edge e, edge in
*** 270,276 ****
      {
        *slot = (void *)elt;
        elt->incoming_edges = XNEW (struct el);
!       elt->incoming_edges->e = incoming_edge;
        elt->incoming_edges->next = NULL;
        return elt;
      }
--- 277,283 ----
      {
        *slot = (void *)elt;
        elt->incoming_edges = XNEW (struct el);
!       elt->incoming_edges->e = e;
        elt->incoming_edges->next = NULL;
        return elt;
      }
*************** lookup_redirection_data (edge e, edge in
*** 290,296 ****
  	{
            struct el *el = XNEW (struct el);
  	  el->next = elt->incoming_edges;
! 	  el->e = incoming_edge;
  	  elt->incoming_edges = el;
  	}
  
--- 297,303 ----
  	{
            struct el *el = XNEW (struct el);
  	  el->next = elt->incoming_edges;
! 	  el->e = e;
  	  elt->incoming_edges = el;
  	}
  
*************** lookup_redirection_data (edge e, edge in
*** 298,303 ****
--- 305,345 ----
      }
  }
  
+ /* For each PHI in BB, copy the argument associated with SRC_E to TGT_E.  */
+ 
+ static void
+ copy_phi_args (basic_block bb, edge src_e, edge tgt_e)
+ {
+   gimple_stmt_iterator gsi;
+   int src_indx = src_e->dest_idx;
+ 
+   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+     {
+       gimple phi = gsi_stmt (gsi);
+       source_location locus = gimple_phi_arg_location (phi, src_indx);
+       add_phi_arg (phi, gimple_phi_arg_def (phi, src_indx), tgt_e, locus);
+     }
+ }
+ 
+ /* We have recently made a copy of ORIG_BB, including its outgoing
+    edges.  The copy is NEW_BB.  Every PHI node in every direct successor of
+    ORIG_BB has a new argument associated with edge from NEW_BB to the
+    successor.  Initialize the PHI argument so that it is equal to the PHI
+    argument associated with the edge from ORIG_BB to the successor.  */
+ 
+ static void
+ update_destination_phis (basic_block orig_bb, basic_block new_bb)
+ {
+   edge_iterator ei;
+   edge e;
+ 
+   FOR_EACH_EDGE (e, ei, orig_bb->succs)
+     {
+       edge e2 = find_edge (new_bb, e->dest);
+       copy_phi_args (e->dest, e, e2);
+     }
+ }
+ 
  /* Given a duplicate block and its single destination (both stored
     in RD).  Create an edge between the duplicate and its single
     destination.
*************** create_edge_and_update_destination_phis 
*** 310,316 ****
  					 basic_block bb)
  {
    edge e = make_edge (bb, rd->outgoing_edge->dest, EDGE_FALLTHRU);
-   gimple_stmt_iterator gsi;
  
    rescan_loop_exit (e, true, false);
    e->probability = REG_BR_PROB_BASE;
--- 352,357 ----
*************** create_edge_and_update_destination_phis 
*** 318,325 ****
  
    if (rd->outgoing_edge->aux)
      {
!       e->aux = (edge *) XNEWVEC (edge, 1);
        THREAD_TARGET(e) = THREAD_TARGET (rd->outgoing_edge);
      }
    else
      {
--- 359,367 ----
  
    if (rd->outgoing_edge->aux)
      {
!       e->aux = (edge *) XNEWVEC (edge, 2);
        THREAD_TARGET(e) = THREAD_TARGET (rd->outgoing_edge);
+       THREAD_TARGET2(e) = THREAD_TARGET2 (rd->outgoing_edge);
      }
    else
      {
*************** create_edge_and_update_destination_phis 
*** 330,346 ****
       from the duplicate block, then we will need to add a new argument
       to them.  The argument should have the same value as the argument
       associated with the outgoing edge stored in RD.  */
!   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
!     {
!       gimple phi = gsi_stmt (gsi);
!       source_location locus;
!       int indx = rd->outgoing_edge->dest_idx;
  
!       locus = gimple_phi_arg_location (phi, indx);
!       add_phi_arg (phi, gimple_phi_arg_def (phi, indx), e, locus);
      }
  }
- 
  /* Hash table traversal callback routine to create duplicate blocks.  */
  
  static int
--- 372,414 ----
       from the duplicate block, then we will need to add a new argument
       to them.  The argument should have the same value as the argument
       associated with the outgoing edge stored in RD.  */
!   copy_phi_args (e->dest, rd->outgoing_edge, e);
! }
! 
! /* Wire up the outgoing edges from the duplicate block and
!    update any PHIs as needed.  */
! static void
! fix_duplicate_block_edges (struct redirection_data *rd,
! 			   struct local_info *local_info)
! {
!   /* If we were threading through an joiner block, then we want
!      to keep its control statement and redirect an outgoing edge.
!      Else we want to remove the control statement & edges, then create
!      a new outgoing edge.  In both cases we may need to update PHIs.  */
!   if (THREAD_TARGET2 (rd->incoming_edges->e) == rd->outgoing_edge)
!     {
!       edge victim;
!       edge e2;
!       edge e = rd->incoming_edges->e;
! 
!       /* This updates the PHIs at the destination of the duplicate
! 	 block.  */
!       update_destination_phis (local_info->bb, rd->dup_block);
! 
!       /* Find the edge from the duplicate block to the block we're
! 	 threading through.  That's the edge we want to redirect.  */
!       victim = find_edge (rd->dup_block, THREAD_TARGET (e)->dest);
!       e2 = redirect_edge_and_branch (victim, THREAD_TARGET2 (e)->dest);
  
!       /* This updates the PHI at the target of the threaded edge.  */
!       copy_phi_args (e2->dest, THREAD_TARGET2 (e), e2);
!     }
!   else
!     {
!       remove_ctrl_stmt_and_useless_edges (rd->dup_block, NULL);
!       create_edge_and_update_destination_phis (rd, rd->dup_block);
      }
  }
  /* Hash table traversal callback routine to create duplicate blocks.  */
  
  static int
*************** create_duplicates (void **slot, void *da
*** 365,373 ****
        create_block_for_threading (local_info->template_block, rd);
  
        /* Go ahead and wire up outgoing edges and update PHIs for the duplicate
!          block.  */
!       remove_ctrl_stmt_and_useless_edges (rd->dup_block, NULL);
!       create_edge_and_update_destination_phis (rd, rd->dup_block);
      }
  
    /* Keep walking the hash table.  */
--- 433,440 ----
        create_block_for_threading (local_info->template_block, rd);
  
        /* Go ahead and wire up outgoing edges and update PHIs for the duplicate
! 	 block.   */
!       fix_duplicate_block_edges (rd, local_info);
      }
  
    /* Keep walking the hash table.  */
*************** fixup_template_block (void **slot, void 
*** 384,395 ****
    struct redirection_data *rd = (struct redirection_data *) *slot;
    struct local_info *local_info = (struct local_info *)data;
  
!   /* If this is the template block, then create its outgoing edges
!      and halt the hash table traversal.  */
    if (rd->dup_block && rd->dup_block == local_info->template_block)
      {
!       remove_ctrl_stmt_and_useless_edges (rd->dup_block, NULL);
!       create_edge_and_update_destination_phis (rd, rd->dup_block);
        return 0;
      }
  
--- 451,466 ----
    struct redirection_data *rd = (struct redirection_data *) *slot;
    struct local_info *local_info = (struct local_info *)data;
  
!   /* If this is the template block halt the traversal after updating
!      it appropriately.
! 
!      If we were threading through an joiner block, then we want
!      to keep its control statement and redirect an outgoing edge.
!      Else we want to remove the control statement & edges, then create
!      a new outgoing edge.  In both cases we may need to update PHIs.  */
    if (rd->dup_block && rd->dup_block == local_info->template_block)
      {
!       fix_duplicate_block_edges (rd, local_info);
        return 0;
      }
  
*************** redirect_edges (void **slot, void *data)
*** 419,426 ****
        free (el);
  
        thread_stats.num_threaded_edges++;
  
!       if (rd->dup_block)
  	{
  	  edge e2;
  
--- 490,507 ----
        free (el);
  
        thread_stats.num_threaded_edges++;
+       /* If we are threading through a joiner block, then we have to
+ 	 find the edge we want to redirect and update some PHI nodes.  */
+       if (THREAD_TARGET2 (e))
+ 	{
+ 	  edge e2;
  
! 	  /* We want to redirect the incoming edge to the joiner block (E)
! 	     to instead reach the duplicate of the joiner block.  */
! 	  e2 = redirect_edge_and_branch (e, rd->dup_block);
! 	  flush_pending_stmts (e2);
! 	}
!       else if (rd->dup_block)
  	{
  	  edge e2;
  
*************** thread_block (basic_block bb, bool noloo
*** 546,559 ****
        if (e->aux == NULL)
  	continue;
  
!       e2 = THREAD_TARGET (e);
  
        if (!e2
  	  /* If NOLOOP_ONLY is true, we only allow threading through the
  	     header of a loop to exit edges.  */
  	  || (noloop_only
  	      && bb == bb->loop_father->header
! 	      && (!loop_exit_edge_p (bb->loop_father, e2))))
  	continue;
  
        if (e->dest == e2->src)
--- 627,644 ----
        if (e->aux == NULL)
  	continue;
  
!       if (THREAD_TARGET2 (e))
! 	e2 = THREAD_TARGET2 (e);
!       else
! 	e2 = THREAD_TARGET (e);
  
        if (!e2
  	  /* If NOLOOP_ONLY is true, we only allow threading through the
  	     header of a loop to exit edges.  */
  	  || (noloop_only
  	      && bb == bb->loop_father->header
! 	      && (!loop_exit_edge_p (bb->loop_father, e2)
! 		  || THREAD_TARGET2 (e))))
  	continue;
  
        if (e->dest == e2->src)
*************** thread_block (basic_block bb, bool noloo
*** 562,568 ****
  
        /* Insert the outgoing edge into the hash table if it is not
  	 already in the hash table.  */
!       lookup_redirection_data (e2, e, INSERT);
      }
  
    /* We do not update dominance info.  */
--- 647,653 ----
  
        /* Insert the outgoing edge into the hash table if it is not
  	 already in the hash table.  */
!       lookup_redirection_data (e, INSERT);
      }
  
    /* We do not update dominance info.  */
*************** thread_through_loop_header (struct loop 
*** 817,822 ****
--- 902,909 ----
  
    if (latch->aux)
      {
+       if (THREAD_TARGET2 (latch))
+ 	goto fail;
        tgt_edge = THREAD_TARGET (latch);
        tgt_bb = tgt_edge->dest;
      }
*************** thread_through_loop_header (struct loop 
*** 840,845 ****
--- 927,934 ----
  	      goto fail;
  	    }
  
+ 	  if (THREAD_TARGET2 (e))
+ 	    goto fail;
  	  tgt_edge = THREAD_TARGET (e);
  	  atgt_bb = tgt_edge->dest;
  	  if (!tgt_bb)
*************** mark_threaded_blocks (bitmap threaded_bl
*** 967,979 ****
    edge e;
    edge_iterator ei;
  
!   for (i = 0; i < VEC_length (edge, threaded_edges); i += 2)
      {
        edge e = VEC_index (edge, threaded_edges, i);
!       edge *x = (edge *) XNEWVEC (edge, 1);
  
        e->aux = x;
        THREAD_TARGET (e) = VEC_index (edge, threaded_edges, i + 1);
        bitmap_set_bit (tmp, e->dest->index);
      }
  
--- 1056,1069 ----
    edge e;
    edge_iterator ei;
  
!   for (i = 0; i < VEC_length (edge, threaded_edges); i += 3)
      {
        edge e = VEC_index (edge, threaded_edges, i);
!       edge *x = (edge *) XNEWVEC (edge, 2);
  
        e->aux = x;
        THREAD_TARGET (e) = VEC_index (edge, threaded_edges, i + 1);
+       THREAD_TARGET2 (e) = VEC_index (edge, threaded_edges, i + 2);
        bitmap_set_bit (tmp, e->dest->index);
      }
  
*************** thread_through_all_blocks (bool may_peel
*** 1085,1091 ****
     after fixing the SSA graph.  */
  
  void
! register_jump_thread (edge e, edge e2)
  {
    /* This can occur if we're jumping to a constant address or
       or something similar.  Just get out now.  */
--- 1175,1181 ----
     after fixing the SSA graph.  */
  
  void
! register_jump_thread (edge e, edge e2, edge e3)
  {
    /* This can occur if we're jumping to a constant address or
       or something similar.  Just get out now.  */
*************** register_jump_thread (edge e, edge e2)
*** 1102,1105 ****
--- 1192,1196 ----
  
    VEC_safe_push (edge, heap, threaded_edges, e);
    VEC_safe_push (edge, heap, threaded_edges, e2);
+   VEC_safe_push (edge, heap, threaded_edges, e3);
  }
Index: testsuite/gcc.dg/builtin-object-size-1.c
===================================================================
*** testsuite/gcc.dg/builtin-object-size-1.c	(revision 174927)
--- testsuite/gcc.dg/builtin-object-size-1.c	(working copy)
*************** test1 (void *q, int x)
*** 64,70 ****
      r = malloc (30);
    else
      r = calloc (2, 16);
!   if (__builtin_object_size (r, 0) != 2 * 16)
      abort ();
    if (x < 20)
      r = malloc (30);
--- 64,74 ----
      r = malloc (30);
    else
      r = calloc (2, 16);
!   /* We may duplicate this test onto the two exit paths.  On one path
!      the size will be 32, the other it will be 30.  If we don't duplicate
!      this test, then the size will be 32.  */
!   if (__builtin_object_size (r, 0) != 2 * 16
!       && __builtin_object_size (r, 0) != 30)
      abort ();
    if (x < 20)
      r = malloc (30);
Index: testsuite/gcc.dg/tree-ssa/20030728-1.c
===================================================================
*** testsuite/gcc.dg/tree-ssa/20030728-1.c	(revision 174927)
--- testsuite/gcc.dg/tree-ssa/20030728-1.c	(working copy)
*************** objects_must_conflict_p (t1, t2)
*** 41,47 ****
    return foo (t2 ? get_alias_set (t2) : 0);
  }
  
! /* There should be two assignments of variables to the value zero.  */
! /* { dg-final { scan-rtl-dump-times "PART.. = 0" 2 "expand"} } */
   
  /* { dg-final { cleanup-rtl-dump "expand" } } */
--- 41,49 ----
    return foo (t2 ? get_alias_set (t2) : 0);
  }
  
! /* There should be one assignment of variables to the value zero.  There
!    used to be two assignments, but improvements in threading allowed the
!    second to be propagated into all its uses and eliminated.   */
! /* { dg-final { scan-rtl-dump-times "PART.. = 0" 1 "expand"} } */
   
  /* { dg-final { cleanup-rtl-dump "expand" } } */
Index: testsuite/gcc.dg/builtin-object-size-2.c
===================================================================
*** testsuite/gcc.dg/builtin-object-size-2.c	(revision 174927)
--- testsuite/gcc.dg/builtin-object-size-2.c	(working copy)
*************** test1 (void *q, int x)
*** 60,66 ****
      r = malloc (30);
    else
      r = calloc (2, 16);
!   if (__builtin_object_size (r, 1) != 2 * 16)
      abort ();
    if (x < 20)
      r = malloc (30);
--- 60,70 ----
      r = malloc (30);
    else
      r = calloc (2, 16);
!   /* We may duplicate this test onto the two exit paths.  On one path
!      the size will be 32, the other it will be 30.  If we don't duplicate
!      this test, then the size will be 32.  */
!   if (__builtin_object_size (r, 1) != 2 * 16
!       && __builtin_object_size (r, 1) != 30)
      abort ();
    if (x < 20)
      r = malloc (30);
Index: tree-flow.h
===================================================================
*** tree-flow.h	(revision 174927)
--- tree-flow.h	(working copy)
*************** bool may_be_nonaddressable_p (tree expr)
*** 808,814 ****
  
  /* In tree-ssa-threadupdate.c.  */
  extern bool thread_through_all_blocks (bool);
! extern void register_jump_thread (edge, edge);
  
  /* In gimplify.c  */
  tree force_gimple_operand_1 (tree, gimple_seq *, gimple_predicate, tree);
--- 808,814 ----
  
  /* In tree-ssa-threadupdate.c.  */
  extern bool thread_through_all_blocks (bool);
! extern void register_jump_thread (edge, edge, edge);
  
  /* In gimplify.c  */
  tree force_gimple_operand_1 (tree, gimple_seq *, gimple_predicate, tree);
Index: tree-ssa-threadedge.c
===================================================================
*** tree-ssa-threadedge.c	(revision 174927)
--- tree-ssa-threadedge.c	(working copy)
*************** thread_around_empty_block (edge taken_ed
*** 652,657 ****
--- 652,678 ----
    return NULL;
  }
        
+ /* E1 and E2 are edges into the same basic block.  Return TRUE if the
+    PHI arguments associated with those edges are equal or there are no
+    PHI arguments, otherwise return FALSE.  */
+ 
+ static bool
+ phi_args_equal_on_edges (edge e1, edge e2)
+ {
+   gimple_stmt_iterator gsi;
+   int indx1 = e1->dest_idx;
+   int indx2 = e2->dest_idx;
+ 
+   for (gsi = gsi_start_phis (e1->dest); !gsi_end_p (gsi); gsi_next (&gsi))
+     {
+       gimple phi = gsi_stmt (gsi);
+ 
+       if (!operand_equal_p (gimple_phi_arg_def (phi, indx1),
+ 			    gimple_phi_arg_def (phi, indx2), 0))
+ 	return false;
+     }
+   return true;
+ }
  
  /* We are exiting E->src, see if E->dest ends with a conditional
     jump which has a known value when reached via E.
*************** thread_across_edge (gimple dummy_cond,
*** 770,780 ****
  	    }
  
  	  remove_temporary_equivalences (stack);
! 	  register_jump_thread (e, taken_edge);
  	  return;
  	}
      }
  
   fail:
    remove_temporary_equivalences (stack);
  }
--- 791,863 ----
  	    }
  
  	  remove_temporary_equivalences (stack);
! 	  register_jump_thread (e, taken_edge, NULL);
  	  return;
  	}
      }
  
+  /* We were unable to determine what out edge from E->dest is taken.  However,
+     we might still be able to thread through successors of E->dest.  This
+     often occurs when E->dest is a joiner block which then fans back out
+     based on redundant tests.
+ 
+     If so, we'll copy E->dest and redirect the appropriate predecessor to
+     the copy.  Within the copy of E->dest, we'll thread one or more edges
+     to points deeper in the CFG.
+ 
+     This is a stopgap until we have a more structured approach to path
+     isolation.  */
+   {
+     edge e2, e3, taken_edge;
+     edge_iterator ei;
+     bool found = false;
+     bitmap visited = BITMAP_ALLOC (NULL);
+ 
+     /* Look at each successor of E->dest to see if we can thread through it.  */
+     FOR_EACH_EDGE (taken_edge, ei, e->dest->succs)
+       {
+ 	/* Avoid threading to any block we have already visited.  */
+ 	bitmap_clear (visited);
+ 	bitmap_set_bit (visited, taken_edge->dest->index);
+ 	bitmap_set_bit (visited, e->dest->index);
+ 
+ 	/* Record whether or not we were able to thread through a successor
+ 	   of E->dest.  */
+ 	found = false;
+ 	e3 = taken_edge;
+ 	do
+ 	  {
+ 	    e2 = thread_around_empty_block (e3,
+ 				            dummy_cond,
+ 				            handle_dominating_asserts,
+ 				            simplify,
+ 				            visited);
+ 	    if (e2)
+ 	      {
+ 	        e3 = e2;
+ 		found = true;
+ 	      }
+ 	  }
+         while (e2);
+ 
+ 	/* If we were able to thread through a successor of E->dest, then
+ 	   record the jump threading opportunity.  */
+ 	if (found)
+ 	  {
+ 	    edge tmp;
+ 	    /* If there is already an edge from the block to be duplicated
+ 	       (E2->src) to the final target (E3->dest), then make sure that
+ 	       the PHI args associated with the edges E2 and E3 are the
+ 	       same.  */
+ 	    tmp = find_edge (taken_edge->src, e3->dest);
+ 	    if (!tmp || phi_args_equal_on_edges (tmp, e3))
+ 	      register_jump_thread (e, taken_edge, e3);
+ 	  }
+ 
+       }
+     BITMAP_FREE (visited);
+   }
+ 
   fail:
    remove_temporary_equivalences (stack);
  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Improve jump threading #5 of N
  2011-06-16  5:49 Improve jump threading #5 of N Jeff Law
@ 2011-06-16  7:57 ` Richard Guenther
  2011-06-19  7:30 ` H.J. Lu
  2011-06-22  4:19 ` Hans-Peter Nilsson
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Guenther @ 2011-06-16  7:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Thu, Jun 16, 2011 at 6:26 AM, Jeff Law <law@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
>
> So as I've mentioned previously, I've been working on a relatively small
> change to the jump threading code which would allow it to duplicate a
> join block when doing so allows us to thread through a successor of the
> join block.  This is expected to be the last extension to the existing
> jump threading code.
>
> This was mainly done to improve our ability to eliminate unexecutable
> paths through the CFG which helps avoid false positives with certain
> warnings.  It also has the nice property that it eliminates conditionals
> and often results in further optimization of nearby code.
>
> To help evaluate the code generation improvements of this change I built
> gcc-4.6 (checking enabled) using a compiler with and without this
> improvement.  I then used the 4.6 cc1s to compile a bunch of .i files
> under the watchful eye of valgrind.
>
>                        without patch    with patch
> Total cbranches          231072754220     229626578262
> Total ibranches:           7687404775       7686994201
>
>
> cbranches shows the number of dynamically executed conditional branches.
>  As you can see, with the patch we eliminated about .625% of the runtime
> conditional branches.  Not bad at all.  We eliminated a trivial number
> of indirect branches.  In all we eliminated 1446595532 runtime branches.
>
>                        without patch    with patch
> Total instructions:     1254106133886    1247718004946
>
>
> I was expecting a reduction in the total number of instructions
> executed, but was quite surprised at the actual data.  We end up
> eliminating 6388128940 dynamic instructions --- which means that for
> every dynamic branch eliminated, on average we were able to eliminate an
> additional 3.4 dynamic instructions -- that's a huge secondary effect.
> Clearly improving jump threading in this way is allowing the rest of the
> optimizers to do a better job.
>
> Anyway, attached is the patch.  Again, the concept is pretty simple,
> when we have a join block which can not be threaded, we peek at the
> successors of the join block and see if one or more of them can be threaded.
>
> If so, we make a duplicate of the join block, wire the incoming edge we
> were originally trying to thread to reach the duplicate rather than the
> original join block.  We then wire the outgoing edge from the duplicate
> to the final jump thread target.
>
> So if given a CFG like this (from  a routine in cfgexpand):
>
>           A
>          / \
>         B   C
>         |  / \
>         | D   E
>         | |  / \
>         | | F   G
>          \| |
>            \|
>             H
>            / \
>           I   J
>          / \
>         L   M
>         |  / \
>         | N   O
>         | |  / \
>         | | P   Q
>          \| |
>            \|
>             R
>
>
> As it turns out some blocks have the same condition (A,I), (C,M), (E,O).
> But because of the merge block H, no threading is possible.  What we
> want to do is make 3 copies of H, each reachable from one predecessor of
> the original H.  That exposes the jump threading opportunities B->L,
> D->N and F->P.  The final CFG looks something like this:
>
>           A
>          / \
>        BH'L C
>         |  / \
>         |DH'N E
>         | |  / \
>         | |FH'P G
>          \| |
>            \|
>             R
>
>
>
> Where each H' also has an edge to J from the original CFG, but which is
> hard to show here... Note that I, M, O & Q all disappear and each
> dynamic path through the cfg is shortened, even though we had to
> duplicate H multiple times.
>
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
>
> OK for mainline?

Ok.

Thanks,
Richard.

>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJN+YXtAAoJEBRtltQi2kC7ncwH/2RqgygBPIdholt7jxRH6X1X
> 7xeBarsQX7SyhO6X1kT7KpWy1tdFElv2UlmqYVKq1Z6U8OZtCwAU3skePk7WcZ/c
> gmsUJYLrrDEz93poPgaOnVP62iqa2svFI20xjUDyxN9xf/82Tc6/emV+fmrStxk3
> AsgrmfGR31mKtot0HxDFAT14+sqLrrcJ49WFpgfAj1FDLXAajX+q8hAf6cXABHJS
> YdFZXeo8NohvYDezLgOhD+YY4/afKzZ3L41ka5gb2fKWrsRwFqCECk7VpbfdDsKc
> 9EqK+X8Xte/Cy0SmSUQU9/vBoN3Wj0O9kA5Bp3UknbjK9WtrLVKAjjz0b7AaxHg=
> =DMtP
> -----END PGP SIGNATURE-----
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Improve jump threading #5 of N
  2011-06-16  5:49 Improve jump threading #5 of N Jeff Law
  2011-06-16  7:57 ` Richard Guenther
@ 2011-06-19  7:30 ` H.J. Lu
  2011-06-22  4:19 ` Hans-Peter Nilsson
  2 siblings, 0 replies; 5+ messages in thread
From: H.J. Lu @ 2011-06-19  7:30 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Wed, Jun 15, 2011 at 9:26 PM, Jeff Law <law@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
>
> So as I've mentioned previously, I've been working on a relatively small
> change to the jump threading code which would allow it to duplicate a
> join block when doing so allows us to thread through a successor of the
> join block.  This is expected to be the last extension to the existing
> jump threading code.
>
> This was mainly done to improve our ability to eliminate unexecutable
> paths through the CFG which helps avoid false positives with certain
> warnings.  It also has the nice property that it eliminates conditionals
> and often results in further optimization of nearby code.
>
> To help evaluate the code generation improvements of this change I built
> gcc-4.6 (checking enabled) using a compiler with and without this
> improvement.  I then used the 4.6 cc1s to compile a bunch of .i files
> under the watchful eye of valgrind.
>
>                        without patch    with patch
> Total cbranches          231072754220     229626578262
> Total ibranches:           7687404775       7686994201
>
>
> cbranches shows the number of dynamically executed conditional branches.
>  As you can see, with the patch we eliminated about .625% of the runtime
> conditional branches.  Not bad at all.  We eliminated a trivial number
> of indirect branches.  In all we eliminated 1446595532 runtime branches.
>
>                        without patch    with patch
> Total instructions:     1254106133886    1247718004946
>
>
> I was expecting a reduction in the total number of instructions
> executed, but was quite surprised at the actual data.  We end up
> eliminating 6388128940 dynamic instructions --- which means that for
> every dynamic branch eliminated, on average we were able to eliminate an
> additional 3.4 dynamic instructions -- that's a huge secondary effect.
> Clearly improving jump threading in this way is allowing the rest of the
> optimizers to do a better job.
>
> Anyway, attached is the patch.  Again, the concept is pretty simple,
> when we have a join block which can not be threaded, we peek at the
> successors of the join block and see if one or more of them can be threaded.
>
> If so, we make a duplicate of the join block, wire the incoming edge we
> were originally trying to thread to reach the duplicate rather than the
> original join block.  We then wire the outgoing edge from the duplicate
> to the final jump thread target.
>
> So if given a CFG like this (from  a routine in cfgexpand):
>
>           A
>          / \
>         B   C
>         |  / \
>         | D   E
>         | |  / \
>         | | F   G
>          \| |
>            \|
>             H
>            / \
>           I   J
>          / \
>         L   M
>         |  / \
>         | N   O
>         | |  / \
>         | | P   Q
>          \| |
>            \|
>             R
>
>
> As it turns out some blocks have the same condition (A,I), (C,M), (E,O).
> But because of the merge block H, no threading is possible.  What we
> want to do is make 3 copies of H, each reachable from one predecessor of
> the original H.  That exposes the jump threading opportunities B->L,
> D->N and F->P.  The final CFG looks something like this:
>
>           A
>          / \
>        BH'L C
>         |  / \
>         |DH'N E
>         | |  / \
>         | |FH'P G
>          \| |
>            \|
>             R
>
>
>
> Where each H' also has an edge to J from the original CFG, but which is
> hard to show here... Note that I, M, O & Q all disappear and each
> dynamic path through the cfg is shortened, even though we had to
> duplicate H multiple times.
>
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
>
> OK for mainline?
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49465

-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Improve jump threading #5 of N
  2011-06-16  5:49 Improve jump threading #5 of N Jeff Law
  2011-06-16  7:57 ` Richard Guenther
  2011-06-19  7:30 ` H.J. Lu
@ 2011-06-22  4:19 ` Hans-Peter Nilsson
  2011-06-22 14:13   ` Jeff Law
  2 siblings, 1 reply; 5+ messages in thread
From: Hans-Peter Nilsson @ 2011-06-22  4:19 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Wed, 15 Jun 2011, Jeff Law wrote:
> So as I've mentioned previously, I've been working on a relatively small
> change to the jump threading code which would allow it to duplicate a
> join block when doing so allows us to thread through a successor of the
> join block.  This is expected to be the last extension to the existing
> jump threading code.

> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.

It seems this caused a warning regression, PR49498, for some
but not all targets, see the PR.

brgds, H-P

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Improve jump threading #5 of N
  2011-06-22  4:19 ` Hans-Peter Nilsson
@ 2011-06-22 14:13   ` Jeff Law
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Law @ 2011-06-22 14:13 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc-patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/21/11 21:28, Hans-Peter Nilsson wrote:
> On Wed, 15 Jun 2011, Jeff Law wrote:
>> So as I've mentioned previously, I've been working on a relatively small
>> change to the jump threading code which would allow it to duplicate a
>> join block when doing so allows us to thread through a successor of the
>> join block.  This is expected to be the last extension to the existing
>> jump threading code.
> 
>> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
> 
> It seems this caused a warning regression, PR49498, for some
> but not all targets, see the PR.
I never would have expected that kind of regression; obviously I'll have
a looksie.  Thanks for the heads-up.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOAfQzAAoJEBRtltQi2kC7MNIH/0W5gcnheIkcWo5z+D1rYjSc
mEHZB9vyzBGLgZHQTXfPSM0MuEIgafz5J17gvkEfTQ768iGCrpWleMkchrhhaip1
oTgCRPLwuYouoPrNuYOMi17Zm8QfoPUFbyKbSxIvvk7v/eRolWyL6bqGWlWeFMID
6MKDv6GGbJRYNR0pyQiUQZ6cz2C6irNI4cLwlu7pbfKZ9pc5wte6dDkZ2jEaFs3b
bISpo4kE/bX5vDkyr2sWmkBLq0SGP981uNkjorO2tRplrEuA2BA85ZyUaCl3VMIz
voRCL+L7cW9zhx2FBL+x2haEwRyxsmLruR3T4equZcCxTpTBHgawWdqWA1g7F9s=
=8Yla
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-06-22 13:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-16  5:49 Improve jump threading #5 of N Jeff Law
2011-06-16  7:57 ` Richard Guenther
2011-06-19  7:30 ` H.J. Lu
2011-06-22  4:19 ` Hans-Peter Nilsson
2011-06-22 14:13   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).