public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Add code-hoisting to GIMPLE
@ 2016-07-04 11:27 Richard Biener
  2016-07-04 20:31 ` Steven Bosscher
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2016-07-04 11:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: Steven Bosscher


The following patch is Stevens code-hoisting based on PRE forward-ported
and fixed for bootstrap plus the case of hoisting code across loops
which we generally do not want (expressions in the loop exit target block
are antic-in throughout the whole loop unless they are killed and thus
get inserted into the exit block and then PREd before the loop).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'm going to try making the bitmap_set ops in do_hoist_insert a bit
faster - Steven, do you remember any issues with the approach from the
time you worked on it?

Thanks,
Richard.

2016-07-04  Steven Bosscher  <steven@gcc.gnu.org
	Richard Biener  <rguenther@suse.de>

	PR tree-optimization/23286
	PR tree-optimization/70159
	* doc/invoke.texi: Document -ftree-hoist.
	* common.opt (ftree-hoist): New flag.
	* opts.c (default_options_table): Enable -ftree-hoist at -O2+.
	* tree-ssa-pre.c (pre_stats): Add hoist_insert.
	(do_regular_insertion): Rename to ...
	(do_pre_regular_insertion): ... this and amend general comments
	on insertion strathegy.
	(do_partial_partial_insertion): Rename to ...
	(do_pre_partial_partial_insertion): ... this.
	(do_hoist_insertion): New function.
	(insert_aux): Take flags on whether to do PRE and/or hoist insertion
	and call do_hoist_insertion properly.
	(insert): Adjust.
	(pass_pre::gate): Enable also if -ftree-hoist is enabled.
	(pass_pre::execute): Register hoist_insert stats.

	* gcc.dg/tree-ssa/ssa-pre-11.c: Disable code hosting.
	* gcc.dg/tree-ssa/ssa-pre-27.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-28.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-2.c: Likewise.
	* gcc.dg/tree-ssa/pr35286.c: Likewise.
	* gcc.dg/tree-ssa/pr35287.c: Likewise.
	* gcc.dg/tree-ssa/loadpre3.c: Adjust so hosting doesn't apply.
	* gcc.dg/tree-ssa/pr43491.c: Scan optimized dump for desired result.
	* gcc.dg/tree-ssa/ssa-hoist-1.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-2.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-3.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-4.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-5.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-6.c: New testcase.

Index: gcc/doc/invoke.texi
===================================================================
*** gcc/doc/invoke.texi.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/doc/invoke.texi	2016-07-04 11:31:33.391904573 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 404,410 ****
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
--- 404,410 ----
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -ftree-hoist -ftree-loop-if-convert @gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
*************** also turns on the following optimization
*** 6372,6377 ****
--- 6372,6378 ----
  -fstrict-aliasing -fstrict-overflow @gol
  -ftree-builtin-call-dce @gol
  -ftree-switch-conversion -ftree-tail-merge @gol
+ -ftree-hoist @gol
  -ftree-pre @gol
  -ftree-vrp @gol
  -fipa-ra}
*************** and the @option{large-stack-frame-growth
*** 7265,7270 ****
--- 7266,7279 ----
  Perform reassociation on trees.  This flag is enabled by default
  at @option{-O} and higher.
  
+ @item -ftree-hoist
+ @opindex ftree-hoist
+ Perform code hoisting on trees.  Code hoisting tries to move the
+ evaluation of expressions executed on all paths to the function exit
+ as early as possible.  This is especially useful as a code size
+ optimization, but it often helps for code speed as well.
+ This flag is enabled by defailt at @option{-O2} and higher.
+ 
  @item -ftree-pre
  @opindex ftree-pre
  Perform partial redundancy elimination (PRE) on trees.  This flag is
*************** Dump each function after STORE-CCP@.  Th
*** 12222,12229 ****
  
  @item pre
  @opindex fdump-tree-pre
! Dump trees after partial redundancy elimination.  The file name is made
! by appending @file{.pre} to the source file name.
  
  @item fre
  @opindex fdump-tree-fre
--- 12231,12238 ----
  
  @item pre
  @opindex fdump-tree-pre
! Dump trees after partial redundancy elimination and/or code hoisting.
! The file name is made by appending @file{.pre} to the source file name.
  
  @item fre
  @opindex fdump-tree-fre
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  double cos (double);
  double f(double a)
  {
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  double cos (double);
  double f(double a)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int motion_test1(int data, int data_0, int data_3, int v)
  {
  	int i;
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  int motion_test1(int data, int data_0, int data_3, int v)
  {
  	int i;
Index: gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr35286.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr35286.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int g2;
  struct A {
      int a; int b;
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  int g2;
  struct A {
      int a; int b;
Index: gcc/testsuite/gcc.dg/tree-ssa/pr35287.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr35287.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr35287.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int *gp;
  int foo(int p)
  {
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  int *gp;
  int foo(int p)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
--- 1,8 ----
  /* { dg-do compile } */
  /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ extern void spoil (void);
+ 
  int foo(int **a,int argc)
  {
    int b;
*************** int foo(int **a,int argc)
*** 11,17 ****
      }
    else
      {
! 
      }
    /* Should be able to eliminate one of the *(*a)'s along the if path
       by pushing it into the else path. We will also eliminate
--- 14,21 ----
      }
    else
      {
!       /* Spoil *a and *(*a) to avoid hoisting it before the "if (...)".  */
!       spoil ();
      }
    /* Should be able to eliminate one of the *(*a)'s along the if path
       by pushing it into the else path. We will also eliminate
Index: gcc/opts.c
===================================================================
*** gcc/opts.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/opts.c	2016-07-04 11:31:33.391904573 +0200
*************** static const struct default_options defa
*** 500,505 ****
--- 500,506 ----
        REORDER_BLOCKS_ALGORITHM_STC },
      { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
+     { OPT_LEVELS_2_PLUS, OPT_ftree_hoist, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/tree-ssa-pre.c	2016-07-04 12:07:32.957077755 +0200
***************
*** 1,4 ****
! /* SSA-PRE for trees.
     Copyright (C) 2001-2016 Free Software Foundation, Inc.
     Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
     <stevenb@suse.de>
--- 1,4 ----
! /* Full and partial redundancy elimination and code hoisting on SSA GIMPLE.
     Copyright (C) 2001-2016 Free Software Foundation, Inc.
     Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
     <stevenb@suse.de>
*************** along with GCC; see the file COPYING3.
*** 54,64 ****
--- 54,85 ----
  #include "tree-cfgcleanup.h"
  #include "langhooks.h"
  
+ /* Even though this file is called tree-ssa-pre.c, we actually
+    implement a bit more than just PRE here.  All of them piggy-back
+    on GVN which is implemented in tree-ssa-sccvn.c.
+ 
+      1. Full Redundancy Elimination (FRE)
+ 	This is the elimination phase of GVN.
+ 
+      2. Partial Redundancy Elimination (PRE)
+ 	This is adds computation of AVAIL_OUT and ANTIC_IN and
+ 	doing expression insertion to form GVN-PRE.
+ 
+      3. Code hoisting
+ 	This optimization uses the ANTIC_IN sets computed for PRE
+ 	to move expressions further up than PRE would do, to make
+ 	multiple computations of the same value fully redundant.
+ 	This pass is explained below (after the explanation of the
+ 	basic algorithm for PRE).
+ */
+ 
  /* TODO:
  
     1. Avail sets can be shared by making an avail_find_leader that
        walks up the dominator tree and looks in those avail sets.
        This might affect code optimality, it's unclear right now.
+       Currently the AVAIL_OUT sets are the remaining quadraticness in
+       memory of GVN-PRE.
     2. Strength reduction can be performed by anticipating expressions
        we can repair later on.
     3. We can do back-substitution or smarter value numbering to catch
*************** along with GCC; see the file COPYING3.
*** 70,76 ****
     represent the actual statement containing the expressions we care about,
     and we cache the value number by putting it in the expression.  */
  
! /* Basic algorithm
  
     First we walk the statements to generate the AVAIL sets, the
     EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
--- 91,97 ----
     represent the actual statement containing the expressions we care about,
     and we cache the value number by putting it in the expression.  */
  
! /* Basic algorithm for Partial Redundancy Elimination:
  
     First we walk the statements to generate the AVAIL sets, the
     EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
*************** along with GCC; see the file COPYING3.
*** 110,126 ****
     In order to make it fully redundant, we insert the expression into
     the predecessors where it is not available, but is ANTIC.
  
     For the partial anticipation case, we only perform insertion if it
     is partially anticipated in some block, and fully available in all
     of the predecessors.
  
!    insert/insert_aux/do_regular_insertion/do_partial_partial_insertion
!    performs these steps.
  
     Fourth, we eliminate fully redundant expressions.
     This is a simple statement walk that replaces redundant
     calculations with the now available values.  */
  
  /* Representations of value numbers:
  
     Value numbers are represented by a representative SSA_NAME.  We
--- 131,205 ----
     In order to make it fully redundant, we insert the expression into
     the predecessors where it is not available, but is ANTIC.
  
+    When optimizing for size, we only eliminate the partial redundancy
+    if we need to insert in only one predecessor.  This avoids almost
+    completely the code size increase that PRE usually causes.
+ 
     For the partial anticipation case, we only perform insertion if it
     is partially anticipated in some block, and fully available in all
     of the predecessors.
  
!    do_pre_regular_insertion/do_pre_partial_partial_insertion
!    performs these steps, driven by insert/insert_aux.
  
     Fourth, we eliminate fully redundant expressions.
     This is a simple statement walk that replaces redundant
     calculations with the now available values.  */
  
+ /* Basic algorithm for Code Hoisting:
+ 
+    Code hoisting is: Moving value computations up in the control flow
+    graph to make multiple copies redundant.  Typically this is a size
+    optimization, but there are cases where it also is helpful for speed.
+ 
+    A simple code hoisting algorithm is implemented that piggy-backs on
+    the PRE infrastructure.  For code hoisting, we have to know ANTIC_OUT
+    which is effectively ANTIC_IN - AVAIL_OUT.  The latter two have to be
+    computed for PRE, and we can use them to perform a limited version of
+    code hoisting, too.
+ 
+    For the purpose of this implementation, a value is hoistable to a basic
+    block B if the following properties are met:
+ 
+    1. The value is in ANTIC_IN(B) -- the value will be computed on all
+       paths from B to function exit and it can be computed in B);
+ 
+    2. The value is not in AVAIL_OUT(B) -- there would be no need to
+       compute the value again and make it available twice;
+ 
+    3. All successors of B are dominated by B -- makes sure that inserting
+       a computation of the value in B will make the remaining
+       computations fully redundant;
+ 
+    4. At least one successor has the value in AVAIL_OUT -- to avoid
+       hoisting values up too far;
+ 
+    5. There are at least two successors of B -- hoisting in straight
+       line code is pointless.
+ 
+    The third condition is not strictly necessary, but it would complicate
+    the hoisting pass a lot.  In fact, I don't know of any code hoisting
+    algorithm that does not have this requirement.  Fortunately, experiments
+    have show that most candidate hoistable values are in regions that meet
+    this condition (e.g. diamond-shape regions).
+ 
+    The forth condition is necessary to avoid hoisting things up too far
+    away from the uses of the value.  Nothing else limits the algorithm
+    from hoisting everything up as far as ANTIC_IN allows.  Experiments
+    with SPEC and CSiBE have shown that hoisting up too far results in more
+    spilling, less benefits for code size, and worse benchmark scores.
+    Fortunately, in practice most of the interesting hoisting opportunities
+    are caught despite this limitation.
+ 
+    For hoistable values that meet all conditions, expressions are inserted
+    to make the calculation of the hoistable value fully redundant.  We
+    perform code hoisting insertions after each round of PRE insertions,
+    because code hoisting never exposes new PRE opportunities, but PRE can
+    create new code hoisting opportunities.
+ 
+    The code hoisting algorithm is implemented in do_hoist_insert, driven
+    by insert/insert_aux.  */
+ 
  /* Representations of value numbers:
  
     Value numbers are represented by a representative SSA_NAME.  We
*************** static struct
*** 445,450 ****
--- 524,532 ----
    /* The number of inserts found due to partial anticipation  */
    int pa_insert;
  
+   /* The number of inserts made for code hoisting.  */
+   int hoist_insert;
+ 
    /* The number of new PHI nodes added by PRE.  */
    int phis;
  } pre_stats;
*************** static pre_expr bitmap_find_leader (bitm
*** 454,459 ****
--- 536,542 ----
  static void bitmap_value_insert_into_set (bitmap_set_t, pre_expr);
  static void bitmap_value_replace_in_set (bitmap_set_t, pre_expr);
  static void bitmap_set_copy (bitmap_set_t, bitmap_set_t);
+ static void bitmap_set_and (bitmap_set_t, bitmap_set_t);
  static bool bitmap_set_contains_value (bitmap_set_t, unsigned int);
  static void bitmap_insert_into_set (bitmap_set_t, pre_expr);
  static void bitmap_insert_into_set_1 (bitmap_set_t, pre_expr,
*************** insert_into_preds_of_block (basic_block
*** 3103,3109 ****
  
  
  
! /* Perform insertion of partially redundant values.
     For BLOCK, do the following:
     1.  Propagate the NEW_SETS of the dominator into the current block.
     If the block has multiple predecessors,
--- 3186,3192 ----
  
  
  
! /* Perform insertion of partially redundant or hoistable values.
     For BLOCK, do the following:
     1.  Propagate the NEW_SETS of the dominator into the current block.
     If the block has multiple predecessors,
*************** insert_into_preds_of_block (basic_block
*** 3114,3128 ****
         2c. Insert a new PHI merging the values of the predecessors.
         2d. Insert the new PHI, and the new expressions, into the
  	   NEW_SETS set.
!    3. Recursively call ourselves on the dominator children of BLOCK.
! 
!    Steps 1, 2a, and 3 are done by insert_aux. 2b, 2c and 2d are done by
!    do_regular_insertion and do_partial_insertion.
! 
  */
  
  static bool
! do_regular_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
--- 3197,3216 ----
         2c. Insert a new PHI merging the values of the predecessors.
         2d. Insert the new PHI, and the new expressions, into the
  	   NEW_SETS set.
!    If the block has multiple successors,
!        3a. Iterate over the ANTIC values for the block to see if
! 	   any of them are good candidates for hoisting.
!        3b. If so, insert expressions computing the values in BLOCK,
! 	   and add the new expressions into the NEW_SETS set.
!    4. Recursively call ourselves on the dominator children of BLOCK.
! 
!    Steps 1, 2a, and 4 are done by insert_aux. 2b, 2c and 2d are done by
!    do_pre_regular_insertion and do_partial_insertion.  3a and 3b are
!    done in do_hoist_insertion.
  */
  
  static bool
! do_pre_regular_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
*************** do_regular_insertion (basic_block block,
*** 3291,3299 ****
     In this case, we know that putting it earlier will enable us to
     remove the later computation.  */
  
- 
  static bool
! do_partial_partial_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
--- 3379,3386 ----
     In this case, we know that putting it earlier will enable us to
     remove the later computation.  */
  
  static bool
! do_pre_partial_partial_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
*************** do_partial_partial_insertion (basic_bloc
*** 3422,3429 ****
    return new_stuff;
  }
  
  static bool
! insert_aux (basic_block block)
  {
    basic_block son;
    bool new_stuff = false;
--- 3509,3626 ----
    return new_stuff;
  }
  
+ /* Insert expressions in BLOCK to compute hoistable values up.
+    Return TRUE if something was inserted, otherwise return FALSE.
+    The caller has to make sure that BLOCK has at least two successors.  */
+ 
  static bool
! do_hoist_insertion (basic_block block)
! {
!   edge e;
!   edge_iterator ei;
!   bool new_stuff = false;
!   unsigned i;
!   bitmap_iterator bi;
!   bitmap_set_t hoistable_set;
!   bitmap availout_in_some;
!   gimple_stmt_iterator last;
! 
!   /* At least two successors, or else...  */
!   gcc_assert (EDGE_COUNT (block->succs) >= 2);
! 
!   /* Check that all successors of BLOCK are dominated by block.
!      We could use dominated_by_p() for this, but actually there is a much
!      quicker check: any successor that is dominated by BLOCK can't have
!      more than one predecessor edge.  */
!   FOR_EACH_EDGE (e, ei, block->succs)
!     if (! single_pred_p (e->dest))
!       return false;
! 
!   /* Determine the insertion point.  If we cannot safely insert before
!      the last stmt if we'd have to, bail out.  */
!   last = gsi_last_bb (block);
!   if (!gsi_end_p (last)
!       && !is_ctrl_stmt (gsi_stmt (last))
!       && stmt_ends_bb_p (gsi_stmt (last)))
!     return false;
! 
!   hoistable_set = bitmap_set_new ();
!   availout_in_some = BITMAP_ALLOC (&grand_bitmap_obstack);
! 
!   /* A hoistable value must be in ANTIC_IN(block)
!      but not in AVAIL_OUT(BLOCK).  */
!   bitmap_set_copy (hoistable_set, ANTIC_IN (block));
!   bitmap_set_subtract_values (hoistable_set, AVAIL_OUT (block));
! 
!   /* Short-cut for a common case: hoistable_set is empty.  */
!   if (bitmap_empty_p (&hoistable_set->values))
!     goto done;
! 
!   /* Compute which of the hoistable values is in AVAIL_OUT of
!      at least one of the successors of BLOCK.  */
!   FOR_EACH_EDGE (e, ei, block->succs)
!     /* Do not consider expressions solely because their availability
!        on loop exits.  They'd be ANTIC-IN throughout the whole loop
!        and thus effectively hoisted across loops by combination of
!        PRE and hoisting.  */
!     if (! loop_exit_edge_p (block->loop_father, e))
!       FOR_EACH_VALUE_ID_IN_SET (hoistable_set, i, bi)
! 	if (bitmap_set_contains_value (AVAIL_OUT (e->dest), i))
! 	  bitmap_set_bit (availout_in_some, i);
! 
!   /* Short-cut for a common case: (hoistable_set & availout_in_some) empty.  */
!   if (! bitmap_intersect_p (&hoistable_set->values, availout_in_some))
!     goto done;
! 
!   /* If there are candidate values for hoisting, insert expressions
!      strategically to make the hoistable expressions fully redundant.  */
!   EXECUTE_IF_SET_IN_BITMAP (&hoistable_set->expressions, 0, i, bi)
!     {
!       pre_expr expr = expression_for_id (i);
!       unsigned int value_id = get_expr_value_id (expr);
!       gimple_seq stmts = NULL;
! 
!       /* If the value of this expression is not available in at least one
! 	 successor, do not hoist the value.  */
!       if (! bitmap_bit_p (availout_in_some, value_id))
! 	continue;
! 
!       /* OK, we should hoist this value.  Perform the transformation.  */
!       tree res = create_expression_by_pieces (block, expr, &stmts,
! 					      get_expr_type (expr));
!       if (! res)
! 	continue;
! 
!       pre_stats.hoist_insert++;
!       if (dump_file && (dump_flags & TDF_DETAILS))
! 	{
! 	  fprintf (dump_file,
! 		   "Inserting expression in block %d for code hoisting: ",
! 		   block->index);
! 	  print_pre_expr (dump_file, expr);
! 	  fprintf (dump_file, "\n");
! 	}
! 
!       if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last)))
! 	gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT);
!       else
! 	gsi_insert_seq_after (&last, stmts, GSI_SAME_STMT);
! 
!       new_stuff = true;
!     }
! 
! done:
!   BITMAP_FREE (availout_in_some);
!   bitmap_set_free (hoistable_set);
! 
!   return new_stuff;
! }
! 
! /* Do a dominator walk on the control flow graph, and insert computations
!    of values as necessary for PRE and hoisting.  */
! 
! static bool
! insert_aux (basic_block block, bool do_pre, bool do_hoist)
  {
    basic_block son;
    bool new_stuff = false;
*************** insert_aux (basic_block block)
*** 3436,3442 ****
  	{
  	  unsigned i;
  	  bitmap_iterator bi;
! 	  bitmap_set_t newset = NEW_SETS (dom);
  	  if (newset)
  	    {
  	      /* Note that we need to value_replace both NEW_SETS, and
--- 3633,3643 ----
  	{
  	  unsigned i;
  	  bitmap_iterator bi;
! 	  bitmap_set_t newset;
! 
! 	  /* First, update the AVAIL_OUT set with anything we may have
! 	     inserted higher up in the dominator tree.  */
! 	  newset = NEW_SETS (dom);
  	  if (newset)
  	    {
  	      /* Note that we need to value_replace both NEW_SETS, and
*************** insert_aux (basic_block block)
*** 3450,3474 ****
  		  bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
  		}
  	    }
! 	  if (!single_pred_p (block))
  	    {
! 	      new_stuff |= do_regular_insertion (block, dom);
  	      if (do_partial_partial)
! 		new_stuff |= do_partial_partial_insertion (block, dom);
  	    }
  	}
      }
    for (son = first_dom_son (CDI_DOMINATORS, block);
         son;
         son = next_dom_son (CDI_DOMINATORS, son))
      {
!       new_stuff |= insert_aux (son);
      }
  
    return new_stuff;
  }
  
! /* Perform insertion of partially redundant values.  */
  
  static void
  insert (void)
--- 3651,3681 ----
  		  bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
  		}
  	    }
! 
! 	  /* Insert expressions for partial redundancies.  */
! 	  if (do_pre && !single_pred_p (block))
  	    {
! 	      new_stuff |= do_pre_regular_insertion (block, dom);
  	      if (do_partial_partial)
! 		new_stuff |= do_pre_partial_partial_insertion (block, dom);
  	    }
+ 
+ 	  /* Insert expressions for hoisting.  */
+ 	  if (do_hoist && EDGE_COUNT (block->succs) >= 2)
+ 	    new_stuff |= do_hoist_insertion (block);
  	}
      }
    for (son = first_dom_son (CDI_DOMINATORS, block);
         son;
         son = next_dom_son (CDI_DOMINATORS, son))
      {
!       new_stuff |= insert_aux (son, do_pre, do_hoist);
      }
  
    return new_stuff;
  }
  
! /* Perform insertion of partially redundant and hoistable values.  */
  
  static void
  insert (void)
*************** insert (void)
*** 3485,3491 ****
        num_iterations++;
        if (dump_file && dump_flags & TDF_DETAILS)
  	fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
!       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun));
  
        /* Clear the NEW sets before the next iteration.  We have already
           fully propagated its contents.  */
--- 3692,3699 ----
        num_iterations++;
        if (dump_file && dump_flags & TDF_DETAILS)
  	fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
!       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), flag_tree_pre,
! 			      flag_tree_hoist);
  
        /* Clear the NEW sets before the next iteration.  We have already
           fully propagated its contents.  */
*************** public:
*** 4764,4770 ****
    {}
  
    /* opt_pass methods: */
!   virtual bool gate (function *) { return flag_tree_pre != 0; }
    virtual unsigned int execute (function *);
  
  }; // class pass_pre
--- 4972,4979 ----
    {}
  
    /* opt_pass methods: */
!   virtual bool gate (function *)
!     { return flag_tree_pre != 0 || flag_tree_hoist != 0; }
    virtual unsigned int execute (function *);
  
  }; // class pass_pre
*************** pass_pre::execute (function *fun)
*** 4819,4824 ****
--- 5028,5034 ----
  
    statistics_counter_event (fun, "Insertions", pre_stats.insertions);
    statistics_counter_event (fun, "PA inserted", pre_stats.pa_insert);
+   statistics_counter_event (fun, "HOIST inserted", pre_stats.hoist_insert);
    statistics_counter_event (fun, "New PHIs", pre_stats.phis);
    statistics_counter_event (fun, "Eliminated", pre_stats.eliminations);
  
Index: gcc/common.opt
===================================================================
*** gcc/common.opt.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/common.opt	2016-07-04 11:31:33.395904620 +0200
*************** ftree-fre
*** 2382,2387 ****
--- 2382,2391 ----
  Common Report Var(flag_tree_fre) Optimization
  Enable Full Redundancy Elimination (FRE) on trees.
  
+ ftree-hoist
+ Common Report Var(flag_tree_hoist) Optimization
+ Enable code hoisting on trees
+ 
  foptimize-strlen
  Common Report Var(flag_optimize_strlen) Optimization
  Enable string length optimizations on trees.
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre" } */
+ 
+ unsigned short f(unsigned short a)
+ {
+   if (a & 0x8000)
+     a <<= 1, a = a ^ 0x1021;
+   else
+     a <<= 1;
+ 
+   return a;
+ }
+ 
+ /* We should hoist and CSE the shift.  */
+ 
+ /* { dg-final { scan-tree-dump-times " << 1;" 1 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,15 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre" } */
+ 
+ int f(int i)
+ {
+   if (i < 0)
+     return i/10+ i;
+   return i/10 + i;
+ }
+ 
+ /* Hoisting of i/10 + i should make the code straight-line
+    with one division.  */
+ 
+ /* { dg-final { scan-tree-dump-times "goto" 0 "pre" } } */
+ /* { dg-final { scan-tree-dump-times " / 10;" 1 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr43491.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr43491.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr43491.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  
  #define REGISTER register
  
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-optimized" } */
  
  #define REGISTER register
  
*************** long foo(long data, long v)
*** 35,41 ****
  	u = i;
  	return v * t * u;
  }
  /* We should not eliminate global register variable when it is the RHS of
!    a single assignment.  */
! /* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
! /* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
--- 35,45 ----
  	u = i;
  	return v * t * u;
  }
+ 
  /* We should not eliminate global register variable when it is the RHS of
!    a single assignment.  So the number of loads from data_0 has to match
!    that of the number of adds (we hoist data_0 + data_3 above the
!    if (data) and eliminate the useless copy).  */
! 
! /* { dg-final { scan-tree-dump-times "= data_0;" 1 "optimized" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
! /* { dg-final { scan-tree-dump-times " \\+ " 1 "optimized" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ int test (int a, int b, int c, int g)
+ {
+   int d, e;
+   if (a)
+     d = b * c;
+   else
+     d = b - c;
+   e = b * c + g;
+   return d + e;
+ }
+ 
+ /* We should hoist and CSE only the multiplication.  */
+ 
+ /* { dg-final { scan-tree-dump-times " \\* " 1 "pre" } } */
+ /* { dg-final { scan-tree-dump "Insertions: 1" "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,24 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-optimized" } */
+ 
+ /* From PR21485.  */
+ 
+ long
+ NumSift (long *array, int b, unsigned long k)
+ {
+   if (b)
+     if (array[k] < array[k + 1L])
+       ++k;
+   return array[k];
+ }
+ 
+ /* There should be only two loads left.  And the final value in the
+    if (b) arm should be if-converted:
+      tem1 = array[k];
+      if (b)
+        tem1 = MAX (array[k+1], tem1)
+      return tem1;  */
+ 
+ /* { dg-final { scan-tree-dump-times "= \\*" 2 "optimized" } } */
+ /* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
+ /* { dg-final { scan-tree-dump-times "= PHI" 1 "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre" } */
  
  int foo (int i, int j, int b)
  {
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre -fno-tree-hoist" } */
  
  int foo (int i, int j, int b)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 1,6 ****
  /* PR37997 */
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-details" } */
  
  int foo (int i, int b, int result)
  {
--- 1,6 ----
  /* PR37997 */
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-details -fno-tree-hoist" } */
  
  int foo (int i, int b, int result)
  {
*************** int foo (int i, int b, int result)
*** 16,20 ****
--- 16,22 ----
  
  /* We should insert i + 1 into the if (b) path as well as the simplified
     i + 1 & -2 expression.  And do replacement with two PHI temps.  */
+ /* With hoisting enabled we'd hoist i + 1 to before the if, retaining
+    only one PHI node.  */
  
  /* { dg-final { scan-tree-dump-times "with prephitmp" 2 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c	2016-07-04 11:30:57.423485226 +0200
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ int a[1024];
+ int b[1024], c[1024];
+ void foo ()
+ {
+   for (int j = 0; j < 1024; ++j)
+     {
+       for (int i = 0; i < 1024; ++i)
+ 	a[i] = j;
+       b[j] = c[j];
+     }
+ }
+ 
+ /* We should not hoist/PRE the outer loop IV increment or the load
+    from c across the inner loop.  */
+ 
+ /* { dg-final { scan-tree-dump-not "HOIST inserted" "pre" } } */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add code-hoisting to GIMPLE
  2016-07-04 11:27 [PATCH] Add code-hoisting to GIMPLE Richard Biener
@ 2016-07-04 20:31 ` Steven Bosscher
  2016-07-08  7:30   ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Bosscher @ 2016-07-04 20:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On Mon, Jul 4, 2016 at 1:26 PM, Richard Biener wrote:
>
> The following patch is Stevens code-hoisting based on PRE forward-ported
> and fixed for bootstrap plus the case of hoisting code across loops
> which we generally do not want (expressions in the loop exit target block
> are antic-in throughout the whole loop unless they are killed and thus
> get inserted into the exit block and then PREd before the loop).
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> I'm going to try making the bitmap_set ops in do_hoist_insert a bit
> faster - Steven, do you remember any issues with the approach from the
> time you worked on it?

Hi Richi,

It's been almost 8 years since I worked on this, so I really don't
recall much about this at all. Sorry :-)

But thank you for picking this old work up!

Ciao!
Steven

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add code-hoisting to GIMPLE
  2016-07-04 20:31 ` Steven Bosscher
@ 2016-07-08  7:30   ` Richard Biener
  2016-07-08 18:12     ` Jeff Law
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2016-07-08  7:30 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches

On Mon, 4 Jul 2016, Steven Bosscher wrote:

> On Mon, Jul 4, 2016 at 1:26 PM, Richard Biener wrote:
> >
> > The following patch is Stevens code-hoisting based on PRE forward-ported
> > and fixed for bootstrap plus the case of hoisting code across loops
> > which we generally do not want (expressions in the loop exit target block
> > are antic-in throughout the whole loop unless they are killed and thus
> > get inserted into the exit block and then PREd before the loop).
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > I'm going to try making the bitmap_set ops in do_hoist_insert a bit
> > faster - Steven, do you remember any issues with the approach from the
> > time you worked on it?
> 
> Hi Richi,
> 
> It's been almost 8 years since I worked on this, so I really don't
> recall much about this at all. Sorry :-)

Fair enough ;)  Apart from the loop case I noticed that code-hoisting
will cause

  if (x1_6 > 6)                                                                 
    goto <bb 3>;                                                                
  else                                                                          
    goto <bb 4>;                                                                
                                                                                
  <bb 3>:                                                                       
  i_7 = i_2(D) + 2;                                                             
                                                                                
  <bb 4>:                                                                       
  # i_1 = PHI <i_2(D)(2), i_7(3)>                                               
  i_8 = i_1 + 2;

to be re-written to

  _18 = i_2(D) + 2;
  if (x1_6 > 6)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  _19 = _18 + 2;

  <bb 4>:
  # i_8 = PHI <_18(2), _19(3)>

which is because critical edge splitting splits 2->4 and thus makes
i_2(D)+2 antic-in in the else block (IIRC it wouldn't be antic-in
in bb 4 but antic-out in bb 2).  Not sure if it is worth trying to
devise a "fix" for this, it's not really a pessimization.

But it generally shows that hoisting is quite aggressive.

Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add code-hoisting to GIMPLE
  2016-07-08  7:30   ` Richard Biener
@ 2016-07-08 18:12     ` Jeff Law
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Law @ 2016-07-08 18:12 UTC (permalink / raw)
  To: Richard Biener, Steven Bosscher; +Cc: GCC Patches

On 07/08/2016 01:30 AM, Richard Biener wrote:
> On Mon, 4 Jul 2016, Steven Bosscher wrote:
>
>> On Mon, Jul 4, 2016 at 1:26 PM, Richard Biener wrote:
>>>
>>> The following patch is Stevens code-hoisting based on PRE forward-ported
>>> and fixed for bootstrap plus the case of hoisting code across loops
>>> which we generally do not want (expressions in the loop exit target block
>>> are antic-in throughout the whole loop unless they are killed and thus
>>> get inserted into the exit block and then PREd before the loop).
>>>
>>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>>
>>> I'm going to try making the bitmap_set ops in do_hoist_insert a bit
>>> faster - Steven, do you remember any issues with the approach from the
>>> time you worked on it?
>>
>> Hi Richi,
>>
>> It's been almost 8 years since I worked on this, so I really don't
>> recall much about this at all. Sorry :-)
>
> Fair enough ;)  Apart from the loop case I noticed that code-hoisting
> will cause
>
>   if (x1_6 > 6)
>     goto <bb 3>;
>   else
>     goto <bb 4>;
>
>   <bb 3>:
>   i_7 = i_2(D) + 2;
>
>   <bb 4>:
>   # i_1 = PHI <i_2(D)(2), i_7(3)>
>   i_8 = i_1 + 2;
>
> to be re-written to
>
>   _18 = i_2(D) + 2;
>   if (x1_6 > 6)
>     goto <bb 3>;
>   else
>     goto <bb 4>;
>
>   <bb 3>:
>   _19 = _18 + 2;
>
>   <bb 4>:
>   # i_8 = PHI <_18(2), _19(3)>
>
> which is because critical edge splitting splits 2->4 and thus makes
> i_2(D)+2 antic-in in the else block (IIRC it wouldn't be antic-in
> in bb 4 but antic-out in bb 2).  Not sure if it is worth trying to
> devise a "fix" for this, it's not really a pessimization.
>
> But it generally shows that hoisting is quite aggressive.
I don't see it as a pessimization either, though the gratutious code 
motion can make it awful hard to evaluate the real effects (I saw this 
when evaluating Click's GCM/GVN algorithm for GCC.

I thought we had a BZ where we wanted to do this kind of hoisting up 
through PHIs.  Oh, 64700, but it's in the other direction -- sink a 
common expression through a PHI.

Jeff

>
> Richard.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add code-hoisting to GIMPLE
  2016-07-08  7:16 Richard Biener
  2016-07-08 12:23 ` Rainer Orth
@ 2016-07-10 17:34 ` Andrew Pinski
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Pinski @ 2016-07-10 17:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On Fri, Jul 8, 2016 at 12:16 AM, Richard Biener <rguenther@suse.de> wrote:
>
> This is a final candidate patch to add code-hoisting to GIMPLE.
>
> I've already committed several patches fixing fallout and the following
> one adds -fno-code-hoisting (I renamed the option) to a few testcases.
> I filed PRs for the cases code-hoisting exposes missed optimization
> opportunities in passes that I couldn't quickly fix (I fixed path
> splitting and loop distribution but failed to grok SLSR).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> I put the patch on the czerny tester for the weekend runs (x86_64 as
> well).
>
> Testing on other archs and comments are of course appreciated, if nothing
> unusual happens I plan to commit this on Monday.

I tested this on top of GCC 6 on aarch64-linux-gnu and there was no
degradation for SPEC CPU INT 2006 on ThunderX.

Thanks,
Andrew Pinski


>
> Richard.
>
> 2016-07-08  Steven Bosscher  <steven@gcc.gnu.org>
>         Richard Biener  <rguenther@suse.de>
>
>         PR tree-optimization/23286
>         PR tree-optimization/70159
>         * doc/invoke.texi: Document -fcode-hoisting.
>         * common.opt (fcode-hoisting): New flag.
>         * opts.c (default_options_table): Enable -fcode-hoisting at -O2+.
>         * tree-ssa-pre.c (pre_stats): Add hoist_insert.
>         (do_regular_insertion): Rename to ...
>         (do_pre_regular_insertion): ... this and amend general comments
>         on insertion strathegy.
>         (do_partial_partial_insertion): Rename to ...
>         (do_pre_partial_partial_insertion): ... this.
>         (do_hoist_insertion): New function.
>         (insert_aux): Take flags on whether to do PRE and/or hoist insertion
>         and call do_hoist_insertion properly.
>         (insert): Adjust.
>         (pass_pre::gate): Enable also if -fcode-hoisting is enabled.
>         (pass_pre::execute): Register hoist_insert stats.
>
>         * gcc.dg/tree-ssa/ssa-pre-11.c: Disable code hosting.
>         * gcc.dg/tree-ssa/ssa-pre-27.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-pre-28.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-pre-2.c: Likewise.
>         * gcc.dg/tree-ssa/pr35286.c: Likewise.
>         * gcc.dg/tree-ssa/pr35287.c: Likewise.
>         * gcc.dg/hoist-register-pressure-1.c: Likewise.
>         * gcc.dg/hoist-register-pressure-2.c: Likewise.
>         * gcc.dg/hoist-register-pressure-3.c: Likewise.
>         * gcc.dg/pr51879-12.c: Likewise.
>         * gcc.dg/strlenopt-9.c: Likewise.
>         * gcc.dg/tree-ssa/pr47392.c: Likewise.
>         * gcc.dg/tree-ssa/pr68619-4.c: Likewise.
>         * gcc.dg/tree-ssa/split-path-5.c: Likewise.
>         * gcc.dg/tree-ssa/slsr-35.c: Likewise.
>         * gcc.dg/tree-ssa/slsr-36.c: Likewise.
>         * gcc.dg/tree-ssa/loadpre3.c: Adjust so hosting doesn't apply.
>         * gcc.dg/tree-ssa/pr43491.c: Scan optimized dump for desired result.
>         * gcc.dg/tree-ssa/ssa-pre-31.c: Adjust expected outcome for hoisting.
>         * gcc.dg/tree-ssa/ssa-hoist-1.c: New testcase.
>         * gcc.dg/tree-ssa/ssa-hoist-2.c: New testcase.
>         * gcc.dg/tree-ssa/ssa-hoist-3.c: New testcase.
>         * gcc.dg/tree-ssa/ssa-hoist-4.c: New testcase.
>         * gcc.dg/tree-ssa/ssa-hoist-5.c: New testcase.
>         * gcc.dg/tree-ssa/ssa-hoist-6.c: New testcase.
>         * gfortran.dg/pr43984.f90: Adjust expected outcome.
>
> Index: gcc/doc/invoke.texi
> ===================================================================
> *** gcc/doc/invoke.texi.orig    2016-07-07 14:45:27.156657281 +0200
> --- gcc/doc/invoke.texi 2016-07-07 14:45:45.460875808 +0200
> *************** Objective-C and Objective-C++ Dialects}.
> *** 404,410 ****
>   -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
>   -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
>   -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
> ! -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
>   -ftree-loop-if-convert-stores -ftree-loop-im @gol
>   -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
>   -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
> --- 404,410 ----
>   -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
>   -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
>   -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
> ! -ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting -ftree-loop-if-convert @gol
>   -ftree-loop-if-convert-stores -ftree-loop-im @gol
>   -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
>   -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
> *************** also turns on the following optimization
> *** 6372,6377 ****
> --- 6372,6378 ----
>   -fstrict-aliasing -fstrict-overflow @gol
>   -ftree-builtin-call-dce @gol
>   -ftree-switch-conversion -ftree-tail-merge @gol
> + -fcode-hoisting @gol
>   -ftree-pre @gol
>   -ftree-vrp @gol
>   -fipa-ra}
> *************** and the @option{large-stack-frame-growth
> *** 7265,7270 ****
> --- 7266,7279 ----
>   Perform reassociation on trees.  This flag is enabled by default
>   at @option{-O} and higher.
>
> + @item -fcode-hoisting
> + @opindex fcode-hoisting
> + Perform code hoisting.  Code hoisting tries to move the
> + evaluation of expressions executed on all paths to the function exit
> + as early as possible.  This is especially useful as a code size
> + optimization, but it often helps for code speed as well.
> + This flag is enabled by defailt at @option{-O2} and higher.
> +
>   @item -ftree-pre
>   @opindex ftree-pre
>   Perform partial redundancy elimination (PRE) on trees.  This flag is
> *************** Dump each function after STORE-CCP@.  Th
> *** 12222,12229 ****
>
>   @item pre
>   @opindex fdump-tree-pre
> ! Dump trees after partial redundancy elimination.  The file name is made
> ! by appending @file{.pre} to the source file name.
>
>   @item fre
>   @opindex fdump-tree-fre
> --- 12231,12238 ----
>
>   @item pre
>   @opindex fdump-tree-pre
> ! Dump trees after partial redundancy elimination and/or code hoisting.
> ! The file name is made by appending @file{.pre} to the source file name.
>
>   @item fre
>   @opindex fdump-tree-fre
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c.orig     2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c  2016-07-07 14:45:45.520876524 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
>   double cos (double);
>   double f(double a)
>   {
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
>   double cos (double);
>   double f(double a)
>   {
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c.orig      2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c   2016-07-07 14:45:45.524876572 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
>   int motion_test1(int data, int data_0, int data_3, int v)
>   {
>         int i;
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
>   int motion_test1(int data, int data_0, int data_3, int v)
>   {
>         int i;
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/pr35286.c.orig        2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/pr35286.c     2016-07-07 14:45:45.532876667 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
>   int g2;
>   struct A {
>       int a; int b;
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
>   int g2;
>   struct A {
>       int a; int b;
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr35287.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/pr35287.c.orig        2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/pr35287.c     2016-07-07 14:45:45.532876667 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
>   int *gp;
>   int foo(int p)
>   {
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
>   int *gp;
>   int foo(int p)
>   {
> Index: gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c.orig       2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c    2016-07-07 14:45:45.532876667 +0200
> ***************
> *** 1,5 ****
> --- 1,8 ----
>   /* { dg-do compile } */
>   /* { dg-options "-O2 -fdump-tree-pre-stats" } */
> +
> + extern void spoil (void);
> +
>   int foo(int **a,int argc)
>   {
>     int b;
> *************** int foo(int **a,int argc)
> *** 11,17 ****
>       }
>     else
>       {
> !
>       }
>     /* Should be able to eliminate one of the *(*a)'s along the if path
>        by pushing it into the else path. We will also eliminate
> --- 14,21 ----
>       }
>     else
>       {
> !       /* Spoil *a and *(*a) to avoid hoisting it before the "if (...)".  */
> !       spoil ();
>       }
>     /* Should be able to eliminate one of the *(*a)'s along the if path
>        by pushing it into the else path. We will also eliminate
> Index: gcc/opts.c
> ===================================================================
> *** gcc/opts.c.orig     2016-07-07 09:46:02.655811772 +0200
> --- gcc/opts.c  2016-07-07 14:45:45.544876811 +0200
> *************** static const struct default_options defa
> *** 500,505 ****
> --- 500,506 ----
>         REORDER_BLOCKS_ALGORITHM_STC },
>       { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
>       { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
> +     { OPT_LEVELS_2_PLUS, OPT_fcode_hoisting, NULL, 1 },
>       { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
>       { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
>       { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> *** gcc/tree-ssa-pre.c.orig     2016-07-07 09:46:02.655811772 +0200
> --- gcc/tree-ssa-pre.c  2016-07-07 14:45:45.552876906 +0200
> ***************
> *** 1,4 ****
> ! /* SSA-PRE for trees.
>      Copyright (C) 2001-2016 Free Software Foundation, Inc.
>      Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
>      <stevenb@suse.de>
> --- 1,4 ----
> ! /* Full and partial redundancy elimination and code hoisting on SSA GIMPLE.
>      Copyright (C) 2001-2016 Free Software Foundation, Inc.
>      Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
>      <stevenb@suse.de>
> *************** along with GCC; see the file COPYING3.
> *** 55,65 ****
> --- 55,86 ----
>   #include "langhooks.h"
>   #include "alias.h"
>
> + /* Even though this file is called tree-ssa-pre.c, we actually
> +    implement a bit more than just PRE here.  All of them piggy-back
> +    on GVN which is implemented in tree-ssa-sccvn.c.
> +
> +      1. Full Redundancy Elimination (FRE)
> +       This is the elimination phase of GVN.
> +
> +      2. Partial Redundancy Elimination (PRE)
> +       This is adds computation of AVAIL_OUT and ANTIC_IN and
> +       doing expression insertion to form GVN-PRE.
> +
> +      3. Code hoisting
> +       This optimization uses the ANTIC_IN sets computed for PRE
> +       to move expressions further up than PRE would do, to make
> +       multiple computations of the same value fully redundant.
> +       This pass is explained below (after the explanation of the
> +       basic algorithm for PRE).
> + */
> +
>   /* TODO:
>
>      1. Avail sets can be shared by making an avail_find_leader that
>         walks up the dominator tree and looks in those avail sets.
>         This might affect code optimality, it's unclear right now.
> +       Currently the AVAIL_OUT sets are the remaining quadraticness in
> +       memory of GVN-PRE.
>      2. Strength reduction can be performed by anticipating expressions
>         we can repair later on.
>      3. We can do back-substitution or smarter value numbering to catch
> *************** along with GCC; see the file COPYING3.
> *** 71,77 ****
>      represent the actual statement containing the expressions we care about,
>      and we cache the value number by putting it in the expression.  */
>
> ! /* Basic algorithm
>
>      First we walk the statements to generate the AVAIL sets, the
>      EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
> --- 92,98 ----
>      represent the actual statement containing the expressions we care about,
>      and we cache the value number by putting it in the expression.  */
>
> ! /* Basic algorithm for Partial Redundancy Elimination:
>
>      First we walk the statements to generate the AVAIL sets, the
>      EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
> *************** along with GCC; see the file COPYING3.
> *** 111,127 ****
>      In order to make it fully redundant, we insert the expression into
>      the predecessors where it is not available, but is ANTIC.
>
>      For the partial anticipation case, we only perform insertion if it
>      is partially anticipated in some block, and fully available in all
>      of the predecessors.
>
> !    insert/insert_aux/do_regular_insertion/do_partial_partial_insertion
> !    performs these steps.
>
>      Fourth, we eliminate fully redundant expressions.
>      This is a simple statement walk that replaces redundant
>      calculations with the now available values.  */
>
>   /* Representations of value numbers:
>
>      Value numbers are represented by a representative SSA_NAME.  We
> --- 132,206 ----
>      In order to make it fully redundant, we insert the expression into
>      the predecessors where it is not available, but is ANTIC.
>
> +    When optimizing for size, we only eliminate the partial redundancy
> +    if we need to insert in only one predecessor.  This avoids almost
> +    completely the code size increase that PRE usually causes.
> +
>      For the partial anticipation case, we only perform insertion if it
>      is partially anticipated in some block, and fully available in all
>      of the predecessors.
>
> !    do_pre_regular_insertion/do_pre_partial_partial_insertion
> !    performs these steps, driven by insert/insert_aux.
>
>      Fourth, we eliminate fully redundant expressions.
>      This is a simple statement walk that replaces redundant
>      calculations with the now available values.  */
>
> + /* Basic algorithm for Code Hoisting:
> +
> +    Code hoisting is: Moving value computations up in the control flow
> +    graph to make multiple copies redundant.  Typically this is a size
> +    optimization, but there are cases where it also is helpful for speed.
> +
> +    A simple code hoisting algorithm is implemented that piggy-backs on
> +    the PRE infrastructure.  For code hoisting, we have to know ANTIC_OUT
> +    which is effectively ANTIC_IN - AVAIL_OUT.  The latter two have to be
> +    computed for PRE, and we can use them to perform a limited version of
> +    code hoisting, too.
> +
> +    For the purpose of this implementation, a value is hoistable to a basic
> +    block B if the following properties are met:
> +
> +    1. The value is in ANTIC_IN(B) -- the value will be computed on all
> +       paths from B to function exit and it can be computed in B);
> +
> +    2. The value is not in AVAIL_OUT(B) -- there would be no need to
> +       compute the value again and make it available twice;
> +
> +    3. All successors of B are dominated by B -- makes sure that inserting
> +       a computation of the value in B will make the remaining
> +       computations fully redundant;
> +
> +    4. At least one successor has the value in AVAIL_OUT -- to avoid
> +       hoisting values up too far;
> +
> +    5. There are at least two successors of B -- hoisting in straight
> +       line code is pointless.
> +
> +    The third condition is not strictly necessary, but it would complicate
> +    the hoisting pass a lot.  In fact, I don't know of any code hoisting
> +    algorithm that does not have this requirement.  Fortunately, experiments
> +    have show that most candidate hoistable values are in regions that meet
> +    this condition (e.g. diamond-shape regions).
> +
> +    The forth condition is necessary to avoid hoisting things up too far
> +    away from the uses of the value.  Nothing else limits the algorithm
> +    from hoisting everything up as far as ANTIC_IN allows.  Experiments
> +    with SPEC and CSiBE have shown that hoisting up too far results in more
> +    spilling, less benefits for code size, and worse benchmark scores.
> +    Fortunately, in practice most of the interesting hoisting opportunities
> +    are caught despite this limitation.
> +
> +    For hoistable values that meet all conditions, expressions are inserted
> +    to make the calculation of the hoistable value fully redundant.  We
> +    perform code hoisting insertions after each round of PRE insertions,
> +    because code hoisting never exposes new PRE opportunities, but PRE can
> +    create new code hoisting opportunities.
> +
> +    The code hoisting algorithm is implemented in do_hoist_insert, driven
> +    by insert/insert_aux.  */
> +
>   /* Representations of value numbers:
>
>      Value numbers are represented by a representative SSA_NAME.  We
> *************** static struct
> *** 446,451 ****
> --- 525,533 ----
>     /* The number of inserts found due to partial anticipation  */
>     int pa_insert;
>
> +   /* The number of inserts made for code hoisting.  */
> +   int hoist_insert;
> +
>     /* The number of new PHI nodes added by PRE.  */
>     int phis;
>   } pre_stats;
> *************** static pre_expr bitmap_find_leader (bitm
> *** 455,460 ****
> --- 537,543 ----
>   static void bitmap_value_insert_into_set (bitmap_set_t, pre_expr);
>   static void bitmap_value_replace_in_set (bitmap_set_t, pre_expr);
>   static void bitmap_set_copy (bitmap_set_t, bitmap_set_t);
> + static void bitmap_set_and (bitmap_set_t, bitmap_set_t);
>   static bool bitmap_set_contains_value (bitmap_set_t, unsigned int);
>   static void bitmap_insert_into_set (bitmap_set_t, pre_expr);
>   static void bitmap_insert_into_set_1 (bitmap_set_t, pre_expr,
> *************** insert_into_preds_of_block (basic_block
> *** 3104,3110 ****
>
>
>
> ! /* Perform insertion of partially redundant values.
>      For BLOCK, do the following:
>      1.  Propagate the NEW_SETS of the dominator into the current block.
>      If the block has multiple predecessors,
> --- 3187,3193 ----
>
>
>
> ! /* Perform insertion of partially redundant or hoistable values.
>      For BLOCK, do the following:
>      1.  Propagate the NEW_SETS of the dominator into the current block.
>      If the block has multiple predecessors,
> *************** insert_into_preds_of_block (basic_block
> *** 3115,3129 ****
>          2c. Insert a new PHI merging the values of the predecessors.
>          2d. Insert the new PHI, and the new expressions, into the
>            NEW_SETS set.
> !    3. Recursively call ourselves on the dominator children of BLOCK.
> !
> !    Steps 1, 2a, and 3 are done by insert_aux. 2b, 2c and 2d are done by
> !    do_regular_insertion and do_partial_insertion.
> !
>   */
>
>   static bool
> ! do_regular_insertion (basic_block block, basic_block dom)
>   {
>     bool new_stuff = false;
>     vec<pre_expr> exprs;
> --- 3198,3217 ----
>          2c. Insert a new PHI merging the values of the predecessors.
>          2d. Insert the new PHI, and the new expressions, into the
>            NEW_SETS set.
> !    If the block has multiple successors,
> !        3a. Iterate over the ANTIC values for the block to see if
> !          any of them are good candidates for hoisting.
> !        3b. If so, insert expressions computing the values in BLOCK,
> !          and add the new expressions into the NEW_SETS set.
> !    4. Recursively call ourselves on the dominator children of BLOCK.
> !
> !    Steps 1, 2a, and 4 are done by insert_aux. 2b, 2c and 2d are done by
> !    do_pre_regular_insertion and do_partial_insertion.  3a and 3b are
> !    done in do_hoist_insertion.
>   */
>
>   static bool
> ! do_pre_regular_insertion (basic_block block, basic_block dom)
>   {
>     bool new_stuff = false;
>     vec<pre_expr> exprs;
> *************** do_regular_insertion (basic_block block,
> *** 3292,3300 ****
>      In this case, we know that putting it earlier will enable us to
>      remove the later computation.  */
>
> -
>   static bool
> ! do_partial_partial_insertion (basic_block block, basic_block dom)
>   {
>     bool new_stuff = false;
>     vec<pre_expr> exprs;
> --- 3380,3387 ----
>      In this case, we know that putting it earlier will enable us to
>      remove the later computation.  */
>
>   static bool
> ! do_pre_partial_partial_insertion (basic_block block, basic_block dom)
>   {
>     bool new_stuff = false;
>     vec<pre_expr> exprs;
> *************** do_partial_partial_insertion (basic_bloc
> *** 3423,3430 ****
>     return new_stuff;
>   }
>
>   static bool
> ! insert_aux (basic_block block)
>   {
>     basic_block son;
>     bool new_stuff = false;
> --- 3510,3647 ----
>     return new_stuff;
>   }
>
> + /* Insert expressions in BLOCK to compute hoistable values up.
> +    Return TRUE if something was inserted, otherwise return FALSE.
> +    The caller has to make sure that BLOCK has at least two successors.  */
> +
>   static bool
> ! do_hoist_insertion (basic_block block)
> ! {
> !   edge e;
> !   edge_iterator ei;
> !   bool new_stuff = false;
> !   unsigned i;
> !   gimple_stmt_iterator last;
> !
> !   /* At least two successors, or else...  */
> !   gcc_assert (EDGE_COUNT (block->succs) >= 2);
> !
> !   /* Check that all successors of BLOCK are dominated by block.
> !      We could use dominated_by_p() for this, but actually there is a much
> !      quicker check: any successor that is dominated by BLOCK can't have
> !      more than one predecessor edge.  */
> !   FOR_EACH_EDGE (e, ei, block->succs)
> !     if (! single_pred_p (e->dest))
> !       return false;
> !
> !   /* Determine the insertion point.  If we cannot safely insert before
> !      the last stmt if we'd have to, bail out.  */
> !   last = gsi_last_bb (block);
> !   if (!gsi_end_p (last)
> !       && !is_ctrl_stmt (gsi_stmt (last))
> !       && stmt_ends_bb_p (gsi_stmt (last)))
> !     return false;
> !
> !   /* Compute the set of hoistable expressions from ANTIC_IN.  First compute
> !      hoistable values.  */
> !   bitmap_set hoistable_set;
> !
> !   /* A hoistable value must be in ANTIC_IN(block)
> !      but not in AVAIL_OUT(BLOCK).  */
> !   bitmap_initialize (&hoistable_set.values, &grand_bitmap_obstack);
> !   bitmap_and_compl (&hoistable_set.values,
> !                   &ANTIC_IN (block)->values, &AVAIL_OUT (block)->values);
> !
> !   /* Short-cut for a common case: hoistable_set is empty.  */
> !   if (bitmap_empty_p (&hoistable_set.values))
> !     return false;
> !
> !   /* Compute which of the hoistable values is in AVAIL_OUT of
> !      at least one of the successors of BLOCK.  */
> !   bitmap_head availout_in_some;
> !   bitmap_initialize (&availout_in_some, &grand_bitmap_obstack);
> !   FOR_EACH_EDGE (e, ei, block->succs)
> !     /* Do not consider expressions solely because their availability
> !        on loop exits.  They'd be ANTIC-IN throughout the whole loop
> !        and thus effectively hoisted across loops by combination of
> !        PRE and hoisting.  */
> !     if (! loop_exit_edge_p (block->loop_father, e))
> !       bitmap_ior_and_into (&availout_in_some, &hoistable_set.values,
> !                          &AVAIL_OUT (e->dest)->values);
> !   bitmap_clear (&hoistable_set.values);
> !
> !   /* Short-cut for a common case: availout_in_some is empty.  */
> !   if (bitmap_empty_p (&availout_in_some))
> !     return false;
> !
> !   /* Hack hoitable_set in-place so we can use sorted_array_from_bitmap_set.  */
> !   hoistable_set.values = availout_in_some;
> !   hoistable_set.expressions = ANTIC_IN (block)->expressions;
> !
> !   /* Now finally construct the topological-ordered expression set.  */
> !   vec<pre_expr> exprs = sorted_array_from_bitmap_set (&hoistable_set);
> !
> !   bitmap_clear (&hoistable_set.values);
> !
> !   /* If there are candidate values for hoisting, insert expressions
> !      strategically to make the hoistable expressions fully redundant.  */
> !   pre_expr expr;
> !   FOR_EACH_VEC_ELT (exprs, i, expr)
> !     {
> !       /* While we try to sort expressions topologically above the
> !          sorting doesn't work out perfectly.  Catch expressions we
> !        already inserted.  */
> !       unsigned int value_id = get_expr_value_id (expr);
> !       if (bitmap_set_contains_value (AVAIL_OUT (block), value_id))
> !       {
> !         if (dump_file && (dump_flags & TDF_DETAILS))
> !           {
> !             fprintf (dump_file,
> !                      "Already inserted expression for ");
> !             print_pre_expr (dump_file, expr);
> !             fprintf (dump_file, " (%04d)\n", value_id);
> !           }
> !         continue;
> !       }
> !
> !       /* OK, we should hoist this value.  Perform the transformation.  */
> !       pre_stats.hoist_insert++;
> !       if (dump_file && (dump_flags & TDF_DETAILS))
> !       {
> !         fprintf (dump_file,
> !                  "Inserting expression in block %d for code hoisting: ",
> !                  block->index);
> !         print_pre_expr (dump_file, expr);
> !         fprintf (dump_file, " (%04d)\n", value_id);
> !       }
> !
> !       gimple_seq stmts = NULL;
> !       tree res = create_expression_by_pieces (block, expr, &stmts,
> !                                             get_expr_type (expr));
> !       if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last)))
> !       gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT);
> !       else
> !       gsi_insert_seq_after (&last, stmts, GSI_NEW_STMT);
> !
> !       /* Make sure to not return true if expression creation ultimately
> !          failed but also make sure to insert any stmts produced as they
> !        are tracked in inserted_exprs.  */
> !       if (! res)
> !       continue;
> !
> !       new_stuff = true;
> !     }
> !
> !   exprs.release ();
> !
> !   return new_stuff;
> ! }
> !
> ! /* Do a dominator walk on the control flow graph, and insert computations
> !    of values as necessary for PRE and hoisting.  */
> !
> ! static bool
> ! insert_aux (basic_block block, bool do_pre, bool do_hoist)
>   {
>     basic_block son;
>     bool new_stuff = false;
> *************** insert_aux (basic_block block)
> *** 3437,3443 ****
>         {
>           unsigned i;
>           bitmap_iterator bi;
> !         bitmap_set_t newset = NEW_SETS (dom);
>           if (newset)
>             {
>               /* Note that we need to value_replace both NEW_SETS, and
> --- 3654,3664 ----
>         {
>           unsigned i;
>           bitmap_iterator bi;
> !         bitmap_set_t newset;
> !
> !         /* First, update the AVAIL_OUT set with anything we may have
> !            inserted higher up in the dominator tree.  */
> !         newset = NEW_SETS (dom);
>           if (newset)
>             {
>               /* Note that we need to value_replace both NEW_SETS, and
> *************** insert_aux (basic_block block)
> *** 3451,3475 ****
>                   bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
>                 }
>             }
> !         if (!single_pred_p (block))
>             {
> !             new_stuff |= do_regular_insertion (block, dom);
>               if (do_partial_partial)
> !               new_stuff |= do_partial_partial_insertion (block, dom);
>             }
>         }
>       }
>     for (son = first_dom_son (CDI_DOMINATORS, block);
>          son;
>          son = next_dom_son (CDI_DOMINATORS, son))
>       {
> !       new_stuff |= insert_aux (son);
>       }
>
>     return new_stuff;
>   }
>
> ! /* Perform insertion of partially redundant values.  */
>
>   static void
>   insert (void)
> --- 3672,3702 ----
>                   bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
>                 }
>             }
> !
> !         /* Insert expressions for partial redundancies.  */
> !         if (do_pre && !single_pred_p (block))
>             {
> !             new_stuff |= do_pre_regular_insertion (block, dom);
>               if (do_partial_partial)
> !               new_stuff |= do_pre_partial_partial_insertion (block, dom);
>             }
> +
> +         /* Insert expressions for hoisting.  */
> +         if (do_hoist && EDGE_COUNT (block->succs) >= 2)
> +           new_stuff |= do_hoist_insertion (block);
>         }
>       }
>     for (son = first_dom_son (CDI_DOMINATORS, block);
>          son;
>          son = next_dom_son (CDI_DOMINATORS, son))
>       {
> !       new_stuff |= insert_aux (son, do_pre, do_hoist);
>       }
>
>     return new_stuff;
>   }
>
> ! /* Perform insertion of partially redundant and hoistable values.  */
>
>   static void
>   insert (void)
> *************** insert (void)
> *** 3486,3492 ****
>         num_iterations++;
>         if (dump_file && dump_flags & TDF_DETAILS)
>         fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
> !       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>
>         /* Clear the NEW sets before the next iteration.  We have already
>            fully propagated its contents.  */
> --- 3713,3720 ----
>         num_iterations++;
>         if (dump_file && dump_flags & TDF_DETAILS)
>         fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
> !       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), flag_tree_pre,
> !                             flag_code_hoisting);
>
>         /* Clear the NEW sets before the next iteration.  We have already
>            fully propagated its contents.  */
> *************** public:
> *** 4833,4839 ****
>     {}
>
>     /* opt_pass methods: */
> !   virtual bool gate (function *) { return flag_tree_pre != 0; }
>     virtual unsigned int execute (function *);
>
>   }; // class pass_pre
> --- 5061,5068 ----
>     {}
>
>     /* opt_pass methods: */
> !   virtual bool gate (function *)
> !     { return flag_tree_pre != 0 || flag_code_hoisting != 0; }
>     virtual unsigned int execute (function *);
>
>   }; // class pass_pre
> *************** pass_pre::execute (function *fun)
> *** 4888,4893 ****
> --- 5117,5123 ----
>
>     statistics_counter_event (fun, "Insertions", pre_stats.insertions);
>     statistics_counter_event (fun, "PA inserted", pre_stats.pa_insert);
> +   statistics_counter_event (fun, "HOIST inserted", pre_stats.hoist_insert);
>     statistics_counter_event (fun, "New PHIs", pre_stats.phis);
>     statistics_counter_event (fun, "Eliminated", pre_stats.eliminations);
>
> Index: gcc/common.opt
> ===================================================================
> *** gcc/common.opt.orig 2016-07-07 09:46:02.655811772 +0200
> --- gcc/common.opt      2016-07-07 14:45:45.572877145 +0200
> *************** fchecking=
> *** 1038,1043 ****
> --- 1038,1047 ----
>   Common Joined RejectNegative UInteger Var(flag_checking)
>   Perform internal consistency checkings.
>
> + fcode-hoisting
> + Common Report Var(flag_code_hoisting) Optimization
> + Enable code hoisting.
> +
>   fcombine-stack-adjustments
>   Common Report Var(flag_combine_stack_adjustments) Optimization
>   Looks for opportunities to reduce stack adjustments and stack references.
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c
> ===================================================================
> *** /dev/null   1970-01-01 00:00:00.000000000 +0000
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c 2016-07-07 14:45:45.572877145 +0200
> ***************
> *** 0 ****
> --- 1,16 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-pre" } */
> +
> + unsigned short f(unsigned short a)
> + {
> +   if (a & 0x8000)
> +     a <<= 1, a = a ^ 0x1021;
> +   else
> +     a <<= 1;
> +
> +   return a;
> + }
> +
> + /* We should hoist and CSE the shift.  */
> +
> + /* { dg-final { scan-tree-dump-times " << 1;" 1 "pre" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c
> ===================================================================
> *** /dev/null   1970-01-01 00:00:00.000000000 +0000
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c 2016-07-07 14:45:45.572877145 +0200
> ***************
> *** 0 ****
> --- 1,15 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-pre" } */
> +
> + int f(int i)
> + {
> +   if (i < 0)
> +     return i/10+ i;
> +   return i/10 + i;
> + }
> +
> + /* Hoisting of i/10 + i should make the code straight-line
> +    with one division.  */
> +
> + /* { dg-final { scan-tree-dump-times "goto" 0 "pre" } } */
> + /* { dg-final { scan-tree-dump-times " / 10;" 1 "pre" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr43491.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/pr43491.c.orig        2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/pr43491.c     2016-07-07 14:45:45.576877193 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
>
>   #define REGISTER register
>
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-optimized" } */
>
>   #define REGISTER register
>
> *************** long foo(long data, long v)
> *** 35,41 ****
>         u = i;
>         return v * t * u;
>   }
>   /* We should not eliminate global register variable when it is the RHS of
> !    a single assignment.  */
> ! /* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
> ! /* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
> --- 35,45 ----
>         u = i;
>         return v * t * u;
>   }
> +
>   /* We should not eliminate global register variable when it is the RHS of
> !    a single assignment.  So the number of loads from data_0 has to match
> !    that of the number of adds (we hoist data_0 + data_3 above the
> !    if (data) and eliminate the useless copy).  */
> !
> ! /* { dg-final { scan-tree-dump-times "= data_0;" 1 "optimized" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
> ! /* { dg-final { scan-tree-dump-times " \\+ " 1 "optimized" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c
> ===================================================================
> *** /dev/null   1970-01-01 00:00:00.000000000 +0000
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c 2016-07-07 14:45:45.576877193 +0200
> ***************
> *** 0 ****
> --- 1,18 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-pre-stats" } */
> +
> + int test (int a, int b, int c, int g)
> + {
> +   int d, e;
> +   if (a)
> +     d = b * c;
> +   else
> +     d = b - c;
> +   e = b * c + g;
> +   return d + e;
> + }
> +
> + /* We should hoist and CSE only the multiplication.  */
> +
> + /* { dg-final { scan-tree-dump-times " \\* " 1 "pre" } } */
> + /* { dg-final { scan-tree-dump "Insertions: 1" "pre" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c
> ===================================================================
> *** /dev/null   1970-01-01 00:00:00.000000000 +0000
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c 2016-07-07 14:45:45.576877193 +0200
> ***************
> *** 0 ****
> --- 1,24 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> + /* From PR21485.  */
> +
> + long
> + NumSift (long *array, int b, unsigned long k)
> + {
> +   if (b)
> +     if (array[k] < array[k + 1L])
> +       ++k;
> +   return array[k];
> + }
> +
> + /* There should be only two loads left.  And the final value in the
> +    if (b) arm should be if-converted:
> +      tem1 = array[k];
> +      if (b)
> +        tem1 = MAX (array[k+1], tem1)
> +      return tem1;  */
> +
> + /* { dg-final { scan-tree-dump-times "= \\*" 2 "optimized" } } */
> + /* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
> + /* { dg-final { scan-tree-dump-times "= PHI" 1 "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c.orig     2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c  2016-07-07 14:45:45.576877193 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre" } */
>
>   int foo (int i, int j, int b)
>   {
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre -fno-code-hoisting" } */
>
>   int foo (int i, int j, int b)
>   {
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c.orig     2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c  2016-07-07 14:45:45.576877193 +0200
> ***************
> *** 1,6 ****
>   /* PR37997 */
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre-details" } */
>
>   int foo (int i, int b, int result)
>   {
> --- 1,6 ----
>   /* PR37997 */
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-pre-details -fno-code-hoisting" } */
>
>   int foo (int i, int b, int result)
>   {
> *************** int foo (int i, int b, int result)
> *** 16,20 ****
> --- 16,22 ----
>
>   /* We should insert i + 1 into the if (b) path as well as the simplified
>      i + 1 & -2 expression.  And do replacement with two PHI temps.  */
> + /* With hoisting enabled we'd hoist i + 1 to before the if, retaining
> +    only one PHI node.  */
>
>   /* { dg-final { scan-tree-dump-times "with prephitmp" 2 "pre" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c
> ===================================================================
> *** /dev/null   1970-01-01 00:00:00.000000000 +0000
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c 2016-07-07 14:45:45.576877193 +0200
> ***************
> *** 0 ****
> --- 1,19 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-pre-stats" } */
> +
> + int a[1024];
> + int b[1024], c[1024];
> + void foo ()
> + {
> +   for (int j = 0; j < 1024; ++j)
> +     {
> +       for (int i = 0; i < 1024; ++i)
> +       a[i] = j;
> +       b[j] = c[j];
> +     }
> + }
> +
> + /* We should not hoist/PRE the outer loop IV increment or the load
> +    from c across the inner loop.  */
> +
> + /* { dg-final { scan-tree-dump-not "HOIST inserted" "pre" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c.orig     2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c  2016-07-07 14:45:45.576877193 +0200
> *************** int foo (S1 *root, int N)
> *** 43,46 ****
>     return 0;
>   }
>
> ! /* { dg-final { scan-tree-dump-times "key" 4 "pre" } } */
> --- 43,46 ----
>     return 0;
>   }
>
> ! /* { dg-final { scan-tree-dump-times "key" 3 "pre" } } */
> Index: gcc/testsuite/gcc.dg/hoist-register-pressure-1.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/hoist-register-pressure-1.c.orig       2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/hoist-register-pressure-1.c    2016-07-07 14:45:45.608877575 +0200
> ***************
> *** 1,4 ****
> ! /* { dg-options "-Os -fdump-rtl-hoist" }  */
>   /* The rtl hoist pass requires that the expression to be hoisted can
>      be assigned without clobbering cc.  For a PLUS rtx on S/390 this
>      requires a load address instruction which is fine on 64 bit but
> --- 1,4 ----
> ! /* { dg-options "-Os -fdump-rtl-hoist -fno-code-hoisting" }  */
>   /* The rtl hoist pass requires that the expression to be hoisted can
>      be assigned without clobbering cc.  For a PLUS rtx on S/390 this
>      requires a load address instruction which is fine on 64 bit but
> Index: gcc/testsuite/gcc.dg/hoist-register-pressure-2.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/hoist-register-pressure-2.c.orig       2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/hoist-register-pressure-2.c    2016-07-07 14:45:45.612877623 +0200
> ***************
> *** 1,4 ****
> ! /* { dg-options "-Os -fdump-rtl-hoist" }  */
>   /* The rtl hoist pass requires that the expression to be hoisted can
>      be assigned without clobbering cc.  For a PLUS rtx on S/390 this
>      requires a load address instruction which is fine on 64 bit but
> --- 1,4 ----
> ! /* { dg-options "-Os -fdump-rtl-hoist -fno-code-hoisting" }  */
>   /* The rtl hoist pass requires that the expression to be hoisted can
>      be assigned without clobbering cc.  For a PLUS rtx on S/390 this
>      requires a load address instruction which is fine on 64 bit but
> Index: gcc/testsuite/gcc.dg/hoist-register-pressure-3.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/hoist-register-pressure-3.c.orig       2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/hoist-register-pressure-3.c    2016-07-07 14:45:45.624877766 +0200
> ***************
> *** 1,4 ****
> ! /* { dg-options "-Os -fdump-rtl-hoist" }  */
>   /* The rtl hoist pass requires that the expression to be hoisted can
>      be assigned without clobbering cc.  For a PLUS rtx on S/390 this
>      requires a load address instruction which is fine on 64 bit but
> --- 1,4 ----
> ! /* { dg-options "-Os -fdump-rtl-hoist -fno-code-hoisting" }  */
>   /* The rtl hoist pass requires that the expression to be hoisted can
>      be assigned without clobbering cc.  For a PLUS rtx on S/390 this
>      requires a load address instruction which is fine on 64 bit but
> Index: gcc/testsuite/gcc.dg/pr51879-12.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/pr51879-12.c.orig      2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/pr51879-12.c   2016-07-07 14:45:45.624877766 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre" } */
>
>   __attribute__((pure)) int bar (int);
>   __attribute__((pure)) int bar2 (int);
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre -fno-code-hoisting" } */
>
>   __attribute__((pure)) int bar (int);
>   __attribute__((pure)) int bar2 (int);
> Index: gcc/testsuite/gcc.dg/strlenopt-9.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/strlenopt-9.c.orig     2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/strlenopt-9.c  2016-07-07 14:45:45.640877957 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do run } */
> ! /* { dg-options "-O2 -fdump-tree-strlen -fdump-tree-optimized" } */
>
>   #include "strlenopt.h"
>
> --- 1,5 ----
>   /* { dg-do run } */
> ! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-strlen -fdump-tree-optimized" } */
>
>   #include "strlenopt.h"
>
> Index: gcc/testsuite/gfortran.dg/pr43984.f90
> ===================================================================
> *** gcc/testsuite/gfortran.dg/pr43984.f90.orig  2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gfortran.dg/pr43984.f90       2016-07-07 14:45:45.648878052 +0200
> *************** end subroutine
> *** 50,55 ****
>
>   end
>
> ! ! There should be three loads from iyz.data, not four.
>
> ! ! { dg-final { scan-tree-dump-times "= iyz.data" 3 "pre" } }
> --- 50,55 ----
>
>   end
>
> ! ! There should be two loads from iyz.data, not four.
>
> ! ! { dg-final { scan-tree-dump-times "= iyz.data" 2 "pre" } }
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr47392.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/pr47392.c.orig        2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/pr47392.c     2016-07-07 14:45:45.648878052 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do run } */
> ! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
>
>   struct A
>   {
> --- 1,5 ----
>   /* { dg-do run } */
> ! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
>
>   struct A
>   {
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c.orig      2016-07-07 09:46:02.655811772 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c   2016-07-07 14:45:45.648878052 +0200
> ***************
> *** 1,5 ****
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-optimized -w" } */
>
>   typedef struct rtx_def *rtx;
>   enum rtx_code
> --- 1,5 ----
>   /* { dg-do compile } */
> ! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-optimized -w" } */
>
>   typedef struct rtx_def *rtx;
>   enum rtx_code
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-35.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/slsr-35.c.orig        2016-07-05 15:24:03.059612220 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-35.c     2016-07-08 06:54:20.620252838 +0200
> ***************
> *** 3,9 ****
>      phi has an argument which is a parameter.  */
>
>   /* { dg-do compile } */
> ! /* { dg-options "-O3 -fdump-tree-optimized" } */
>
>   int
>   f (int c, int i)
> --- 3,9 ----
>      phi has an argument which is a parameter.  */
>
>   /* { dg-do compile } */
> ! /* { dg-options "-O3 -fno-code-hoisting -fdump-tree-optimized" } */
>
>   int
>   f (int c, int i)
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-36.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/slsr-36.c.orig        2016-07-05 15:24:03.059612220 +0200
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-36.c     2016-07-08 06:54:32.300387549 +0200
> ***************
> *** 3,9 ****
>      phi has an argument which is a parameter.  */
>
>   /* { dg-do compile } */
> ! /* { dg-options "-O3 -fdump-tree-optimized" } */
>
>   int
>   f (int s, int c, int i)
> --- 3,9 ----
>      phi has an argument which is a parameter.  */
>
>   /* { dg-do compile } */
> ! /* { dg-options "-O3 -fno-code-hoisting -fdump-tree-optimized" } */
>
>   int
>   f (int s, int c, int i)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add code-hoisting to GIMPLE
  2016-07-08 12:27   ` Richard Biener
@ 2016-07-08 12:47     ` Rainer Orth
  0 siblings, 0 replies; 9+ messages in thread
From: Rainer Orth @ 2016-07-08 12:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi Richard,

>> I've just bootstrapped the patch on sparc-sun-solaris2.12, which
>> uncovered a couple of testsuite failures:
>> 
>> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths
>> "Duplicat
>> ing join block" 2
>> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths
>> "Duplicat
>> ing join block" 2
>
> Saw this on x86_64 as well and should have been fixed with
>
> 2016-07-05  Richard Biener  <rguenther@suse.de>                                 
>                                                                                 
>         * gimple-ssa-split-paths.c (find_block_to_duplicate_for_splitting_pa):  
>         Handle empty else block.                                                
>         (is_feasible_trace): Likewise.                                          
>         (split_paths): Likewise.            
>
>> Message doesn't occur at all.
>> 
>> +FAIL: gfortran.dg/ldist-1.f90 -O scan-tree-dump-not ldist "distributed:
>> spl
>> it to"
>> 
>> Likewise.
>
> You mean it does occur (it's a scan-tree-dump-not).  I saw this on x86_64
> as well and fixed it with
>
> 2016-07-05  Richard Biener  <rguenther@suse.de>
>
>         * tree-loop-distribution.c (distribute_loop): Fix issue with
>         the cost model loop.
>
> maybe the fixes were not complete.  I'll have a second look with a
> sparc-solaris cross on Monday.

I guess there's no need: I applied the patch to a not fully up to date
tree which I'd bootstrapped before, to save me the full regtest.  At
r238001, it just didn't include either fix.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add code-hoisting to GIMPLE
  2016-07-08 12:23 ` Rainer Orth
@ 2016-07-08 12:27   ` Richard Biener
  2016-07-08 12:47     ` Rainer Orth
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2016-07-08 12:27 UTC (permalink / raw)
  To: Rainer Orth; +Cc: gcc-patches

On Fri, 8 Jul 2016, Rainer Orth wrote:

> Hi Richard,
> 
> > This is a final candidate patch to add code-hoisting to GIMPLE.
> >
> > I've already committed several patches fixing fallout and the following
> > one adds -fno-code-hoisting (I renamed the option) to a few testcases.
> > I filed PRs for the cases code-hoisting exposes missed optimization
> > opportunities in passes that I couldn't quickly fix (I fixed path
> > splitting and loop distribution but failed to grok SLSR).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > I put the patch on the czerny tester for the weekend runs (x86_64 as 
> > well).
> >
> > Testing on other archs and comments are of course appreciated, if nothing
> > unusual happens I plan to commit this on Monday.
> 
> I've just bootstrapped the patch on sparc-sun-solaris2.12, which
> uncovered a couple of testsuite failures:
> 
> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths "Duplicat
> ing join block" 2
> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths "Duplicat
> ing join block" 2

Saw this on x86_64 as well and should have been fixed with

2016-07-05  Richard Biener  <rguenther@suse.de>                                 
                                                                                
        * gimple-ssa-split-paths.c (find_block_to_duplicate_for_splitting_pa):  
        Handle empty else block.                                                
        (is_feasible_trace): Likewise.                                          
        (split_paths): Likewise.            

> Message doesn't occur at all.
> 
> +FAIL: gfortran.dg/ldist-1.f90   -O   scan-tree-dump-not ldist "distributed: spl
> it to"
> 
> Likewise.

You mean it does occur (it's a scan-tree-dump-not).  I saw this on x86_64
as well and fixed it with

2016-07-05  Richard Biener  <rguenther@suse.de>

        * tree-loop-distribution.c (distribute_loop): Fix issue with
        the cost model loop.

maybe the fixes were not complete.  I'll have a second look with a
sparc-solaris cross on Monday.

Thanks for testing,
Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add code-hoisting to GIMPLE
  2016-07-08  7:16 Richard Biener
@ 2016-07-08 12:23 ` Rainer Orth
  2016-07-08 12:27   ` Richard Biener
  2016-07-10 17:34 ` Andrew Pinski
  1 sibling, 1 reply; 9+ messages in thread
From: Rainer Orth @ 2016-07-08 12:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi Richard,

> This is a final candidate patch to add code-hoisting to GIMPLE.
>
> I've already committed several patches fixing fallout and the following
> one adds -fno-code-hoisting (I renamed the option) to a few testcases.
> I filed PRs for the cases code-hoisting exposes missed optimization
> opportunities in passes that I couldn't quickly fix (I fixed path
> splitting and loop distribution but failed to grok SLSR).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> I put the patch on the czerny tester for the weekend runs (x86_64 as 
> well).
>
> Testing on other archs and comments are of course appreciated, if nothing
> unusual happens I plan to commit this on Monday.

I've just bootstrapped the patch on sparc-sun-solaris2.12, which
uncovered a couple of testsuite failures:

+FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths "Duplicat
ing join block" 2
+FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths "Duplicat
ing join block" 2

Message doesn't occur at all.

+FAIL: gfortran.dg/ldist-1.f90   -O   scan-tree-dump-not ldist "distributed: spl
it to"

Likewise.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] Add code-hoisting to GIMPLE
@ 2016-07-08  7:16 Richard Biener
  2016-07-08 12:23 ` Rainer Orth
  2016-07-10 17:34 ` Andrew Pinski
  0 siblings, 2 replies; 9+ messages in thread
From: Richard Biener @ 2016-07-08  7:16 UTC (permalink / raw)
  To: gcc-patches


This is a final candidate patch to add code-hoisting to GIMPLE.

I've already committed several patches fixing fallout and the following
one adds -fno-code-hoisting (I renamed the option) to a few testcases.
I filed PRs for the cases code-hoisting exposes missed optimization
opportunities in passes that I couldn't quickly fix (I fixed path
splitting and loop distribution but failed to grok SLSR).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I put the patch on the czerny tester for the weekend runs (x86_64 as 
well).

Testing on other archs and comments are of course appreciated, if nothing
unusual happens I plan to commit this on Monday.

Richard.

2016-07-08  Steven Bosscher  <steven@gcc.gnu.org>
	Richard Biener  <rguenther@suse.de>

	PR tree-optimization/23286
	PR tree-optimization/70159
	* doc/invoke.texi: Document -fcode-hoisting.
	* common.opt (fcode-hoisting): New flag.
	* opts.c (default_options_table): Enable -fcode-hoisting at -O2+.
	* tree-ssa-pre.c (pre_stats): Add hoist_insert.
	(do_regular_insertion): Rename to ...
	(do_pre_regular_insertion): ... this and amend general comments
	on insertion strathegy.
	(do_partial_partial_insertion): Rename to ...
	(do_pre_partial_partial_insertion): ... this.
	(do_hoist_insertion): New function.
	(insert_aux): Take flags on whether to do PRE and/or hoist insertion
	and call do_hoist_insertion properly.
	(insert): Adjust.
	(pass_pre::gate): Enable also if -fcode-hoisting is enabled.
	(pass_pre::execute): Register hoist_insert stats.

	* gcc.dg/tree-ssa/ssa-pre-11.c: Disable code hosting.
	* gcc.dg/tree-ssa/ssa-pre-27.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-28.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-2.c: Likewise.
	* gcc.dg/tree-ssa/pr35286.c: Likewise.
	* gcc.dg/tree-ssa/pr35287.c: Likewise.
	* gcc.dg/hoist-register-pressure-1.c: Likewise.
	* gcc.dg/hoist-register-pressure-2.c: Likewise.
	* gcc.dg/hoist-register-pressure-3.c: Likewise.
	* gcc.dg/pr51879-12.c: Likewise.
	* gcc.dg/strlenopt-9.c: Likewise.
	* gcc.dg/tree-ssa/pr47392.c: Likewise.
	* gcc.dg/tree-ssa/pr68619-4.c: Likewise.
	* gcc.dg/tree-ssa/split-path-5.c: Likewise.
	* gcc.dg/tree-ssa/slsr-35.c: Likewise.
	* gcc.dg/tree-ssa/slsr-36.c: Likewise.
	* gcc.dg/tree-ssa/loadpre3.c: Adjust so hosting doesn't apply.
	* gcc.dg/tree-ssa/pr43491.c: Scan optimized dump for desired result.
	* gcc.dg/tree-ssa/ssa-pre-31.c: Adjust expected outcome for hoisting.
	* gcc.dg/tree-ssa/ssa-hoist-1.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-2.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-3.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-4.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-5.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-6.c: New testcase.
	* gfortran.dg/pr43984.f90: Adjust expected outcome.

Index: gcc/doc/invoke.texi
===================================================================
*** gcc/doc/invoke.texi.orig	2016-07-07 14:45:27.156657281 +0200
--- gcc/doc/invoke.texi	2016-07-07 14:45:45.460875808 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 404,410 ****
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
--- 404,410 ----
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting -ftree-loop-if-convert @gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
*************** also turns on the following optimization
*** 6372,6377 ****
--- 6372,6378 ----
  -fstrict-aliasing -fstrict-overflow @gol
  -ftree-builtin-call-dce @gol
  -ftree-switch-conversion -ftree-tail-merge @gol
+ -fcode-hoisting @gol
  -ftree-pre @gol
  -ftree-vrp @gol
  -fipa-ra}
*************** and the @option{large-stack-frame-growth
*** 7265,7270 ****
--- 7266,7279 ----
  Perform reassociation on trees.  This flag is enabled by default
  at @option{-O} and higher.
  
+ @item -fcode-hoisting
+ @opindex fcode-hoisting
+ Perform code hoisting.  Code hoisting tries to move the
+ evaluation of expressions executed on all paths to the function exit
+ as early as possible.  This is especially useful as a code size
+ optimization, but it often helps for code speed as well.
+ This flag is enabled by defailt at @option{-O2} and higher.
+ 
  @item -ftree-pre
  @opindex ftree-pre
  Perform partial redundancy elimination (PRE) on trees.  This flag is
*************** Dump each function after STORE-CCP@.  Th
*** 12222,12229 ****
  
  @item pre
  @opindex fdump-tree-pre
! Dump trees after partial redundancy elimination.  The file name is made
! by appending @file{.pre} to the source file name.
  
  @item fre
  @opindex fdump-tree-fre
--- 12231,12238 ----
  
  @item pre
  @opindex fdump-tree-pre
! Dump trees after partial redundancy elimination and/or code hoisting.
! The file name is made by appending @file{.pre} to the source file name.
  
  @item fre
  @opindex fdump-tree-fre
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c	2016-07-07 14:45:45.520876524 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  double cos (double);
  double f(double a)
  {
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
  double cos (double);
  double f(double a)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c	2016-07-07 14:45:45.524876572 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int motion_test1(int data, int data_0, int data_3, int v)
  {
  	int i;
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
  int motion_test1(int data, int data_0, int data_3, int v)
  {
  	int i;
Index: gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr35286.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr35286.c	2016-07-07 14:45:45.532876667 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int g2;
  struct A {
      int a; int b;
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
  int g2;
  struct A {
      int a; int b;
Index: gcc/testsuite/gcc.dg/tree-ssa/pr35287.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr35287.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr35287.c	2016-07-07 14:45:45.532876667 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int *gp;
  int foo(int p)
  {
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
  int *gp;
  int foo(int p)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c	2016-07-07 14:45:45.532876667 +0200
***************
*** 1,5 ****
--- 1,8 ----
  /* { dg-do compile } */
  /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ extern void spoil (void);
+ 
  int foo(int **a,int argc)
  {
    int b;
*************** int foo(int **a,int argc)
*** 11,17 ****
      }
    else
      {
! 
      }
    /* Should be able to eliminate one of the *(*a)'s along the if path
       by pushing it into the else path. We will also eliminate
--- 14,21 ----
      }
    else
      {
!       /* Spoil *a and *(*a) to avoid hoisting it before the "if (...)".  */
!       spoil ();
      }
    /* Should be able to eliminate one of the *(*a)'s along the if path
       by pushing it into the else path. We will also eliminate
Index: gcc/opts.c
===================================================================
*** gcc/opts.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/opts.c	2016-07-07 14:45:45.544876811 +0200
*************** static const struct default_options defa
*** 500,505 ****
--- 500,506 ----
        REORDER_BLOCKS_ALGORITHM_STC },
      { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
+     { OPT_LEVELS_2_PLUS, OPT_fcode_hoisting, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/tree-ssa-pre.c	2016-07-07 14:45:45.552876906 +0200
***************
*** 1,4 ****
! /* SSA-PRE for trees.
     Copyright (C) 2001-2016 Free Software Foundation, Inc.
     Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
     <stevenb@suse.de>
--- 1,4 ----
! /* Full and partial redundancy elimination and code hoisting on SSA GIMPLE.
     Copyright (C) 2001-2016 Free Software Foundation, Inc.
     Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
     <stevenb@suse.de>
*************** along with GCC; see the file COPYING3.
*** 55,65 ****
--- 55,86 ----
  #include "langhooks.h"
  #include "alias.h"
  
+ /* Even though this file is called tree-ssa-pre.c, we actually
+    implement a bit more than just PRE here.  All of them piggy-back
+    on GVN which is implemented in tree-ssa-sccvn.c.
+ 
+      1. Full Redundancy Elimination (FRE)
+ 	This is the elimination phase of GVN.
+ 
+      2. Partial Redundancy Elimination (PRE)
+ 	This is adds computation of AVAIL_OUT and ANTIC_IN and
+ 	doing expression insertion to form GVN-PRE.
+ 
+      3. Code hoisting
+ 	This optimization uses the ANTIC_IN sets computed for PRE
+ 	to move expressions further up than PRE would do, to make
+ 	multiple computations of the same value fully redundant.
+ 	This pass is explained below (after the explanation of the
+ 	basic algorithm for PRE).
+ */
+ 
  /* TODO:
  
     1. Avail sets can be shared by making an avail_find_leader that
        walks up the dominator tree and looks in those avail sets.
        This might affect code optimality, it's unclear right now.
+       Currently the AVAIL_OUT sets are the remaining quadraticness in
+       memory of GVN-PRE.
     2. Strength reduction can be performed by anticipating expressions
        we can repair later on.
     3. We can do back-substitution or smarter value numbering to catch
*************** along with GCC; see the file COPYING3.
*** 71,77 ****
     represent the actual statement containing the expressions we care about,
     and we cache the value number by putting it in the expression.  */
  
! /* Basic algorithm
  
     First we walk the statements to generate the AVAIL sets, the
     EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
--- 92,98 ----
     represent the actual statement containing the expressions we care about,
     and we cache the value number by putting it in the expression.  */
  
! /* Basic algorithm for Partial Redundancy Elimination:
  
     First we walk the statements to generate the AVAIL sets, the
     EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
*************** along with GCC; see the file COPYING3.
*** 111,127 ****
     In order to make it fully redundant, we insert the expression into
     the predecessors where it is not available, but is ANTIC.
  
     For the partial anticipation case, we only perform insertion if it
     is partially anticipated in some block, and fully available in all
     of the predecessors.
  
!    insert/insert_aux/do_regular_insertion/do_partial_partial_insertion
!    performs these steps.
  
     Fourth, we eliminate fully redundant expressions.
     This is a simple statement walk that replaces redundant
     calculations with the now available values.  */
  
  /* Representations of value numbers:
  
     Value numbers are represented by a representative SSA_NAME.  We
--- 132,206 ----
     In order to make it fully redundant, we insert the expression into
     the predecessors where it is not available, but is ANTIC.
  
+    When optimizing for size, we only eliminate the partial redundancy
+    if we need to insert in only one predecessor.  This avoids almost
+    completely the code size increase that PRE usually causes.
+ 
     For the partial anticipation case, we only perform insertion if it
     is partially anticipated in some block, and fully available in all
     of the predecessors.
  
!    do_pre_regular_insertion/do_pre_partial_partial_insertion
!    performs these steps, driven by insert/insert_aux.
  
     Fourth, we eliminate fully redundant expressions.
     This is a simple statement walk that replaces redundant
     calculations with the now available values.  */
  
+ /* Basic algorithm for Code Hoisting:
+ 
+    Code hoisting is: Moving value computations up in the control flow
+    graph to make multiple copies redundant.  Typically this is a size
+    optimization, but there are cases where it also is helpful for speed.
+ 
+    A simple code hoisting algorithm is implemented that piggy-backs on
+    the PRE infrastructure.  For code hoisting, we have to know ANTIC_OUT
+    which is effectively ANTIC_IN - AVAIL_OUT.  The latter two have to be
+    computed for PRE, and we can use them to perform a limited version of
+    code hoisting, too.
+ 
+    For the purpose of this implementation, a value is hoistable to a basic
+    block B if the following properties are met:
+ 
+    1. The value is in ANTIC_IN(B) -- the value will be computed on all
+       paths from B to function exit and it can be computed in B);
+ 
+    2. The value is not in AVAIL_OUT(B) -- there would be no need to
+       compute the value again and make it available twice;
+ 
+    3. All successors of B are dominated by B -- makes sure that inserting
+       a computation of the value in B will make the remaining
+       computations fully redundant;
+ 
+    4. At least one successor has the value in AVAIL_OUT -- to avoid
+       hoisting values up too far;
+ 
+    5. There are at least two successors of B -- hoisting in straight
+       line code is pointless.
+ 
+    The third condition is not strictly necessary, but it would complicate
+    the hoisting pass a lot.  In fact, I don't know of any code hoisting
+    algorithm that does not have this requirement.  Fortunately, experiments
+    have show that most candidate hoistable values are in regions that meet
+    this condition (e.g. diamond-shape regions).
+ 
+    The forth condition is necessary to avoid hoisting things up too far
+    away from the uses of the value.  Nothing else limits the algorithm
+    from hoisting everything up as far as ANTIC_IN allows.  Experiments
+    with SPEC and CSiBE have shown that hoisting up too far results in more
+    spilling, less benefits for code size, and worse benchmark scores.
+    Fortunately, in practice most of the interesting hoisting opportunities
+    are caught despite this limitation.
+ 
+    For hoistable values that meet all conditions, expressions are inserted
+    to make the calculation of the hoistable value fully redundant.  We
+    perform code hoisting insertions after each round of PRE insertions,
+    because code hoisting never exposes new PRE opportunities, but PRE can
+    create new code hoisting opportunities.
+ 
+    The code hoisting algorithm is implemented in do_hoist_insert, driven
+    by insert/insert_aux.  */
+ 
  /* Representations of value numbers:
  
     Value numbers are represented by a representative SSA_NAME.  We
*************** static struct
*** 446,451 ****
--- 525,533 ----
    /* The number of inserts found due to partial anticipation  */
    int pa_insert;
  
+   /* The number of inserts made for code hoisting.  */
+   int hoist_insert;
+ 
    /* The number of new PHI nodes added by PRE.  */
    int phis;
  } pre_stats;
*************** static pre_expr bitmap_find_leader (bitm
*** 455,460 ****
--- 537,543 ----
  static void bitmap_value_insert_into_set (bitmap_set_t, pre_expr);
  static void bitmap_value_replace_in_set (bitmap_set_t, pre_expr);
  static void bitmap_set_copy (bitmap_set_t, bitmap_set_t);
+ static void bitmap_set_and (bitmap_set_t, bitmap_set_t);
  static bool bitmap_set_contains_value (bitmap_set_t, unsigned int);
  static void bitmap_insert_into_set (bitmap_set_t, pre_expr);
  static void bitmap_insert_into_set_1 (bitmap_set_t, pre_expr,
*************** insert_into_preds_of_block (basic_block
*** 3104,3110 ****
  
  
  
! /* Perform insertion of partially redundant values.
     For BLOCK, do the following:
     1.  Propagate the NEW_SETS of the dominator into the current block.
     If the block has multiple predecessors,
--- 3187,3193 ----
  
  
  
! /* Perform insertion of partially redundant or hoistable values.
     For BLOCK, do the following:
     1.  Propagate the NEW_SETS of the dominator into the current block.
     If the block has multiple predecessors,
*************** insert_into_preds_of_block (basic_block
*** 3115,3129 ****
         2c. Insert a new PHI merging the values of the predecessors.
         2d. Insert the new PHI, and the new expressions, into the
  	   NEW_SETS set.
!    3. Recursively call ourselves on the dominator children of BLOCK.
! 
!    Steps 1, 2a, and 3 are done by insert_aux. 2b, 2c and 2d are done by
!    do_regular_insertion and do_partial_insertion.
! 
  */
  
  static bool
! do_regular_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
--- 3198,3217 ----
         2c. Insert a new PHI merging the values of the predecessors.
         2d. Insert the new PHI, and the new expressions, into the
  	   NEW_SETS set.
!    If the block has multiple successors,
!        3a. Iterate over the ANTIC values for the block to see if
! 	   any of them are good candidates for hoisting.
!        3b. If so, insert expressions computing the values in BLOCK,
! 	   and add the new expressions into the NEW_SETS set.
!    4. Recursively call ourselves on the dominator children of BLOCK.
! 
!    Steps 1, 2a, and 4 are done by insert_aux. 2b, 2c and 2d are done by
!    do_pre_regular_insertion and do_partial_insertion.  3a and 3b are
!    done in do_hoist_insertion.
  */
  
  static bool
! do_pre_regular_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
*************** do_regular_insertion (basic_block block,
*** 3292,3300 ****
     In this case, we know that putting it earlier will enable us to
     remove the later computation.  */
  
- 
  static bool
! do_partial_partial_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
--- 3380,3387 ----
     In this case, we know that putting it earlier will enable us to
     remove the later computation.  */
  
  static bool
! do_pre_partial_partial_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
*************** do_partial_partial_insertion (basic_bloc
*** 3423,3430 ****
    return new_stuff;
  }
  
  static bool
! insert_aux (basic_block block)
  {
    basic_block son;
    bool new_stuff = false;
--- 3510,3647 ----
    return new_stuff;
  }
  
+ /* Insert expressions in BLOCK to compute hoistable values up.
+    Return TRUE if something was inserted, otherwise return FALSE.
+    The caller has to make sure that BLOCK has at least two successors.  */
+ 
  static bool
! do_hoist_insertion (basic_block block)
! {
!   edge e;
!   edge_iterator ei;
!   bool new_stuff = false;
!   unsigned i;
!   gimple_stmt_iterator last;
! 
!   /* At least two successors, or else...  */
!   gcc_assert (EDGE_COUNT (block->succs) >= 2);
! 
!   /* Check that all successors of BLOCK are dominated by block.
!      We could use dominated_by_p() for this, but actually there is a much
!      quicker check: any successor that is dominated by BLOCK can't have
!      more than one predecessor edge.  */
!   FOR_EACH_EDGE (e, ei, block->succs)
!     if (! single_pred_p (e->dest))
!       return false;
! 
!   /* Determine the insertion point.  If we cannot safely insert before
!      the last stmt if we'd have to, bail out.  */
!   last = gsi_last_bb (block);
!   if (!gsi_end_p (last)
!       && !is_ctrl_stmt (gsi_stmt (last))
!       && stmt_ends_bb_p (gsi_stmt (last)))
!     return false;
! 
!   /* Compute the set of hoistable expressions from ANTIC_IN.  First compute
!      hoistable values.  */
!   bitmap_set hoistable_set;
! 
!   /* A hoistable value must be in ANTIC_IN(block)
!      but not in AVAIL_OUT(BLOCK).  */
!   bitmap_initialize (&hoistable_set.values, &grand_bitmap_obstack);
!   bitmap_and_compl (&hoistable_set.values,
! 		    &ANTIC_IN (block)->values, &AVAIL_OUT (block)->values);
! 
!   /* Short-cut for a common case: hoistable_set is empty.  */
!   if (bitmap_empty_p (&hoistable_set.values))
!     return false;
! 
!   /* Compute which of the hoistable values is in AVAIL_OUT of
!      at least one of the successors of BLOCK.  */
!   bitmap_head availout_in_some;
!   bitmap_initialize (&availout_in_some, &grand_bitmap_obstack);
!   FOR_EACH_EDGE (e, ei, block->succs)
!     /* Do not consider expressions solely because their availability
!        on loop exits.  They'd be ANTIC-IN throughout the whole loop
!        and thus effectively hoisted across loops by combination of
!        PRE and hoisting.  */
!     if (! loop_exit_edge_p (block->loop_father, e))
!       bitmap_ior_and_into (&availout_in_some, &hoistable_set.values,
! 			   &AVAIL_OUT (e->dest)->values);
!   bitmap_clear (&hoistable_set.values);
! 
!   /* Short-cut for a common case: availout_in_some is empty.  */
!   if (bitmap_empty_p (&availout_in_some))
!     return false;
! 
!   /* Hack hoitable_set in-place so we can use sorted_array_from_bitmap_set.  */
!   hoistable_set.values = availout_in_some;
!   hoistable_set.expressions = ANTIC_IN (block)->expressions;
! 
!   /* Now finally construct the topological-ordered expression set.  */
!   vec<pre_expr> exprs = sorted_array_from_bitmap_set (&hoistable_set);
! 
!   bitmap_clear (&hoistable_set.values);
! 
!   /* If there are candidate values for hoisting, insert expressions
!      strategically to make the hoistable expressions fully redundant.  */
!   pre_expr expr;
!   FOR_EACH_VEC_ELT (exprs, i, expr)
!     {
!       /* While we try to sort expressions topologically above the
!          sorting doesn't work out perfectly.  Catch expressions we
! 	 already inserted.  */
!       unsigned int value_id = get_expr_value_id (expr);
!       if (bitmap_set_contains_value (AVAIL_OUT (block), value_id))
! 	{
! 	  if (dump_file && (dump_flags & TDF_DETAILS))
! 	    {
! 	      fprintf (dump_file,
! 		       "Already inserted expression for ");
! 	      print_pre_expr (dump_file, expr);
! 	      fprintf (dump_file, " (%04d)\n", value_id);
! 	    }
! 	  continue;
! 	}
! 
!       /* OK, we should hoist this value.  Perform the transformation.  */
!       pre_stats.hoist_insert++;
!       if (dump_file && (dump_flags & TDF_DETAILS))
! 	{
! 	  fprintf (dump_file,
! 		   "Inserting expression in block %d for code hoisting: ",
! 		   block->index);
! 	  print_pre_expr (dump_file, expr);
! 	  fprintf (dump_file, " (%04d)\n", value_id);
! 	}
! 
!       gimple_seq stmts = NULL;
!       tree res = create_expression_by_pieces (block, expr, &stmts,
! 					      get_expr_type (expr));
!       if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last)))
! 	gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT);
!       else
! 	gsi_insert_seq_after (&last, stmts, GSI_NEW_STMT);
! 
!       /* Make sure to not return true if expression creation ultimately
!          failed but also make sure to insert any stmts produced as they
! 	 are tracked in inserted_exprs.  */
!       if (! res)
! 	continue;
! 
!       new_stuff = true;
!     }
! 
!   exprs.release ();
! 
!   return new_stuff;
! }
! 
! /* Do a dominator walk on the control flow graph, and insert computations
!    of values as necessary for PRE and hoisting.  */
! 
! static bool
! insert_aux (basic_block block, bool do_pre, bool do_hoist)
  {
    basic_block son;
    bool new_stuff = false;
*************** insert_aux (basic_block block)
*** 3437,3443 ****
  	{
  	  unsigned i;
  	  bitmap_iterator bi;
! 	  bitmap_set_t newset = NEW_SETS (dom);
  	  if (newset)
  	    {
  	      /* Note that we need to value_replace both NEW_SETS, and
--- 3654,3664 ----
  	{
  	  unsigned i;
  	  bitmap_iterator bi;
! 	  bitmap_set_t newset;
! 
! 	  /* First, update the AVAIL_OUT set with anything we may have
! 	     inserted higher up in the dominator tree.  */
! 	  newset = NEW_SETS (dom);
  	  if (newset)
  	    {
  	      /* Note that we need to value_replace both NEW_SETS, and
*************** insert_aux (basic_block block)
*** 3451,3475 ****
  		  bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
  		}
  	    }
! 	  if (!single_pred_p (block))
  	    {
! 	      new_stuff |= do_regular_insertion (block, dom);
  	      if (do_partial_partial)
! 		new_stuff |= do_partial_partial_insertion (block, dom);
  	    }
  	}
      }
    for (son = first_dom_son (CDI_DOMINATORS, block);
         son;
         son = next_dom_son (CDI_DOMINATORS, son))
      {
!       new_stuff |= insert_aux (son);
      }
  
    return new_stuff;
  }
  
! /* Perform insertion of partially redundant values.  */
  
  static void
  insert (void)
--- 3672,3702 ----
  		  bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
  		}
  	    }
! 
! 	  /* Insert expressions for partial redundancies.  */
! 	  if (do_pre && !single_pred_p (block))
  	    {
! 	      new_stuff |= do_pre_regular_insertion (block, dom);
  	      if (do_partial_partial)
! 		new_stuff |= do_pre_partial_partial_insertion (block, dom);
  	    }
+ 
+ 	  /* Insert expressions for hoisting.  */
+ 	  if (do_hoist && EDGE_COUNT (block->succs) >= 2)
+ 	    new_stuff |= do_hoist_insertion (block);
  	}
      }
    for (son = first_dom_son (CDI_DOMINATORS, block);
         son;
         son = next_dom_son (CDI_DOMINATORS, son))
      {
!       new_stuff |= insert_aux (son, do_pre, do_hoist);
      }
  
    return new_stuff;
  }
  
! /* Perform insertion of partially redundant and hoistable values.  */
  
  static void
  insert (void)
*************** insert (void)
*** 3486,3492 ****
        num_iterations++;
        if (dump_file && dump_flags & TDF_DETAILS)
  	fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
!       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun));
  
        /* Clear the NEW sets before the next iteration.  We have already
           fully propagated its contents.  */
--- 3713,3720 ----
        num_iterations++;
        if (dump_file && dump_flags & TDF_DETAILS)
  	fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
!       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), flag_tree_pre,
! 			      flag_code_hoisting);
  
        /* Clear the NEW sets before the next iteration.  We have already
           fully propagated its contents.  */
*************** public:
*** 4833,4839 ****
    {}
  
    /* opt_pass methods: */
!   virtual bool gate (function *) { return flag_tree_pre != 0; }
    virtual unsigned int execute (function *);
  
  }; // class pass_pre
--- 5061,5068 ----
    {}
  
    /* opt_pass methods: */
!   virtual bool gate (function *)
!     { return flag_tree_pre != 0 || flag_code_hoisting != 0; }
    virtual unsigned int execute (function *);
  
  }; // class pass_pre
*************** pass_pre::execute (function *fun)
*** 4888,4893 ****
--- 5117,5123 ----
  
    statistics_counter_event (fun, "Insertions", pre_stats.insertions);
    statistics_counter_event (fun, "PA inserted", pre_stats.pa_insert);
+   statistics_counter_event (fun, "HOIST inserted", pre_stats.hoist_insert);
    statistics_counter_event (fun, "New PHIs", pre_stats.phis);
    statistics_counter_event (fun, "Eliminated", pre_stats.eliminations);
  
Index: gcc/common.opt
===================================================================
*** gcc/common.opt.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/common.opt	2016-07-07 14:45:45.572877145 +0200
*************** fchecking=
*** 1038,1043 ****
--- 1038,1047 ----
  Common Joined RejectNegative UInteger Var(flag_checking)
  Perform internal consistency checkings.
  
+ fcode-hoisting
+ Common Report Var(flag_code_hoisting) Optimization
+ Enable code hoisting.
+ 
  fcombine-stack-adjustments
  Common Report Var(flag_combine_stack_adjustments) Optimization
  Looks for opportunities to reduce stack adjustments and stack references.
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c	2016-07-07 14:45:45.572877145 +0200
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre" } */
+ 
+ unsigned short f(unsigned short a)
+ {
+   if (a & 0x8000)
+     a <<= 1, a = a ^ 0x1021;
+   else
+     a <<= 1;
+ 
+   return a;
+ }
+ 
+ /* We should hoist and CSE the shift.  */
+ 
+ /* { dg-final { scan-tree-dump-times " << 1;" 1 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c	2016-07-07 14:45:45.572877145 +0200
***************
*** 0 ****
--- 1,15 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre" } */
+ 
+ int f(int i)
+ {
+   if (i < 0)
+     return i/10+ i;
+   return i/10 + i;
+ }
+ 
+ /* Hoisting of i/10 + i should make the code straight-line
+    with one division.  */
+ 
+ /* { dg-final { scan-tree-dump-times "goto" 0 "pre" } } */
+ /* { dg-final { scan-tree-dump-times " / 10;" 1 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr43491.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr43491.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr43491.c	2016-07-07 14:45:45.576877193 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  
  #define REGISTER register
  
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-optimized" } */
  
  #define REGISTER register
  
*************** long foo(long data, long v)
*** 35,41 ****
  	u = i;
  	return v * t * u;
  }
  /* We should not eliminate global register variable when it is the RHS of
!    a single assignment.  */
! /* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
! /* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
--- 35,45 ----
  	u = i;
  	return v * t * u;
  }
+ 
  /* We should not eliminate global register variable when it is the RHS of
!    a single assignment.  So the number of loads from data_0 has to match
!    that of the number of adds (we hoist data_0 + data_3 above the
!    if (data) and eliminate the useless copy).  */
! 
! /* { dg-final { scan-tree-dump-times "= data_0;" 1 "optimized" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
! /* { dg-final { scan-tree-dump-times " \\+ " 1 "optimized" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c	2016-07-07 14:45:45.576877193 +0200
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ int test (int a, int b, int c, int g)
+ {
+   int d, e;
+   if (a)
+     d = b * c;
+   else
+     d = b - c;
+   e = b * c + g;
+   return d + e;
+ }
+ 
+ /* We should hoist and CSE only the multiplication.  */
+ 
+ /* { dg-final { scan-tree-dump-times " \\* " 1 "pre" } } */
+ /* { dg-final { scan-tree-dump "Insertions: 1" "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c	2016-07-07 14:45:45.576877193 +0200
***************
*** 0 ****
--- 1,24 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-optimized" } */
+ 
+ /* From PR21485.  */
+ 
+ long
+ NumSift (long *array, int b, unsigned long k)
+ {
+   if (b)
+     if (array[k] < array[k + 1L])
+       ++k;
+   return array[k];
+ }
+ 
+ /* There should be only two loads left.  And the final value in the
+    if (b) arm should be if-converted:
+      tem1 = array[k];
+      if (b)
+        tem1 = MAX (array[k+1], tem1)
+      return tem1;  */
+ 
+ /* { dg-final { scan-tree-dump-times "= \\*" 2 "optimized" } } */
+ /* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
+ /* { dg-final { scan-tree-dump-times "= PHI" 1 "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c	2016-07-07 14:45:45.576877193 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre" } */
  
  int foo (int i, int j, int b)
  {
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre -fno-code-hoisting" } */
  
  int foo (int i, int j, int b)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c	2016-07-07 14:45:45.576877193 +0200
***************
*** 1,6 ****
  /* PR37997 */
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-details" } */
  
  int foo (int i, int b, int result)
  {
--- 1,6 ----
  /* PR37997 */
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-details -fno-code-hoisting" } */
  
  int foo (int i, int b, int result)
  {
*************** int foo (int i, int b, int result)
*** 16,20 ****
--- 16,22 ----
  
  /* We should insert i + 1 into the if (b) path as well as the simplified
     i + 1 & -2 expression.  And do replacement with two PHI temps.  */
+ /* With hoisting enabled we'd hoist i + 1 to before the if, retaining
+    only one PHI node.  */
  
  /* { dg-final { scan-tree-dump-times "with prephitmp" 2 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c	2016-07-07 14:45:45.576877193 +0200
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ int a[1024];
+ int b[1024], c[1024];
+ void foo ()
+ {
+   for (int j = 0; j < 1024; ++j)
+     {
+       for (int i = 0; i < 1024; ++i)
+ 	a[i] = j;
+       b[j] = c[j];
+     }
+ }
+ 
+ /* We should not hoist/PRE the outer loop IV increment or the load
+    from c across the inner loop.  */
+ 
+ /* { dg-final { scan-tree-dump-not "HOIST inserted" "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c	2016-07-07 14:45:45.576877193 +0200
*************** int foo (S1 *root, int N)
*** 43,46 ****
    return 0;
  } 
  
! /* { dg-final { scan-tree-dump-times "key" 4 "pre" } } */
--- 43,46 ----
    return 0;
  } 
  
! /* { dg-final { scan-tree-dump-times "key" 3 "pre" } } */
Index: gcc/testsuite/gcc.dg/hoist-register-pressure-1.c
===================================================================
*** gcc/testsuite/gcc.dg/hoist-register-pressure-1.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/hoist-register-pressure-1.c	2016-07-07 14:45:45.608877575 +0200
***************
*** 1,4 ****
! /* { dg-options "-Os -fdump-rtl-hoist" }  */
  /* The rtl hoist pass requires that the expression to be hoisted can
     be assigned without clobbering cc.  For a PLUS rtx on S/390 this
     requires a load address instruction which is fine on 64 bit but
--- 1,4 ----
! /* { dg-options "-Os -fdump-rtl-hoist -fno-code-hoisting" }  */
  /* The rtl hoist pass requires that the expression to be hoisted can
     be assigned without clobbering cc.  For a PLUS rtx on S/390 this
     requires a load address instruction which is fine on 64 bit but
Index: gcc/testsuite/gcc.dg/hoist-register-pressure-2.c
===================================================================
*** gcc/testsuite/gcc.dg/hoist-register-pressure-2.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/hoist-register-pressure-2.c	2016-07-07 14:45:45.612877623 +0200
***************
*** 1,4 ****
! /* { dg-options "-Os -fdump-rtl-hoist" }  */
  /* The rtl hoist pass requires that the expression to be hoisted can
     be assigned without clobbering cc.  For a PLUS rtx on S/390 this
     requires a load address instruction which is fine on 64 bit but
--- 1,4 ----
! /* { dg-options "-Os -fdump-rtl-hoist -fno-code-hoisting" }  */
  /* The rtl hoist pass requires that the expression to be hoisted can
     be assigned without clobbering cc.  For a PLUS rtx on S/390 this
     requires a load address instruction which is fine on 64 bit but
Index: gcc/testsuite/gcc.dg/hoist-register-pressure-3.c
===================================================================
*** gcc/testsuite/gcc.dg/hoist-register-pressure-3.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/hoist-register-pressure-3.c	2016-07-07 14:45:45.624877766 +0200
***************
*** 1,4 ****
! /* { dg-options "-Os -fdump-rtl-hoist" }  */
  /* The rtl hoist pass requires that the expression to be hoisted can
     be assigned without clobbering cc.  For a PLUS rtx on S/390 this
     requires a load address instruction which is fine on 64 bit but
--- 1,4 ----
! /* { dg-options "-Os -fdump-rtl-hoist -fno-code-hoisting" }  */
  /* The rtl hoist pass requires that the expression to be hoisted can
     be assigned without clobbering cc.  For a PLUS rtx on S/390 this
     requires a load address instruction which is fine on 64 bit but
Index: gcc/testsuite/gcc.dg/pr51879-12.c
===================================================================
*** gcc/testsuite/gcc.dg/pr51879-12.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/pr51879-12.c	2016-07-07 14:45:45.624877766 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre" } */
  
  __attribute__((pure)) int bar (int);
  __attribute__((pure)) int bar2 (int);
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre -fno-code-hoisting" } */
  
  __attribute__((pure)) int bar (int);
  __attribute__((pure)) int bar2 (int);
Index: gcc/testsuite/gcc.dg/strlenopt-9.c
===================================================================
*** gcc/testsuite/gcc.dg/strlenopt-9.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/strlenopt-9.c	2016-07-07 14:45:45.640877957 +0200
***************
*** 1,5 ****
  /* { dg-do run } */
! /* { dg-options "-O2 -fdump-tree-strlen -fdump-tree-optimized" } */
  
  #include "strlenopt.h"
  
--- 1,5 ----
  /* { dg-do run } */
! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-strlen -fdump-tree-optimized" } */
  
  #include "strlenopt.h"
  
Index: gcc/testsuite/gfortran.dg/pr43984.f90
===================================================================
*** gcc/testsuite/gfortran.dg/pr43984.f90.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gfortran.dg/pr43984.f90	2016-07-07 14:45:45.648878052 +0200
*************** end subroutine
*** 50,55 ****
  
  end
  
! ! There should be three loads from iyz.data, not four.
  
! ! { dg-final { scan-tree-dump-times "= iyz.data" 3 "pre" } }
--- 50,55 ----
  
  end
  
! ! There should be two loads from iyz.data, not four.
  
! ! { dg-final { scan-tree-dump-times "= iyz.data" 2 "pre" } }
Index: gcc/testsuite/gcc.dg/tree-ssa/pr47392.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr47392.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr47392.c	2016-07-07 14:45:45.648878052 +0200
***************
*** 1,5 ****
  /* { dg-do run } */
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  
  struct A
  {
--- 1,5 ----
  /* { dg-do run } */
! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
  
  struct A
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c.orig	2016-07-07 09:46:02.655811772 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c	2016-07-07 14:45:45.648878052 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-optimized -w" } */
  
  typedef struct rtx_def *rtx;
  enum rtx_code
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -fno-code-hoisting -fdump-tree-optimized -w" } */
  
  typedef struct rtx_def *rtx;
  enum rtx_code
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-35.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/slsr-35.c.orig	2016-07-05 15:24:03.059612220 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-35.c	2016-07-08 06:54:20.620252838 +0200
***************
*** 3,9 ****
     phi has an argument which is a parameter.  */
  
  /* { dg-do compile } */
! /* { dg-options "-O3 -fdump-tree-optimized" } */
  
  int
  f (int c, int i)
--- 3,9 ----
     phi has an argument which is a parameter.  */
  
  /* { dg-do compile } */
! /* { dg-options "-O3 -fno-code-hoisting -fdump-tree-optimized" } */
  
  int
  f (int c, int i)
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-36.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/slsr-36.c.orig	2016-07-05 15:24:03.059612220 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-36.c	2016-07-08 06:54:32.300387549 +0200
***************
*** 3,9 ****
     phi has an argument which is a parameter.  */
  
  /* { dg-do compile } */
! /* { dg-options "-O3 -fdump-tree-optimized" } */
  
  int
  f (int s, int c, int i)
--- 3,9 ----
     phi has an argument which is a parameter.  */
  
  /* { dg-do compile } */
! /* { dg-options "-O3 -fno-code-hoisting -fdump-tree-optimized" } */
  
  int
  f (int s, int c, int i)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-10 17:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-04 11:27 [PATCH] Add code-hoisting to GIMPLE Richard Biener
2016-07-04 20:31 ` Steven Bosscher
2016-07-08  7:30   ` Richard Biener
2016-07-08 18:12     ` Jeff Law
2016-07-08  7:16 Richard Biener
2016-07-08 12:23 ` Rainer Orth
2016-07-08 12:27   ` Richard Biener
2016-07-08 12:47     ` Rainer Orth
2016-07-10 17:34 ` Andrew Pinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).