public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Improvements to code hoisting
@ 2010-06-16 15:57 Maxim Kuvyrkov
  2010-06-16 15:58 ` 0001-Add-hoist_insn-debug-counter.patch Maxim Kuvyrkov
                   ` (11 more replies)
  0 siblings, 12 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 15:57 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

The following series of patches improves code hoisting and PRE RTL-level 
optimizations.  The two threads of the patches correspond to 
target-independent changes to gcse.c and to changes to ARM backend to 
make it emit RTL that is better suited for optimizers.

Motivating examples for this work are ARM PRs
http://gcc.gnu.org/PR42495
http://gcc.gnu.org/PR42574
With the patches applied GCC produces perfect code for these examples.

The general effect of the patches is that they allow more expressions to 
be hoisted or PRE'd, thus making the optimizations more aggressive.  The 
negative effect of this are extended live ranges and increased register 
pressure.  Luckily, IRA and reload do good job in dealing with excessive 
register pressure.

Still, while investigating size regressions I came upon a problem in 
IRA's estimation of cost of putting a pseudo to memory when optimizing 
for size.  It appears that [ARM's] rtx_cost() model overestimates cost 
of assigning a constant to memory, and, thus, makes IRA allocate a 
register for something it shouldn't have.  This results in a spill of 
another variable that didn't get a register.  I believe, this should not 
block inclusion of the patches as they still provide significant 
improvement on average.

The cumulative patch was tested on x86_64-linux-gnu (bootstrap, default 
languages) and arm-linux-gnu (no bootstrap, c and c++ only).

There's a new FAIL on x86_64-linux-gnu gfortran testsuite: an ICE in 
output_die, at dwarf2out.c:10875.  Although I didn't look in the details 
of the problem, it seems to be a latent bug uncovered by the patches.

I benchmarked an earlier version of these patches on a Cortex-A9 board 
and got 0.1-0.5% size decrease on SPEC2K at -Os for both thumb1 and 
thumb2 modes when compiled with and without -fpic.

I'm now working on getting SPEC2K speed numbers.  I will post detailed 
benchmarking results before committing the patches that aren't obvious 
improvements.  I'll appreciate if someone posts benchmark numbers for 
other architectures.

Each patch will be posted in a subthread.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0001-Add-hoist_insn-debug-counter.patch
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
@ 2010-06-16 15:58 ` Maxim Kuvyrkov
  2010-06-16 16:34   ` 0001-Add-hoist_insn-debug-counter.patch Jeff Law
  2010-06-16 15:59 ` 0002-Allow-constant-MEMs-through-calls.patch Maxim Kuvyrkov
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 15:58 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 158 bytes --]

This patch adds new debug counter to the hoisting pass.

OK to apply?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0001-Add-hoist_insn-debug-counter.ChangeLog --]
[-- Type: text/plain, Size: 133 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* dbgcnt.def (hoist_insn): New debug counter.
	* gcse.c (hoist_code): Use it.

[-- Attachment #3: 0001-Add-hoist_insn-debug-counter.patch --]
[-- Type: text/plain, Size: 1176 bytes --]

From 8078946fa2fefcb32b522b752cd72a39cf5dde3f Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:01:46 -0700
Subject: [PATCH 1/9] Add hoist_insn debug counter

---
 gcc/dbgcnt.def |    1 +
 gcc/gcse.c     |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 33afb0b..56f3461 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -158,6 +158,7 @@ DEBUG_COUNTER (gcse2_delete)
 DEBUG_COUNTER (global_alloc_at_func)
 DEBUG_COUNTER (global_alloc_at_reg)
 DEBUG_COUNTER (hoist)
+DEBUG_COUNTER (hoist_insn)
 DEBUG_COUNTER (ia64_sched2)
 DEBUG_COUNTER (if_conversion)
 DEBUG_COUNTER (if_after_combine)
diff --git a/gcc/gcse.c b/gcc/gcse.c
index b0a1868..1dbd2f0 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4325,7 +4325,7 @@ hoist_code (void)
 		 the vast majority of hoistable expressions are only movable
 		 from two successors, so raising this threshold is likely
 		 to nullify any benefit we get from code hoisting.  */
-	      if (hoistable > 1)
+	      if (hoistable > 1 && dbg_cnt (hoist_insn))
 		{
 		  SET_BIT (hoist_exprs[bb->index], i);
 		  found = 1;
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0002-Allow-constant-MEMs-through-calls.patch
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
  2010-06-16 15:58 ` 0001-Add-hoist_insn-debug-counter.patch Maxim Kuvyrkov
@ 2010-06-16 15:59 ` Maxim Kuvyrkov
  2010-06-16 16:52   ` 0002-Allow-constant-MEMs-through-calls.patch Jeff Law
  2010-06-16 16:03 ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 15:59 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 398 bytes --]

The function gcse.c: compute_transpout() is responsible for filtering 
out expressions that do not survive function calls.  Certain kinds of 
memory references, e.g., references to read-only memory can safely be 
propagated through calls.  This patch improves compute_transpout() to 
allow that.

OK to apply?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0002-Allow-constant-MEMs-through-calls.ChangeLog --]
[-- Type: text/plain, Size: 118 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (compute_transpout): Let constant MEMs through CALLs.

[-- Attachment #3: 0002-Allow-constant-MEMs-through-calls.patch --]
[-- Type: text/plain, Size: 933 bytes --]

From 4927716f5c288236eb8320364f3a9871ad138698 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:03:18 -0700
Subject: [PATCH 2/9] Allow constant MEMs through calls

---
 gcc/gcse.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 1dbd2f0..00f3841 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4083,6 +4083,12 @@ compute_transpout (void)
 		  && CONSTANT_POOL_ADDRESS_P (XEXP (expr->expr, 0)))
 		continue;
 
+	      /* Handle constant memory references, e.g., PIC addresses.
+		 We don't need to assume MEM_NOTRAP_P here because we only
+		 care about MEM's value surviving the call.  */
+	      if (MEM_READONLY_P (expr->expr) && !MEM_VOLATILE_P (expr->expr))
+		continue;
+
 	      /* ??? Optimally, we would use interprocedural alias
 		 analysis to determine if this mem is actually killed
 		 by this call.  */
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0003-Improve-VBEout-computation.patch
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
  2010-06-16 15:58 ` 0001-Add-hoist_insn-debug-counter.patch Maxim Kuvyrkov
  2010-06-16 15:59 ` 0002-Allow-constant-MEMs-through-calls.patch Maxim Kuvyrkov
@ 2010-06-16 16:03 ` Maxim Kuvyrkov
  2010-06-16 17:19   ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
  2010-06-21 19:00   ` 0003-Improve-VBEout-computation.patch Jeff Law
  2010-06-16 16:20 ` 0004-Set-pseudos-only-once.patch Maxim Kuvyrkov
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 16:03 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

Code hoisting algorithm from Muchnick, which is the one implemented in 
GCC, has a quirk in that it does include expressions calculated in basic 
block in its VBEout set.  This seems odd to me; if an expression is 
calculated in BB and is available at BB's end, then we do want to hoist 
expressions from BB's successors to the end of BB.  This patch 
implements this.

The patch also adds an open-ended comment which describes another 
possible improvement to the algorithm.

OK to apply?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0003-Improve-VBEout-computation.ChangeLog --]
[-- Type: text/plain, Size: 164 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (compute_code_hoist_vbeinout): Consider more expressions
	for hoisting.  Add an open-ended comment.

[-- Attachment #3: 0003-Improve-VBEout-computation.patch --]
[-- Type: text/plain, Size: 2830 bytes --]

From d5ddf8e23f925e3076e90d7c6fbb85a7095e58bb Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:18:02 -0700
Subject: [PATCH 3/9] Improve VBEout computation

---
 gcc/gcse.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 00f3841..74986a4 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4171,13 +4171,52 @@ compute_code_hoist_vbeinout (void)
       FOR_EACH_BB_REVERSE (bb)
 	{
 	  if (bb->next_bb != EXIT_BLOCK_PTR)
-	    sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
-					   hoist_vbein, bb->index);
+	    {
+	      /* ??? Code hoisting algorithm from Muchnick seems to be
+		 overly conservative in considering only those expressions
+		 that are calculated along every path from BB.
+		 E.g., it will not try to optimize the following case:
+
+		  2
+                  | \
+		  3* |
+	          | /
+		  4
+                 / \
+                5*  6
+
+		 ;; "*" marks basic blocks that calculate same expression
+		 ;; Ideally, all calculation would be moved to block 2.
+
+		 One way to improve the algorith can be to exclude
+		 certain edges from intersection of successors' hoist_vbein's.
+
+		 E.g., if a basic block has a fallthru edge that the control
+		 takes with >80% probability, then just copy hoist_vbein
+		 of the destination block to hoist_vbeout of BB.  */
+	      sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
+					     hoist_vbein, bb->index);
+
+	      /* ??? Another quirk of Muchnick is that vbeout[BB] does not
+		 include expressions calculated in BB itself and available
+		 at its end.  Fix this.  */
+	      sbitmap_a_or_b (hoist_vbeout[bb->index],
+			      hoist_vbeout[bb->index], comp[bb->index]);
+	    }
 
 	  changed |= sbitmap_a_or_b_and_c_cg (hoist_vbein[bb->index],
 					      antloc[bb->index],
 					      hoist_vbeout[bb->index],
 					      transp[bb->index]);
+
+	  /* Enable if debugging VBE data flow problem.  */
+	  if (dump_file && 0)
+	    {
+	      fprintf (dump_file, "vbein (%d): ", bb->index);
+	      dump_sbitmap_file (dump_file, hoist_vbein[bb->index]);
+	      fprintf (dump_file, "vbeout(%d): ", bb->index);
+	      dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	    }
 	}
 
       passes++;
@@ -4298,6 +4337,11 @@ hoist_code (void)
 	  if (TEST_BIT (hoist_vbeout[bb->index], i)
 	      && TEST_BIT (transpout[bb->index], i))
 	    {
+	      /* If an expression is computed in BB and is available at end of
+		 BB, hoist all occurences dominated by BB to BB.  */
+	      if (TEST_BIT (comp[bb->index], i))
+		hoistable++;
+
 	      /* We've found a potentially hoistable expression, now
 		 we look at every block BB dominates to see if it
 		 computes the expression.  */
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0004-Set-pseudos-only-once.patch
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (2 preceding siblings ...)
  2010-06-16 16:03 ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-16 16:20 ` Maxim Kuvyrkov
  2010-06-21 18:22   ` 0004-Set-pseudos-only-once.patch Jeff Law
  2010-06-16 16:20 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 16:20 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

IRA and reload has special relationship with pseudos that are set only 
once.  When such pseudos initialized with constants or instances that 
can be considered constant across the function, reload can rematerialize 
them instead of spilling or apply other optimizations.

This patch makes sure that we don't unnecessarily set same pseudo more 
than once.

OK to apply?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0004-Set-pseudos-only-once.ChangeLog --]
[-- Type: text/plain, Size: 122 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (hoist_code): Generate new pseudo for every new set insn.

[-- Attachment #3: 0004-Set-pseudos-only-once.patch --]
[-- Type: text/plain, Size: 1022 bytes --]

From 0d63cdbf3e0625d46aae79bd83217194a81c4945 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:19:29 -0700
Subject: [PATCH 4/9] Set pseudos only once

---
 gcc/gcse.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 74986a4..45cab70 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4437,8 +4437,13 @@ hoist_code (void)
 
 		      /* Create a pseudo-reg to store the result of reaching
 			 expressions into.  Get the mode for the new pseudo
-			 from the mode of the original destination pseudo.  */
-		      if (expr->reaching_reg == NULL)
+			 from the mode of the original destination pseudo.
+
+			 It is important to use new pseudos whenever we
+			 emit a set for it in insert_insn_end_basic below.
+			 This will allow reload to use equivalence for
+			 registers that are set only once.  */
+		      if (!insn_inserted_p)
 			expr->reaching_reg
 			  = gen_reg_rtx_and_attrs (SET_DEST (set));
 
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (3 preceding siblings ...)
  2010-06-16 16:20 ` 0004-Set-pseudos-only-once.patch Maxim Kuvyrkov
@ 2010-06-16 16:20 ` Maxim Kuvyrkov
  2010-06-16 16:43   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  2010-06-16 16:23 ` 0006-GCSE-complex-constants.patch Maxim Kuvyrkov
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 16:20 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

Currently, code hoisting only checks immediately-dominated blocks for 
expressions to hoist.  I wonder if limiting the search for expressions 
is intentional.

This patch makes code hoisting search through all dominated blocks for 
expressions to hoist.

On one hand, hoisting expressions from all dominated blocks, not just 
the immediate dominees, provides significantly greater unification 
opportunities.  On the other hand, it can also substantially extend live 
ranges of pseudos, and increase register pressure.

Even considering the negatives, hoisting expressions from non-immediate 
dominees seems like the right choice to me.  Most expressions that can 
be moved several basic blocks up are constants, and IRA/reload should be 
able to rematerialize those under high register pressure.  On the 
flip-side, if an expression is complex, than it would be less costly to 
spill/restore it instead of calculating it in dominated blocks.

A compromise may be to limit the depth of search with a parameter.

OK to apply?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.ChangeLog --]
[-- Type: text/plain, Size: 148 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (hoist_code): Walk through all dominated blocks in search of
	expressions to hoist.

[-- Attachment #3: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch --]
[-- Type: text/plain, Size: 1821 bytes --]

From fd7341c44c4dc586b61a783a0b5efce2c6cc62d4 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:24:56 -0700
Subject: [PATCH 5/9] Search all dominated blocks for expressions to hoist

---
 gcc/gcse.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 45cab70..a7c7237 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4327,7 +4327,8 @@ hoist_code (void)
       int found = 0;
       int insn_inserted_p;
 
-      domby = get_dominated_by (CDI_DOMINATORS, bb);
+      domby = get_all_dominated_blocks (CDI_DOMINATORS, bb);
+
       /* Examine each expression that is very busy at the exit of this
 	 block.  These are the potentially hoistable expressions.  */
       for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
@@ -4418,7 +4419,11 @@ hoist_code (void)
 		     it would be safe to compute it at the start of the
 		     dominated block.  Now we have to determine if the
 		     expression would reach the dominated block if it was
-		     placed at the end of BB.  */
+		     placed at the end of BB.
+		     Note: the fact that hoist_exprs has i-th bit set means
+		     that /some/, not necesserilly all, occurences from
+		     the dominated blocks can be hoisted to BB.  Here we check
+		     if a specific occurence can be hoisted to BB.  */
 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL))
 		    {
 		      struct expr *expr = index_map[i];
@@ -4431,6 +4436,12 @@ hoist_code (void)
 			occr = occr->next;
 
 		      gcc_assert (occr);
+
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			continue;
+
 		      insn = occr->insn;
 		      set = single_set (insn);
 		      gcc_assert (set);
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0006-GCSE-complex-constants.patch
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (4 preceding siblings ...)
  2010-06-16 16:20 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-16 16:23 ` Maxim Kuvyrkov
  2010-06-16 17:18   ` 0006-GCSE-complex-constants.patch Jeff Law
  2010-06-16 16:25 ` 0007-Add-open-ended-comments.patch Maxim Kuvyrkov
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 16:23 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

Certain architectures (e.g., ARM) cannot easily operate with constants, 
they need to emit sequences of several instructions to load constants 
into registers.  The common procedure to do this is to emit a (parallel 
[(set) (clobber (reg1)) ... (clobber (regN))]) instruction which later 
splits into several instructions using pseudos (regX) to store 
intermediate values.

Currently PRE and hoist do not GCSE constants, and there is a good 
reason for that, to avoid increasing register pressure; interestingly, 
symbol_refs are allowed to be GCSE'ed, is this intentional or by accident?

In any case, it seems like a good idea to GCSE constants and symbol_refs 
that need something beyond a simple (set) to get into a register, and 
not GCSE them otherwise.

This patch adds a simple heuristic to gcse.c:want_to_gcse_p() that 
adjusts preferences for CONST_INTs and SYMBOL_REFs.

OK to apply?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0006-GCSE-complex-constants.ChangeLog --]
[-- Type: text/plain, Size: 178 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (want_to_gcse_p): Change signature.  Consider "complex"
	constant expressions for GCSE.
	(hash_scan_set): Update.

[-- Attachment #3: 0006-GCSE-complex-constants.patch --]
[-- Type: text/plain, Size: 3787 bytes --]

From 9850ab786c95491e2a92c7adfef295adafa52add Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:26:34 -0700
Subject: [PATCH 6/9] GCSE complex constants

---
 gcc/gcse.c |   36 +++++++++++++++++++++++++++++-------
 1 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index a7c7237..c81d71c 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -431,7 +431,7 @@ static void hash_scan_insn (rtx, struct hash_table_d *);
 static void hash_scan_set (rtx, rtx, struct hash_table_d *);
 static void hash_scan_clobber (rtx, rtx, struct hash_table_d *);
 static void hash_scan_call (rtx, rtx, struct hash_table_d *);
-static int want_to_gcse_p (rtx);
+static int want_to_gcse_p (rtx, rtx);
 static bool gcse_constant_p (const_rtx);
 static int oprs_unchanged_p (const_rtx, const_rtx, int);
 static int oprs_anticipatable_p (const_rtx, const_rtx);
@@ -751,10 +751,10 @@ static basic_block current_bb;
 
 
 /* See whether X, the source of a set, is something we want to consider for
-   GCSE.  */
+   GCSE in instruction INSN.  */
 
 static int
-want_to_gcse_p (rtx x)
+want_to_gcse_p (rtx x, rtx insn)
 {
 #ifdef STACK_REGS
   /* On register stack architectures, don't GCSE constants from the
@@ -768,13 +768,35 @@ want_to_gcse_p (rtx x)
     {
     case REG:
     case SUBREG:
-    case CONST_INT:
     case CONST_DOUBLE:
     case CONST_FIXED:
     case CONST_VECTOR:
     case CALL:
       return 0;
 
+    case CONST_INT:
+    case SYMBOL_REF:
+      /* If it takes a PARALLEL to set a constant or a symbol, try to gcse it.
+	 Usually, (clobber (reg)) is the second part of the parallel.
+	 We rely on rematerialization of constants to avoid excessive
+	 register pressure.
+
+	 ??? We would also like to GCSE/hoist non-parallel-looking constants
+	 and symbol_refs on architectures which require load from constant
+	 pools to get a constant into register, e.g., ARM.
+	 We do not currently do that because IRA overestimates cost
+	 of allocating a constant to memory, thus unnecessarily increasing
+	 register pressure and causing spills.  One side of this problem
+	 is IRA using rtx_costs which are not particularly precise for
+	 constants when optimizing for size.  For a good example see
+	 300.twolf:ucxxo1.c from SPEC2000.
+
+	 ??? Should we handle CONST_DOUBLE and CONST_FIXED similarly?
+	 Will rematerialization handle them?  */
+      if (GET_CODE (PATTERN (insn)) != PARALLEL)
+	return 0;
+      /* FALLTHRU */
+
     default:
       return can_assign_to_reg_without_clobbers_p (x);
     }
@@ -1328,7 +1350,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	  && !REG_P (src)
 	  && (table->set_p
 	      ? gcse_constant_p (XEXP (note, 0))
-	      : want_to_gcse_p (XEXP (note, 0))))
+	      : want_to_gcse_p (XEXP (note, 0), insn)))
 	src = XEXP (note, 0), pat = gen_rtx_SET (VOIDmode, dest, src);
 
       /* Only record sets of pseudo-regs in the hash table.  */
@@ -1343,7 +1365,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	     can't do the same thing at the rtl level.  */
 	  && !can_throw_internal (insn)
 	  /* Is SET_SRC something we want to gcse?  */
-	  && want_to_gcse_p (src)
+	  && want_to_gcse_p (src, insn)
 	  /* Don't CSE a nop.  */
 	  && ! set_noop_p (pat)
 	  /* Don't GCSE if it has attached REG_EQUIV note.
@@ -1404,7 +1426,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	      do that easily for EH edges so disable GCSE on these for now.  */
 	   && !can_throw_internal (insn)
 	   /* Is SET_DEST something we want to gcse?  */
-	   && want_to_gcse_p (dest)
+	   && want_to_gcse_p (dest, insn)
 	   /* Don't CSE a nop.  */
 	   && ! set_noop_p (pat)
 	   /* Don't GCSE if it has attached REG_EQUIV note.
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0007-Add-open-ended-comments.patch
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (5 preceding siblings ...)
  2010-06-16 16:23 ` 0006-GCSE-complex-constants.patch Maxim Kuvyrkov
@ 2010-06-16 16:25 ` Maxim Kuvyrkov
  2010-06-16 17:46   ` 0007-Add-open-ended-comments.patch Jeff Law
  2010-06-16 16:54 ` Improvements to code hoisting Richard Guenther
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 16:25 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 262 bytes --]

This patch adds several open-ended comments in gcse.c.  I'll be happy if 
anyone can answer some of them, in which case I'll check in the answers, 
rather than questions :).

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0007-Add-open-ended-comments.ChangeLog --]
[-- Type: text/plain, Size: 133 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (insert_insn_end_basic_block, hoist_code): Add open-ended
	comments.

[-- Attachment #3: 0007-Add-open-ended-comments.patch --]
[-- Type: text/plain, Size: 2074 bytes --]

From 1ead1c2b6d748f02f12f04d054a5c39604a1b3ec Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:27:27 -0700
Subject: [PATCH 7/9] Add open-ended comments

---
 gcc/gcse.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index c81d71c..7215063 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -3431,7 +3431,9 @@ process_insert_insn (struct expr *expr)
 
    For PRE, we want to verify that the expr is either transparent
    or locally anticipatable in the target block.  This check makes
-   no sense for code hoisting.  */
+   no sense for code hoisting.
+   ??? We always call this function with (PRE == 0), which makes the checks
+   useless.  */
 
 static void
 insert_insn_end_basic_block (struct expr *expr, basic_block bb, int pre)
@@ -3535,6 +3537,9 @@ insert_insn_end_basic_block (struct expr *expr, basic_block bb, int pre)
   else
     new_insn = emit_insn_after_noloc (pat, insn, bb);
 
+  /* ??? It maybe useful to try set REG_EQUAL note on NEW_INSN here.
+     How can we do it?  */
+
   while (1)
     {
       if (INSN_P (pat))
@@ -4343,7 +4348,16 @@ hoist_code (void)
       index_map[expr->bitmap_index] = expr;
 
   /* Walk over each basic block looking for potentially hoistable
-     expressions, nothing gets hoisted from the entry block.  */
+     expressions, nothing gets hoisted from the entry block.
+
+     ??? It maybe worthwhile to walk CFG in DFS order over the dominator tree.
+     One can imagine a case when a dominated block B is linked before
+     its dominator A, so if expressions were hoisted from blocks C and D,
+     which B (and A) dominates, then it may occur that we miss
+     an optimization of moving these expressions all the way to A.
+     Alternatively, we may handle this case by updating expressions'
+     occurences to include instructions emitted by code hoisting, i.e.,
+     an expression emitted at the end of B will then be hoisted to A.  */
   FOR_EACH_BB (bb)
     {
       int found = 0;
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0001-Add-hoist_insn-debug-counter.patch
  2010-06-16 15:58 ` 0001-Add-hoist_insn-debug-counter.patch Maxim Kuvyrkov
@ 2010-06-16 16:34   ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-16 16:34 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

> This patch adds new debug counter to the hoisting pass.
>
> OK to apply?
OK.
jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-16 16:20 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-16 16:43   ` Steven Bosscher
  2010-06-21 19:45     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Steven Bosscher @ 2010-06-16 16:43 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

On Wed, Jun 16, 2010 at 5:58 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> Currently, code hoisting only checks immediately-dominated blocks for
> expressions to hoist.  I wonder if limiting the search for expressions is
> intentional.
>
> This patch makes code hoisting search through all dominated blocks for
> expressions to hoist.

And makes the algorithm quadratic in the size of the CFG. You should
limit the depth not only to avoid excessive live range lengths but
also for corner cases of strangely-formed CFGs.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0002-Allow-constant-MEMs-through-calls.patch
  2010-06-16 15:59 ` 0002-Allow-constant-MEMs-through-calls.patch Maxim Kuvyrkov
@ 2010-06-16 16:52   ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-16 16:52 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

> The function gcse.c: compute_transpout() is responsible for filtering 
> out expressions that do not survive function calls.  Certain kinds of 
> memory references, e.g., references to read-only memory can safely be 
> propagated through calls.  This patch improves compute_transpout() to 
> allow that.
>
> OK to apply?
OK.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Improvements to code hoisting
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (6 preceding siblings ...)
  2010-06-16 16:25 ` 0007-Add-open-ended-comments.patch Maxim Kuvyrkov
@ 2010-06-16 16:54 ` Richard Guenther
  2010-07-01  9:00   ` Maxim Kuvyrkov
  2010-06-23 20:42 ` Update compute_transpout Maxim Kuvyrkov
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Richard Guenther @ 2010-06-16 16:54 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

On Wed, Jun 16, 2010 at 5:47 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> The following series of patches improves code hoisting and PRE RTL-level
> optimizations.  The two threads of the patches correspond to
> target-independent changes to gcse.c and to changes to ARM backend to make
> it emit RTL that is better suited for optimizers.
>
> Motivating examples for this work are ARM PRs
> http://gcc.gnu.org/PR42495
> http://gcc.gnu.org/PR42574
> With the patches applied GCC produces perfect code for these examples.
>
> The general effect of the patches is that they allow more expressions to be
> hoisted or PRE'd, thus making the optimizations more aggressive.  The
> negative effect of this are extended live ranges and increased register
> pressure.  Luckily, IRA and reload do good job in dealing with excessive
> register pressure.
>
> Still, while investigating size regressions I came upon a problem in IRA's
> estimation of cost of putting a pseudo to memory when optimizing for size.
>  It appears that [ARM's] rtx_cost() model overestimates cost of assigning a
> constant to memory, and, thus, makes IRA allocate a register for something
> it shouldn't have.  This results in a spill of another variable that didn't
> get a register.  I believe, this should not block inclusion of the patches
> as they still provide significant improvement on average.
>
> The cumulative patch was tested on x86_64-linux-gnu (bootstrap, default
> languages) and arm-linux-gnu (no bootstrap, c and c++ only).
>
> There's a new FAIL on x86_64-linux-gnu gfortran testsuite: an ICE in
> output_die, at dwarf2out.c:10875.  Although I didn't look in the details of
> the problem, it seems to be a latent bug uncovered by the patches.
>
> I benchmarked an earlier version of these patches on a Cortex-A9 board and
> got 0.1-0.5% size decrease on SPEC2K at -Os for both thumb1 and thumb2 modes
> when compiled with and without -fpic.
>
> I'm now working on getting SPEC2K speed numbers.  I will post detailed
> benchmarking results before committing the patches that aren't obvious
> improvements.  I'll appreciate if someone posts benchmark numbers for other
> architectures.
>
> Each patch will be posted in a subthread.

What is the compile-time effect of the cummulative patch on GCSE time?

Richard.

> Thank you,
>
> --
> Maxim Kuvyrkov
> CodeSourcery
> maxim@codesourcery.com
> (650) 331-3385 x724
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0006-GCSE-complex-constants.patch
  2010-06-16 16:23 ` 0006-GCSE-complex-constants.patch Maxim Kuvyrkov
@ 2010-06-16 17:18   ` Jeff Law
  2010-06-23 20:39     ` 0006-GCSE-complex-constants.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-06-16 17:18 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

> Certain architectures (e.g., ARM) cannot easily operate with 
> constants, they need to emit sequences of several instructions to load 
> constants into registers.  The common procedure to do this is to emit 
> a (parallel [(set) (clobber (reg1)) ... (clobber (regN))]) instruction 
> which later splits into several instructions using pseudos (regX) to 
> store intermediate values.
>
> Currently PRE and hoist do not GCSE constants, and there is a good 
> reason for that, to avoid increasing register pressure; interestingly, 
> symbol_refs are allowed to be GCSE'ed, is this intentional or by 
> accident?
It's intentional; a SYMBOL_REF if often be rather expensive.  Some 
CONST_INTs can have that same property.  One could argue that an 
expensive CONST_INT shouldn't be appearing in RTL, but certainly some 
ports have chosen to handle splitting insns with expensive constants 
later in the pipeline.

>
> In any case, it seems like a good idea to GCSE constants and 
> symbol_refs that need something beyond a simple (set) to get into a 
> register, and not GCSE them otherwise.
Rather than triggering this on the PARALLEL it might be better to 
trigger it on the cost of the RTX.  Triggering on the PARALLEL looks 
like a hack to me -- IMHO we'd be better off fixing the costing 
mechanism and using costing as the trigger.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 16:03 ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-16 17:19   ` Paolo Bonzini
  2010-06-16 17:23     ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  2010-06-21 19:00   ` 0003-Improve-VBEout-computation.patch Jeff Law
  1 sibling, 1 reply; 94+ messages in thread
From: Paolo Bonzini @ 2010-06-16 17:19 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

+	 that are calculated along every path from BB.
+	 E.g., it will not try to optimize the following case:
+
+		  2
+                  | \
+		  3* |
+	          | /
+		  4
+                 / \
+                5*  6
+
+	 ;; "*" marks basic blocks that calculate same expression
+	 ;; Ideally, all calculation would be moved to block 2.

No, this pessimizes the path 2->4->6.  It may cause a massive number of 
expressions to be hoisted above switch statements, see PR24123 for a 
similar failure.

Paolo

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 17:19   ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
@ 2010-06-16 17:23     ` Maxim Kuvyrkov
  2010-06-16 17:32       ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
  2010-06-21 18:58       ` 0003-Improve-VBEout-computation.patch Jeff Law
  0 siblings, 2 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 17:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jeff Law, gcc-patches

On 6/16/10 8:57 PM, Paolo Bonzini wrote:
> +  that are calculated along every path from BB.
> + E.g., it will not try to optimize the following case:
> +
> + 2
> + | \
> + 3* |
> + | /
> + 4
> + / \
> + 5* 6
> +
> + ;; "*" marks basic blocks that calculate same expression
> + ;; Ideally, all calculation would be moved to block 2.
>
> No, this pessimizes the path 2->4->6.

Code hoisting is used only when optimizing for size, otherwise PRE is 
used.  Maybe I'm missing something, but /speed/ regression of the path 
2->4->6 is acceptable as long as overall code size goes down.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 17:23     ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-16 17:32       ` Paolo Bonzini
  2010-06-16 17:50         ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  2010-06-21 18:58       ` 0003-Improve-VBEout-computation.patch Jeff Law
  1 sibling, 1 reply; 94+ messages in thread
From: Paolo Bonzini @ 2010-06-16 17:32 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

On 06/16/2010 07:03 PM, Maxim Kuvyrkov wrote:
> On 6/16/10 8:57 PM, Paolo Bonzini wrote:
>> + that are calculated along every path from BB.
>> + E.g., it will not try to optimize the following case:
>> +
>> + 2
>> + | \
>> + 3* |
>> + | /
>> + 4
>> + / \
>> + 5* 6
>> +
>> + ;; "*" marks basic blocks that calculate same expression
>> + ;; Ideally, all calculation would be moved to block 2.
>>
>> No, this pessimizes the path 2->4->6.
>
> Code hoisting is used only when optimizing for size, otherwise PRE is
> used. Maybe I'm missing something, but /speed/ regression of the path
> 2->4->6 is acceptable as long as overall code size goes down.

The traditional "code hoisting" optimization doesn't penalize speed 
(except possibly for increased register pressure and possibly extra 
spilling), so you're not doing the same optimization after your patch 
anymore.

And I'm talking about _massive_ speed regressions when this behavior 
happens.  Try timing

    double i, k, l;
    k = l = 0;
    for (i=1;i<1000000000;i++)
      {
        double j;
        if (i < 10)
          k += 1/i;
        if (i > 999999990)
          l += 1/i;
      }

with and without your patch.  If I understood correctly, you'll compute 
999999999 divisions instead of 18.

Paolo

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0007-Add-open-ended-comments.patch
  2010-06-16 16:25 ` 0007-Add-open-ended-comments.patch Maxim Kuvyrkov
@ 2010-06-16 17:46   ` Jeff Law
  2010-06-23 20:45     ` 0007-Add-open-ended-comments.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-06-16 17:46 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

> This patch adds several open-ended comments in gcse.c.  I'll be happy 
> if anyone can answer some of them, in which case I'll check in the 
> answers, rather than questions :).
>
> Thank you,
>
@@ -3431,7 +3431,9 @@ process_insert_insn (struct expr *expr)

     For PRE, we want to verify that the expr is either transparent
     or locally anticipatable in the target block.  This check makes
-   no sense for code hoisting.  */
+   no sense for code hoisting.
+   ??? We always call this function with (PRE == 0), which makes the checks
+   useless.  */
See pre_edge_insert and search for EDGE_ABNORMAL.

That code went through several iterations and may no longer be 
necessary.  So we really should extend the existing comment before the 
call to insert_insn_end_basic_block from pre_edge_insert.

@@ -3535,6 +3537,9 @@ insert_insn_end_basic_block (struct expr *expr, 
basic_block bb, int pre)
    else
      new_insn = emit_insn_after_noloc (pat, insn, bb);

+  /* ??? It maybe useful to try set REG_EQUAL note on NEW_INSN here.
+     How can we do it?  */
+

Why do you think this is important?  Have you run into cases where 
having the note showing an alternate form of the expression would have 
allowed further optimization?

So, when you find a hoistable expression that reaches from its new block 
to a dominated child block which also evaluates the expression, if the 
insn in the dominated child has a REG_EQUAL note, you might be able to 
copy it.  So record it into a variable in the loop over the dominated 
blocks.  At the end of that loop, copy the REG_EQUAL note to the new 
insn (which you'll need to record as well).    You may have to verify 
the note is safe to copy/move.  I haven't pondered that aspect at all.

    /* Walk over each basic block looking for potentially hoistable
-     expressions, nothing gets hoisted from the entry block.  */
+     expressions, nothing gets hoisted from the entry block.
+
+     ??? It maybe worthwhile to walk CFG in DFS order over the 
dominator tree.
+     One can imagine a case when a dominated block B is linked before
+     its dominator A, so if expressions were hoisted from blocks C and D,
+     which B (and A) dominates, then it may occur that we miss
+     an optimization of moving these expressions all the way to A.
+     Alternatively, we may handle this case by updating expressions'
+     occurences to include instructions emitted by code hoisting, i.e.,
+     an expression emitted at the end of B will then be hoisted to A.  */
Well, yea, I guess this is possible.  I'm not sure if it happens much in 
practice.   I think your best bet would be controlling the order of 
blocks visited.  Trying to update the tables on the fly seems like it's 
going to get ugly quick.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 17:32       ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
@ 2010-06-16 17:50         ` Maxim Kuvyrkov
  2010-06-16 19:10           ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 17:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jeff Law, gcc-patches

On 6/16/10 9:10 PM, Paolo Bonzini wrote:
> On 06/16/2010 07:03 PM, Maxim Kuvyrkov wrote:
>> On 6/16/10 8:57 PM, Paolo Bonzini wrote:
>>> + that are calculated along every path from BB.
>>> + E.g., it will not try to optimize the following case:
>>> +
>>> + 2
>>> + | \
>>> + 3* |
>>> + | /
>>> + 4
>>> + / \
>>> + 5* 6
>>> +
>>> + ;; "*" marks basic blocks that calculate same expression
>>> + ;; Ideally, all calculation would be moved to block 2.
>>>
>>> No, this pessimizes the path 2->4->6.
...
> with and without your patch.

To make sure we're on the same page: the patch consists of two parts:

1. the big open-ended comment that describes an unimplemented possible 
improvement; I assume this what you're referring to; and

2. a smaller comment and 2 lines of code that implement a different 
improvement to VBEout computation that is in line with traditional 
algorithm and doesn't penalize speed.

I do not make strong claims if the change described in (1) is an actual 
improvement.

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 17:50         ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-16 19:10           ` Paolo Bonzini
  2010-06-16 19:25             ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Paolo Bonzini @ 2010-06-16 19:10 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

On 06/16/2010 07:19 PM, Maxim Kuvyrkov wrote:
> the patch consists of two parts:
>
> 1. the big open-ended comment that describes an unimplemented possible
> improvement; I assume this what you're referring to; and

Yes.

> 2. a smaller comment and 2 lines of code that implement a different
> improvement to VBEout computation that is in line with traditional
> algorithm and doesn't penalize speed.

Got it now. :)

I was confused by the

+	      sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
+					     hoist_vbein, bb->index);

after the comment, which however is just the preexisting code moved into 
braces.

Regarding (2), I think it's fine.  Still wondering about one thing 
though: if an expression is available at the end of BB and computed in 
BB, it is fully redundant and it should be PRE's task to remove it, 
right?  Maybe you're hitting the problem that our RTL PRE is not cascading?

Paolo

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 19:10           ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
@ 2010-06-16 19:25             ` Maxim Kuvyrkov
  2010-06-16 19:31               ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-16 19:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jeff Law, gcc-patches

On 6/16/10 9:27 PM, Paolo Bonzini wrote:
...
> Regarding (2), I think it's fine. Still wondering about one thing
> though: if an expression is available at the end of BB and computed in
> BB, it is fully redundant and it should be PRE's task to remove it,
> right? Maybe you're hitting the problem that our RTL PRE is not cascading?

PRE often increases code size, so we run it when optimizing for speed. 
When optimizing for size hoist is run in place of PRE.  Does this answer 
your question?

Regards,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 19:25             ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-16 19:31               ` Paolo Bonzini
  2010-06-21 18:46                 ` 0003-Improve-VBEout-computation.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Paolo Bonzini @ 2010-06-16 19:31 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

On 06/16/2010 07:32 PM, Maxim Kuvyrkov wrote:
> On 6/16/10 9:27 PM, Paolo Bonzini wrote:
> ...
>> Regarding (2), I think it's fine. Still wondering about one thing
>> though: if an expression is available at the end of BB and computed in
>> BB, it is fully redundant and it should be PRE's task to remove it,
>> right? Maybe you're hitting the problem that our RTL PRE is not
>> cascading?
>
> PRE often increases code size, so we run it when optimizing for speed.
> When optimizing for size hoist is run in place of PRE. Does this answer
> your question?

Yes.  It looks like a valuable addition indeed.

Can you please add a comment saying "this allows the hoisting pass to 
also perform elimination of fully redundant expressions"?

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0004-Set-pseudos-only-once.patch
  2010-06-16 16:20 ` 0004-Set-pseudos-only-once.patch Maxim Kuvyrkov
@ 2010-06-21 18:22   ` Jeff Law
  2010-06-22 12:34     ` 0004-Set-pseudos-only-once.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-06-21 18:22 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

On 06/16/10 09:57, Maxim Kuvyrkov wrote:
> IRA and reload has special relationship with pseudos that are set only 
> once.  When such pseudos initialized with constants or instances that 
> can be considered constant across the function, reload can 
> rematerialize them instead of spilling or apply other optimizations.
>
> This patch makes sure that we don't unnecessarily set same pseudo more 
> than once.
>
> OK to apply?
OK.  THanks,
Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 19:31               ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
@ 2010-06-21 18:46                 ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-21 18:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Maxim Kuvyrkov, gcc-patches

On 06/16/10 11:50, Paolo Bonzini wrote:
> On 06/16/2010 07:32 PM, Maxim Kuvyrkov wrote:
>> On 6/16/10 9:27 PM, Paolo Bonzini wrote:
>> ...
>>> Regarding (2), I think it's fine. Still wondering about one thing
>>> though: if an expression is available at the end of BB and computed in
>>> BB, it is fully redundant and it should be PRE's task to remove it,
>>> right? Maybe you're hitting the problem that our RTL PRE is not
>>> cascading?
>>
>> PRE often increases code size, so we run it when optimizing for speed.
>> When optimizing for size hoist is run in place of PRE. Does this answer
>> your question?
>
> Yes.  It looks like a valuable addition indeed.
>
> Can you please add a comment saying "this allows the hoisting pass to 
> also perform elimination of fully redundant expressions"?
What's "funny" is gcse.c used to have a classic fully-redundant 
expression elimination pass which was used when optimizing for size 
(instead of a PRE based algorithm).  It had bitrotted over time and it 
was removed.    If we can make hoisting do the job (or a sizeable 
portion of the job) without having to compute any additional dataflow, 
that's good, very good.

jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 17:23     ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  2010-06-16 17:32       ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
@ 2010-06-21 18:58       ` Jeff Law
  1 sibling, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-21 18:58 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Paolo Bonzini, gcc-patches

On 06/16/10 11:03, Maxim Kuvyrkov wrote:
> On 6/16/10 8:57 PM, Paolo Bonzini wrote:
>> +  that are calculated along every path from BB.
>> + E.g., it will not try to optimize the following case:
>> +
>> + 2
>> + | \
>> + 3* |
>> + | /
>> + 4
>> + / \
>> + 5* 6
>> +
>> + ;; "*" marks basic blocks that calculate same expression
>> + ;; Ideally, all calculation would be moved to block 2.
>>
>> No, this pessimizes the path 2->4->6.
>
> Code hoisting is used only when optimizing for size, otherwise PRE is 
> used.  Maybe I'm missing something, but /speed/ regression of the path 
> 2->4->6 is acceptable as long as overall code size goes down.
Well, within certain limits, a speed regression would be acceptable.  
But what is more important is the correctness issue.

Hoisting the expression into block #2 would introduce an evaluation of 
the expression on a path which did not have an evaluation in the 
original code (2->4->6) -- which could potentially cause a conforming 
program to begin to fail.  This can occur for memory references or other 
instructions that might potentially trap/fault on invalid input.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-16 16:03 ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  2010-06-16 17:19   ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
@ 2010-06-21 19:00   ` Jeff Law
  2010-06-22 12:30     ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  1 sibling, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-06-21 19:00 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

On 06/16/10 09:56, Maxim Kuvyrkov wrote:
> Code hoisting algorithm from Muchnick, which is the one implemented in 
> GCC, has a quirk in that it does include expressions calculated in 
> basic block in its VBEout set.  This seems odd to me; if an expression 
> is calculated in BB and is available at BB's end, then we do want to 
> hoist expressions from BB's successors to the end of BB.  This patch 
> implements this.
>
> The patch also adds an open-ended comment which describes another 
> possible improvement to the algorithm.
I think the comment that has sparked the confusion needs to be removed 
as the proposed optimization is incorrect, at least in cases where the 
expression may trap.


+      /* Enable if debugging VBE data flow problem.  */
+      if (dump_file && 0)
+        {
+          fprintf (dump_file, "vbein (%d): ", bb->index);
+          dump_sbitmap_file (dump_file, hoist_vbein[bb->index]);
+          fprintf (dump_file, "vbeout(%d): ", bb->index);
+          dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+        }

ISTM you should move this code to the end of compute_code_hoist_vbeinout 
and have it iterate over each block -- you don't want to dump the VBE 
state at each transition change, ISTM you want to dump the vbein/vbeout 
for each block when the solution is complete.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-16 16:43   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-21 19:45     ` Jeff Law
  2010-06-21 20:27       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  2010-06-29 19:22       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 2 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-21 19:45 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Maxim Kuvyrkov, gcc-patches

On 06/16/10 10:27, Steven Bosscher wrote:
> On Wed, Jun 16, 2010 at 5:58 PM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>    
>> Currently, code hoisting only checks immediately-dominated blocks for
>> expressions to hoist.  I wonder if limiting the search for expressions is
>> intentional.
>>
>> This patch makes code hoisting search through all dominated blocks for
>> expressions to hoist.
>>      
> And makes the algorithm quadratic in the size of the CFG. You should
> limit the depth not only to avoid excessive live range lengths but
> also for corner cases of strangely-formed CFGs.
>    
Technically true, but we only care about the dominance tree here, not 
the entire CFG.  The change which made code hoisting only look at the 
immediate dominators was a mistake.  It's unfortunate that 
get_dominated_by only returns immediate dominators -- based on the name, 
one could reasonably expect to get the full set of dominators which I 
suspect happened back in 2002 when that change was made.

Maxim -- can you test f90-intrinsic-bit.f  with and without this change 
and report back on how the compilation time changes?

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 19:45     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-06-21 20:27       ` Steven Bosscher
  2010-06-21 21:35         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  2010-06-29 19:22       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  1 sibling, 1 reply; 94+ messages in thread
From: Steven Bosscher @ 2010-06-21 20:27 UTC (permalink / raw)
  To: Jeff Law; +Cc: Maxim Kuvyrkov, gcc-patches

On Mon, Jun 21, 2010 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
> On 06/16/10 10:27, Steven Bosscher wrote:
>>
>> On Wed, Jun 16, 2010 at 5:58 PM, Maxim Kuvyrkov<maxim@codesourcery.com>
>>  wrote:
>>
>>>
>>> Currently, code hoisting only checks immediately-dominated blocks for
>>> expressions to hoist.  I wonder if limiting the search for expressions is
>>> intentional.
>>>
>>> This patch makes code hoisting search through all dominated blocks for
>>> expressions to hoist.
>>>
>>
>> And makes the algorithm quadratic in the size of the CFG. You should
>> limit the depth not only to avoid excessive live range lengths but
>> also for corner cases of strangely-formed CFGs.
>>
>
> Technically true, but we only care about the dominance tree here, not the
> entire CFG.  The change which made code hoisting only look at the immediate
> dominators was a mistake.  It's unfortunate that get_dominated_by only
> returns immediate dominators -- based on the name, one could reasonably
> expect to get the full set of dominators which I suspect happened back in
> 2002 when that change was made.

I experimented with a patch similar to Maxim's already 2.5 years ago
(and offered to work on it for CS, but there was no interest in this
work at the time :-/)  See these three Bugzilla comments:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13

Note especially the pessimization in comment #13 of PR33828.
Therefore I maintain my objection to this patch.

I hope Maxim will address this and the the other issues of PR33828.

But of course there is much more benefit to be obtained from finishing
the GIMPLE hoisting pass (which I also sent to CS, FWIW). It showed
~1% code size improvement (!) on CSiBE arm-elf when I last toyed with
it, even though it is not quite perfect and hoists things it
shouldn't...

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 20:27       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-21 21:35         ` Jeff Law
  2010-06-21 21:50           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  2010-06-22 12:42           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 2 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-21 21:35 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Maxim Kuvyrkov, gcc-patches

On 06/21/10 13:28, Steven Bosscher wrote:
>
> I experimented with a patch similar to Maxim's already 2.5 years ago
> (and offered to work on it for CS, but there was no interest in this
> work at the time :-/)  See these three Bugzilla comments:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>    
Right.  This is precisely the problem with using immediate dominators.  
This doesn't argue that Maxim's approach is wrong or bad for compile 
time performance or anything like that.  It merely raises the same issue.

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c8
>    
Clearly something should be done about this.  If you have a testcase for 
Maxim that would be a help.  One could argue that adding the REG_EQUAL 
note created by hoisting to the hash table is a waste of time, fixing 
that would eliminate this problem.

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13
>
> Note especially the pessimization in comment #13 of PR33828.
> Therefore I maintain my objection to this patch.
>    
Clearly you don't want to hoist any higher than the lowest common 
dominator.  Otherwise you unreasonably lengthen lifetimes.  Maxim will 
need to address this problem as well.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 21:35         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-06-21 21:50           ` Steven Bosscher
  2010-06-21 22:21             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  2010-06-22 12:42           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  1 sibling, 1 reply; 94+ messages in thread
From: Steven Bosscher @ 2010-06-21 21:50 UTC (permalink / raw)
  To: Jeff Law; +Cc: Maxim Kuvyrkov, gcc-patches

On Mon, Jun 21, 2010 at 10:18 PM, Jeff Law <law@redhat.com> wrote:
> Clearly you don't want to hoist any higher than the lowest common dominator.
>  Otherwise you unreasonably lengthen lifetimes.  Maxim will need to address
> this problem as well.

Yes, and I need to fix this for my GIMPLE hoisting pass also. Do you
know of an algorithm for this?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 21:50           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-21 22:21             ` Jeff Law
  2010-06-21 22:26               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-06-21 22:21 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Maxim Kuvyrkov, gcc-patches

On 06/21/10 14:38, Steven Bosscher wrote:
> On Mon, Jun 21, 2010 at 10:18 PM, Jeff Law<law@redhat.com>  wrote:
>    
>> Clearly you don't want to hoist any higher than the lowest common dominator.
>>   Otherwise you unreasonably lengthen lifetimes.  Maxim will need to address
>> this problem as well.
>>      
> Yes, and I need to fix this for my GIMPLE hoisting pass also. Do you
> know of an algorithm for this?
>    
I recall seeing a lowest common ancestor algorithm in literature 
somewhere, so you could run that on the dominator tree I think.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 22:21             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-06-21 22:26               ` Steven Bosscher
  2010-06-22 15:17                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Paolo Bonzini
  0 siblings, 1 reply; 94+ messages in thread
From: Steven Bosscher @ 2010-06-21 22:26 UTC (permalink / raw)
  To: Jeff Law; +Cc: Maxim Kuvyrkov, gcc-patches

On Mon, Jun 21, 2010 at 11:35 PM, Jeff Law <law@redhat.com> wrote:
> I recall seeing a lowest common ancestor algorithm in literature somewhere,
> so you could run that on the dominator tree I think.

Yay. Even wikipedia knows about lowest common ancestor algorithms.

So there you go: even if one knows a language fairly well, you can
still look for something seemingly trivial for two years and not
succeed, simply because you just don't know the right terminology.

I'm going to play with some of those algorithms right away, so that
hopefully GIMPLE hoisting will be ready before the end of stage 1
after all (this was the major blocker for me to continue working on
it...).

Thanks!

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-21 19:00   ` 0003-Improve-VBEout-computation.patch Jeff Law
@ 2010-06-22 12:30     ` Maxim Kuvyrkov
  2010-06-23 19:25       ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-22 12:30 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On 6/21/10 10:19 PM, Jeff Law wrote:
...
> + /* Enable if debugging VBE data flow problem. */
> + if (dump_file && 0)
> + {
> + fprintf (dump_file, "vbein (%d): ", bb->index);
> + dump_sbitmap_file (dump_file, hoist_vbein[bb->index]);
> + fprintf (dump_file, "vbeout(%d): ", bb->index);
> + dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
> + }
>
> ISTM you should move this code to the end of compute_code_hoist_vbeinout
> and have it iterate over each block -- you don't want to dump the VBE
> state at each transition change, ISTM you want to dump the vbein/vbeout
> for each block when the solution is complete.

The intent of this code is to see intermediate states of the data flow 
problem, but, anyway, I'll move it out of the inner loop to dump only 
the final result.

I'll post another version of the patch in a couple of days when I finish 
reworking other pieces of improvements to hoisting.

Thanks!

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0004-Set-pseudos-only-once.patch
  2010-06-21 18:22   ` 0004-Set-pseudos-only-once.patch Jeff Law
@ 2010-06-22 12:34     ` Maxim Kuvyrkov
  2010-06-23 22:01       ` 0004-Set-pseudos-only-once.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-22 12:34 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On 6/21/10 9:55 PM, Jeff Law wrote:
> On 06/16/10 09:57, Maxim Kuvyrkov wrote:
>> IRA and reload has special relationship with pseudos that are set only
>> once. When such pseudos initialized with constants or instances that
>> can be considered constant across the function, reload can
>> rematerialize them instead of spilling or apply other optimizations.
>>
>> This patch makes sure that we don't unnecessarily set same pseudo more
>> than once.
>>
>> OK to apply?
> OK. THanks,

Thank you for reviewing this and other patches.

There is similar code in gcse.c:pre_delete():

		/* Create a pseudo-reg to store the result of reaching
		   expressions into.  Get the mode for the new pseudo from
		   the mode of the original destination pseudo.  */
		if (expr->reaching_reg == NULL)
		  expr->reaching_reg = gen_reg_rtx_and_attrs (SET_DEST (set));

		gcse_emit_move_after (expr->reaching_reg, SET_DEST (set), insn);
		delete_insn (insn);
		occr->deleted_p = 1;
		changed = 1;
		gcse_subst_count++;

 From quick look at PRE, it seem that creating a new pseudo for PRE is 
also correct.  Do you know off-hand if this indeed is the case?

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 21:35         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  2010-06-21 21:50           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-22 12:42           ` Maxim Kuvyrkov
  2010-06-23 19:50             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  1 sibling, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-22 12:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 6/22/10 12:18 AM, Jeff Law wrote:
> On 06/21/10 13:28, Steven Bosscher wrote:
>>
>> I experimented with a patch similar to Maxim's already 2.5 years ago
>> (and offered to work on it for CS, but there was no interest in this
>> work at the time :-/) See these three Bugzilla comments:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
> Right. This is precisely the problem with using immediate dominators.
> This doesn't argue that Maxim's approach is wrong or bad for compile
> time performance or anything like that. It merely raises the same issue.

I agree with Steven that the search is better be constrained, possibly, 
with a large enough constant.  I've added a new parameter and a 
dominance.c function to return dominated blocks up to depth N in the 
dominator tree (with N==1 being immediate dominators and N==0 being all 
dominators).

Does this sound OK?

...
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13
>>
>> Note especially the pessimization in comment #13 of PR33828.
>> Therefore I maintain my objection to this patch.
> Clearly you don't want to hoist any higher than the lowest common
> dominator. Otherwise you unreasonably lengthen lifetimes. Maxim will
> need to address this problem as well.

This can be addressed with a walk over the dominator tree after we 
compute VBEout.  Start with the root and descend in the tree keeping a 
bitset of expressions that should be alive up the tree.  If current node

1. has a single successor,
2. has i'th expression set in VBEout,
3. the successor has i'th expression set in VBEout,
4. current node doesn't generate i'th expression,
5. i'th expression is not marked in the bitset as required up the tree,

than we can hoist i'th expression in the successor with the same result 
as in the current node and not unnecessarily extend live ranges.  There 
maybe a couple more details to the above, but the problem should be 
easily fixable.

I will post second version of the patch in a couple of days.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 22:26               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-22 15:17                 ` Paolo Bonzini
  0 siblings, 0 replies; 94+ messages in thread
From: Paolo Bonzini @ 2010-06-22 15:17 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Jeff Law, Maxim Kuvyrkov, gcc-patches

On 06/21/2010 11:50 PM, Steven Bosscher wrote:
> On Mon, Jun 21, 2010 at 11:35 PM, Jeff Law<law@redhat.com>  wrote:
>> I recall seeing a lowest common ancestor algorithm in literature somewhere,
>> so you could run that on the dominator tree I think.
>
> Yay. Even wikipedia knows about lowest common ancestor algorithms.

Even dominance.c knows, nearest_common_dominator.

:)

Paolo

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-22 12:30     ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-23 19:25       ` Maxim Kuvyrkov
  2010-06-29 19:08         ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 19:25 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 295 bytes --]

On 6/22/10 4:02 PM, Maxim Kuvyrkov wrote:
...
> I'll post another version of the patch in a couple of days when I finish
> reworking other pieces of improvements to hoisting.

Updated version.  OK to check in?

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0003-Improve-VBEout-computation.ChangeLog --]
[-- Type: text/plain, Size: 212 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (compute_code_hoist_vbeinout): Consider more expressions
	for hoisting.  Add debug print out.
	(hoist_code): Count occurences in current block too.

[-- Attachment #3: 0003-Improve-VBEout-computation.patch --]
[-- Type: text/plain, Size: 2250 bytes --]

From f171a1fedfda70c53359103011be52c56626b798 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:18:02 -0700
Subject: [PATCH 03/14] Improve VBEout computation

---
 gcc/gcse.c |   29 ++++++++++++++++++++++++++---
 1 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 103f0e0..22576ca 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4171,8 +4171,16 @@ compute_code_hoist_vbeinout (void)
       FOR_EACH_BB_REVERSE (bb)
 	{
 	  if (bb->next_bb != EXIT_BLOCK_PTR)
-	    sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
-					   hoist_vbein, bb->index);
+	    {
+	      sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
+					     hoist_vbein, bb->index);
+
+	      /* One of the quirks of code hoisting algorithm in Muchnick
+		 is that VBEout[BB] does not include expressions calculated
+		 in BB itself and available at its end.  Fix this.  */
+	      sbitmap_a_or_b (hoist_vbeout[bb->index],
+			      hoist_vbeout[bb->index], comp[bb->index]);
+	    }
 
 	  changed |= sbitmap_a_or_b_and_c_cg (hoist_vbein[bb->index],
 					      antloc[bb->index],
@@ -4184,7 +4192,17 @@ compute_code_hoist_vbeinout (void)
     }
 
   if (dump_file)
-    fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
+    {
+      fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbein (%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbein[bb->index]);
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
 }
 
 /* Top level routine to do the dataflow analysis needed by code hoisting.  */
@@ -4298,6 +4316,11 @@ hoist_code (void)
 	  if (TEST_BIT (hoist_vbeout[bb->index], i)
 	      && TEST_BIT (transpout[bb->index], i))
 	    {
+	      /* If an expression is computed in BB and is available at end of
+		 BB, hoist all occurences dominated by BB to BB.  */
+	      if (TEST_BIT (comp[bb->index], i))
+		hoistable++;
+
 	      /* We've found a potentially hoistable expression, now
 		 we look at every block BB dominates to see if it
 		 computes the expression.  */
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-22 12:42           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-23 19:50             ` Maxim Kuvyrkov
  2010-06-23 20:06               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Paolo Bonzini
  2010-06-24 17:11               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 2 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 19:50 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3319 bytes --]

On 6/22/10 4:24 PM, Maxim Kuvyrkov wrote:
> On 6/22/10 12:18 AM, Jeff Law wrote:
>> On 06/21/10 13:28, Steven Bosscher wrote:
>>>
>>> I experimented with a patch similar to Maxim's already 2.5 years ago
>>> (and offered to work on it for CS, but there was no interest in this
>>> work at the time :-/) See these three Bugzilla comments:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>> Right. This is precisely the problem with using immediate dominators.
>> This doesn't argue that Maxim's approach is wrong or bad for compile
>> time performance or anything like that. It merely raises the same issue.
>
> I agree with Steven that the search is better be constrained, possibly,
> with a large enough constant. I've added a new parameter and a
> dominance.c function to return dominated blocks up to depth N in the
> dominator tree (with N==1 being immediate dominators and N==0 being all
> dominators).

The attached patch adds max-hoist-depth parameter to control depth of 
descend in dominator tree.  The default value of 30 should be enough for 
most practical purposes.

>
> Does this sound OK?
>
> ...
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13
>>>
>>> Note especially the pessimization in comment #13 of PR33828.
>>> Therefore I maintain my objection to this patch.
>> Clearly you don't want to hoist any higher than the lowest common
>> dominator. Otherwise you unreasonably lengthen lifetimes. Maxim will
>> need to address this problem as well.
>
> This can be addressed with a walk over the dominator tree after we
> compute VBEout. Start with the root and descend in the tree keeping a
> bitset of expressions that should be alive up the tree. If current node
>
> 1. has a single successor,
> 2. has i'th expression set in VBEout,
> 3. the successor has i'th expression set in VBEout,
> 4. current node doesn't generate i'th expression,
> 5. i'th expression is not marked in the bitset as required up the tree,
>
> than we can hoist i'th expression in the successor with the same result
> as in the current node and not unnecessarily extend live ranges. There
> maybe a couple more details to the above, but the problem should be
> easily fixable.

This is implemented as cleanup_code_hoist_vbeout() function.  The 
solution it produces is OK from correctness point of view (it removes 
bits from VBEout), but, please, *check my reasoning* to make sure it 
doesn't remove from VBEout expressions it shouldn't.

VBEout for the testcase in PR33828 is now just as expected:

hoisting vbeinout computation: 2 passes
vbeout(2): n_bits = 5, set = {1 3 }
vbeout(3): n_bits = 5, set = {1 3 }
vbeout(4): n_bits = 5, set = {1 3 }
vbeout(5): n_bits = 5, set = {0 1 2 3 }
vbeout(6): n_bits = 5, set = {}
hoisting vbeout cleanup pass
vbeout(2): n_bits = 5, set = {}
vbeout(3): n_bits = 5, set = {}
vbeout(4): n_bits = 5, set = {1 3 }
vbeout(5): n_bits = 5, set = {}
vbeout(6): n_bits = 5, set = {}

With cleaned up vbeout the pass hoists occurences from bb5 and bb6 to 
bb4 instead of [unnecessarily far] to bb2:

PRE/HOIST: end of bb 4, insn 47, copying expression 1 to reg 146

Cleaning up vbeout also makes for less mechanic work to be done in 
hoist_code speeding up the pass.

Any comments?  OK to check in?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0005-Also-search-non-immediately-dominated-blocks-for-exp.ChangeLog --]
[-- Type: text/plain, Size: 678 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* basic-block.h (get_dominated_to_depth): Declare.
	* dominance.c (get_dominated_to_depth): New function, use
	get_all_dominated_blocks as a base.
	(get_all_dominated_blocks): Use get_dominated_to_depth.
	
	* gcse.c (cleanup_code_hoist_vbeout): New static function.
	(compute_code_hoist_vbeinout): Use it, add debug print out.
	(compute_code_hoist_data): Compute dominators earlier.
	(hoist_code): Use get_dominated_to_depth.  Update.  Add comment.

	* params.def (PARAM_MAX_HOIST_DEPTH): New parameter to avoid
	quadratic behavior.
	* params.h (MAX_HOIST_DEPTH): New macro.
	* doc/invoke.texi (max-hoist-depth): Document.

[-- Attachment #3: 0005-Also-search-non-immediately-dominated-blocks-for-exp.patch --]
[-- Type: text/plain, Size: 9282 bytes --]

From 5ae56fb9c375bb402f1c86ed46ef3ba1ed09d422 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:24:56 -0700
Subject: [PATCH 05/14] Also search non-immediately dominated blocks for expressions to hoist

---
 gcc/basic-block.h   |    2 +
 gcc/doc/invoke.texi |    6 ++++
 gcc/dominance.c     |   22 ++++++++++++--
 gcc/gcse.c          |   81 +++++++++++++++++++++++++++++++++++++++++++++++++--
 gcc/params.def      |    8 +++++
 gcc/params.h        |    2 +
 6 files changed, 115 insertions(+), 6 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 135c0c2..1bf192d 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -854,6 +854,8 @@ extern VEC (basic_block, heap) *get_dominated_by (enum cdi_direction, basic_bloc
 extern VEC (basic_block, heap) *get_dominated_by_region (enum cdi_direction,
 							 basic_block *,
 							 unsigned);
+extern VEC (basic_block, heap) *get_dominated_to_depth (enum cdi_direction,
+							basic_block, int);
 extern VEC (basic_block, heap) *get_all_dominated_blocks (enum cdi_direction,
 							  basic_block);
 extern void add_to_dominance_info (enum cdi_direction, basic_block);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9e517e9..05ebcf0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8195,6 +8195,12 @@ when @option{-ftree-vectorize} is used.  The number of iterations after
 vectorization needs to be greater than the value specified by this option
 to allow vectorization.  The default value is 0.
 
+@item max-hoist-depth
+The depth of search in the dominator tree for expressions to hoist.
+This is used to avoid quadratic behavior in hoisting algorithm.
+The value of 0 will avoid limiting the search, but may slow down compilation
+of huge functions.  The default value is 30.
+
 @item max-unrolled-insns
 The maximum number of instructions that a loop should have if that loop
 is unrolled, and if the loop is unrolled, it determines how many times
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c2dcf0..7861439 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -783,16 +783,20 @@ get_dominated_by_region (enum cdi_direction dir, basic_block *region,
 }
 
 /* Returns the list of basic blocks including BB dominated by BB, in the
-   direction DIR.  The vector will be sorted in preorder.  */
+   direction DIR up to DEPTH in the dominator tree.  The DEPTH of zero will
+   produce a vector containing all dominated blocks.  The vector will be sorted
+   in preorder.  */
 
 VEC (basic_block, heap) *
-get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+get_dominated_to_depth (enum cdi_direction dir, basic_block bb, int depth)
 {
   VEC(basic_block, heap) *bbs = NULL;
   unsigned i;
+  unsigned next_level_start;
 
   i = 0;
   VEC_safe_push (basic_block, heap, bbs, bb);
+  next_level_start = 1; /* = VEC_length (basic_block, bbs); */
 
   do
     {
@@ -803,12 +807,24 @@ get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
 	   son;
 	   son = next_dom_son (dir, son))
 	VEC_safe_push (basic_block, heap, bbs, son);
+
+      if (i == next_level_start && --depth)
+	next_level_start = VEC_length (basic_block, bbs);
     }
-  while (i < VEC_length (basic_block, bbs));
+  while (i < next_level_start);
 
   return bbs;
 }
 
+/* Returns the list of basic blocks including BB dominated by BB, in the
+   direction DIR.  The vector will be sorted in preorder.  */
+
+VEC (basic_block, heap) *
+get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+{
+  return get_dominated_to_depth (dir, bb, 0);
+}
+
 /* Redirect all edges pointing to BB to TO.  */
 void
 redirect_immediate_dominators (enum cdi_direction dir, basic_block bb,
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 3af1a01..e323db1 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4145,6 +4145,57 @@ free_code_hoist_mem (void)
   free_dominance_info (CDI_DOMINATORS);
 }
 
+/* Cleanup VBEout of BB and its successors in the dominator tree.  */
+
+static void
+cleanup_code_hoist_vbeout (basic_block bb)
+{
+  basic_block son;
+  bool first_p = true;
+
+  /* We follow two rules to clean up VBEout[BB]:
+
+     1. If BB does not have any dominated blocks, nothing will ever be hoisted
+     to BB, so we can just wipe its VBEout clean.
+
+     2. If an expression can be hoisted both to BB and to a *single* successor
+     of BB in the dominator tree, then there is no point of hoisting
+     the expression to BB over BB's successor.  Doing otherwise would
+     unnecessarily extend live ranges.  One exception to this rule is when
+     an expression is computed in BB and available at BB's end, so we need
+     to subtract comp[bb] from the set of expressions that are present in
+     only one of the dominated blocks.  */
+
+  for (son = first_dom_son (CDI_DOMINATORS, bb);
+       son != NULL;
+       son = next_dom_son (CDI_DOMINATORS, son))
+    {
+      cleanup_code_hoist_vbeout (son);
+
+      if (first_p)
+	{
+	  sbitmap_copy (hoist_vbein[bb->index], hoist_vbeout[son->index]);
+	  first_p = false;
+	}
+      else
+	sbitmap_difference (hoist_vbein[bb->index],
+			    hoist_vbein[bb->index], hoist_vbeout[son->index]);
+    }
+
+  if (!first_p)
+    {
+      sbitmap_difference (hoist_vbein[bb->index],
+			  hoist_vbein[bb->index], comp[bb->index]);
+
+      if (sbitmap_any_common_bits (hoist_vbeout[bb->index],
+				   hoist_vbein[bb->index]))
+	sbitmap_difference (hoist_vbeout[bb->index],
+			    hoist_vbeout[bb->index], hoist_vbein[bb->index]);
+    }
+  else
+    sbitmap_zero (hoist_vbeout[bb->index]);
+}
+
 /* Compute the very busy expressions at entry/exit from each block.
 
    An expression is very busy if all paths from a given point
@@ -4203,6 +4254,19 @@ compute_code_hoist_vbeinout (void)
 	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
 	}
     }
+
+  cleanup_code_hoist_vbeout (ENTRY_BLOCK_PTR->next_bb);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "hoisting vbeout cleanup pass\n");
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
 }
 
 /* Top level routine to do the dataflow analysis needed by code hoisting.  */
@@ -4212,8 +4276,8 @@ compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
   compute_transpout ();
-  compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
+  compute_code_hoist_vbeinout ();
   if (dump_file)
     fprintf (dump_file, "\n");
 }
@@ -4306,7 +4370,8 @@ hoist_code (void)
       int found = 0;
       int insn_inserted_p;
 
-      domby = get_dominated_by (CDI_DOMINATORS, bb);
+      domby = get_dominated_to_depth (CDI_DOMINATORS, bb, MAX_HOIST_DEPTH);
+
       /* Examine each expression that is very busy at the exit of this
 	 block.  These are the potentially hoistable expressions.  */
       for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
@@ -4397,7 +4462,11 @@ hoist_code (void)
 		     it would be safe to compute it at the start of the
 		     dominated block.  Now we have to determine if the
 		     expression would reach the dominated block if it was
-		     placed at the end of BB.  */
+		     placed at the end of BB.
+		     Note: the fact that hoist_exprs has i-th bit set means
+		     that /some/, not necesserilly all, occurences from
+		     the dominated blocks can be hoisted to BB.  Here we check
+		     if a specific occurence can be hoisted to BB.  */
 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL))
 		    {
 		      struct expr *expr = index_map[i];
@@ -4410,6 +4479,12 @@ hoist_code (void)
 			occr = occr->next;
 
 		      gcc_assert (occr);
+
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			continue;
+
 		      insn = occr->insn;
 		      set = single_set (insn);
 		      gcc_assert (set);
diff --git a/gcc/params.def b/gcc/params.def
index 35650ff..f08d482 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -219,6 +219,14 @@ DEFPARAM(PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION,
 	"gcse-after-reload-critical-fraction",
 	"The threshold ratio of critical edges execution count that permit performing redundancy elimination after reload",
         10, 0, 0)
+/* How deep from a given basic block the dominator tree should be searched
+   for expressions to hoist to the block.  The value of 0 will avoid limiting
+   the search.  */
+DEFPARAM(PARAM_MAX_HOIST_DEPTH,
+	 "max-hoist-depth",
+	 "Maximum depth of search in the dominator tree for expressions to hoist",
+	 30, 0, 0)
+
 /* This parameter limits the number of insns in a loop that will be unrolled,
    and by how much the loop is unrolled.
 
diff --git a/gcc/params.h b/gcc/params.h
index 833fc3b..c0404ca 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -125,6 +125,8 @@ typedef enum compiler_param
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION)
 #define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION)
+#define MAX_HOIST_DEPTH \
+  PARAM_VALUE (PARAM_MAX_HOIST_DEPTH)
 #define MAX_UNROLLED_INSNS \
   PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)
 #define MAX_SMS_LOOP_NUMBER \
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-23 19:50             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-23 20:06               ` Paolo Bonzini
  2010-06-23 20:30                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-06-24 17:11               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  1 sibling, 1 reply; 94+ messages in thread
From: Paolo Bonzini @ 2010-06-23 20:06 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, Steven Bosscher, gcc-patches

On 06/23/2010 09:08 PM, Maxim Kuvyrkov wrote:
> On 6/22/10 4:24 PM, Maxim Kuvyrkov wrote:
>> On 6/22/10 12:18 AM, Jeff Law wrote:
>>> On 06/21/10 13:28, Steven Bosscher wrote:
>>>>
>>>> I experimented with a patch similar to Maxim's already 2.5 years ago
>>>> (and offered to work on it for CS, but there was no interest in this
>>>> work at the time :-/) See these three Bugzilla comments:
>>>>
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>>> Right. This is precisely the problem with using immediate dominators.
>>> This doesn't argue that Maxim's approach is wrong or bad for compile
>>> time performance or anything like that. It merely raises the same issue.
>>
>> I agree with Steven that the search is better be constrained, possibly,
>> with a large enough constant. I've added a new parameter and a
>> dominance.c function to return dominated blocks up to depth N in the
>> dominator tree (with N==1 being immediate dominators and N==0 being all
>> dominators).
>
> The attached patch adds max-hoist-depth parameter to control depth of
> descend in dominator tree. The default value of 30 should be enough for
> most practical purposes.

30 seems like "infinite" for most practical cases.  Have you measured 
the code size impact on CSiBE with say 1, 5, 10, 20, 30?

Paolo

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-23 20:06               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Paolo Bonzini
@ 2010-06-23 20:30                 ` Maxim Kuvyrkov
  2010-06-23 21:23                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 20:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jeff Law, Steven Bosscher, gcc-patches

On 6/23/10 11:20 PM, Paolo Bonzini wrote:
> On 06/23/2010 09:08 PM, Maxim Kuvyrkov wrote:
>> On 6/22/10 4:24 PM, Maxim Kuvyrkov wrote:
>>> On 6/22/10 12:18 AM, Jeff Law wrote:
>>>> On 06/21/10 13:28, Steven Bosscher wrote:
>>>>>
>>>>> I experimented with a patch similar to Maxim's already 2.5 years ago
>>>>> (and offered to work on it for CS, but there was no interest in this
>>>>> work at the time :-/) See these three Bugzilla comments:
>>>>>
>>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>>>> Right. This is precisely the problem with using immediate dominators.
>>>> This doesn't argue that Maxim's approach is wrong or bad for compile
>>>> time performance or anything like that. It merely raises the same
>>>> issue.
>>>
>>> I agree with Steven that the search is better be constrained, possibly,
>>> with a large enough constant. I've added a new parameter and a
>>> dominance.c function to return dominated blocks up to depth N in the
>>> dominator tree (with N==1 being immediate dominators and N==0 being all
>>> dominators).
>>
>> The attached patch adds max-hoist-depth parameter to control depth of
>> descend in dominator tree. The default value of 30 should be enough for
>> most practical purposes.
>
> 30 seems like "infinite" for most practical cases.

Exactly.  The purpose of the parameter is to avoid quadratic behavior on 
weird CFGs.  For normal graphs code hoisting should traverse the whole 
structure.

The problem of excessive code hoisting that arises when looking for 
expressions in the whole dominator subtree [instead of just immediately 
dominated blocks] will be addressed in another patch I'm about to post. 
  The the GCSE-complex-constants thread.

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0006-GCSE-complex-constants.patch
  2010-06-16 17:18   ` 0006-GCSE-complex-constants.patch Jeff Law
@ 2010-06-23 20:39     ` Maxim Kuvyrkov
       [not found]       ` <4C2BBEB5.4080209@codesourcery.com>
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 20:39 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1956 bytes --]

On 6/16/10 8:54 PM, Jeff Law wrote:
>> Certain architectures (e.g., ARM) cannot easily operate with
>> constants, they need to emit sequences of several instructions to load
>> constants into registers. The common procedure to do this is to emit a
>> (parallel [(set) (clobber (reg1)) ... (clobber (regN))]) instruction
>> which later splits into several instructions using pseudos (regX) to
>> store intermediate values.
>>
>> Currently PRE and hoist do not GCSE constants, and there is a good
>> reason for that, to avoid increasing register pressure; interestingly,
>> symbol_refs are allowed to be GCSE'ed, is this intentional or by
>> accident?
> It's intentional; a SYMBOL_REF if often be rather expensive. Some
> CONST_INTs can have that same property. One could argue that an
> expensive CONST_INT shouldn't be appearing in RTL, but certainly some
> ports have chosen to handle splitting insns with expensive constants
> later in the pipeline.
>
>>
>> In any case, it seems like a good idea to GCSE constants and
>> symbol_refs that need something beyond a simple (set) to get into a
>> register, and not GCSE them otherwise.
> Rather than triggering this on the PARALLEL it might be better to
> trigger it on the cost of the RTX. Triggering on the PARALLEL looks like
> a hack to me -- IMHO we'd be better off fixing the costing mechanism and
> using costing as the trigger.

Here is reworked patch that

(a) introduces max_distance property to expressions (counted in 
instructions),

(b) uses RTX cost model to estimate how far an expression can travel 
(the greater the cost, the farther the distance), and

(c) adds two new parameters to tweak the above.

I am yet to do benchmarking on ARM and x86[_64] to find out what the 
optimal parameter values are.  Before starting with testing I would like 
to get feedback on the concept and its implementation.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0006-GCSE-simple-expression.ChangeLog --]
[-- Type: text/plain, Size: 784 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (struct expr:max_distance): New field.
	(want_to_gcse_p): Change signature.  Allow GCSE of constants.
	Set max_distance.
	(insert_expr_in_table): Set new max_distance field.
	(hash_scan_set): Update.
	(hoist_expr_reaches_here_p): Stop search after max_distance
	instructions.
	(find_antic_occr_in_bb): New static function.  Use it in ...
	(hoist_code): Calculate sizes of basic block before any changes are
	done.  Pass max_distance to hoist_expr_reaches_here_p.

	* params.def (PARAM_GCSE_COST_DISTANCE_RATIO,)
	(PARAM_GCSE_UNRESTRICTED_COST): New parameters.
	* params.h (GCSE_COST_DISTANCE_RATIO, GCSE_UNRESTRICTED_COST): New
	macros.
	* doc/invoke.texi (gcse-cost-distance-ratio, gcse-unrestricted-cost):
	Document.

[-- Attachment #3: 0006-GCSE-simple-expression.patch --]
[-- Type: text/plain, Size: 18966 bytes --]

From 8a23b3ad5d4cfc4794a9de74482146330183ac17 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:26:34 -0700
Subject: [PATCH 06/14] GCSE simple expression

---
 gcc/doc/invoke.texi |   13 +++
 gcc/gcse.c          |  224 ++++++++++++++++++++++++++++++++++++++++++++-------
 gcc/params.def      |   15 ++++
 gcc/params.h        |    4 +
 4 files changed, 226 insertions(+), 30 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 05ebcf0..ebdd5c1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8195,6 +8195,19 @@ when @option{-ftree-vectorize} is used.  The number of iterations after
 vectorization needs to be greater than the value specified by this option
 to allow vectorization.  The default value is 0.
 
+@item gcse-cost-distance-ratio
+Scaling factor in calculation of maximum distance an expression
+can be moved by GCSE optimizations.  This is currently supported only in
+code hoisting pass.  The bigger the ratio, the more agressive code hoisting
+will be with expressions which have cost less than
+@option{gcse-unrestricted-cost}.  The default value is 2.
+
+@item gcse-unrestricted-cost
+Cost at which GCSE optimizations will not constraint the distance
+an expression can travel.  This is currently supported only in
+code hoisting pass.  The lesser the cost, the more aggressive code hoisting
+will be.  The default value is 3.
+
 @item max-hoist-depth
 The depth of search in the dominator tree for expressions to hoist.
 This is used to avoid quadratic behavior in hoisting algorithm.
diff --git a/gcc/gcse.c b/gcc/gcse.c
index e323db1..45eb7bc 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -295,6 +295,12 @@ struct expr
      The value is the newly created pseudo-reg to record a copy of the
      expression in all the places that reach the redundant copy.  */
   rtx reaching_reg;
+  /* Maximum distance in instructions this expression can travel.
+     We avoid moving simple expressions for more than a few instructions
+     to keep register pressure under control.
+     A value of "0" removes restrictions on how far the expression can
+     travel.  */
+  int max_distance;
 };
 
 /* Occurrence of an expression.
@@ -431,12 +437,12 @@ static void hash_scan_insn (rtx, struct hash_table_d *);
 static void hash_scan_set (rtx, rtx, struct hash_table_d *);
 static void hash_scan_clobber (rtx, rtx, struct hash_table_d *);
 static void hash_scan_call (rtx, rtx, struct hash_table_d *);
-static int want_to_gcse_p (rtx);
+static int want_to_gcse_p (rtx, int *);
 static bool gcse_constant_p (const_rtx);
 static int oprs_unchanged_p (const_rtx, const_rtx, int);
 static int oprs_anticipatable_p (const_rtx, const_rtx);
 static int oprs_available_p (const_rtx, const_rtx);
-static void insert_expr_in_table (rtx, enum machine_mode, rtx, int, int,
+static void insert_expr_in_table (rtx, enum machine_mode, rtx, int, int, int,
 				  struct hash_table_d *);
 static void insert_set_in_table (rtx, rtx, struct hash_table_d *);
 static unsigned int hash_expr (const_rtx, enum machine_mode, int *, int);
@@ -496,7 +502,8 @@ static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *);
+static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
+				      int, int *);
 static int hoist_code (void);
 static int one_code_hoisting_pass (void);
 static rtx process_insert_insn (struct expr *);
@@ -754,8 +761,11 @@ static basic_block current_bb;
    GCSE.  */
 
 static int
-want_to_gcse_p (rtx x)
+want_to_gcse_p (rtx x, int *max_distance_ptr)
 {
+  int cost;
+  int max_distance;
+
 #ifdef STACK_REGS
   /* On register stack architectures, don't GCSE constants from the
      constant pool, as the benefits are often swamped by the overhead
@@ -764,14 +774,42 @@ want_to_gcse_p (rtx x)
     x = avoid_constant_pool_reference (x);
 #endif
 
+  /* GCSE'ing constants:
+
+     We do not specifically distinguish between constant and non-constant
+     expressions in PRE and Hoist.  We use rtx_cost below to limit
+     the maximum distance simple expressions can travel.
+
+     Nevertheless, constants are much easier to GCSE, and, hence,
+     it is easy to overdo the optimizations.  Usually, excessive PRE and
+     Hoisting of constant leads to increased register pressure.
+
+     RA can deal with this by rematerialing some of the constants.
+     Therefore, it is important that the back-end generates sets of constants
+     in a way that allows reload rematerialize them under high register
+     pressure, i.e., a pseudo register with REG_EQUAL to constant
+     is set only once.  Failing to do so will result in IRA/reload
+     spilling such constants under high register pressure instead of
+     rematerializing them.  */
+
+  cost = rtx_cost (x, SET, optimize_function_for_speed_p (cfun));
+
+  if (cost < COSTS_N_INSNS (GCSE_UNRESTRICTED_COST))
+    {
+      max_distance = GCSE_COST_DISTANCE_RATIO * cost;
+      if (max_distance == 0)
+	return 0;
+    }
+  else
+    max_distance = 0;
+
+  if (max_distance_ptr)
+    *max_distance_ptr = max_distance;
+
   switch (GET_CODE (x))
     {
     case REG:
     case SUBREG:
-    case CONST_INT:
-    case CONST_DOUBLE:
-    case CONST_FIXED:
-    case CONST_VECTOR:
     case CALL:
       return 0;
 
@@ -1089,11 +1127,15 @@ expr_equiv_p (const_rtx x, const_rtx y)
    It is only used if X is a CONST_INT.
 
    ANTIC_P is nonzero if X is an anticipatable expression.
-   AVAIL_P is nonzero if X is an available expression.  */
+   AVAIL_P is nonzero if X is an available expression.
+
+   MAX_DISTANCE is the maximum distance in instructions this expression can
+   be moved.
+*/
 
 static void
 insert_expr_in_table (rtx x, enum machine_mode mode, rtx insn, int antic_p,
-		      int avail_p, struct hash_table_d *table)
+		      int avail_p, int max_distance, struct hash_table_d *table)
 {
   int found, do_not_record_p;
   unsigned int hash;
@@ -1136,7 +1178,10 @@ insert_expr_in_table (rtx x, enum machine_mode mode, rtx insn, int antic_p,
       cur_expr->next_same_hash = NULL;
       cur_expr->antic_occr = NULL;
       cur_expr->avail_occr = NULL;
+      cur_expr->max_distance = max_distance;
     }
+  else
+    gcc_assert (cur_expr->max_distance == max_distance);
 
   /* Now record the occurrence(s).  */
   if (antic_p)
@@ -1237,6 +1282,7 @@ insert_set_in_table (rtx x, rtx insn, struct hash_table_d *table)
       cur_expr->next_same_hash = NULL;
       cur_expr->antic_occr = NULL;
       cur_expr->avail_occr = NULL;
+      cur_expr->max_distance = 0; /* Not used for set_p tables.  */
     }
 
   /* Now record the occurrence.  */
@@ -1306,6 +1352,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
     {
       unsigned int regno = REGNO (dest);
       rtx tmp;
+      int max_distance = 0;
 
       /* See if a REG_EQUAL note shows this equivalent to a simpler expression.
 
@@ -1328,7 +1375,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	  && !REG_P (src)
 	  && (table->set_p
 	      ? gcse_constant_p (XEXP (note, 0))
-	      : want_to_gcse_p (XEXP (note, 0))))
+	      : want_to_gcse_p (XEXP (note, 0), NULL)))
 	src = XEXP (note, 0), pat = gen_rtx_SET (VOIDmode, dest, src);
 
       /* Only record sets of pseudo-regs in the hash table.  */
@@ -1343,7 +1390,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	     can't do the same thing at the rtl level.  */
 	  && !can_throw_internal (insn)
 	  /* Is SET_SRC something we want to gcse?  */
-	  && want_to_gcse_p (src)
+	  && want_to_gcse_p (src, &max_distance)
 	  /* Don't CSE a nop.  */
 	  && ! set_noop_p (pat)
 	  /* Don't GCSE if it has attached REG_EQUIV note.
@@ -1367,7 +1414,8 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	  int avail_p = (oprs_available_p (src, insn)
 			 && ! JUMP_P (insn));
 
-	  insert_expr_in_table (src, GET_MODE (dest), insn, antic_p, avail_p, table);
+	  insert_expr_in_table (src, GET_MODE (dest), insn, antic_p, avail_p,
+				max_distance, table);
 	}
 
       /* Record sets for constant/copy propagation.  */
@@ -1404,7 +1452,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	      do that easily for EH edges so disable GCSE on these for now.  */
 	   && !can_throw_internal (insn)
 	   /* Is SET_DEST something we want to gcse?  */
-	   && want_to_gcse_p (dest)
+	   && want_to_gcse_p (dest, NULL)
 	   /* Don't CSE a nop.  */
 	   && ! set_noop_p (pat)
 	   /* Don't GCSE if it has attached REG_EQUIV note.
@@ -1425,7 +1473,7 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 			     && ! JUMP_P (insn);
 
 	       /* Record the memory expression (DEST) in the hash table.  */
-	       insert_expr_in_table (dest, GET_MODE (dest), insn,
+	       insert_expr_in_table (dest, GET_MODE (dest), insn, 0,
 				     antic_p, avail_p, table);
              }
       }
@@ -1512,8 +1560,8 @@ dump_hash_table (FILE *file, const char *name, struct hash_table_d *table)
     if (flat_table[i] != 0)
       {
 	expr = flat_table[i];
-	fprintf (file, "Index %d (hash value %d)\n  ",
-		 expr->bitmap_index, hash_val[i]);
+	fprintf (file, "Index %d (hash value %d; max distance %d)\n  ",
+		 expr->bitmap_index, hash_val[i], expr->max_distance);
 	print_rtl (file, expr->expr);
 	fprintf (file, "\n");
       }
@@ -4284,6 +4332,8 @@ compute_code_hoist_data (void)
 
 /* Determine if the expression identified by EXPR_INDEX would
    reach BB unimpared if it was placed at the end of EXPR_BB.
+   Stop the search if the expression would need to be moved more
+   than DISTANCE instructions.
 
    It's unclear exactly what Muchnick meant by "unimpared".  It seems
    to me that the expression must either be computed or transparent in
@@ -4296,7 +4346,8 @@ compute_code_hoist_data (void)
    paths.  */
 
 static int
-hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb, char *visited)
+hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
+			   char *visited, int distance, int *bb_size)
 {
   edge pred;
   edge_iterator ei;
@@ -4309,6 +4360,18 @@ hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
       visited = XCNEWVEC (char, last_basic_block);
     }
 
+  /* Terminate the search if distance, for which EXPR is allowed to move,
+     is exhausted.  */
+  if (distance > 0)
+    {
+      distance -= bb_size[bb->index];
+
+      if (distance <= 0)
+	return 0;
+    }
+  else
+    gcc_assert (distance == 0);
+
   FOR_EACH_EDGE (pred, ei, bb->preds)
     {
       basic_block pred_bb = pred->src;
@@ -4330,8 +4393,8 @@ hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
       else
 	{
 	  visited[pred_bb->index] = 1;
-	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index,
-					   pred_bb, visited))
+	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
+					   visited, distance, bb_size))
 	    break;
 	}
     }
@@ -4341,6 +4404,19 @@ hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
   return (pred == NULL);
 }
 \f
+/* Find anticipatable occurence of EXPR in BB.  */
+static struct occr *
+find_antic_occr_in_bb (struct expr *expr, basic_block bb)
+{
+  struct occr *occr = expr->antic_occr;
+
+  /* Find the right occurrence of this expression.  */
+  while (BLOCK_FOR_INSN (occr->insn) != bb && occr)
+    occr = occr->next;
+
+  return occr;
+}
+
 /* Actually perform code hoisting.  */
 
 static int
@@ -4351,6 +4427,7 @@ hoist_code (void)
   unsigned int i,j;
   struct expr **index_map;
   struct expr *expr;
+  int *bb_size;
   int changed = 0;
 
   sbitmap_vector_zero (hoist_exprs, last_basic_block);
@@ -4363,6 +4440,28 @@ hoist_code (void)
     for (expr = expr_hash_table.table[i]; expr != NULL; expr = expr->next_same_hash)
       index_map[expr->bitmap_index] = expr;
 
+  bb_size = XCNEWVEC (int, last_basic_block);
+  FOR_EACH_BB (bb)
+    {
+      rtx bb_head;
+      rtx bb_end;
+
+      bb_head = next_nonnote_insn (BB_HEAD (bb));
+      bb_end = BB_END (bb);
+      bb_end = INSN_P (bb_end) ? bb_end : prev_nonnote_insn (bb_end);
+
+      if (bb_head && BLOCK_FOR_INSN (bb_head) == bb
+	  && bb_end && BLOCK_FOR_INSN (bb_end) == bb)
+	{
+	  gcc_assert (INSN_P (bb_head) && INSN_P (bb_end));
+	  bb_size[bb->index] = (DF_INSN_LUID (bb_end) - DF_INSN_LUID (bb_head)
+				+ 1);
+	  gcc_assert (bb_size[bb->index] >= 1);
+	}
+      else
+	bb_size[bb->index] = 0;
+    }
+
   /* Walk over each basic block looking for potentially hoistable
      expressions, nothing gets hoisted from the entry block.  */
   FOR_EACH_BB (bb)
@@ -4391,6 +4490,8 @@ hoist_code (void)
 		 computes the expression.  */
 	      for (j = 0; VEC_iterate (basic_block, domby, j, dominated); j++)
 		{
+		  int max_distance;
+
 		  /* Ignore self dominance.  */
 		  if (bb == dominated)
 		    continue;
@@ -4400,12 +4501,42 @@ hoist_code (void)
 		  if (!TEST_BIT (antloc[dominated->index], i))
 		    continue;
 
+		  expr = index_map[i];
+
+		  max_distance = expr->max_distance;
+		  if (max_distance > 0)
+		    {
+		      struct occr *occr;
+
+		      occr = find_antic_occr_in_bb (expr, dominated);
+		      gcc_assert (occr);
+
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
+		      /* Adjust MAX_DISTANCE to account for the fact that
+			 OCCR won't have to travel all of DOMINATED, but
+			 only part of it.
+			 Note: DF_INSN_LUIDs should be used cautiously once
+			 we start emitting new instructions.  Luckily, we
+			 are sure that occr->insn was present at the time of
+			 df_analyze, so it has valid DF_INSN_LUID.  */
+		      max_distance += (bb_size[dominated->index]
+				       - DF_INSN_LUID (occr->insn));
+		    }
+
 		  /* Note if the expression would reach the dominated block
 		     unimpared if it was placed at the end of BB.
 
 		     Keep track of how many times this expression is hoistable
 		     from a dominated block into BB.  */
-		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL))
+		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
+						 max_distance, bb_size))
 		    hoistable++;
 		}
 
@@ -4448,6 +4579,9 @@ hoist_code (void)
 		 computes the expression.  */
 	      for (j = 0; VEC_iterate (basic_block, domby, j, dominated); j++)
 		{
+		  struct occr *occr = NULL;
+		  int max_distance;
+
 		  /* Ignore self dominance.  */
 		  if (bb == dominated)
 		    continue;
@@ -4458,6 +4592,33 @@ hoist_code (void)
 		  if (!TEST_BIT (antloc[dominated->index], i))
 		    continue;
 
+		  expr = index_map[i];
+
+		  max_distance = expr->max_distance;
+		  if (max_distance > 0)
+		    {
+		      occr = find_antic_occr_in_bb (index_map[i], dominated);
+		      gcc_assert (occr);
+
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
+		      /* Adjust MAX_DISTANCE to account for the fact that
+			 OCCR won't have to travel all of DOMINATED, but
+			 only part of it.
+			 Note: DF_INSN_LUIDs should be used cautiously once
+			 we start emitting new instructions.  Luckily, we
+			 are sure that occr->insn was present at the time of
+			 df_analyze, so it has valid DF_INSN_LUID.  */
+		      max_distance += (bb_size[dominated->index]
+				       - DF_INSN_LUID (occr->insn));
+		    }
+
 		  /* The expression is computed in the dominated block and
 		     it would be safe to compute it at the start of the
 		     dominated block.  Now we have to determine if the
@@ -4467,23 +4628,25 @@ hoist_code (void)
 		     that /some/, not necesserilly all, occurences from
 		     the dominated blocks can be hoisted to BB.  Here we check
 		     if a specific occurence can be hoisted to BB.  */
-		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL))
+		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
+						 max_distance, bb_size))
 		    {
-		      struct expr *expr = index_map[i];
-		      struct occr *occr = expr->antic_occr;
 		      rtx insn;
 		      rtx set;
 
-		      /* Find the right occurrence of this expression.  */
-		      while (BLOCK_FOR_INSN (occr->insn) != dominated && occr)
-			occr = occr->next;
-
-		      gcc_assert (occr);
+		      if (!occr)
+			{
+			  occr = find_antic_occr_in_bb (expr, dominated);
+			  gcc_assert (occr);
+			}
 
 		      /* An occurence might've been already deleted
 			 while processing a dominator of BB.  */
 		      if (occr->deleted_p)
-			continue;
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
 
 		      insn = occr->insn;
 		      set = single_set (insn);
@@ -4519,6 +4682,7 @@ hoist_code (void)
       VEC_free (basic_block, heap, domby);
     }
 
+  free (bb_size);
   free (index_map);
 
   return changed;
diff --git a/gcc/params.def b/gcc/params.def
index f08d482..2329767 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -219,6 +219,21 @@ DEFPARAM(PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION,
 	"gcse-after-reload-critical-fraction",
 	"The threshold ratio of critical edges execution count that permit performing redundancy elimination after reload",
         10, 0, 0)
+
+/* GCSE will use GCSE_COST_DISTANCE_RATION as a scaling factor
+   to calculate maximum distance for which an expression is allowed to move
+   from its rtx_cost.  */
+DEFPARAM(PARAM_GCSE_COST_DISTANCE_RATIO,
+	 "gcse-cost-distance-ratio",
+	 "Scaling factor in calculation of maximum distance an expression can be moved by GCSE optimizations",
+	 2, 0, 0)
+/* GCSE won't restrict distance for which an expression with rtx_cost greater
+   than COSTS_N_INSN(GCSE_UNRESTRICTED_COST) is allowed to move.  */
+DEFPARAM(PARAM_GCSE_UNRESTRICTED_COST,
+	 "gcse-unrestricted-cost",
+	 "Cost at which GCSE optimizations will not constraint the distance an expression can travel",
+	 3, 0, 0)
+
 /* How deep from a given basic block the dominator tree should be searched
    for expressions to hoist to the block.  The value of 0 will avoid limiting
    the search.  */
diff --git a/gcc/params.h b/gcc/params.h
index c0404ca..aa96c81 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -125,6 +125,10 @@ typedef enum compiler_param
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION)
 #define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION)
+#define GCSE_COST_DISTANCE_RATIO \
+  PARAM_VALUE (PARAM_GCSE_COST_DISTANCE_RATIO)
+#define GCSE_UNRESTRICTED_COST \
+  PARAM_VALUE (PARAM_GCSE_UNRESTRICTED_COST)
 #define MAX_HOIST_DEPTH \
   PARAM_VALUE (PARAM_MAX_HOIST_DEPTH)
 #define MAX_UNROLLED_INSNS \
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Update compute_transpout
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (7 preceding siblings ...)
  2010-06-16 16:54 ` Improvements to code hoisting Richard Guenther
@ 2010-06-23 20:42 ` Maxim Kuvyrkov
  2010-06-23 21:57   ` Jeff Law
  2010-06-23 21:20 ` ARM improvements for GCSE Maxim Kuvyrkov
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 20:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 264 bytes --]

This patches addresses 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c5 .

The comment is indeed outdated and harmless CALL instructions are being 
unjustly treated.

OK to check in?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0007-Update-compute_transpout.ChangeLog --]
[-- Type: text/plain, Size: 170 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (compute_transpout): Remove outdated comment.  Handle
	call instructions that do not cause abnormal flow.

[-- Attachment #3: 0007-Update-compute_transpout.patch --]
[-- Type: text/plain, Size: 1129 bytes --]

From 960bb5dbe59db9a5555d26d5425074e18f238aec Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Wed, 23 Jun 2010 09:17:50 -0700
Subject: [PATCH 07/14] Update compute_transpout

---
 gcc/gcse.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 45eb7bc..09f8ddf 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4117,12 +4117,18 @@ compute_transpout (void)
 
   FOR_EACH_BB (bb)
     {
-      /* Note that flow inserted a nop at the end of basic blocks that
-	 end in call instructions for reasons other than abnormal
-	 control flow.  */
       if (! CALL_P (BB_END (bb)))
 	continue;
 
+      if (EDGE_COUNT (bb->succs) == 1
+	  && !(EDGE_SUCC (bb, 0)->flags & EDGE_COMPLEX))
+	/* The call insn doesn't involve any special control flow and
+	   just happens to be the last in basic block.  */
+	{
+	  gcc_assert (EDGE_SUCC (bb, 0)->flags & EDGE_FALLTHRU);
+	  continue;
+	}
+
       for (i = 0; i < expr_hash_table.size; i++)
 	for (expr = expr_hash_table.table[i]; expr ; expr = expr->next_same_hash)
 	  if (MEM_P (expr->expr))
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0007-Add-open-ended-comments.patch
  2010-06-16 17:46   ` 0007-Add-open-ended-comments.patch Jeff Law
@ 2010-06-23 20:45     ` Maxim Kuvyrkov
  0 siblings, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 20:45 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On 6/16/10 9:18 PM, Jeff Law wrote:
>> This patch adds several open-ended comments in gcse.c.  I'll be happy
>> if anyone can answer some of them, in which case I'll check in the
>> answers, rather than questions :).
>>
>> Thank you,
>>
> @@ -3431,7 +3431,9 @@ process_insert_insn (struct expr *expr)
>
> For PRE, we want to verify that the expr is either transparent
> or locally anticipatable in the target block. This check makes
> - no sense for code hoisting. */
> + no sense for code hoisting.
> + ??? We always call this function with (PRE == 0), which makes the checks
> + useless. */
> See pre_edge_insert and search for EDGE_ABNORMAL.
>
> That code went through several iterations and may no longer be
> necessary. So we really should extend the existing comment before the
> call to insert_insn_end_basic_block from pre_edge_insert.

I tried making pre_edge_insert to call insert_insn_end_basic_block with 
(pre == 1), and that failed with segmentation fault due to antloc being 
NULL.

Anyway, the attached patch removes unused checks.

OK to check in?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0009-Fix-call-to-insert_insn_end_basic_block.ChangeLog --]
[-- Type: text/plain, Size: 177 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (insert_insn_end_basic_block): Update signature, remove
	unused checks.
	(pre_edge_insert, hoist_code): Update.


[-- Attachment #3: 0009-Fix-call-to-insert_insn_end_basic_block.patch --]
[-- Type: text/plain, Size: 3542 bytes --]

From d8595bfdd77c24cb5e5be2f40bbaba89689b1ce4 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Wed, 23 Jun 2010 08:51:10 -0700
Subject: [PATCH 09/14] Fix call to insert_insn_end_basic_block

---
 gcc/gcse.c |   30 ++++++------------------------
 1 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index ff8dbc2..b95be91 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -491,7 +491,7 @@ static void free_pre_mem (void);
 static void compute_pre_data (void);
 static int pre_expr_reaches_here_p (basic_block, struct expr *,
 				    basic_block);
-static void insert_insn_end_basic_block (struct expr *, basic_block, int);
+static void insert_insn_end_basic_block (struct expr *, basic_block);
 static void pre_insert_copy_insn (struct expr *, rtx);
 static void pre_insert_copies (void);
 static int pre_delete (void);
@@ -3453,14 +3453,10 @@ process_insert_insn (struct expr *expr)
 
 /* Add EXPR to the end of basic block BB.
 
-   This is used by both the PRE and code hoisting.
-
-   For PRE, we want to verify that the expr is either transparent
-   or locally anticipatable in the target block.  This check makes
-   no sense for code hoisting.  */
+   This is used by both the PRE and code hoisting.  */
 
 static void
-insert_insn_end_basic_block (struct expr *expr, basic_block bb, int pre)
+insert_insn_end_basic_block (struct expr *expr, basic_block bb)
 {
   rtx insn = BB_END (bb);
   rtx new_insn;
@@ -3487,12 +3483,6 @@ insert_insn_end_basic_block (struct expr *expr, basic_block bb, int pre)
 #ifdef HAVE_cc0
       rtx note;
 #endif
-      /* It should always be the case that we can put these instructions
-	 anywhere in the basic block with performing PRE optimizations.
-	 Check this.  */
-      gcc_assert (!NONJUMP_INSN_P (insn) || !pre
-		  || TEST_BIT (antloc[bb->index], expr->bitmap_index)
-		  || TEST_BIT (transp[bb->index], expr->bitmap_index));
 
       /* If this is a jump table, then we can't insert stuff here.  Since
 	 we know the previous real insn must be the tablejump, we insert
@@ -3529,15 +3519,7 @@ insert_insn_end_basic_block (struct expr *expr, basic_block bb, int pre)
       /* Keeping in mind targets with small register classes and parameters
          in registers, we search backward and place the instructions before
 	 the first parameter is loaded.  Do this for everyone for consistency
-	 and a presumption that we'll get better code elsewhere as well.
-
-	 It should always be the case that we can put these instructions
-	 anywhere in the basic block with performing PRE optimizations.
-	 Check this.  */
-
-      gcc_assert (!pre
-		  || TEST_BIT (antloc[bb->index], expr->bitmap_index)
-		  || TEST_BIT (transp[bb->index], expr->bitmap_index));
+	 and a presumption that we'll get better code elsewhere as well.  */
 
       /* Since different machines initialize their parameter registers
 	 in different orders, assume nothing.  Collect the set of all
@@ -3634,7 +3616,7 @@ pre_edge_insert (struct edge_list *edge_list, struct expr **index_map)
 			   now.  */
 
 			if (eg->flags & EDGE_ABNORMAL)
-			  insert_insn_end_basic_block (index_map[j], bb, 0);
+			  insert_insn_end_basic_block (index_map[j], bb);
 			else
 			  {
 			    insn = process_insert_insn (index_map[j]);
@@ -4685,7 +4667,7 @@ hoist_code (void)
 
 		      if (!insn_inserted_p)
 			{
-			  insert_insn_end_basic_block (index_map[i], bb, 0);
+			  insert_insn_end_basic_block (index_map[i], bb);
 			  insn_inserted_p = 1;
 			}
 		    }
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* ARM improvements for GCSE
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (8 preceding siblings ...)
  2010-06-23 20:42 ` Update compute_transpout Maxim Kuvyrkov
@ 2010-06-23 21:20 ` Maxim Kuvyrkov
  2010-06-23 21:22   ` Maxim Kuvyrkov
                     ` (3 more replies)
  2010-07-01 11:05 ` 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach Maxim Kuvyrkov
  2010-07-27 21:21 ` Improvements to code hoisting Maxim Kuvyrkov
  11 siblings, 4 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 21:20 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

Richard,

I will post ARM-specific changes in this subthread of GCSE improvements.

I'm yet to do final benchmarking and tuning of the code hoisting 
changes, but preliminary results show about 0.5% size decrease when 
compiled with -Os for Thumb1 mode and even greater improvement for 
Thumb1 PIC code.

Thumb2 and ARM code size also goes down but to a lesser degree.  I 
focused on Thumb1 mode in this work.

I hope you'll have time to review ARM-specific changes.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: ARM improvements for GCSE
  2010-06-23 21:20 ` ARM improvements for GCSE Maxim Kuvyrkov
@ 2010-06-23 21:22   ` Maxim Kuvyrkov
  2010-06-24 11:24     ` Richard Earnshaw
  2010-06-23 21:30   ` Fix thumb1 size cost of small constants Maxim Kuvyrkov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 21:22 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 269 bytes --]

This patch improves handling of "J" and "K" constants.

If a pseudo assigned a constant and set only once, IRA/reload can 
rematerialize it to decrease high register pressure.

OK to check in?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0010-Allow-reload-rematerialize-J-and-K-constants.ChangeLog --]
[-- Type: text/plain, Size: 213 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/arm/arm.c (thumb1_size_rtx_costs): Add cost for "J" constants.
	* config/arm/arm.md (define_split "J", define_split "K"): Make
	IRA/reload friendly.

[-- Attachment #3: 0010-Allow-reload-rematerialize-J-and-K-constants.patch --]
[-- Type: text/plain, Size: 2647 bytes --]

From 10bc95586c6fc745b2cddb959d61e97cb190a7e9 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:32:25 -0700
Subject: [PATCH 10/14] Allow reload rematerialize J and K constants

---
 gcc/config/arm/arm.c  |    4 ++++
 gcc/config/arm/arm.md |   19 ++++++++++++-------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5057bac..5671587 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6931,6 +6931,10 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
         {
           if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
             return 0;
+	  /* See split "TARGET_THUMB1 && satisfies_constraint_J".  */
+	  if (INTVAL (x) >= -255 && INTVAL (x) <= -1)
+            return COSTS_N_INSNS (2);
+	  /* See split "TARGET_THUMB1 && satisfies_constraint_K".  */
           if (thumb_shiftable_const (INTVAL (x)))
             return COSTS_N_INSNS (2);
           return COSTS_N_INSNS (3);
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 628bd62..b6cca49 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5191,17 +5191,21 @@
   [(set (match_operand:SI 0 "register_operand" "")
 	(match_operand:SI 1 "const_int_operand" ""))]
   "TARGET_THUMB1 && satisfies_constraint_J (operands[1])"
-  [(set (match_dup 0) (match_dup 1))
-   (set (match_dup 0) (neg:SI (match_dup 0)))]
-  "operands[1] = GEN_INT (- INTVAL (operands[1]));"
+  [(set (match_dup 2) (match_dup 1))
+   (set (match_dup 0) (neg:SI (match_dup 2)))]
+  "
+  {
+    operands[1] = GEN_INT (- INTVAL (operands[1]));
+    operands[2] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  }"
 )
 
 (define_split 
   [(set (match_operand:SI 0 "register_operand" "")
 	(match_operand:SI 1 "const_int_operand" ""))]
   "TARGET_THUMB1 && satisfies_constraint_K (operands[1])"
-  [(set (match_dup 0) (match_dup 1))
-   (set (match_dup 0) (ashift:SI (match_dup 0) (match_dup 2)))]
+  [(set (match_dup 2) (match_dup 1))
+   (set (match_dup 0) (ashift:SI (match_dup 2) (match_dup 3)))]
   "
   {
     unsigned HOST_WIDE_INT val = INTVAL (operands[1]) & 0xffffffffu;
@@ -5212,12 +5216,13 @@
       if ((val & (mask << i)) == val)
         break;
 
-    /* Shouldn't happen, but we don't want to split if the shift is zero.  */
+    /* Don't split if the shift is zero.  */
     if (i == 0)
       FAIL;
 
     operands[1] = GEN_INT (val >> i);
-    operands[2] = GEN_INT (i);
+    operands[2] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+    operands[3] = GEN_INT (i);
   }"
 )
 
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-23 20:30                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-23 21:23                   ` Steven Bosscher
  2010-06-23 21:30                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Steven Bosscher @ 2010-06-23 21:23 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Paolo Bonzini, Jeff Law, gcc-patches

On Wed, Jun 23, 2010 at 9:52 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
>>> The attached patch adds max-hoist-depth parameter to control depth of
>>> descend in dominator tree. The default value of 30 should be enough for
>>> most practical purposes.
>>
>> 30 seems like "infinite" for most practical cases.
>
> Exactly.  The purpose of the parameter is to avoid quadratic behavior on
> weird CFGs.  For normal graphs code hoisting should traverse the whole
> structure.
>
> The problem of excessive code hoisting that arises when looking for
> expressions in the whole dominator subtree [instead of just immediately
> dominated blocks] will be addressed in another patch I'm about to post.  The
> the GCSE-complex-constants thread.

Actually the parameter also makes a difference for code size. When I
experimented with this all this time ago, I had the best CSiBE size
scores with a depth of 5 or 6.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-23 21:23                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-23 21:30                     ` Maxim Kuvyrkov
  0 siblings, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 21:30 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Paolo Bonzini, Jeff Law, gcc-patches

On 6/24/10 12:49 AM, Steven Bosscher wrote:
...
> Actually the parameter also makes a difference for code size. When I
> experimented with this all this time ago, I had the best CSiBE size
> scores with a depth of 5 or 6.

Yes, I got similar results when tested this patch without max_distance 
restriction on expressions.  Depth of 5 or 6 avoids nasty regressions in 
code with high register pressure; with a greater depth all small 
expressions get hoisted to the first basic block and register pressure 
sky-rockets.

However, it is still better to address this problem by restricting 
individual expressions with max_distance.  Using depth of the search 
seems like too big a hammer.

Regards,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Fix thumb1 size cost of small constants
  2010-06-23 21:20 ` ARM improvements for GCSE Maxim Kuvyrkov
  2010-06-23 21:22   ` Maxim Kuvyrkov
@ 2010-06-23 21:30   ` Maxim Kuvyrkov
  2010-06-24 11:28     ` Richard Earnshaw
  2010-06-23 21:35   ` Wrap calculation of PIC address into a single instruction Maxim Kuvyrkov
  2010-07-17 16:52   ` Tune hoisting for ARM Maxim Kuvyrkov
  3 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 21:30 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 443 bytes --]

This patch fixes thumb1 size cost of small constants.

Currently, the cost of SET of a constant is set to zero, which is odd 
considering that it still takes one instruction to do the operation. 
The code for Thumb2 and ARM modes returns COSTS_N_INSNS (1) for similar 
case, so the patch makes Thumb1 cost agree with ARM and Thumb2 cost.

OK to check in?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0014-Fix-thumb1-size-cost-of-small-constants.ChangeLog --]
[-- Type: text/plain, Size: 130 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/arm/arm.c (thumb1_size_rtx_costs): Fix cost of simple
	constants.

[-- Attachment #3: 0014-Fix-thumb1-size-cost-of-small-constants.patch --]
[-- Type: text/plain, Size: 860 bytes --]

From 0e6dd3f28b8d2c9142f3d5d31b3d0684785b4ce0 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Wed, 23 Jun 2010 07:20:14 -0700
Subject: [PATCH 14/14] Fix thumb1 size cost of small constants

---
 gcc/config/arm/arm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d846557..b2186d8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6941,7 +6941,7 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
       if (outer == SET)
         {
           if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
-            return 0;
+            return COSTS_N_INSNS (1);
 	  /* See split "TARGET_THUMB1 && satisfies_constraint_J".  */
 	  if (INTVAL (x) >= -255 && INTVAL (x) <= -1)
             return COSTS_N_INSNS (2);
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Wrap calculation of PIC address into a single instruction
  2010-06-23 21:20 ` ARM improvements for GCSE Maxim Kuvyrkov
  2010-06-23 21:22   ` Maxim Kuvyrkov
  2010-06-23 21:30   ` Fix thumb1 size cost of small constants Maxim Kuvyrkov
@ 2010-06-23 21:35   ` Maxim Kuvyrkov
  2010-06-23 21:38     ` Andrew Pinski
                       ` (2 more replies)
  2010-07-17 16:52   ` Tune hoisting for ARM Maxim Kuvyrkov
  3 siblings, 3 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 21:35 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

This patch enables optimizations, particularly GCSE, handle calculation 
of PIC addresses.  GCSE tracks only single instructions, so it can't 
handle two-instruction calculation of PIC address.

With this patch, calculations of PIC addresses are represented as single 
instructions allowing GCSE eliminate all but the first address 
calculation for global variables.

Any comments?  OK to check in?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0011-Wrap-calculation-of-PIC-address-into-a-single-instru.ChangeLog --]
[-- Type: text/plain, Size: 453 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/arm/arm.c (legitimize_pic_address): Use calculate_pic_address.
	Imitate the effect of gen_const_mem() on the memory reference.
	(will_be_in_index_register): New static function.
	(arm_legitimate_address_outer_p, thumb2_legitimate_address_p,)
	(thumb1_legitimate_address_p): Use it.
	* config/arm/arm.md (calculate_pic_address): Define expand and split
	to emit calculation of PIC address.

[-- Attachment #3: 0011-Wrap-calculation-of-PIC-address-into-a-single-instru.patch --]
[-- Type: text/plain, Size: 5898 bytes --]

From a48037bce0fefdebb5220b39e718dc186b2d7f69 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Fri, 18 Jun 2010 14:01:32 -0700
Subject: [PATCH 11/14] Wrap calculation of PIC address into a single instruction

---
 gcc/config/arm/arm.c  |   43 +++++++++++++++++++++++++++----------------
 gcc/config/arm/arm.md |   28 ++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5671587..d846557 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -4897,17 +4897,13 @@ legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg)
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
-      rtx pic_ref, address;
       rtx insn;
 
       if (reg == 0)
 	{
 	  gcc_assert (can_create_pseudo_p ());
 	  reg = gen_reg_rtx (Pmode);
-	  address = gen_reg_rtx (Pmode);
 	}
-      else
-	address = reg;
 
       /* VxWorks does not impose a fixed gap between segments; the run-time
 	 gap can be different from the object-file gap.  We therefore can't
@@ -4923,18 +4919,21 @@ legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg)
 	insn = arm_pic_static_addr (orig, reg);
       else
 	{
+	  rtx pat;
+	  rtx mem;
+
 	  /* If this function doesn't have a pic register, create one now.  */
 	  require_pic_register ();
 
-	  if (TARGET_32BIT)
-	    emit_insn (gen_pic_load_addr_32bit (address, orig));
-	  else /* TARGET_THUMB1 */
-	    emit_insn (gen_pic_load_addr_thumb1 (address, orig));
+	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
 
-	  pic_ref = gen_const_mem (Pmode,
-				   gen_rtx_PLUS (Pmode, cfun->machine->pic_reg,
-					         address));
-	  insn = emit_move_insn (reg, pic_ref);
+	  /* Make the MEM as close to a constant as possible.  */
+	  mem = SET_SRC (pat);
+	  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
+	  MEM_READONLY_P (mem) = 1;
+	  MEM_NOTRAP_P (mem) = 1;
+
+	  insn = emit_insn (pat);
 	}
 
       /* Put a REG_EQUAL note on this insn, so that it can be optimized
@@ -5214,6 +5213,15 @@ pcrel_constant_p (rtx x)
   return FALSE;
 }
 
+/* Return true to X will surely end up in an index register after the first
+   splitting pass.  */
+static bool
+will_be_in_index_register (const_rtx x)
+{
+  /* arm.md: calculate_pic_address will split this into a register.  */
+  return GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_PIC_SYM;
+}
+
 /* Return nonzero if X is a valid ARM state address operand.  */
 int
 arm_legitimate_address_outer_p (enum machine_mode mode, rtx x, RTX_CODE outer,
@@ -5271,8 +5279,9 @@ arm_legitimate_address_outer_p (enum machine_mode mode, rtx x, RTX_CODE outer,
       rtx xop1 = XEXP (x, 1);
 
       return ((arm_address_register_rtx_p (xop0, strict_p)
-	       && GET_CODE(xop1) == CONST_INT
-	       && arm_legitimate_index_p (mode, xop1, outer, strict_p))
+	       && ((GET_CODE(xop1) == CONST_INT
+		    && arm_legitimate_index_p (mode, xop1, outer, strict_p))
+		   || (!strict_p && will_be_in_index_register (xop1))))
 	      || (arm_address_register_rtx_p (xop1, strict_p)
 		  && arm_legitimate_index_p (mode, xop0, outer, strict_p)));
     }
@@ -5358,7 +5367,8 @@ thumb2_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
       rtx xop1 = XEXP (x, 1);
 
       return ((arm_address_register_rtx_p (xop0, strict_p)
-	       && thumb2_legitimate_index_p (mode, xop1, strict_p))
+	       && (thumb2_legitimate_index_p (mode, xop1, strict_p)
+		   || (!strict_p && will_be_in_index_register (xop1))))
 	      || (arm_address_register_rtx_p (xop1, strict_p)
 		  && thumb2_legitimate_index_p (mode, xop0, strict_p)));
     }
@@ -5661,7 +5671,8 @@ thumb1_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
 	  && XEXP (x, 0) != frame_pointer_rtx
 	  && XEXP (x, 1) != frame_pointer_rtx
 	  && thumb1_index_register_rtx_p (XEXP (x, 0), strict_p)
-	  && thumb1_index_register_rtx_p (XEXP (x, 1), strict_p))
+	  && (thumb1_index_register_rtx_p (XEXP (x, 1), strict_p)
+	      || (!strict_p && will_be_in_index_register (XEXP (x, 1)))))
 	return 1;
 
       /* REG+const has 5-7 bit offset for non-SP registers.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b6cca49..534bfc7 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5231,6 +5231,34 @@
 ;; we use an unspec.  The offset will be loaded from a constant pool entry,
 ;; since that is the only type of relocation we can use.
 
+;; Wrap calculation of the whole PIC address in a single pattern for the
+;; benefit of optimizers, particularly, PRE and HOIST.  Calculation of
+;; a PIC address involves two loads from memory, so we want to CSE it
+;; as often as possible.
+;; This pattern will be split into one of the pic_load_addr_* patterns
+;; and a move after GCSE optimizations.
+;;
+;; Note: Update arm.c: legitimize_pic_address() when changing this pattern.
+(define_expand "calculate_pic_address"
+  [(set (match_operand:SI 0 "register_operand" "")
+	(mem:SI (plus:SI (match_operand:SI 1 "register_operand" "")
+			 (unspec:SI [(match_operand:SI 2 "" "")]
+				    UNSPEC_PIC_SYM))))]
+  "flag_pic"
+)
+
+;; Split calculate_pic_address into pic_load_addr_* and a move.
+(define_split
+  [(set (match_operand:SI 0 "register_operand" "")
+	(mem:SI (plus:SI (match_operand:SI 1 "register_operand" "")
+			 (unspec:SI [(match_operand:SI 2 "" "")]
+				    UNSPEC_PIC_SYM))))]
+  "flag_pic"
+  [(set (match_dup 3) (unspec:SI [(match_dup 2)] UNSPEC_PIC_SYM))
+   (set (match_dup 0) (mem:SI (plus:SI (match_dup 1) (match_dup 3))))]
+  "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];"
+)
+
 ;; The rather odd constraints on the following are to force reload to leave
 ;; the insn alone, and to force the minipool generation pass to then move
 ;; the GOT symbol to memory.
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Wrap calculation of PIC address into a single instruction
  2010-06-23 21:35   ` Wrap calculation of PIC address into a single instruction Maxim Kuvyrkov
@ 2010-06-23 21:38     ` Andrew Pinski
  2010-06-23 21:41     ` Steven Bosscher
  2010-07-01 12:40     ` Richard Earnshaw
  2 siblings, 0 replies; 94+ messages in thread
From: Andrew Pinski @ 2010-06-23 21:38 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Earnshaw, gcc-patches

On Wed, Jun 23, 2010 at 2:19 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> This patch enables optimizations, particularly GCSE, handle calculation of
> PIC addresses.  GCSE tracks only single instructions, so it can't handle
> two-instruction calculation of PIC address.

The reg_equal note on the second instruction should have been enough
to solve this issue.  This is how it is optimized on PowerPC and some
other targets.  Why is not working for arm?

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Wrap calculation of PIC address into a single instruction
  2010-06-23 21:35   ` Wrap calculation of PIC address into a single instruction Maxim Kuvyrkov
  2010-06-23 21:38     ` Andrew Pinski
@ 2010-06-23 21:41     ` Steven Bosscher
  2010-06-23 22:23       ` Maxim Kuvyrkov
  2010-07-01 12:40     ` Richard Earnshaw
  2 siblings, 1 reply; 94+ messages in thread
From: Steven Bosscher @ 2010-06-23 21:41 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Earnshaw, gcc-patches

On Wed, Jun 23, 2010 at 11:19 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> This patch enables optimizations, particularly GCSE, handle calculation of
> PIC addresses.  GCSE tracks only single instructions, so it can't handle
> two-instruction calculation of PIC address.
>
> With this patch, calculations of PIC addresses are represented as single
> instructions allowing GCSE eliminate all but the first address calculation
> for global variables.
>
> Any comments?

Yes. This is what we added GCSE's ability to eliminate redundancies
from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note,
GCSE is (or should be) already able to eliminate redundant address
calculations.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Update compute_transpout
  2010-06-23 20:42 ` Update compute_transpout Maxim Kuvyrkov
@ 2010-06-23 21:57   ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-23 21:57 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 06/23/10 14:16, Maxim Kuvyrkov wrote:
> This patches addresses 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c5 .
>
> The comment is indeed outdated and harmless CALL instructions are 
> being unjustly treated.
>
> OK to check in?
>
OK.  Thanks,

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0004-Set-pseudos-only-once.patch
  2010-06-22 12:34     ` 0004-Set-pseudos-only-once.patch Maxim Kuvyrkov
@ 2010-06-23 22:01       ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-23 22:01 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

On 06/22/10 06:08, Maxim Kuvyrkov wrote:
> On 6/21/10 9:55 PM, Jeff Law wrote:
>> On 06/16/10 09:57, Maxim Kuvyrkov wrote:
>>> IRA and reload has special relationship with pseudos that are set only
>>> once. When such pseudos initialized with constants or instances that
>>> can be considered constant across the function, reload can
>>> rematerialize them instead of spilling or apply other optimizations.
>>>
>>> This patch makes sure that we don't unnecessarily set same pseudo more
>>> than once.
>>>
>>> OK to apply?
>> OK. THanks,
>
> Thank you for reviewing this and other patches.
>
> There is similar code in gcse.c:pre_delete():
>
>         /* Create a pseudo-reg to store the result of reaching
>            expressions into.  Get the mode for the new pseudo from
>            the mode of the original destination pseudo.  */
>         if (expr->reaching_reg == NULL)
>           expr->reaching_reg = gen_reg_rtx_and_attrs (SET_DEST (set));
>
>         gcse_emit_move_after (expr->reaching_reg, SET_DEST (set), insn);
>         delete_insn (insn);
>         occr->deleted_p = 1;
>         changed = 1;
>         gcse_subst_count++;
>
> From quick look at PRE, it seem that creating a new pseudo for PRE is 
> also correct.  Do you know off-hand if this indeed is the case?
Creating a new pseudo for PRE would be good; however, it's not 
immediately clear to me if creating a new pseudo is safe given the way 
deletion & insertion work for our implementation of PRE.

jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Wrap calculation of PIC address into a single instruction
  2010-06-23 21:41     ` Steven Bosscher
@ 2010-06-23 22:23       ` Maxim Kuvyrkov
  2010-06-24 11:56         ` Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-23 22:23 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Richard Earnshaw, gcc-patches, Andrew Pinski

On 6/24/10 1:22 AM, Steven Bosscher wrote:
> On Wed, Jun 23, 2010 at 11:19 PM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>> This patch enables optimizations, particularly GCSE, handle calculation of
>> PIC addresses.  GCSE tracks only single instructions, so it can't handle
>> two-instruction calculation of PIC address.
>>
>> With this patch, calculations of PIC addresses are represented as single
>> instructions allowing GCSE eliminate all but the first address calculation
>> for global variables.
>>
>> Any comments?
>
> Yes. This is what we added GCSE's ability to eliminate redundancies
> from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note,
> GCSE is (or should be) already able to eliminate redundant address
> calculations.

You know, it turns out GCSE can eliminate calculation of PIC addresses 
on ARM.  When I started working on improving code hoisting the example 
with PIC address wasn't fully optimized without this patch.

Now, apparently, one of the other GCSE improvements (VBEout computation, 
probably) fixed the underlying problem.

Richard, unless you think the patch may be valuable for some other 
reason, I'm dropping it.

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: ARM improvements for GCSE
  2010-06-23 21:22   ` Maxim Kuvyrkov
@ 2010-06-24 11:24     ` Richard Earnshaw
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Earnshaw @ 2010-06-24 11:24 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches


On Thu, 2010-06-24 at 00:49 +0400, Maxim Kuvyrkov wrote:
> This patch improves handling of "J" and "K" constants.
> 
> If a pseudo assigned a constant and set only once, IRA/reload can 
> rematerialize it to decrease high register pressure.
> 
> OK to check in?


OK.

R.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Fix thumb1 size cost of small constants
  2010-06-23 21:30   ` Fix thumb1 size cost of small constants Maxim Kuvyrkov
@ 2010-06-24 11:28     ` Richard Earnshaw
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Earnshaw @ 2010-06-24 11:28 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches


On Thu, 2010-06-24 at 01:05 +0400, Maxim Kuvyrkov wrote:
> This patch fixes thumb1 size cost of small constants.
> 
> Currently, the cost of SET of a constant is set to zero, which is odd 
> considering that it still takes one instruction to do the operation. 
> The code for Thumb2 and ARM modes returns COSTS_N_INSNS (1) for similar 
> case, so the patch makes Thumb1 cost agree with ARM and Thumb2 cost.
> 
> OK to check in?
> 
> Thank you,

OK.

R.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Wrap calculation of PIC address into a single instruction
  2010-06-23 22:23       ` Maxim Kuvyrkov
@ 2010-06-24 11:56         ` Maxim Kuvyrkov
  2010-06-29 19:18           ` Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-24 11:56 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Richard Earnshaw, gcc-patches, Andrew Pinski

On 6/24/10 1:50 AM, Maxim Kuvyrkov wrote:
> On 6/24/10 1:22 AM, Steven Bosscher wrote:
>> On Wed, Jun 23, 2010 at 11:19 PM, Maxim
>> Kuvyrkov<maxim@codesourcery.com> wrote:
>>> This patch enables optimizations, particularly GCSE, handle
>>> calculation of
>>> PIC addresses. GCSE tracks only single instructions, so it can't handle
>>> two-instruction calculation of PIC address.
>>>
>>> With this patch, calculations of PIC addresses are represented as single
>>> instructions allowing GCSE eliminate all but the first address
>>> calculation
>>> for global variables.
>>>
>>> Any comments?
>>
>> Yes. This is what we added GCSE's ability to eliminate redundancies
>> from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note,
>> GCSE is (or should be) already able to eliminate redundant address
>> calculations.
>
> You know, it turns out GCSE can eliminate calculation of PIC addresses
> on ARM. When I started working on improving code hoisting the example
> with PIC address wasn't fully optimized without this patch.

It was late in the night when I checked the generated code and, although 
GCSE of PIC addresses for ARM is now better, it is still not as good as 
with this patch.

GCSE cannot use (REG_EQUAL (symbol_ref)) note on the second instruction 
because can_assign_to_reg_without_clobbers returns false for symbol_ref 
when compiling PIC code.  (Symbol_ref) is not a 
LEGITIMATE_PIC_OPERAND_P, so it not a general_operand either.  The 
second check in can_assign_to_reg_without_clobbers returns false as (set 
(reg) (symbol_ref)) yields invalid instruction.

AFAICT, GCSE cannot optimize a bare symbol_ref for ARM PIC code because 
it has no guarantee that emit_move_insn (reg, symbol_ref) will generate 
simple enough code.

>
> Now, apparently, one of the other GCSE improvements (VBEout computation,
> probably) fixed the underlying problem.

Improvements to GCSE'ing constants were able to optimize half of second 
and subsequent address calculations.

>
> Richard, unless you think the patch may be valuable for some other
> reason, I'm dropping it.

The patch stands for now.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-23 19:50             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-06-23 20:06               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Paolo Bonzini
@ 2010-06-24 17:11               ` Maxim Kuvyrkov
  2010-06-29 19:12                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-06-30 18:46                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  1 sibling, 2 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-24 17:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]

On 6/23/10 11:08 PM, Maxim Kuvyrkov wrote:
...
>> This can be addressed with a walk over the dominator tree after we
>> compute VBEout. Start with the root and descend in the tree keeping a
>> bitset of expressions that should be alive up the tree. If current node
>>
>> 1. has a single successor,
>> 2. has i'th expression set in VBEout,
>> 3. the successor has i'th expression set in VBEout,
>> 4. current node doesn't generate i'th expression,
>> 5. i'th expression is not marked in the bitset as required up the tree,
>>
>> than we can hoist i'th expression in the successor with the same result
>> as in the current node and not unnecessarily extend live ranges. There
>> maybe a couple more details to the above, but the problem should be
>> easily fixable.
>
> This is implemented as cleanup_code_hoist_vbeout() function. The
> solution it produces is OK from correctness point of view (it removes
> bits from VBEout), but, please, *check my reasoning* to make sure it
> doesn't remove from VBEout expressions it shouldn't.

There is a flaw in the implementation I posted yesterday.  VBEout sets 
have to be cleaned up considering data both downward and upward the 
dominator tree; see new example and comments in compute_code_hoist_vbeinout.

This updated patch corrects the cleaning routine and adds several 
comments to annotate its actions.

Does this look OK?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0005-Also-search-non-immediately-dominated-blocks-for-exp.ChangeLog --]
[-- Type: text/plain, Size: 643 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* basic-block.h (get_dominated_to_depth): Declare.
	* dominance.c (get_dominated_to_depth): New function, use
	get_all_dominated_blocks as a base.
	(get_all_dominated_blocks): Use get_dominated_to_depth.
	
	* gcse.c (compute_code_hoist_vbeinout): Clean up vbeout sets.
	Add debug print outs.
	(compute_code_hoist_data): Compute dominators earlier.
	(hoist_code): Use get_dominated_to_depth.  Update.  Add comment.

	* params.def (PARAM_MAX_HOIST_DEPTH): New parameter to avoid
	quadratic behavior.
	* params.h (MAX_HOIST_DEPTH): New macro.
	* doc/invoke.texi (max-hoist-depth): Document.

[-- Attachment #3: 0005-Also-search-non-immediately-dominated-blocks-for-exp.patch --]
[-- Type: text/plain, Size: 11255 bytes --]

From b4feb3a1e0c9b4bb2b15585d1a55e9f5ef423847 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:24:56 -0700
Subject: [PATCH 05/14] Also search non-immediately dominated blocks for expressions to hoist

---
 gcc/basic-block.h   |    2 +
 gcc/doc/invoke.texi |    6 ++
 gcc/dominance.c     |   22 +++++++-
 gcc/gcse.c          |  142 +++++++++++++++++++++++++++++++++++++++++++++++++-
 gcc/params.def      |    8 +++
 gcc/params.h        |    2 +
 6 files changed, 176 insertions(+), 6 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 135c0c2..1bf192d 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -854,6 +854,8 @@ extern VEC (basic_block, heap) *get_dominated_by (enum cdi_direction, basic_bloc
 extern VEC (basic_block, heap) *get_dominated_by_region (enum cdi_direction,
 							 basic_block *,
 							 unsigned);
+extern VEC (basic_block, heap) *get_dominated_to_depth (enum cdi_direction,
+							basic_block, int);
 extern VEC (basic_block, heap) *get_all_dominated_blocks (enum cdi_direction,
 							  basic_block);
 extern void add_to_dominance_info (enum cdi_direction, basic_block);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9e517e9..05ebcf0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8195,6 +8195,12 @@ when @option{-ftree-vectorize} is used.  The number of iterations after
 vectorization needs to be greater than the value specified by this option
 to allow vectorization.  The default value is 0.
 
+@item max-hoist-depth
+The depth of search in the dominator tree for expressions to hoist.
+This is used to avoid quadratic behavior in hoisting algorithm.
+The value of 0 will avoid limiting the search, but may slow down compilation
+of huge functions.  The default value is 30.
+
 @item max-unrolled-insns
 The maximum number of instructions that a loop should have if that loop
 is unrolled, and if the loop is unrolled, it determines how many times
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c2dcf0..7861439 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -783,16 +783,20 @@ get_dominated_by_region (enum cdi_direction dir, basic_block *region,
 }
 
 /* Returns the list of basic blocks including BB dominated by BB, in the
-   direction DIR.  The vector will be sorted in preorder.  */
+   direction DIR up to DEPTH in the dominator tree.  The DEPTH of zero will
+   produce a vector containing all dominated blocks.  The vector will be sorted
+   in preorder.  */
 
 VEC (basic_block, heap) *
-get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+get_dominated_to_depth (enum cdi_direction dir, basic_block bb, int depth)
 {
   VEC(basic_block, heap) *bbs = NULL;
   unsigned i;
+  unsigned next_level_start;
 
   i = 0;
   VEC_safe_push (basic_block, heap, bbs, bb);
+  next_level_start = 1; /* = VEC_length (basic_block, bbs); */
 
   do
     {
@@ -803,12 +807,24 @@ get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
 	   son;
 	   son = next_dom_son (dir, son))
 	VEC_safe_push (basic_block, heap, bbs, son);
+
+      if (i == next_level_start && --depth)
+	next_level_start = VEC_length (basic_block, bbs);
     }
-  while (i < VEC_length (basic_block, bbs));
+  while (i < next_level_start);
 
   return bbs;
 }
 
+/* Returns the list of basic blocks including BB dominated by BB, in the
+   direction DIR.  The vector will be sorted in preorder.  */
+
+VEC (basic_block, heap) *
+get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+{
+  return get_dominated_to_depth (dir, bb, 0);
+}
+
 /* Redirect all edges pointing to BB to TO.  */
 void
 redirect_immediate_dominators (enum cdi_direction dir, basic_block bb,
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 39660d5..572cfdb 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4155,6 +4155,7 @@ compute_code_hoist_vbeinout (void)
 {
   int changed, passes;
   basic_block bb;
+  sbitmap tmp1, tmp2;
 
   sbitmap_vector_zero (hoist_vbeout, last_basic_block);
   sbitmap_vector_zero (hoist_vbein, last_basic_block);
@@ -4203,6 +4204,130 @@ compute_code_hoist_vbeinout (void)
 	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
 	}
     }
+
+  /* Now cleanup VBEout to avoid moving expressions too far up.
+
+     We follow two rules to clean up VBEout[BB]:
+
+     1. If BB does not have any dominated blocks, nothing will ever be hoisted
+     to BB, so we can just wipe its VBEout clean.
+
+     2. If an expression can be hoisted both to BB and to a *single* successor
+     of BB in the dominator tree, then there is no point of hoisting
+     the expression to BB over BB's successor.  Doing otherwise would
+     unnecessarily extend live ranges.  */
+
+  /* Wipe VBEout of leaf blocks in the dominator tree.  */
+  FOR_EACH_BB (bb)
+    if (first_dom_son (CDI_DOMINATORS, bb) == NULL)
+      sbitmap_zero (hoist_vbeout[bb->index]);
+
+  tmp1 = sbitmap_alloc (expr_hash_table.n_elems);
+  tmp2 = sbitmap_alloc (expr_hash_table.n_elems);
+
+  /* We cannot cleanup VBEout in a single traversal.  There has to be both
+     upward and downward links when computing VBEout of current block to
+     avoid removing bits that shouldn't be removed.  E.g., consider
+     the following dominator tree; '*' marks blocks which compute same
+     expression, the expression can be freely moved; the expected result
+     is that we move computations of '*' from (3) and (6) to (2).
+
+       2
+      / \
+     3*  4
+        / \
+       5   6*
+
+     Doing a depth-first search over this tree without and upward link
+     will remove the expression from VBEout[4] (there's no point of hoisting
+     the expression to (4) if it's not computed in both (5) and (6).
+     When cleaning up VBEout[2] we won't see the expression as needed in (4),
+     so we will remove it from VBEout[2] leaving it to (3) to calculate
+     it's own copy of '*'.
+
+     Therefore, we use iterative algorithm to solve this problem with both
+     upward and downward links.  The algorithm obviously converges as at
+     each iteration we make VBEout sets only smaller.  */
+
+  passes = 0;
+  changed = 1;
+
+  while (changed)
+    {
+      changed = 0;
+
+      FOR_EACH_BB (bb)
+        {
+	  basic_block son;
+	  bool first_p = true;
+
+	  /* Walk through dominated blocks and calculate the set of expressions
+	     that are needed in any one, and only one, of the blocks.
+	     TMP1 is the basis of what we want to remove from VBEout[BB].  */
+	  for (son = first_dom_son (CDI_DOMINATORS, bb);
+	       son != NULL;
+	       son = next_dom_son (CDI_DOMINATORS, son))
+	    {
+	      if (first_p)
+		{
+		  sbitmap_copy (tmp1, hoist_vbeout[son->index]);
+		  first_p = false;
+		}
+	      else
+		sbitmap_difference (tmp1, tmp1, hoist_vbeout[son->index]);
+	    }
+
+	  if (!first_p)
+	    {
+	      /* Now trim TMP1 to avoid removing too much.  */
+
+	      if (bb->prev_bb != ENTRY_BLOCK_PTR)
+		/* Remove epxressions from TMP1 that are needed upwards.
+		   These are VBEout[parent] minus expressions that are
+		   killed in BB (and, hence, don't get to VBEout[parent] from
+		   BB).  */
+		{
+		  basic_block parent;
+
+		  parent = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+		  sbitmap_difference (tmp2, hoist_vbeout[parent->index],
+				      transp[bb->index]);
+
+		  sbitmap_difference (tmp1, tmp1, tmp2);
+		}
+
+	      /* Never remove any of expressions computed in BB from
+		 VBEout[BB].  */
+	      sbitmap_difference (tmp1, tmp1, comp[bb->index]);
+
+	      if (sbitmap_any_common_bits (hoist_vbeout[bb->index], tmp1))
+		/* There is at least one bit that can be removed from
+		   VBEout[BB].  */
+		{
+		  sbitmap_difference (hoist_vbeout[bb->index],
+				      hoist_vbeout[bb->index], tmp1);
+		  changed = 1;
+		}
+	    }
+	}
+
+      passes++;
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "hoisting vbeout cleanup: %d passes\n", passes);
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
+
+  sbitmap_free (tmp1);
+  sbitmap_free (tmp2);
 }
 
 /* Top level routine to do the dataflow analysis needed by code hoisting.  */
@@ -4212,8 +4337,8 @@ compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
   compute_transpout ();
-  compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
+  compute_code_hoist_vbeinout ();
   if (dump_file)
     fprintf (dump_file, "\n");
 }
@@ -4306,7 +4431,8 @@ hoist_code (void)
       int found = 0;
       int insn_inserted_p;
 
-      domby = get_dominated_by (CDI_DOMINATORS, bb);
+      domby = get_dominated_to_depth (CDI_DOMINATORS, bb, MAX_HOIST_DEPTH);
+
       /* Examine each expression that is very busy at the exit of this
 	 block.  These are the potentially hoistable expressions.  */
       for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
@@ -4397,7 +4523,11 @@ hoist_code (void)
 		     it would be safe to compute it at the start of the
 		     dominated block.  Now we have to determine if the
 		     expression would reach the dominated block if it was
-		     placed at the end of BB.  */
+		     placed at the end of BB.
+		     Note: the fact that hoist_exprs has i-th bit set means
+		     that /some/, not necesserilly all, occurences from
+		     the dominated blocks can be hoisted to BB.  Here we check
+		     if a specific occurence can be hoisted to BB.  */
 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL))
 		    {
 		      struct expr *expr = index_map[i];
@@ -4410,6 +4540,12 @@ hoist_code (void)
 			occr = occr->next;
 
 		      gcc_assert (occr);
+
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			continue;
+
 		      insn = occr->insn;
 		      set = single_set (insn);
 		      gcc_assert (set);
diff --git a/gcc/params.def b/gcc/params.def
index 35650ff..f08d482 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -219,6 +219,14 @@ DEFPARAM(PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION,
 	"gcse-after-reload-critical-fraction",
 	"The threshold ratio of critical edges execution count that permit performing redundancy elimination after reload",
         10, 0, 0)
+/* How deep from a given basic block the dominator tree should be searched
+   for expressions to hoist to the block.  The value of 0 will avoid limiting
+   the search.  */
+DEFPARAM(PARAM_MAX_HOIST_DEPTH,
+	 "max-hoist-depth",
+	 "Maximum depth of search in the dominator tree for expressions to hoist",
+	 30, 0, 0)
+
 /* This parameter limits the number of insns in a loop that will be unrolled,
    and by how much the loop is unrolled.
 
diff --git a/gcc/params.h b/gcc/params.h
index 833fc3b..c0404ca 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -125,6 +125,8 @@ typedef enum compiler_param
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION)
 #define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION)
+#define MAX_HOIST_DEPTH \
+  PARAM_VALUE (PARAM_MAX_HOIST_DEPTH)
 #define MAX_UNROLLED_INSNS \
   PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)
 #define MAX_SMS_LOOP_NUMBER \
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-23 19:25       ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-29 19:08         ` Maxim Kuvyrkov
  2010-06-30 17:14           ` 0003-Improve-VBEout-computation.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-29 19:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On 6/23/10 10:53 PM, Maxim Kuvyrkov wrote:
> On 6/22/10 4:02 PM, Maxim Kuvyrkov wrote:
> ...
>> I'll post another version of the patch in a couple of days when I finish
>> reworking other pieces of improvements to hoisting.
>
> Updated version. OK to check in?

Ping?  Bootstrapped and regtested on {i686,x86_64,arm}-linux-gnu.

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-24 17:11               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-29 19:12                 ` Maxim Kuvyrkov
  2010-06-30  1:43                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  2010-06-30 16:33                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  2010-06-30 18:46                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  1 sibling, 2 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-29 19:12 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
...
> This updated patch corrects the cleaning routine and adds several
> comments to annotate its actions.

Ping.

Also, in case you haven't look at the patch yet, here is yet another 
version with a fix to potential miscompilation of code with EH. 
Otherwise the patch is the same.

A miscompilation can occur due to VBEout sets not filtering out 
expressions that die to due to abnormal control flow, these expression 
are represented in TRANSPOUT set.  This updated version (a) filters out 
VBEout sets with !TRANSPOUT and (b) adds a check to 
hoist_expr_reaches_here_p() that accounts for TRANSPOUT.  Previously, 
the check in hoist_code() would suffice because we never looked too far 
down the CFG.

Bootstrapped and regtested on {x86_64,i686,arm}-linux-gnu.

OK to check in?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0005-Also-search-non-immediately-dominated-blocks-for-exp.ChangeLog --]
[-- Type: text/plain, Size: 707 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* basic-block.h (get_dominated_to_depth): Declare.
	* dominance.c (get_dominated_to_depth): New function, use
	get_all_dominated_blocks as a base.
	(get_all_dominated_blocks): Use get_dominated_to_depth.

	* gcse.c (compute_code_hoist_vbeinout): Clean up vbeout sets.
	Add debug print outs.
	(compute_code_hoist_data): Compute dominators earlier.
	(hoist_expr_reaches_here_p): Account for abnormal control flow.
	(hoist_code): Use get_dominated_to_depth.  Update.  Add comment.

	* params.def (PARAM_MAX_HOIST_DEPTH): New parameter to avoid
	quadratic behavior.
	* params.h (MAX_HOIST_DEPTH): New macro.
	* doc/invoke.texi (max-hoist-depth): Document.

[-- Attachment #3: 0003-Also-search-non-immediately-dominated-blocks-for-exp.patch --]
[-- Type: text/plain, Size: 13329 bytes --]

From d187bcc076a776101c01c2821804af8e7a6dfb83 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:24:56 -0700
Subject: [PATCH 03/15] Also search non-immediately dominated blocks for expressions to hoist

---
 gcc/basic-block.h   |    2 +
 gcc/doc/invoke.texi |    6 ++
 gcc/dominance.c     |   22 +++++-
 gcc/gcse.c          |  189 ++++++++++++++++++++++++++++++++++++++++++++++++---
 gcc/params.def      |    8 ++
 gcc/params.h        |    2 +
 6 files changed, 216 insertions(+), 13 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 135c0c2..1bf192d 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -854,6 +854,8 @@ extern VEC (basic_block, heap) *get_dominated_by (enum cdi_direction, basic_bloc
 extern VEC (basic_block, heap) *get_dominated_by_region (enum cdi_direction,
 							 basic_block *,
 							 unsigned);
+extern VEC (basic_block, heap) *get_dominated_to_depth (enum cdi_direction,
+							basic_block, int);
 extern VEC (basic_block, heap) *get_all_dominated_blocks (enum cdi_direction,
 							  basic_block);
 extern void add_to_dominance_info (enum cdi_direction, basic_block);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3d576bf..83d019c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8232,6 +8232,12 @@ an expression can travel.  This is currently supported only in
 code hoisting pass.  The lesser the cost, the more aggressive code hoisting
 will be.  The default value is 3.
 
+@item max-hoist-depth
+The depth of search in the dominator tree for expressions to hoist.
+This is used to avoid quadratic behavior in hoisting algorithm.
+The value of 0 will avoid limiting the search, but may slow down compilation
+of huge functions.  The default value is 30.
+
 @item max-unrolled-insns
 The maximum number of instructions that a loop should have if that loop
 is unrolled, and if the loop is unrolled, it determines how many times
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c2dcf0..7861439 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -783,16 +783,20 @@ get_dominated_by_region (enum cdi_direction dir, basic_block *region,
 }
 
 /* Returns the list of basic blocks including BB dominated by BB, in the
-   direction DIR.  The vector will be sorted in preorder.  */
+   direction DIR up to DEPTH in the dominator tree.  The DEPTH of zero will
+   produce a vector containing all dominated blocks.  The vector will be sorted
+   in preorder.  */
 
 VEC (basic_block, heap) *
-get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+get_dominated_to_depth (enum cdi_direction dir, basic_block bb, int depth)
 {
   VEC(basic_block, heap) *bbs = NULL;
   unsigned i;
+  unsigned next_level_start;
 
   i = 0;
   VEC_safe_push (basic_block, heap, bbs, bb);
+  next_level_start = 1; /* = VEC_length (basic_block, bbs); */
 
   do
     {
@@ -803,12 +807,24 @@ get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
 	   son;
 	   son = next_dom_son (dir, son))
 	VEC_safe_push (basic_block, heap, bbs, son);
+
+      if (i == next_level_start && --depth)
+	next_level_start = VEC_length (basic_block, bbs);
     }
-  while (i < VEC_length (basic_block, bbs));
+  while (i < next_level_start);
 
   return bbs;
 }
 
+/* Returns the list of basic blocks including BB dominated by BB, in the
+   direction DIR.  The vector will be sorted in preorder.  */
+
+VEC (basic_block, heap) *
+get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+{
+  return get_dominated_to_depth (dir, bb, 0);
+}
+
 /* Redirect all edges pointing to BB to TO.  */
 void
 redirect_immediate_dominators (enum cdi_direction dir, basic_block bb,
diff --git a/gcc/gcse.c b/gcc/gcse.c
index d734fa4..d00a788 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4219,6 +4219,7 @@ compute_code_hoist_vbeinout (void)
 {
   int changed, passes;
   basic_block bb;
+  sbitmap tmp1, tmp2;
 
   sbitmap_vector_zero (hoist_vbeout, last_basic_block);
   sbitmap_vector_zero (hoist_vbein, last_basic_block);
@@ -4235,8 +4236,14 @@ compute_code_hoist_vbeinout (void)
       FOR_EACH_BB_REVERSE (bb)
 	{
 	  if (bb->next_bb != EXIT_BLOCK_PTR)
-	    sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
-					   hoist_vbein, bb->index);
+	    {
+	      sbitmap_intersection_of_succs (hoist_vbeout[bb->index],
+					     hoist_vbein, bb->index);
+
+	      /* Remove from VBEout expressions that die right after BB.  */
+	      sbitmap_a_and_b (hoist_vbeout[bb->index],
+			       hoist_vbeout[bb->index], transpout[bb->index]);
+	    }
 
 	  changed |= sbitmap_a_or_b_and_c_cg (hoist_vbein[bb->index],
 					      antloc[bb->index],
@@ -4248,7 +4255,144 @@ compute_code_hoist_vbeinout (void)
     }
 
   if (dump_file)
-    fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
+    {
+      fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbein (%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbein[bb->index]);
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
+
+  /* Now cleanup VBEout to avoid moving expressions too far up.
+
+     We follow two rules to clean up VBEout[BB]:
+
+     1. If BB does not have any dominated blocks, nothing will ever be hoisted
+     to BB, so we can just wipe its VBEout clean.
+
+     2. If an expression can be hoisted both to BB and to a *single* successor
+     of BB in the dominator tree, then there is no point of hoisting
+     the expression to BB over BB's successor.  Doing otherwise would
+     unnecessarily extend live ranges.  */
+
+  /* Wipe VBEout of leaf blocks in the dominator tree.  */
+  FOR_EACH_BB (bb)
+    if (first_dom_son (CDI_DOMINATORS, bb) == NULL)
+      sbitmap_zero (hoist_vbeout[bb->index]);
+
+  tmp1 = sbitmap_alloc (expr_hash_table.n_elems);
+  tmp2 = sbitmap_alloc (expr_hash_table.n_elems);
+
+  /* We cannot cleanup VBEout in a single traversal.  There has to be both
+     upward and downward links when computing VBEout of current block to
+     avoid removing bits that shouldn't be removed.  E.g., consider
+     the following dominator tree; '*' marks blocks which compute same
+     expression, the expression can be freely moved; the expected result
+     is that we move computations of '*' from (3) and (6) to (2).
+
+       2
+      / \
+     3*  4
+        / \
+       5   6*
+
+     A walk over the above tree considering only downward links will first
+     remove '*' from VBEout[4] [as there's no point of hoisting
+     the expression to (4) if it's not computed in both (5) and (6)].
+     Then, when processing VBEout[2]. we won't see '*' as needed in (4),
+     so '*' will be removed from VBEout[2] too, leaving a copy of '*' in (3).
+
+     Therefore, we use iterative algorithm with both upward and downward
+     links to solve this problem.  The algorithm obviously converges as at
+     each iteration we make VBEout sets only smaller.  */
+
+  passes = 0;
+  changed = 1;
+
+  while (changed)
+    {
+      changed = 0;
+
+      FOR_EACH_BB (bb)
+        {
+	  basic_block son;
+	  bool first_p = true;
+
+	  /* Walk through dominated blocks and calculate the set of expressions
+	     that are needed in any one, and only one, of the blocks.
+	     TMP1 is the basis of what we want to remove from VBEout[BB].  */
+	  for (son = first_dom_son (CDI_DOMINATORS, bb);
+	       son != NULL;
+	       son = next_dom_son (CDI_DOMINATORS, son))
+	    {
+	      if (first_p)
+		{
+		  sbitmap_copy (tmp1, hoist_vbeout[son->index]);
+		  first_p = false;
+		}
+	      else
+		sbitmap_difference (tmp1, tmp1, hoist_vbeout[son->index]);
+	    }
+
+	  if (!first_p)
+	    {
+	      /* Now trim TMP1 to avoid removing too much.  */
+
+	      if (bb->prev_bb != ENTRY_BLOCK_PTR)
+		/* Remove epxressions from TMP1 that are needed upwards.
+		   These are VBEout[parent] minus expressions that are
+		   killed in BB (and, hence, don't get to VBEout[parent] from
+		   BB).  */
+		{
+		  basic_block parent;
+
+		  parent = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+		  /* Expressions killed in BB.  */
+		  sbitmap_not (tmp2, transp[bb->index]);
+
+		  /* Expressions that reach from PARENT to the end of BB.  */
+		  sbitmap_difference (tmp2, hoist_vbeout[parent->index],
+				      tmp2);
+
+		  sbitmap_difference (tmp1, tmp1, tmp2);
+		}
+
+	      /* Never remove any of expressions computed in BB from
+		 VBEout[BB].  */
+	      sbitmap_difference (tmp1, tmp1, comp[bb->index]);
+
+	      if (sbitmap_any_common_bits (hoist_vbeout[bb->index], tmp1))
+		/* There is at least one bit that can be removed from
+		   VBEout[BB].  */
+		{
+		  sbitmap_difference (hoist_vbeout[bb->index],
+				      hoist_vbeout[bb->index], tmp1);
+		  changed = 1;
+		}
+	    }
+	}
+
+      passes++;
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "hoisting vbeout cleanup: %d passes\n", passes);
+
+      FOR_EACH_BB (bb)
+        {
+	  fprintf (dump_file, "vbeout(%d): ", bb->index);
+	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
+	}
+    }
+
+  sbitmap_free (tmp1);
+  sbitmap_free (tmp2);
 }
 
 /* Top level routine to do the dataflow analysis needed by code hoisting.  */
@@ -4258,8 +4402,8 @@ compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
   compute_transpout ();
-  compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
+  compute_code_hoist_vbeinout ();
   if (dump_file)
     fprintf (dump_file, "\n");
 }
@@ -4311,11 +4455,12 @@ hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
 
       if (pred->src == ENTRY_BLOCK_PTR)
 	break;
-      else if (pred_bb == expr_bb)
-	continue;
       else if (visited[pred_bb->index])
 	continue;
-
+      else if (!TEST_BIT (transpout[pred_bb->index], expr_index))
+	break;
+      else if (pred_bb == expr_bb)
+	continue;
       /* Does this predecessor generate this expression?  */
       else if (TEST_BIT (comp[pred_bb->index], expr_index))
 	break;
@@ -4403,15 +4548,15 @@ hoist_code (void)
       int found = 0;
       int insn_inserted_p;
 
-      domby = get_dominated_by (CDI_DOMINATORS, bb);
+      domby = get_dominated_to_depth (CDI_DOMINATORS, bb, MAX_HOIST_DEPTH);
+
       /* Examine each expression that is very busy at the exit of this
 	 block.  These are the potentially hoistable expressions.  */
       for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
 	{
 	  int hoistable = 0;
 
-	  if (TEST_BIT (hoist_vbeout[bb->index], i)
-	      && TEST_BIT (transpout[bb->index], i))
+	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
 	      /* We've found a potentially hoistable expression, now
 		 we look at every block BB dominates to see if it
@@ -4438,6 +4583,14 @@ hoist_code (void)
 		      occr = find_occr_in_bb (expr->antic_occr, dominated);
 		      gcc_assert (occr);
 
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
 		      gcc_assert (NONDEBUG_INSN_P (occr->insn));
 
 		      /* Adjust MAX_DISTANCE to account for the fact that
@@ -4516,6 +4669,14 @@ hoist_code (void)
 		      occr = find_occr_in_bb (expr->antic_occr, dominated);
 		      gcc_assert (occr);
 
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
 		      gcc_assert (NONDEBUG_INSN_P (occr->insn));
 
 		      /* Adjust MAX_DISTANCE to account for the fact that
@@ -4546,6 +4707,14 @@ hoist_code (void)
 			  gcc_assert (occr);
 			}
 
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			{
+			  gcc_assert (MAX_HOIST_DEPTH > 1);
+			  continue;
+			}
+
 		      insn = occr->insn;
 		      set = single_set (insn);
 		      gcc_assert (set);
diff --git a/gcc/params.def b/gcc/params.def
index 551e8e2..22737a0 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -240,6 +240,14 @@ DEFPARAM(PARAM_GCSE_UNRESTRICTED_COST,
 	 "Cost at which GCSE optimizations will not constraint the distance an expression can travel",
 	 3, 0, 0)
 
+/* How deep from a given basic block the dominator tree should be searched
+   for expressions to hoist to the block.  The value of 0 will avoid limiting
+   the search.  */
+DEFPARAM(PARAM_MAX_HOIST_DEPTH,
+	 "max-hoist-depth",
+	 "Maximum depth of search in the dominator tree for expressions to hoist",
+	 30, 0, 0)
+
 /* This parameter limits the number of insns in a loop that will be unrolled,
    and by how much the loop is unrolled.
 
diff --git a/gcc/params.h b/gcc/params.h
index 174edc1..aa96c81 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -129,6 +129,8 @@ typedef enum compiler_param
   PARAM_VALUE (PARAM_GCSE_COST_DISTANCE_RATIO)
 #define GCSE_UNRESTRICTED_COST \
   PARAM_VALUE (PARAM_GCSE_UNRESTRICTED_COST)
+#define MAX_HOIST_DEPTH \
+  PARAM_VALUE (PARAM_MAX_HOIST_DEPTH)
 #define MAX_UNROLLED_INSNS \
   PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)
 #define MAX_SMS_LOOP_NUMBER \
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Wrap calculation of PIC address into a single instruction
  2010-06-24 11:56         ` Maxim Kuvyrkov
@ 2010-06-29 19:18           ` Maxim Kuvyrkov
  0 siblings, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-29 19:18 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Richard Earnshaw, gcc-patches, Andrew Pinski

On 6/24/10 3:04 PM, Maxim Kuvyrkov wrote:
...
> GCSE cannot use (REG_EQUAL (symbol_ref)) note on the second instruction
> because can_assign_to_reg_without_clobbers returns false for symbol_ref
> when compiling PIC code. (Symbol_ref) is not a LEGITIMATE_PIC_OPERAND_P,
> so it not a general_operand either. The second check in
> can_assign_to_reg_without_clobbers returns false as (set (reg)
> (symbol_ref)) yields invalid instruction.
...

Ping.

Improvements to code hoisting provide about 0.8% size reduction on Thumb 
PIC code.  Without this patch space reduction is much less.

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-21 19:45     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  2010-06-21 20:27       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-29 19:22       ` Maxim Kuvyrkov
  1 sibling, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-29 19:22 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 6/21/10 10:46 PM, Jeff Law wrote:
...
> Technically true, but we only care about the dominance tree here, not
> the entire CFG. The change which made code hoisting only look at the
> immediate dominators was a mistake. It's unfortunate that
> get_dominated_by only returns immediate dominators -- based on the name,
> one could reasonably expect to get the full set of dominators which I
> suspect happened back in 2002 when that change was made.
>
> Maxim -- can you test f90-intrinsic-bit.f with and without this change
> and report back on how the compilation time changes?

Compile time of f90-intrinsic-bit.f doesn't change in any meaningful way.

Regards,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-29 19:12                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-30  1:43                   ` Steven Bosscher
  2010-06-30  9:39                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-06-30 16:41                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  2010-06-30 16:33                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  1 sibling, 2 replies; 94+ messages in thread
From: Steven Bosscher @ 2010-06-30  1:43 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

On Tue, Jun 29, 2010 at 8:23 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
> ...
>>
>> This updated patch corrects the cleaning routine and adds several
>> comments to annotate its actions.
>
> Ping.
>
> Also, in case you haven't look at the patch yet, here is yet another version
> with a fix to potential miscompilation of code with EH. Otherwise the patch
> is the same.
>
> A miscompilation can occur due to VBEout sets not filtering out expressions
> that die to due to abnormal control flow, these expression are represented
> in TRANSPOUT set.  This updated version (a) filters out VBEout sets with
> !TRANSPOUT and (b) adds a check to hoist_expr_reaches_here_p() that accounts
> for TRANSPOUT.  Previously, the check in hoist_code() would suffice because
> we never looked too far down the CFG.

The ideal patch would remove TRANSPOUT and clean out those expressions
earlier, see what RTL PRE does.

If you clean that up first, this latest version 0003 patch will
probably look better/simpler.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-30  1:43                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-30  9:39                     ` Maxim Kuvyrkov
  2010-06-30 12:14                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  2010-06-30 16:41                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  1 sibling, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-30  9:39 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Jeff Law, gcc-patches

On 6/30/10 1:08 AM, Steven Bosscher wrote:
...
> The ideal patch would remove TRANSPOUT and clean out those expressions
> earlier, see what RTL PRE does.

Removing TRANSPOUT would be a separate change and there's no need to 
mash it together with this patch.

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-30  9:39                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-30 12:14                       ` Steven Bosscher
  0 siblings, 0 replies; 94+ messages in thread
From: Steven Bosscher @ 2010-06-30 12:14 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jeff Law, gcc-patches

On Wed, Jun 30, 2010 at 8:23 AM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> On 6/30/10 1:08 AM, Steven Bosscher wrote:
> ...
>>
>> The ideal patch would remove TRANSPOUT and clean out those expressions
>> earlier, see what RTL PRE does.
>
> Removing TRANSPOUT would be a separate change and there's no need to mash it
> together with this patch.

Indeed. This is why I said:

"If you clean that up first, this latest version 0003 patch will
probably look better/simpler."

Note the word "first".

But it's only a suggestion. It just seems to me that doing this would
simplify the verification of the implementation of your new algorithm
a bit easier. And that'd be a good thing...

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-29 19:12                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-06-30  1:43                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-30 16:33                   ` Jeff Law
  1 sibling, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-30 16:33 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 06/29/10 12:23, Maxim Kuvyrkov wrote:
> On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
> ...
>> This updated patch corrects the cleaning routine and adds several
>> comments to annotate its actions.
>
> Ping.
>
> Also, in case you haven't look at the patch yet, here is yet another 
> version with a fix to potential miscompilation of code with EH. 
> Otherwise the patch is the same.
>
> A miscompilation can occur due to VBEout sets not filtering out 
> expressions that die to due to abnormal control flow, these expression 
> are represented in TRANSPOUT set.  This updated version (a) filters 
> out VBEout sets with !TRANSPOUT and (b) adds a check to 
> hoist_expr_reaches_here_p() that accounts for TRANSPOUT.  Previously, 
> the check in hoist_code() would suffice because we never looked too 
> far down the CFG.
But aren't we just moving an expression that we were going to evaluate 
anyway from one block to another without splitting edges?  Doesn't that 
avoid the problems commonly associated with code motion with abnormal edges?

What am I missing?

jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-30  1:43                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  2010-06-30  9:39                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-30 16:41                     ` Jeff Law
  2010-06-30 16:42                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
  1 sibling, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-06-30 16:41 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Maxim Kuvyrkov, gcc-patches

On 06/29/10 15:08, Steven Bosscher wrote:
> On Tue, Jun 29, 2010 at 8:23 PM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>    
>> On 6/24/10 7:53 PM, Maxim Kuvyrkov wrote:
>> ...
>>      
>>> This updated patch corrects the cleaning routine and adds several
>>> comments to annotate its actions.
>>>        
>> Ping.
>>
>> Also, in case you haven't look at the patch yet, here is yet another version
>> with a fix to potential miscompilation of code with EH. Otherwise the patch
>> is the same.
>>
>> A miscompilation can occur due to VBEout sets not filtering out expressions
>> that die to due to abnormal control flow, these expression are represented
>> in TRANSPOUT set.  This updated version (a) filters out VBEout sets with
>> !TRANSPOUT and (b) adds a check to hoist_expr_reaches_here_p() that accounts
>> for TRANSPOUT.  Previously, the check in hoist_code() would suffice because
>> we never looked too far down the CFG.
>>      
> The ideal patch would remove TRANSPOUT and clean out those expressions
> earlier, see what RTL PRE does.
>
> If you clean that up first, this latest version 0003 patch will
> probably look better/simpler.
>    
If there is a real issue here, the PRE approach is required for 
correctness.  TRANSPOUT misses a boatload of stuff (Integer DIV/MOD, 
most FP operations, etc).

What I'm struggling with is why we need this hair at all for hoisting 
since we're not inserting on edges.

jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-30 16:41                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-06-30 16:42                       ` Steven Bosscher
  2010-06-30 16:48                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Steven Bosscher @ 2010-06-30 16:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Maxim Kuvyrkov, gcc-patches

On Wed, Jun 30, 2010 at 5:11 PM, Jeff Law <law@redhat.com> wrote:
> If there is a real issue here, the PRE approach is required for correctness.
>  TRANSPOUT misses a boatload of stuff (Integer DIV/MOD, most FP operations,
> etc).
>
> What I'm struggling with is why we need this hair at all for hoisting since
> we're not inserting on edges.

IIRC it has something to do with being unable to fixup the CFG if we
move a potentially throwing expression.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-30 16:42                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
@ 2010-06-30 16:48                         ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-30 16:48 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Maxim Kuvyrkov, gcc-patches

On 06/30/10 09:26, Steven Bosscher wrote:
> On Wed, Jun 30, 2010 at 5:11 PM, Jeff Law<law@redhat.com>  wrote:
>    
>> If there is a real issue here, the PRE approach is required for correctness.
>>   TRANSPOUT misses a boatload of stuff (Integer DIV/MOD, most FP operations,
>> etc).
>>
>> What I'm struggling with is why we need this hair at all for hoisting since
>> we're not inserting on edges.
>>      
> IIRC it has something to do with being unable to fixup the CFG if we
> move a potentially throwing expression.
>    
But there aren't any fixups to do -- the fixups are necessary when we 
insert on an abnormal edge.  For hoisting that never happens.

jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0003-Improve-VBEout-computation.patch
  2010-06-29 19:08         ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
@ 2010-06-30 17:14           ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-06-30 17:14 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

On 06/29/10 12:12, Maxim Kuvyrkov wrote:
> On 6/23/10 10:53 PM, Maxim Kuvyrkov wrote:
>> On 6/22/10 4:02 PM, Maxim Kuvyrkov wrote:
>> ...
>>> I'll post another version of the patch in a couple of days when I 
>>> finish
>>> reworking other pieces of improvements to hoisting.
>>
>> Updated version. OK to check in?
>
> Ping?  Bootstrapped and regtested on {i686,x86_64,arm}-linux-gnu.
>
> Thanks,
>
OK.
Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-24 17:11               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-06-29 19:12                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-06-30 18:46                 ` Jeff Law
  2010-06-30 20:53                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  1 sibling, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-06-30 18:46 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 06/24/10 09:53, Maxim Kuvyrkov wrote:
> On 6/23/10 11:08 PM, Maxim Kuvyrkov wrote:
> ...
>>> This can be addressed with a walk over the dominator tree after we
>>> compute VBEout. Start with the root and descend in the tree keeping a
>>> bitset of expressions that should be alive up the tree. If current node
>>>
>>> 1. has a single successor,
>>> 2. has i'th expression set in VBEout,
>>> 3. the successor has i'th expression set in VBEout,
>>> 4. current node doesn't generate i'th expression,
>>> 5. i'th expression is not marked in the bitset as required up the tree,
>>>
>>> than we can hoist i'th expression in the successor with the same result
>>> as in the current node and not unnecessarily extend live ranges. There
>>> maybe a couple more details to the above, but the problem should be
>>> easily fixable.
>>
>> This is implemented as cleanup_code_hoist_vbeout() function. The
>> solution it produces is OK from correctness point of view (it removes
>> bits from VBEout), but, please, *check my reasoning* to make sure it
>> doesn't remove from VBEout expressions it shouldn't.
>
> There is a flaw in the implementation I posted yesterday.  VBEout sets 
> have to be cleaned up considering data both downward and upward the 
> dominator tree; see new example and comments in 
> compute_code_hoist_vbeinout.
>
> This updated patch corrects the cleaning routine and adds several 
> comments to annotate its actions.
>
> Does this look OK?


It looks like you've got a bi-directional dataflow problem to clean up 
VBEout, in general we really want to avoid bi-directional problems.   Is 
there some reason you're not using a lowest common ancestor algorithm here?

In your comments you have the following CFG:



+  /* We cannot cleanup VBEout in a single traversal.  There has to be both
+     upward and downward links when computing VBEout of current block to
+     avoid removing bits that shouldn't be removed.  E.g., consider
+     the following dominator tree; '*' marks blocks which compute same
+     expression, the expression can be freely moved; the expected result
+     is that we move computations of '*' from (3) and (6) to (2).
+
+       2
+      / \
+     3*  4
+        / \
+       5   6*
+
+     Doing a depth-first search over this tree without and upward link
+     will remove the expression from VBEout[4] (there's no point of 
hoisting
+     the expression to (4) if it's not computed in both (5) and (6).
+     When cleaning up VBEout[2] we won't see the expression as needed 
in (4),
+     so we will remove it from VBEout[2] leaving it to (3) to calculate
+     it's own copy of '*'.

The first paragraph of your comment implies that we'd want to hoist the 
expression from 3 & 6 into 2.  However, that's not a valid 
transformation as the path 2, 4, 5 does not evaluate the expression in 
the original CFG.  VBEout for block 4 should be false since the 
expression is not evaluated in block #5.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-30 18:46                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-06-30 20:53                   ` Maxim Kuvyrkov
  2010-07-01 16:54                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-06-30 20:53 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 6/30/10 10:12 PM, Jeff Law wrote:
> On 06/24/10 09:53, Maxim Kuvyrkov wrote:
>> On 6/23/10 11:08 PM, Maxim Kuvyrkov wrote:
>> ...
>>>> This can be addressed with a walk over the dominator tree after we
>>>> compute VBEout. Start with the root and descend in the tree keeping a
>>>> bitset of expressions that should be alive up the tree. If current node
>>>>
>>>> 1. has a single successor,
>>>> 2. has i'th expression set in VBEout,
>>>> 3. the successor has i'th expression set in VBEout,
>>>> 4. current node doesn't generate i'th expression,
>>>> 5. i'th expression is not marked in the bitset as required up the tree,
>>>>
>>>> than we can hoist i'th expression in the successor with the same result
>>>> as in the current node and not unnecessarily extend live ranges. There
>>>> maybe a couple more details to the above, but the problem should be
>>>> easily fixable.
>>>
>>> This is implemented as cleanup_code_hoist_vbeout() function. The
>>> solution it produces is OK from correctness point of view (it removes
>>> bits from VBEout), but, please, *check my reasoning* to make sure it
>>> doesn't remove from VBEout expressions it shouldn't.
>>
>> There is a flaw in the implementation I posted yesterday. VBEout sets
>> have to be cleaned up considering data both downward and upward the
>> dominator tree; see new example and comments in
>> compute_code_hoist_vbeinout.
>>
>> This updated patch corrects the cleaning routine and adds several
>> comments to annotate its actions.
>>
>> Does this look OK?
>
>
> It looks like you've got a bi-directional dataflow problem to clean up
> VBEout, in general we really want to avoid bi-directional problems. Is
> there some reason you're not using a lowest common ancestor algorithm here?

A dataflow problem seems simpler for this case.  The problem uses 
bi-directional links to compute a set of expressions that will be 
subtracted from VBEout on each iteration, it never adds new expressions 
to destination sets.  I, therefore, claim that the fact that this 
particular problem uses bi-directional links does not really have any 
significant negative impact.

Is there any reason bi-directional dataflow problems should be avoided 
at all cost?

>
> In your comments you have the following CFG:
>
>
>
> + /* We cannot cleanup VBEout in a single traversal. There has to be both
> + upward and downward links when computing VBEout of current block to
> + avoid removing bits that shouldn't be removed. E.g., consider
> + the following dominator tree; '*' marks blocks which compute same
> + expression, the expression can be freely moved; the expected result
> + is that we move computations of '*' from (3) and (6) to (2).
> +
> + 2
> + / \
> + 3* 4
> + / \
> + 5 6*
> +
> + Doing a depth-first search over this tree without and upward link
> + will remove the expression from VBEout[4] (there's no point of hoisting
> + the expression to (4) if it's not computed in both (5) and (6).
> + When cleaning up VBEout[2] we won't see the expression as needed in (4),
> + so we will remove it from VBEout[2] leaving it to (3) to calculate
> + it's own copy of '*'.
>
> The first paragraph of your comment implies that we'd want to hoist the
> expression from 3 & 6 into 2. However, that's not a valid transformation
> as the path 2, 4, 5 does not evaluate the expression in the original
> CFG. VBEout for block 4 should be false since the expression is not
> evaluated in block #5.

Interesting.  While working on implementing cleanup of VBEout sets I 
oscillated several times between bottom-up dominator tree walk and 
iterative walk.  At some point I got myself convinced that dominator 
walk would do just as fine the job as iterative walk.  Then I come up 
with the above case which needs more than a single walk to get to the 
right solution.  Now you pointed out that the above case is invalid, 
which makes me think that a dominator walk would suffice after all.

Still, the iterative solution looks better to me as it makes it crystal 
clear that only expressions that definitely won't be hoisted to BB will 
be removed from BB's VBEout.

Does this make sense?

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Improvements to code hoisting
  2010-06-16 16:54 ` Improvements to code hoisting Richard Guenther
@ 2010-07-01  9:00   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-01  9:00 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jeff Law, gcc-patches, Richard Earnshaw

On 6/16/10 8:38 PM, Richard Guenther wrote:
> On Wed, Jun 16, 2010 at 5:47 PM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>> The following series of patches improves code hoisting and PRE RTL-level
>> optimizations.  The two threads of the patches correspond to
>> target-independent changes to gcse.c and to changes to ARM backend to make
>> it emit RTL that is better suited for optimizers.
>>
>> Motivating examples for this work are ARM PRs
>> http://gcc.gnu.org/PR42495
>> http://gcc.gnu.org/PR42574
>> With the patches applied GCC produces perfect code for these examples.
...
> What is the compile-time effect of the cummulative patch on GCSE time?

The compile-time effect on code hoisting is hard to measure reliably, on 
whole of SPEC2K code hoisting takes 1.3-2.6 seconds.  In some cases 
compile time goes up on others it goes down.  Generally, it seems that 
the patches make GCC a tiny bit faster, about 0.2%.  I attribute this 
speedup to cleaning up VBEout sets.

There is no effect on PRE and other GCSE passes as the patches don't 
change anything for those.

As to the size reduction, the results are:

ARM non-PIC: -0.2%
ARM     PIC: -0.8%

x86 non-PIC: -0.0%
x86     PIC: -0.1%

x64 non-PIC: -0.1%
x64     PIC: -0.2%

The flags used are "-Os -fno-common {-m32/m64}" for x86[_64] and "-Os 
-fno-common -mthumb -march={armv5te/armv7-a}" for ARM.

The code size metric is geomean across all object files in SPEC2K.

Regards,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (9 preceding siblings ...)
  2010-06-23 21:20 ` ARM improvements for GCSE Maxim Kuvyrkov
@ 2010-07-01 11:05 ` Maxim Kuvyrkov
  2010-07-01 14:26   ` 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach Jeff Law
  2010-07-27 21:21 ` Improvements to code hoisting Maxim Kuvyrkov
  11 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-01 11:05 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: Steven Bosscher

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

This patch fixes a quirk in hoist_expr_reaches_here_p that makes it to 
avoid moving an expression across its computation in another block.

Bootstrapped and regtested on x86[_64]-linux-gnu and regtested on 
arm-linux-gnu.  This change reduces PIC code size on ARM by 0.3% and 
doesn't increase non-PIC code size.

OK to check in?

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach.ChangeLog --]
[-- Type: text/plain, Size: 117 bytes --]

2010-07-01  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (hoist_expr_reaches_here_p): Remove excessive check.

[-- Attachment #3: 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach.patch --]
[-- Type: text/plain, Size: 801 bytes --]

From 3a7e54fd7ffb36e20a454daa4889d52542ac911c Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 22 Jun 2010 08:40:15 -0700
Subject: [PATCH 08/15] Don't kill generated expressions in hoist_expr_reaches_here_p.

---
 gcc/gcse.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 47d0dba..f37e486 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4478,9 +4478,6 @@ hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      /* Does this predecessor generate this expression?  */
-      else if (TEST_BIT (comp[pred_bb->index], expr_index))
-	break;
       else if (! TEST_BIT (transp[pred_bb->index], expr_index))
 	break;
 
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Wrap calculation of PIC address into a single instruction
  2010-06-23 21:35   ` Wrap calculation of PIC address into a single instruction Maxim Kuvyrkov
  2010-06-23 21:38     ` Andrew Pinski
  2010-06-23 21:41     ` Steven Bosscher
@ 2010-07-01 12:40     ` Richard Earnshaw
  2 siblings, 0 replies; 94+ messages in thread
From: Richard Earnshaw @ 2010-07-01 12:40 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches


On Thu, 2010-06-24 at 01:19 +0400, Maxim Kuvyrkov wrote:
> This patch enables optimizations, particularly GCSE, handle calculation 
> of PIC addresses.  GCSE tracks only single instructions, so it can't 
> handle two-instruction calculation of PIC address.
> 
> With this patch, calculations of PIC addresses are represented as single 
> instructions allowing GCSE eliminate all but the first address 
> calculation for global variables.
> 
> Any comments?  OK to check in?

+/* Return true to X will surely end up in an index register after the first
+   splitting pass.  */

s/Return true to/Return true if/

Other than that, this is ok.

R.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach
  2010-07-01 11:05 ` 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach Maxim Kuvyrkov
@ 2010-07-01 14:26   ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-07-01 14:26 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Steven Bosscher

On 07/01/10 05:05, Maxim Kuvyrkov wrote:
> This patch fixes a quirk in hoist_expr_reaches_here_p that makes it to 
> avoid moving an expression across its computation in another block.
>
> Bootstrapped and regtested on x86[_64]-linux-gnu and regtested on 
> arm-linux-gnu.  This change reduces PIC code size on ARM by 0.3% and 
> doesn't increase non-PIC code size.
>
> OK to check in?
OK.
jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-06-30 20:53                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-01 16:54                     ` Jeff Law
  2010-07-02 16:08                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-07-01 16:54 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 06/30/10 13:30, Maxim Kuvyrkov wrote:
>>
>> It looks like you've got a bi-directional dataflow problem to clean up
>> VBEout, in general we really want to avoid bi-directional problems. Is
>> there some reason you're not using a lowest common ancestor algorithm 
>> here?
>
>
> A dataflow problem seems simpler for this case.  The problem uses 
> bi-directional links to compute a set of expressions that will be 
> subtracted from VBEout on each iteration, it never adds new 
> expressions to destination sets.  I, therefore, claim that the fact 
> that this particular problem uses bi-directional links does not really 
> have any significant negative impact.
While this specific case may be reasonably safe, in general we really 
want to avoid bi-directional dataflow solvers as it's often hard to 
prove termination, it's hard to evaluate their compile-time performance, 
they're typically more difficult to understand for anyone reading the 
code, and (personal opinion here) often they're a symptom of not solving 
the right problem.

> Interesting.  While working on implementing cleanup of VBEout sets I 
> oscillated several times between bottom-up dominator tree walk and 
> iterative walk.  At some point I got myself convinced that dominator 
> walk would do just as fine the job as iterative walk.  Then I come up 
> with the above case which needs more than a single walk to get to the 
> right solution.  Now you pointed out that the above case is invalid, 
> which makes me think that a dominator walk would suffice after all.
>
> Still, the iterative solution looks better to me as it makes it 
> crystal clear that only expressions that definitely won't be hoisted 
> to BB will be removed from BB's VBEout.
>
> Does this make sense?
A little bit.  But it's still unclear why we're not using lowest common 
ancestor here.  It seems that tells us precisely where we want to hoist 
to avoid unnecessary movements.  Or is this to prevent hoisting into an 
LCA, then hoisting the expression again on a later pass further up the 
dominator tree?

The other issues that I think are still unanswered:

   1. Why precisely to do we need transpout for hoisting.  We should 
only be hoisting a very busy expression into a block which dominates the 
original evaluations.  We don't insert on edges, so we don't have to 
worry about splitting abnormals.  The only thing I can think of is 
perhaps the hoisting might require new edges in the destination block in 
the expression potentially traps?!?

   2. Assuming there's a good reason for #1, for correctness we should 
drop transpout and instead use the same method as PRE.

Jeff
>
> Thanks,
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0006-GCSE-complex-constants.patch
       [not found]       ` <4C2BBEB5.4080209@codesourcery.com>
@ 2010-07-01 17:01         ` Jeff Law
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff Law @ 2010-07-01 17:01 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

On 06/30/10 16:01, Maxim Kuvyrkov wrote:
> On 6/24/10 12:10 AM, Maxim Kuvyrkov wrote:
>> On 6/16/10 8:54 PM, Jeff Law wrote:
>>>> Certain architectures (e.g., ARM) cannot easily operate with
>>>> constants, they need to emit sequences of several instructions to load
>>>> constants into registers. The common procedure to do this is to emit a
>>>> (parallel [(set) (clobber (reg1)) ... (clobber (regN))]) instruction
>>>> which later splits into several instructions using pseudos (regX) to
>>>> store intermediate values.
>>>>
>>>> Currently PRE and hoist do not GCSE constants, and there is a good
>>>> reason for that, to avoid increasing register pressure; interestingly,
>>>> symbol_refs are allowed to be GCSE'ed, is this intentional or by
>>>> accident?
>>> It's intentional; a SYMBOL_REF if often be rather expensive. Some
>>> CONST_INTs can have that same property. One could argue that an
>>> expensive CONST_INT shouldn't be appearing in RTL, but certainly some
>>> ports have chosen to handle splitting insns with expensive constants
>>> later in the pipeline.
>>>
>>>>
>>>> In any case, it seems like a good idea to GCSE constants and
>>>> symbol_refs that need something beyond a simple (set) to get into a
>>>> register, and not GCSE them otherwise.
>>> Rather than triggering this on the PARALLEL it might be better to
>>> trigger it on the cost of the RTX. Triggering on the PARALLEL looks 
>>> like
>>> a hack to me -- IMHO we'd be better off fixing the costing mechanism 
>>> and
>>> using costing as the trigger.
>>
>> Here is reworked patch that
>>
>> (a) introduces max_distance property to expressions (counted in
>> instructions),
>>
>> (b) uses RTX cost model to estimate how far an expression can travel
>> (the greater the cost, the farther the distance), and
>>
>> (c) adds two new parameters to tweak the above.
>>
>> I am yet to do benchmarking on ARM and x86[_64] to find out what the
>> optimal parameter values are. Before starting with testing I would like
>> to get feedback on the concept and its implementation.
>
> Ping.
>
> The optimal settings turned out to be almost exactly as expected: 
> unrestricted hoisting is best for expressions with "rtx_cost >= 
> COSTS_N_INSNS(3)" and for expression cheaper than that, movement 
> freedom is about 4 instructions per one unit of COSTS_N_INSNS.  E.g., 
> an expression of COSTS_N_INSNS(2) will be allowed to move at most 8 
> instructions up.
>
> I did the tuning for ARM -Os with all other patches applied.  The 
> result is 0.2% size reduction for non-PIC code and 0.8% size reduction 
> for PIC code.
>
> For x86 and x86_64 I verified that the patches provide size 
> improvement, though a smaller one: 0.1-0.2% for PIC and 0.0-0.1% for 
> both non-PIC.
>
> In both cases code size metric is geomean across all object files in 
> SPEC2K.
>
> OK to check in?
@item gcse-cost-distance-ratio
+Scaling factor in calculation of maximum distance an expression
+can be moved by GCSE optimizations.  This is currently supported only in
+code hoisting pass.  The bigger the ratio, the more agressive code hoisting
+will be with expressions which have cost less than
+@option{gcse-unrestricted-cost}.  The default value is 10.
+
+@item gcse-unrestricted-cost
+Cost at which GCSE optimizations will not constraint the distance
+an expression can travel.  This is currently supported only in
+code hoisting pass.  The lesser the cost, the more aggressive code hoisting
+will be.  The default value is 3.


For unrestricted-cost, I think you should change the first sentence to
"Cost, roughly measured as the cost of a single typical machine 
instruction, at which GCSE optimizations will not constrain the distance 
an expression can travel."    Are there any special values (0, 
negative?)  that need to be documented?  I think some similar 
clarifications for cost-distance-ratio would help.  What happens if the 
user enters negative numbers for these two knobs?


+   MAX_DISTANCE is the maximum distance in instructions this expression can
+   be moved.
+*/

Remove the newline between the end of the commend and the comment-close 
marker.  ie "be moved.  */


+      cur_expr->max_distance = 0; /* Not used for set_p tables.  */
We generally don't use this style comment.  Put the common on the line 
before the initialization.

A comment before the initialization of bb_size and to_bb_head would be 
useful.


Approved with those minor documentation & comment tweaks.

jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-01 16:54                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-07-02 16:08                       ` Maxim Kuvyrkov
  2010-07-07 16:56                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-02 16:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 7/1/10 8:36 PM, Jeff Law wrote:
> On 06/30/10 13:30, Maxim Kuvyrkov wrote:
>>>
>>> It looks like you've got a bi-directional dataflow problem to clean up
>>> VBEout, in general we really want to avoid bi-directional problems. Is
>>> there some reason you're not using a lowest common ancestor algorithm
>>> here?
>>
>>
>> A dataflow problem seems simpler for this case. The problem uses
>> bi-directional links to compute a set of expressions that will be
>> subtracted from VBEout on each iteration, it never adds new
>> expressions to destination sets. I, therefore, claim that the fact
>> that this particular problem uses bi-directional links does not really
>> have any significant negative impact.
> While this specific case may be reasonably safe, in general we really
> want to avoid bi-directional dataflow solvers as it's often hard to
> prove termination, it's hard to evaluate their compile-time performance,
> they're typically more difficult to understand for anyone reading the
> code, and (personal opinion here) often they're a symptom of not solving
> the right problem.
>
>> Interesting. While working on implementing cleanup of VBEout sets I
>> oscillated several times between bottom-up dominator tree walk and
>> iterative walk. At some point I got myself convinced that dominator
>> walk would do just as fine the job as iterative walk. Then I come up
>> with the above case which needs more than a single walk to get to the
>> right solution. Now you pointed out that the above case is invalid,
>> which makes me think that a dominator walk would suffice after all.
>>
>> Still, the iterative solution looks better to me as it makes it
>> crystal clear that only expressions that definitely won't be hoisted
>> to BB will be removed from BB's VBEout.
>>
>> Does this make sense?
> A little bit. But it's still unclear why we're not using lowest common
> ancestor here.

It appears we were thinking about different approaches to using LCA to 
solve this problem, the algorithm I thought you were suggesting would've 
been bulky and slow.

Now I see that the problem can be solved reasonably fast with LCA too. 
I don't yet have all the details figured out, so if you have a clear 
picture of the algorithm in mind, please don't hold it to yourself ;)

> 1. Why precisely to do we need transpout for hoisting. We should only be
> hoisting a very busy expression into a block which dominates the
> original evaluations. We don't insert on edges, so we don't have to
> worry about splitting abnormals. The only thing I can think of is
> perhaps the hoisting might require new edges in the destination block in
> the expression potentially traps?!?

I can't give a definitive answer to if and why hoisting needs transpout. 
  It seems to me transpout can be safely removed if we just avoid 
propagating data across complex edges in compute_vbeinout and 
hoist_expr_reaches_here_p.

That said, I would not check in such a clean up in the same patch as 
improving code hoisting to look into non-immediately-dominated blocks. 
Let's keep bugs these two changes can introduce separate.

>
> 2. Assuming there's a good reason for #1, for correctness we should drop
> transpout and instead use the same method as PRE.

I don't think there is.  Anyway, we will find out once I or someone else 
implement removal of transpout.

Jeff, do you remember why transpout sets were introduced in the first place?

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-02 16:08                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-07 16:56                         ` Jeff Law
  2010-07-09 20:18                           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-07-07 16:56 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 07/02/10 10:08, Maxim Kuvyrkov wrote:
>
> It appears we were thinking about different approaches to using LCA to 
> solve this problem, the algorithm I thought you were suggesting 
> would've been bulky and slow.
>
> Now I see that the problem can be solved reasonably fast with LCA too. 
> I don't yet have all the details figured out, so if you have a clear 
> picture of the algorithm in mind, please don't hold it to yourself ;)
Isn't it fairly simple?  If we have an expression we want to hoist from 
some set of blocks; isn't the destination the LCA within the dominator 
tree of those blocks?

I realize that LCA is usually formulated as the LCA between two blocks, 
but isn't it relatively easy to compute the LCA for pairs and 
recurse/iterate?

Or is this problem some kind of implementation detail that I've missed?

>
>> 1. Why precisely to do we need transpout for hoisting. We should only be
>> hoisting a very busy expression into a block which dominates the
>> original evaluations. We don't insert on edges, so we don't have to
>> worry about splitting abnormals. The only thing I can think of is
>> perhaps the hoisting might require new edges in the destination block in
>> the expression potentially traps?!?
>
> I can't give a definitive answer to if and why hoisting needs 
> transpout.  It seems to me transpout can be safely removed if we just 
> avoid propagating data across complex edges in compute_vbeinout and 
> hoist_expr_reaches_here_p.
>
> That said, I would not check in such a clean up in the same patch as 
> improving code hoisting to look into non-immediately-dominated blocks. 
> Let's keep bugs these two changes can introduce separate.
If you want to keep the changes separate (and there's certainly value in 
doing that), the way to go is fix the correctness issue first, then the 
improvement of the optimization.  Particularly when we're still 
iterating on the implementation of the optimization.


> I don't think there is.  Anyway, we will find out once I or someone 
> else implement removal of transpout.
>
> Jeff, do you remember why transpout sets were introduced in the first 
> place?
I don't.  It was in the first external version of hoisting I posted and 
the first development version I checked into Cygnus's internal tree.

Since there's no edge insertions with hoisting, the only potential 
problem I can see is when the hoisted expression itself can trigger 
traversal of an abnormal edge and the block we want to put the 
expression does not have an edge to the handler.  In that case we'd need 
to add edges in the cfg and I don't see any compensation code in gcse.c 
to deal with that case.

Assuming that's the situation we need to avoid then we really need to 
switch the pre-like code since it detects expressions which can cause 
traversal of the abnormal edge much better.   It's a fairly simple patch 
since it just prunes some expressions from the local tables before 
running the dataflow solver.

Jeff


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-07 16:56                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-07-09 20:18                           ` Maxim Kuvyrkov
  2010-07-14 20:58                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-09 20:18 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On 7/7/10 8:40 PM, Jeff Law wrote:
...
> Since there's no edge insertions with hoisting, the only potential
> problem I can see is when the hoisted expression itself can trigger
> traversal of an abnormal edge and the block we want to put the
> expression does not have an edge to the handler. In that case we'd need
> to add edges in the cfg and I don't see any compensation code in gcse.c
> to deal with that case.

I agree.

>
> Assuming that's the situation we need to avoid then we really need to
> switch the pre-like code since it detects expressions which can cause
> traversal of the abnormal edge much better. It's a fairly simple patch
> since it just prunes some expressions from the local tables before
> running the dataflow solver.

The first of the attached patches replaces transpout with an additional 
check in determining if an expression is anticipatable.

The second patch implements LCA approach to avoid hoisting expression 
too far up.  As a side effect of implementation, it somewhat simplifies 
control flow of hoist_code.

I really hope this is the last iteration on the one-line change this 
problem initially was :).

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0013-Replace-transpout.ChangeLog --]
[-- Type: text/plain, Size: 265 bytes --]

2010-07-10  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (compute_transpout, transpout): Remove
	(causes_abnormal_control_flow_p): New static function.
	(hash_scan_set): Use it.
	(alloc_code_hoist_mem, free_code_hoist_mem, compute_code_hoist_data):
	Update.

[-- Attachment #3: 0013-Replace-transpout.patch --]
[-- Type: text/plain, Size: 5435 bytes --]

From 32aa317c164722a5eefc5ad5ac9ce95912cc3120 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:03:18 -0700
Subject: [PATCH 13/14] Replace transpout

---
 gcc/gcse.c |   86 +++++++++++++++++++-----------------------------------------
 1 files changed, 27 insertions(+), 59 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 4f6dc83..9479304 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -470,7 +470,6 @@ static void mark_oprs_set (rtx);
 static void alloc_cprop_mem (int, int);
 static void free_cprop_mem (void);
 static void compute_transp (const_rtx, int, sbitmap *, int);
-static void compute_transpout (void);
 static void compute_local_properties (sbitmap *, sbitmap *, sbitmap *,
 				      struct hash_table_d *);
 static void compute_cprop_data (void);
@@ -1357,6 +1356,27 @@ gcse_constant_p (const_rtx x)
   return CONSTANT_P (x) && (GET_CODE (x) != CONST || shared_const_p (x));
 }
 
+/* Return true if INSN can cause abnormal control flow.  */
+
+static bool
+causes_abnormal_control_flow_p (const_rtx insn)
+{
+  basic_block bb;
+  edge e;
+  edge_iterator ei;
+
+  bb = BLOCK_FOR_INSN (insn);
+
+  if (BB_END (bb) != insn)
+    return false;
+
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    if (e->flags & EDGE_ABNORMAL)
+      return true;
+
+  return false;
+}
+
 /* Scan pattern PAT of INSN and add an entry to the hash TABLE (set or
    expression one).  */
 
@@ -1424,11 +1444,13 @@ hash_scan_set (rtx pat, rtx insn, struct hash_table_d *table)
 	{
 	  /* An expression is not anticipatable if its operands are
 	     modified before this insn or if this is not the only SET in
-	     this insn.  The latter condition does not have to mean that
+	     this insn.  The latter conditions do not have to mean that
 	     SRC itself is not anticipatable, but we just will not be
-	     able to handle code motion of insns with multiple sets.  */
-	  int antic_p = oprs_anticipatable_p (src, insn)
-			&& !multiple_sets (insn);
+	     able to handle code motion of insns with multiple sets or
+	     abnormal control flow.  */
+	  int antic_p = (oprs_anticipatable_p (src, insn)
+			 && !multiple_sets (insn)
+			 && !causes_abnormal_control_flow_p (insn));
 	  /* An expression is not available if its operands are
 	     subsequently modified, including this insn.  It's also not
 	     available if this is a branch, because we can't insert
@@ -3237,11 +3259,6 @@ bypass_conditional_jumps (void)
 /* Nonzero for expressions that are transparent in the block.  */
 static sbitmap *transp;
 
-/* Nonzero for expressions that are transparent at the end of the block.
-   This is only zero for expressions killed by abnormal critical edge
-   created by a calls.  */
-static sbitmap *transpout;
-
 /* Nonzero for expressions that are computed (available) in the block.  */
 static sbitmap *comp;
 
@@ -4097,52 +4114,6 @@ add_label_notes (rtx x, rtx insn)
     }
 }
 
-/* Compute transparent outgoing information for each block.
-
-   An expression is transparent to an edge unless it is killed by
-   the edge itself.  This can only happen with abnormal control flow,
-   when the edge is traversed through a call.  This happens with
-   non-local labels and exceptions.
-
-   This would not be necessary if we split the edge.  While this is
-   normally impossible for abnormal critical edges, with some effort
-   it should be possible with exception handling, since we still have
-   control over which handler should be invoked.  But due to increased
-   EH table sizes, this may not be worthwhile.  */
-
-static void
-compute_transpout (void)
-{
-  basic_block bb;
-  unsigned int i;
-  struct expr *expr;
-
-  sbitmap_vector_ones (transpout, last_basic_block);
-
-  FOR_EACH_BB (bb)
-    {
-      /* Note that flow inserted a nop at the end of basic blocks that
-	 end in call instructions for reasons other than abnormal
-	 control flow.  */
-      if (! CALL_P (BB_END (bb)))
-	continue;
-
-      for (i = 0; i < expr_hash_table.size; i++)
-	for (expr = expr_hash_table.table[i]; expr ; expr = expr->next_same_hash)
-	  if (MEM_P (expr->expr))
-	    {
-	      if (GET_CODE (XEXP (expr->expr, 0)) == SYMBOL_REF
-		  && CONSTANT_POOL_ADDRESS_P (XEXP (expr->expr, 0)))
-		continue;
-
-	      /* ??? Optimally, we would use interprocedural alias
-		 analysis to determine if this mem is actually killed
-		 by this call.  */
-	      RESET_BIT (transpout[bb->index], expr->bitmap_index);
-	    }
-    }
-}
-
 /* Code Hoisting variables and subroutines.  */
 
 /* Very busy expressions.  */
@@ -4171,7 +4142,6 @@ alloc_code_hoist_mem (int n_blocks, int n_exprs)
   hoist_vbein = sbitmap_vector_alloc (n_blocks, n_exprs);
   hoist_vbeout = sbitmap_vector_alloc (n_blocks, n_exprs);
   hoist_exprs = sbitmap_vector_alloc (n_blocks, n_exprs);
-  transpout = sbitmap_vector_alloc (n_blocks, n_exprs);
 }
 
 /* Free vars used for code hoisting analysis.  */
@@ -4186,7 +4156,6 @@ free_code_hoist_mem (void)
   sbitmap_vector_free (hoist_vbein);
   sbitmap_vector_free (hoist_vbeout);
   sbitmap_vector_free (hoist_exprs);
-  sbitmap_vector_free (transpout);
 
   free_dominance_info (CDI_DOMINATORS);
 }
@@ -4246,7 +4215,6 @@ static void
 compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
-  compute_transpout ();
   compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
   if (dump_file)
-- 
1.6.2.4


[-- Attachment #4: 0005-Also-search-non-immediately-dominated-blocks-for-exp.ChangeLog --]
[-- Type: text/plain, Size: 687 bytes --]

2010-06-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* basic-block.h (get_dominated_to_depth): Declare.
	* dominance.c (get_dominated_to_depth): New function, use
	get_all_dominated_blocks as a base.
	(get_all_dominated_blocks): Use get_dominated_to_depth.

	* gcse.c (occr_t, VEC (occr_t, heap)): Define.
	(hoist_exprs): Remove.
	(alloc_code_hoist_mem, free_code_hoist_mem): Update.
	(compute_code_hoist_vbeinout): Add debug print outs.
	(hoist_code): Partially rewrite, simplify.  Use get_dominated_to_depth.

	* params.def (PARAM_MAX_HOIST_DEPTH): New parameter to avoid
	quadratic behavior.
	* params.h (MAX_HOIST_DEPTH): New macro.
	* doc/invoke.texi (max-hoist-depth): Document.

[-- Attachment #5: 0015-Also-search-non-immediately-dominated-blocks-for-exp.patch --]
[-- Type: text/plain, Size: 20724 bytes --]

From 7ff6ff43205f47bcf1a38c4c24140c8e50ca1bb3 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 15 Jun 2010 11:24:56 -0700
Subject: [PATCH 15/16] Also search non-immediately dominated blocks for expressions to hoist

---
 gcc/basic-block.h   |    2 +
 gcc/doc/invoke.texi |    6 +
 gcc/dominance.c     |   22 ++++-
 gcc/gcse.c          |  276 ++++++++++++++++++++++++++-------------------------
 gcc/params.def      |    8 ++
 gcc/params.h        |    2 +
 6 files changed, 179 insertions(+), 137 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 135c0c2..1bf192d 100644
*** a/gcc/basic-block.h
--- b/gcc/basic-block.h
***************
*** 854,859 ****
--- 854,861 ----
  extern VEC (basic_block, heap) *get_dominated_by_region (enum cdi_direction,
  							 basic_block *,
  							 unsigned);
+ extern VEC (basic_block, heap) *get_dominated_to_depth (enum cdi_direction,
+ 							basic_block, int);
  extern VEC (basic_block, heap) *get_all_dominated_blocks (enum cdi_direction,
  							  basic_block);
  extern void add_to_dominance_info (enum cdi_direction, basic_block);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f6d4c04..5ef5026 100644
*** a/gcc/doc/invoke.texi
--- b/gcc/doc/invoke.texi
***************
*** 8232,8237 ****
--- 8232,8243 ----
  code hoisting pass.  The lesser the cost, the more aggressive code hoisting
  will be.  The default value is 3.
  
+ @item max-hoist-depth
+ The depth of search in the dominator tree for expressions to hoist.
+ This is used to avoid quadratic behavior in hoisting algorithm.
+ The value of 0 will avoid limiting the search, but may slow down compilation
+ of huge functions.  The default value is 30.
+ 
  @item max-unrolled-insns
  The maximum number of instructions that a loop should have if that loop
  is unrolled, and if the loop is unrolled, it determines how many times
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c2dcf0..7861439 100644
*** a/gcc/dominance.c
--- b/gcc/dominance.c
***************
*** 783,798 ****
  }
  
  /* Returns the list of basic blocks including BB dominated by BB, in the
!    direction DIR.  The vector will be sorted in preorder.  */
  
  VEC (basic_block, heap) *
! get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
  {
    VEC(basic_block, heap) *bbs = NULL;
    unsigned i;
  
    i = 0;
    VEC_safe_push (basic_block, heap, bbs, bb);
  
    do
      {
--- 783,802 ----
  }
  
  /* Returns the list of basic blocks including BB dominated by BB, in the
!    direction DIR up to DEPTH in the dominator tree.  The DEPTH of zero will
!    produce a vector containing all dominated blocks.  The vector will be sorted
!    in preorder.  */
  
  VEC (basic_block, heap) *
! get_dominated_to_depth (enum cdi_direction dir, basic_block bb, int depth)
  {
    VEC(basic_block, heap) *bbs = NULL;
    unsigned i;
+   unsigned next_level_start;
  
    i = 0;
    VEC_safe_push (basic_block, heap, bbs, bb);
+   next_level_start = 1; /* = VEC_length (basic_block, bbs); */
  
    do
      {
***************
*** 803,814 ****
  	   son;
  	   son = next_dom_son (dir, son))
  	VEC_safe_push (basic_block, heap, bbs, son);
      }
!   while (i < VEC_length (basic_block, bbs));
  
    return bbs;
  }
  
  /* Redirect all edges pointing to BB to TO.  */
  void
  redirect_immediate_dominators (enum cdi_direction dir, basic_block bb,
--- 807,830 ----
  	   son;
  	   son = next_dom_son (dir, son))
  	VEC_safe_push (basic_block, heap, bbs, son);
+ 
+       if (i == next_level_start && --depth)
+ 	next_level_start = VEC_length (basic_block, bbs);
      }
!   while (i < next_level_start);
  
    return bbs;
  }
  
+ /* Returns the list of basic blocks including BB dominated by BB, in the
+    direction DIR.  The vector will be sorted in preorder.  */
+ 
+ VEC (basic_block, heap) *
+ get_all_dominated_blocks (enum cdi_direction dir, basic_block bb)
+ {
+   return get_dominated_to_depth (dir, bb, 0);
+ }
+ 
  /* Redirect all edges pointing to BB to TO.  */
  void
  redirect_immediate_dominators (enum cdi_direction dir, basic_block bb,
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 44e7d2f..3006348 100644
*** a/gcc/gcse.c
--- b/gcc/gcse.c
***************
*** 322,327 ****
--- 322,331 ----
    char copied_p;
  };
  
+ typedef struct occr *occr_t;
+ DEF_VEC_P (occr_t);
+ DEF_VEC_ALLOC_P (occr_t, heap);
+ 
  /* Expression and copy propagation hash tables.
     Each hash table is an array of buckets.
     ??? It is known that if it were an array of entries, structure elements
***************
*** 4161,4169 ****
  static sbitmap *hoist_vbein;
  static sbitmap *hoist_vbeout;
  
- /* Hoistable expressions.  */
- static sbitmap *hoist_exprs;
- 
  /* ??? We could compute post dominators and run this algorithm in
     reverse to perform tail merging, doing so would probably be
     more effective than the tail merging code in jump.c.
--- 4165,4170 ----
***************
*** 4182,4188 ****
  
    hoist_vbein = sbitmap_vector_alloc (n_blocks, n_exprs);
    hoist_vbeout = sbitmap_vector_alloc (n_blocks, n_exprs);
-   hoist_exprs = sbitmap_vector_alloc (n_blocks, n_exprs);
    transpout = sbitmap_vector_alloc (n_blocks, n_exprs);
  }
  
--- 4183,4188 ----
***************
*** 4197,4203 ****
  
    sbitmap_vector_free (hoist_vbein);
    sbitmap_vector_free (hoist_vbeout);
-   sbitmap_vector_free (hoist_exprs);
    sbitmap_vector_free (transpout);
  
    free_dominance_info (CDI_DOMINATORS);
--- 4197,4202 ----
***************
*** 4237,4242 ****
--- 4236,4245 ----
  		 in BB and available at its end.  */
  	      sbitmap_a_or_b (hoist_vbeout[bb->index],
  			      hoist_vbeout[bb->index], comp[bb->index]);
+ 
+ 	      /* Remove from VBEout expressions that die right after BB.  */
+ 	      sbitmap_a_and_b (hoist_vbeout[bb->index],
+ 			       hoist_vbeout[bb->index], transpout[bb->index]);
  	    }
  
  	  changed |= sbitmap_a_or_b_and_c_cg (hoist_vbein[bb->index],
***************
*** 4249,4255 ****
      }
  
    if (dump_file)
!     fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
  }
  
  /* Top level routine to do the dataflow analysis needed by code hoisting.  */
--- 4252,4268 ----
      }
  
    if (dump_file)
!     {
!       fprintf (dump_file, "hoisting vbeinout computation: %d passes\n", passes);
! 
!       FOR_EACH_BB (bb)
!         {
! 	  fprintf (dump_file, "vbein (%d): ", bb->index);
! 	  dump_sbitmap_file (dump_file, hoist_vbein[bb->index]);
! 	  fprintf (dump_file, "vbeout(%d): ", bb->index);
! 	  dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]);
! 	}
!     }
  }
  
  /* Top level routine to do the dataflow analysis needed by code hoisting.  */
***************
*** 4312,4322 ****
  
        if (pred->src == ENTRY_BLOCK_PTR)
  	break;
-       else if (pred_bb == expr_bb)
- 	continue;
        else if (visited[pred_bb->index])
  	continue;
! 
        else if (! TEST_BIT (transp[pred_bb->index], expr_index))
  	break;
  
--- 4325,4336 ----
  
        if (pred->src == ENTRY_BLOCK_PTR)
  	break;
        else if (visited[pred_bb->index])
  	continue;
!       else if (!TEST_BIT (transpout[pred_bb->index], expr_index))
! 	break;
!       else if (pred_bb == expr_bb)
! 	continue;
        else if (! TEST_BIT (transp[pred_bb->index], expr_index))
  	break;
  
***************
*** 4360,4367 ****
    int *bb_size;
    int changed = 0;
  
-   sbitmap_vector_zero (hoist_exprs, last_basic_block);
- 
    /* Compute a mapping from expression number (`bitmap_index') to
       hash table entry.  */
  
--- 4374,4379 ----
***************
*** 4405,4434 ****
       Currently, we assume that this case is rare enough to worth fixing.  */
    FOR_EACH_BB (bb)
      {
!       int found = 0;
!       int insn_inserted_p;
  
-       domby = get_dominated_by (CDI_DOMINATORS, bb);
        /* Examine each expression that is very busy at the exit of this
  	 block.  These are the potentially hoistable expressions.  */
        for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
  	{
! 	  int hoistable = 0;
! 
! 	  if (TEST_BIT (hoist_vbeout[bb->index], i)
! 	      && TEST_BIT (transpout[bb->index], i))
  	    {
  	      /* If an expression is computed in BB and is available at end of
  		 BB, hoist all occurences dominated by BB to BB.  */
  	      if (TEST_BIT (comp[bb->index], i))
! 		hoistable++;
  
  	      /* We've found a potentially hoistable expression, now
  		 we look at every block BB dominates to see if it
  		 computes the expression.  */
  	      for (j = 0; VEC_iterate (basic_block, domby, j, dominated); j++)
  		{
- 		  struct expr *expr = index_map[i];
  		  int max_distance;
  
  		  /* Ignore self dominance.  */
--- 4417,4475 ----
       Currently, we assume that this case is rare enough to worth fixing.  */
    FOR_EACH_BB (bb)
      {
!       domby = get_dominated_to_depth (CDI_DOMINATORS, bb, MAX_HOIST_DEPTH);
! 
!       if (VEC_length (basic_block, domby) == 0)
! 	continue;
  
        /* Examine each expression that is very busy at the exit of this
  	 block.  These are the potentially hoistable expressions.  */
        for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
  	{
! 	  if (TEST_BIT (hoist_vbeout[bb->index], i))
  	    {
+ 	      /* Current expression.  */
+ 	      struct expr *expr = index_map[i];
+ 	      /* Number of occurences of EXPR that can be hoisted to BB.  */
+ 	      int hoistable = 0;
+ 	      /* Basic blocks that have occurences reachable from BB.  */
+ 	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
+ 	      /* Occurences reachable from BB.  */
+ 	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
+ 	      /* We want to insert the expression into BB only once, so
+ 		 note when we've inserted it.  */
+ 	      int insn_inserted_p;
+ 	      occr_t occr;
+ 
+ 	      bitmap_initialize (from_bbs, 0);
+ 
  	      /* If an expression is computed in BB and is available at end of
  		 BB, hoist all occurences dominated by BB to BB.  */
  	      if (TEST_BIT (comp[bb->index], i))
! 		{
! 		  occr = find_occr_in_bb (expr->antic_occr, bb);
! 
! 		  if (occr)
! 		    {
! 		      /* An occurence might've been already deleted
! 			 while processing a dominator of BB.  */
! 		      if (occr->deleted_p)
! 			gcc_assert (MAX_HOIST_DEPTH > 1);
! 		      else
! 			{
! 			  gcc_assert (NONDEBUG_INSN_P (occr->insn));
! 			  hoistable++;
! 			}
! 		    }
! 		  else
! 		    hoistable++;
! 		}
  
  	      /* We've found a potentially hoistable expression, now
  		 we look at every block BB dominates to see if it
  		 computes the expression.  */
  	      for (j = 0; VEC_iterate (basic_block, domby, j, dominated); j++)
  		{
  		  int max_distance;
  
  		  /* Ignore self dominance.  */
***************
*** 4440,4461 ****
  		  if (!TEST_BIT (antloc[dominated->index], i))
  		    continue;
  
! 		  max_distance = expr->max_distance;
! 		  if (max_distance > 0)
! 		    {
! 		      struct occr *occr;
! 
! 		      occr = find_occr_in_bb (expr->antic_occr, dominated);
! 		      gcc_assert (occr);
  
! 		      gcc_assert (NONDEBUG_INSN_P (occr->insn));
! 
! 		      /* Adjust MAX_DISTANCE to account for the fact that
! 			 OCCR won't have to travel all of DOMINATED, but
! 			 only part of it.  */
! 		      max_distance += (bb_size[dominated->index]
! 				       - to_bb_head[INSN_UID (occr->insn)]);
  		    }
  
  		  /* Note if the expression would reach the dominated block
  		     unimpared if it was placed at the end of BB.
--- 4481,4505 ----
  		  if (!TEST_BIT (antloc[dominated->index], i))
  		    continue;
  
! 		  occr = find_occr_in_bb (expr->antic_occr, dominated);
! 		  gcc_assert (occr);
  
! 		  /* An occurence might've been already deleted
! 		     while processing a dominator of BB.  */
! 		  if (occr->deleted_p)
! 		    {
! 		      gcc_assert (MAX_HOIST_DEPTH > 1);
! 		      continue;
  		    }
+ 		  gcc_assert (NONDEBUG_INSN_P (occr->insn));
+ 
+ 		  max_distance = expr->max_distance;
+ 		  if (max_distance > 0)
+ 		    /* Adjust MAX_DISTANCE to account for the fact that
+ 		       OCCR won't have to travel all of DOMINATED, but
+ 		       only part of it.  */
+ 		    max_distance += (bb_size[dominated->index]
+ 				     - to_bb_head[INSN_UID (occr->insn)]);
  
  		  /* Note if the expression would reach the dominated block
  		     unimpared if it was placed at the end of BB.
***************
*** 4464,4474 ****
  		     from a dominated block into BB.  */
  		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
  						 max_distance, bb_size))
! 		    hoistable++;
  		}
  
  	      /* If we found more than one hoistable occurrence of this
! 		 expression, then note it in the bitmap of expressions to
  		 hoist.  It makes no sense to hoist things which are computed
  		 in only one BB, and doing so tends to pessimize register
  		 allocation.  One could increase this value to try harder
--- 4508,4523 ----
  		     from a dominated block into BB.  */
  		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
  						 max_distance, bb_size))
! 		    {
! 		      hoistable++;
! 		      VEC_safe_push (occr_t, heap,
! 				     occrs_to_hoist, occr);
! 		      bitmap_set_bit (from_bbs, dominated->index);
! 		    }
  		}
  
  	      /* If we found more than one hoistable occurrence of this
! 		 expression, then note it in the vector of expressions to
  		 hoist.  It makes no sense to hoist things which are computed
  		 in only one BB, and doing so tends to pessimize register
  		 allocation.  One could increase this value to try harder
***************
*** 4479,4589 ****
  		 to nullify any benefit we get from code hoisting.  */
  	      if (hoistable > 1 && dbg_cnt (hoist_insn))
  		{
! 		  SET_BIT (hoist_exprs[bb->index], i);
! 		  found = 1;
  		}
! 	    }
! 	}
!       /* If we found nothing to hoist, then quit now.  */
!       if (! found)
!         {
! 	  VEC_free (basic_block, heap, domby);
! 	  continue;
! 	}
  
!       /* Loop over all the hoistable expressions.  */
!       for (i = 0; i < hoist_exprs[bb->index]->n_bits; i++)
! 	{
! 	  /* We want to insert the expression into BB only once, so
! 	     note when we've inserted it.  */
! 	  insn_inserted_p = 0;
  
! 	  /* These tests should be the same as the tests above.  */
! 	  if (TEST_BIT (hoist_exprs[bb->index], i))
! 	    {
! 	      /* We've found a potentially hoistable expression, now
! 		 we look at every block BB dominates to see if it
! 		 computes the expression.  */
! 	      for (j = 0; VEC_iterate (basic_block, domby, j, dominated); j++)
  		{
! 		  struct expr *expr = index_map[i];
! 		  struct occr *occr = NULL;
! 		  int max_distance;
! 
! 		  /* Ignore self dominance.  */
! 		  if (bb == dominated)
! 		    continue;
! 
! 		  /* We've found a dominated block, now see if it computes
! 		     the busy expression and whether or not moving that
! 		     expression to the "beginning" of that block is safe.  */
! 		  if (!TEST_BIT (antloc[dominated->index], i))
! 		    continue;
! 
! 		  max_distance = expr->max_distance;
! 		  if (max_distance > 0)
! 		    {
! 		      occr = find_occr_in_bb (expr->antic_occr, dominated);
! 		      gcc_assert (occr);
! 
! 		      gcc_assert (NONDEBUG_INSN_P (occr->insn));
! 
! 		      /* Adjust MAX_DISTANCE to account for the fact that
! 			 OCCR won't have to travel all of DOMINATED, but
! 			 only part of it.  */
! 		      max_distance += (bb_size[dominated->index]
! 				       - to_bb_head[INSN_UID (occr->insn)]);
! 		    }
! 
! 		  /* The expression is computed in the dominated block and
! 		     it would be safe to compute it at the start of the
! 		     dominated block.  Now we have to determine if the
! 		     expression would reach the dominated block if it was
! 		     placed at the end of BB.
! 		     Note: the fact that hoist_exprs has i-th bit set means
! 		     that /some/, not necesserilly all, occurences from
! 		     the dominated blocks can be hoisted to BB.  Here we check
! 		     if a specific occurence can be hoisted to BB.  */
! 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
! 						 max_distance, bb_size))
  		    {
! 		      rtx insn;
! 		      rtx set;
! 
! 		      if (!occr)
! 			{
! 			  occr = find_occr_in_bb (expr->antic_occr, dominated);
! 			  gcc_assert (occr);
! 			}
! 
! 		      insn = occr->insn;
! 		      set = single_set (insn);
! 		      gcc_assert (set);
! 
! 		      /* Create a pseudo-reg to store the result of reaching
! 			 expressions into.  Get the mode for the new pseudo
! 			 from the mode of the original destination pseudo.
! 
! 			 It is important to use new pseudos whenever we
! 			 emit a set.  This will allow reload to use
! 			 rematerialization for such registers.  */
! 		      if (!insn_inserted_p)
! 			expr->reaching_reg
! 			  = gen_reg_rtx_and_attrs (SET_DEST (set));
! 
! 		      gcse_emit_move_after (expr->reaching_reg, SET_DEST (set), insn);
! 		      delete_insn (insn);
! 		      occr->deleted_p = 1;
! 		      changed = 1;
! 		      gcse_subst_count++;
! 
! 		      if (!insn_inserted_p)
! 			{
! 			  insert_insn_end_basic_block (index_map[i], bb);
! 			  insn_inserted_p = 1;
! 			}
  		    }
  		}
  	    }
  	}
        VEC_free (basic_block, heap, domby);
--- 4528,4597 ----
  		 to nullify any benefit we get from code hoisting.  */
  	      if (hoistable > 1 && dbg_cnt (hoist_insn))
  		{
! 		  /* If (hoistable != VEC_length), then there is
! 		     an occurence of EXPR in BB itself.  Don't waste
! 		     time looking for LCA in this case.  */
! 		  if ((unsigned) hoistable
! 		      == VEC_length (occr_t, occrs_to_hoist))
! 		    {
! 		      basic_block lca;
! 
! 		      lca = nearest_common_dominator_for_set (CDI_DOMINATORS,
! 							      from_bbs);
! 		      if (lca != bb)
! 			/* Punt, it's better to hoist these occurences to
! 			   LCA.  */
! 			VEC_free (occr_t, heap, occrs_to_hoist);
! 		    }
  		}
! 	      else
! 		/* Punt, no point hoisting a single occurence.  */
! 		VEC_free (occr_t, heap, occrs_to_hoist);
  
! 	      insn_inserted_p = 0;
  
! 	      /* Walk through occurences of I'th expressions we want
! 		 to hoist to BB and make the transformations.  */
! 	      for (j = 0;
! 		   VEC_iterate (occr_t, occrs_to_hoist, j, occr);
! 		   j++)
  		{
! 		  rtx insn;
! 		  rtx set;
! 
! 		  gcc_assert (!occr->deleted_p);
! 
! 		  insn = occr->insn;
! 		  set = single_set (insn);
! 		  gcc_assert (set);
! 
! 		  /* Create a pseudo-reg to store the result of reaching
! 		     expressions into.  Get the mode for the new pseudo
! 		     from the mode of the original destination pseudo.
! 
! 		     It is important to use new pseudos whenever we
! 		     emit a set.  This will allow reload to use
! 		     rematerialization for such registers.  */
! 		  if (!insn_inserted_p)
! 		    expr->reaching_reg
! 		      = gen_reg_rtx_and_attrs (SET_DEST (set));
! 
! 		  gcse_emit_move_after (expr->reaching_reg, SET_DEST (set),
! 					insn);
! 		  delete_insn (insn);
! 		  occr->deleted_p = 1;
! 		  changed = 1;
! 		  gcse_subst_count++;
! 
! 		  if (!insn_inserted_p)
  		    {
! 		      insert_insn_end_basic_block (expr, bb);
! 		      insn_inserted_p = 1;
  		    }
  		}
+ 
+ 	      VEC_free (occr_t, heap, occrs_to_hoist);
+ 	      bitmap_clear (from_bbs);
  	    }
  	}
        VEC_free (basic_block, heap, domby);
diff --git a/gcc/params.def b/gcc/params.def
index 551e8e2..22737a0 100644
*** a/gcc/params.def
--- b/gcc/params.def
***************
*** 240,245 ****
--- 240,253 ----
  	 "Cost at which GCSE optimizations will not constraint the distance an expression can travel",
  	 3, 0, 0)
  
+ /* How deep from a given basic block the dominator tree should be searched
+    for expressions to hoist to the block.  The value of 0 will avoid limiting
+    the search.  */
+ DEFPARAM(PARAM_MAX_HOIST_DEPTH,
+ 	 "max-hoist-depth",
+ 	 "Maximum depth of search in the dominator tree for expressions to hoist",
+ 	 30, 0, 0)
+ 
  /* This parameter limits the number of insns in a loop that will be unrolled,
     and by how much the loop is unrolled.
  
diff --git a/gcc/params.h b/gcc/params.h
index 174edc1..aa96c81 100644
*** a/gcc/params.h
--- b/gcc/params.h
***************
*** 129,134 ****
    PARAM_VALUE (PARAM_GCSE_COST_DISTANCE_RATIO)
  #define GCSE_UNRESTRICTED_COST \
    PARAM_VALUE (PARAM_GCSE_UNRESTRICTED_COST)
  #define MAX_UNROLLED_INSNS \
    PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)
  #define MAX_SMS_LOOP_NUMBER \
- - 
--- 129,136 ----
    PARAM_VALUE (PARAM_GCSE_COST_DISTANCE_RATIO)
  #define GCSE_UNRESTRICTED_COST \
    PARAM_VALUE (PARAM_GCSE_UNRESTRICTED_COST)
+ #define MAX_HOIST_DEPTH \
+   PARAM_VALUE (PARAM_MAX_HOIST_DEPTH)
  #define MAX_UNROLLED_INSNS \
    PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)
  #define MAX_SMS_LOOP_NUMBER \
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-09 20:18                           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-14 20:58                             ` Jeff Law
  2010-07-14 21:42                               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-07-14 20:58 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

07/09/10 14:17, Maxim Kuvyrkov wrote:
>
> The first of the attached patches replaces transpout with an 
> additional check in determining if an expression is anticipatable.
First, don't forget -p when generating your diffs.  -p includes the 
function's name in the diff output which helps the reviewer understand 
where a hunk was changed. Many reviewers prefer unidiffs (-u) as well.

Anyway, onward to the patches...


I'm not sure why you didn't just factor out the code from 
compute_pre_data and use it for both pre and hoisting.


Pull this fragment from compute_pre_data into its own function and call 
it from compute_pre_data and compute_hoist_data:

   /* Collect expressions which might trap.  */
   trapping_expr = sbitmap_alloc (expr_hash_table.n_elems);
   sbitmap_zero (trapping_expr);
   for (ui = 0; ui < expr_hash_table.size; ui++)
     {
       struct expr *e;
       for (e = expr_hash_table.table[ui]; e != NULL; e = e->next_same_hash)
         if (may_trap_p (e->expr))
           SET_BIT (trapping_expr, e->bitmap_index);
     }

Which gives you a bitmap of potentially trapping expressions.

Then pull this fragment from compute_pre_data into a function:


       FOR_EACH_EDGE (e, ei, bb->preds)
         if (e->flags & EDGE_ABNORMAL)
           {
             sbitmap_difference (antloc[bb->index], antloc[bb->index], 
trapping_expr);
             sbitmap_difference (transp[bb->index], transp[bb->index], 
trapping_expr);
             break;
           }

  compute_pre_data would then look like:

  compute_local_properties ( ... )
  sbitmap_vector_zero ( ... )
  compute_potentially_trapping_expressions ( ...)

  FOR_EACH_BB (bb)
     {
       /* If the current block is the destination of an abnormal edge, we
          kill all trapping expressions because we won't be able to properly
          place the instruction on the edge.  So make them neither
          anticipatable nor transparent.  This is fairly conservative.  */
       prune_antloc & transp (bb, antloc, transp, trapping_expr)

       sbitmap_a_or_b (ae_kill[bb->index], transp[bb->index], 
comp[bb->index]);
       sbitmap_not (ae_kill[bb->index], ae_kill[bb->index]);
     }

And compute_code_hoist_data would look like:

   compute_local_properties ( ... )
   compute_potentially_trapping_expressions ( ... )

   FOR_EACH_BB (bb)
     prune_antloc & transp (bb, antloc, transp, trapping_expr);

Or something very close to that.


That avoids having two hunks of code that are supposed to do the same 
thing, but due to implementation differences actually behave differently.



>
> The second patch implements LCA approach to avoid hoisting expression 
> too far up.  As a side effect of implementation, it somewhat 
> simplifies control flow of hoist_code.
This looks pretty good.  I still see a reference to compute_transpout, 
which presumably will go away once we settle on the final form of the 
first patch.  This patch is OK.

>
> I really hope this is the last iteration on the one-line change this 
> problem initially was :).
:-)  It happens like this sometimes.

Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-14 20:58                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-07-14 21:42                               ` Maxim Kuvyrkov
  2010-07-15 16:06                                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-14 21:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 7/15/10 12:58 AM, Jeff Law wrote:
> /* Collect expressions which might trap. */
> trapping_expr = sbitmap_alloc (expr_hash_table.n_elems);
> sbitmap_zero (trapping_expr);
> for (ui = 0; ui < expr_hash_table.size; ui++)
> {
> struct expr *e;
> for (e = expr_hash_table.table[ui]; e != NULL; e = e->next_same_hash)
> if (may_trap_p (e->expr))
> SET_BIT (trapping_expr, e->bitmap_index);
> }
...
> FOR_EACH_EDGE (e, ei, bb->preds)
> if (e->flags & EDGE_ABNORMAL)
> {
> sbitmap_difference (antloc[bb->index], antloc[bb->index], trapping_expr);
> sbitmap_difference (transp[bb->index], transp[bb->index], trapping_expr);
> break;
> }
>
...
> compute_local_properties ( ... )
> compute_potentially_trapping_expressions ( ... )
>
> FOR_EACH_BB (bb)
> prune_antloc & transp (bb, antloc, transp, trapping_expr);

This will have the effect of disabling hoisting for all potentially 
trapping expressions, even if they don't result in abnormal control 
flow.  E.g., a memory reference will not be eligible for hoisting in 
under these conditions.

[IIUC, it is OK to hoist a potentially trapping expression as long as it 
will trap on every path and provided non-call exceptions are disabled.]

> That avoids having two hunks of code that are supposed to do the same
> thing, but due to implementation differences actually behave differently.

PRE code for handling trapping expressions removes quite a larger set of 
expression from optimization space, and that seems like the right thing 
to do for PRE.  Hoisting, on the other hand can relax condtions on 
trapping expressions by compensating with the VBE requirement.  It seems 
to me that we shouldn't disable hoisting of all potentially trapping 
expressions, but only of those that have an abnormal edge sticking out 
of them.


>> The second patch implements LCA approach to avoid hoisting expression
>> too far up. As a side effect of implementation, it somewhat simplifies
>> control flow of hoist_code.
> This looks pretty good. I still see a reference to compute_transpout,
> which presumably will go away once we settle on the final form of the
> first patch. This patch is OK.

Thanks!

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-14 21:42                               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-15 16:06                                 ` Jeff Law
  2010-07-15 19:22                                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-07-15 16:06 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 07/14/10 15:42, Maxim Kuvyrkov wrote:
> On 7/15/10 12:58 AM, Jeff Law wrote:
>> /* Collect expressions which might trap. */
>> trapping_expr = sbitmap_alloc (expr_hash_table.n_elems);
>> sbitmap_zero (trapping_expr);
>> for (ui = 0; ui < expr_hash_table.size; ui++)
>> {
>> struct expr *e;
>> for (e = expr_hash_table.table[ui]; e != NULL; e = e->next_same_hash)
>> if (may_trap_p (e->expr))
>> SET_BIT (trapping_expr, e->bitmap_index);
>> }
> ...
>> FOR_EACH_EDGE (e, ei, bb->preds)
>> if (e->flags & EDGE_ABNORMAL)
>> {
>> sbitmap_difference (antloc[bb->index], antloc[bb->index], 
>> trapping_expr);
>> sbitmap_difference (transp[bb->index], transp[bb->index], 
>> trapping_expr);
>> break;
>> }
>>
> ...
>> compute_local_properties ( ... )
>> compute_potentially_trapping_expressions ( ... )
>>
>> FOR_EACH_BB (bb)
>> prune_antloc & transp (bb, antloc, transp, trapping_expr);
>
> This will have the effect of disabling hoisting for all potentially 
> trapping expressions, even if they don't result in abnormal control 
> flow.  E.g., a memory reference will not be eligible for hoisting in 
> under these conditions.
?  See the e->flags & EDGE_ABNORMAL test prior to removing elements of 
trapping_expr from antloc or transp.

> PRE code for handling trapping expressions removes quite a larger set 
> of expression from optimization space, and that seems like the right 
> thing to do for PRE.  Hoisting, on the other hand can relax condtions 
> on trapping expressions by compensating with the VBE requirement.  It 
> seems to me that we shouldn't disable hoisting of all potentially 
> trapping expressions, but only of those that have an abnormal edge 
> sticking out of them.
Partially correct.  If there aren't abnormal edges in the source blocks, 
then hoisting can relax based on the VBE requirement.  If there are 
abnormal edges, then hoisting can not relax because nothing adds the 
abnormal edges to the destination blocks.


Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-15 16:06                                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-07-15 19:22                                   ` Maxim Kuvyrkov
  2010-07-16 18:37                                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-15 19:22 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 373 bytes --]

On 7/15/10 8:06 PM, Jeff Law wrote:
...
> ? See the e->flags & EDGE_ABNORMAL test prior to removing elements of
> trapping_expr from antloc or transp.

I missed that point that we should avoid emitting anything at the ends 
of basic blocks with abnormal edges.

Is the attached patch OK?

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0002-Replace-transpout.ChangeLog --]
[-- Type: text/plain, Size: 363 bytes --]

2010-07-15  Jeff Law  <law@redhat.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (compute_transpout, transpout): Remove.
	(compute_pre_data): Move pruning of trapping expressions ...
	(prune_trapping_expressions): ... here.  New static function.
	(compute_code_hoist_data): Use it.
	(alloc_code_hoist_mem, free_code_hoist_mem, hoist_code): Update.

[-- Attachment #3: 0002-Replace-transpout.patch --]
[-- Type: text/plain, Size: 6415 bytes --]

From 41bd3914ab86d8154cb9fdc148559c459e29a816 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Thu, 15 Jul 2010 10:12:17 -0700
Subject: [PATCH 02/13] Replace transpout

---
 gcc/gcse.c |  103 ++++++++++++++++++-----------------------------------------
 1 files changed, 32 insertions(+), 71 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index e506d47..1d9c6c9 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -468,7 +468,6 @@ static void mark_oprs_set (rtx);
 static void alloc_cprop_mem (int, int);
 static void free_cprop_mem (void);
 static void compute_transp (const_rtx, int, sbitmap *, int);
-static void compute_transpout (void);
 static void compute_local_properties (sbitmap *, sbitmap *, sbitmap *,
 				      struct hash_table_d *);
 static void compute_cprop_data (void);
@@ -3172,11 +3171,6 @@ bypass_conditional_jumps (void)
 /* Nonzero for expressions that are transparent in the block.  */
 static sbitmap *transp;
 
-/* Nonzero for expressions that are transparent at the end of the block.
-   This is only zero for expressions killed by abnormal critical edge
-   created by a calls.  */
-static sbitmap *transpout;
-
 /* Nonzero for expressions that are computed (available) in the block.  */
 static sbitmap *comp;
 
@@ -3240,17 +3234,15 @@ free_pre_mem (void)
   pre_optimal = pre_redundant = pre_insert_map = pre_delete_map = NULL;
 }
 
-/* Top level routine to do the dataflow analysis needed by PRE.  */
+/* Remove potentially trapping expressions from anticipatable and transparent
+   sets of basic blocks that have incoming abnormal edge.  */
 
 static void
-compute_pre_data (void)
+prune_trapping_expressions (void)
 {
   sbitmap trapping_expr;
-  basic_block bb;
   unsigned int ui;
-
-  compute_local_properties (transp, comp, antloc, &expr_hash_table);
-  sbitmap_vector_zero (ae_kill, last_basic_block);
+  basic_block bb;
 
   /* Collect expressions which might trap.  */
   trapping_expr = sbitmap_alloc (expr_hash_table.n_elems);
@@ -3263,11 +3255,6 @@ compute_pre_data (void)
 	  SET_BIT (trapping_expr, e->bitmap_index);
     }
 
-  /* Compute ae_kill for each basic block using:
-
-     ~(TRANSP | COMP)
-  */
-
   FOR_EACH_BB (bb)
     {
       edge e;
@@ -3280,11 +3267,35 @@ compute_pre_data (void)
       FOR_EACH_EDGE (e, ei, bb->preds)
 	if (e->flags & EDGE_ABNORMAL)
 	  {
-	    sbitmap_difference (antloc[bb->index], antloc[bb->index], trapping_expr);
-	    sbitmap_difference (transp[bb->index], transp[bb->index], trapping_expr);
+	    sbitmap_difference (antloc[bb->index],
+				antloc[bb->index], trapping_expr);
+	    sbitmap_difference (transp[bb->index],
+				transp[bb->index], trapping_expr);
 	    break;
 	  }
+    }
+
+  sbitmap_free (trapping_expr);
+}
+
+/* Top level routine to do the dataflow analysis needed by PRE.  */
+
+static void
+compute_pre_data (void)
+{
+  basic_block bb;
+
+  compute_local_properties (transp, comp, antloc, &expr_hash_table);
+  prune_trapping_expressions ();
+  sbitmap_vector_zero (ae_kill, last_basic_block);
 
+  /* Compute ae_kill for each basic block using:
+
+     ~(TRANSP | COMP)
+  */
+
+  FOR_EACH_BB (bb)
+    {
       sbitmap_a_or_b (ae_kill[bb->index], transp[bb->index], comp[bb->index]);
       sbitmap_not (ae_kill[bb->index], ae_kill[bb->index]);
     }
@@ -3295,7 +3306,6 @@ compute_pre_data (void)
   antloc = NULL;
   sbitmap_vector_free (ae_kill);
   ae_kill = NULL;
-  sbitmap_free (trapping_expr);
 }
 \f
 /* PRE utilities */
@@ -4050,52 +4060,6 @@ add_label_notes (rtx x, rtx insn)
     }
 }
 
-/* Compute transparent outgoing information for each block.
-
-   An expression is transparent to an edge unless it is killed by
-   the edge itself.  This can only happen with abnormal control flow,
-   when the edge is traversed through a call.  This happens with
-   non-local labels and exceptions.
-
-   This would not be necessary if we split the edge.  While this is
-   normally impossible for abnormal critical edges, with some effort
-   it should be possible with exception handling, since we still have
-   control over which handler should be invoked.  But due to increased
-   EH table sizes, this may not be worthwhile.  */
-
-static void
-compute_transpout (void)
-{
-  basic_block bb;
-  unsigned int i;
-  struct expr *expr;
-
-  sbitmap_vector_ones (transpout, last_basic_block);
-
-  FOR_EACH_BB (bb)
-    {
-      /* Note that flow inserted a nop at the end of basic blocks that
-	 end in call instructions for reasons other than abnormal
-	 control flow.  */
-      if (! CALL_P (BB_END (bb)))
-	continue;
-
-      for (i = 0; i < expr_hash_table.size; i++)
-	for (expr = expr_hash_table.table[i]; expr ; expr = expr->next_same_hash)
-	  if (MEM_P (expr->expr))
-	    {
-	      if (GET_CODE (XEXP (expr->expr, 0)) == SYMBOL_REF
-		  && CONSTANT_POOL_ADDRESS_P (XEXP (expr->expr, 0)))
-		continue;
-
-	      /* ??? Optimally, we would use interprocedural alias
-		 analysis to determine if this mem is actually killed
-		 by this call.  */
-	      RESET_BIT (transpout[bb->index], expr->bitmap_index);
-	    }
-    }
-}
-
 /* Code Hoisting variables and subroutines.  */
 
 /* Very busy expressions.  */
@@ -4124,7 +4088,6 @@ alloc_code_hoist_mem (int n_blocks, int n_exprs)
   hoist_vbein = sbitmap_vector_alloc (n_blocks, n_exprs);
   hoist_vbeout = sbitmap_vector_alloc (n_blocks, n_exprs);
   hoist_exprs = sbitmap_vector_alloc (n_blocks, n_exprs);
-  transpout = sbitmap_vector_alloc (n_blocks, n_exprs);
 }
 
 /* Free vars used for code hoisting analysis.  */
@@ -4139,7 +4102,6 @@ free_code_hoist_mem (void)
   sbitmap_vector_free (hoist_vbein);
   sbitmap_vector_free (hoist_vbeout);
   sbitmap_vector_free (hoist_exprs);
-  sbitmap_vector_free (transpout);
 
   free_dominance_info (CDI_DOMINATORS);
 }
@@ -4192,7 +4154,7 @@ static void
 compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
-  compute_transpout ();
+  prune_trapping_expressions ();
   compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
   if (dump_file)
@@ -4294,8 +4256,7 @@ hoist_code (void)
 	{
 	  int hoistable = 0;
 
-	  if (TEST_BIT (hoist_vbeout[bb->index], i)
-	      && TEST_BIT (transpout[bb->index], i))
+	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
 	      /* We've found a potentially hoistable expression, now
 		 we look at every block BB dominates to see if it
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-15 19:22                                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-16 18:37                                     ` Jeff Law
  2010-07-17 16:41                                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-07-16 18:37 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 07/15/10 13:22, Maxim Kuvyrkov wrote:
> On 7/15/10 8:06 PM, Jeff Law wrote:
> ...
>> ? See the e->flags & EDGE_ABNORMAL test prior to removing elements of
>> trapping_expr from antloc or transp.
>
> I missed that point that we should avoid emitting anything at the ends 
> of basic blocks with abnormal edges.
>
> Is the attached patch OK?
Yes as long as it passes the usual bootstrap & regression test.

Thanks for your patience,
Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-16 18:37                                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-07-17 16:41                                       ` Maxim Kuvyrkov
  2010-07-19 18:08                                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-17 16:41 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

On 7/16/10 10:37 PM, Jeff Law wrote:
> On 07/15/10 13:22, Maxim Kuvyrkov wrote:
>> On 7/15/10 8:06 PM, Jeff Law wrote:
>> ...
>>> ? See the e->flags & EDGE_ABNORMAL test prior to removing elements of
>>> trapping_expr from antloc or transp.
>>
>> I missed that point that we should avoid emitting anything at the ends
>> of basic blocks with abnormal edges.
>>
>> Is the attached patch OK?
> Yes as long as it passes the usual bootstrap & regression test.

Well, after the regtest I now know the exact purpose of transpout.  It 
is to avoid moving memory references across calls that can clobber 
memory.  Without the checks done in compute_transpout a memory reference 
can be moved along an abnormal edge and, when that happens, it gets 
placed /before/ the call.

Similarly, we want to avoid hoisting potentially trapping expressions 
along abnormal edges as that would allow the trap to happen prematurely.

Jeff, I'm sorry this is taking so long; would you please review this 
incarnation of the patch.

Bootstrapped and tested on i686-linux (biarch) and arm-linux.

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0002-Replace-transpout.ChangeLog --]
[-- Type: text/plain, Size: 388 bytes --]

2010-07-15  Jeff Law  <law@redhat.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>

	* gcse.c (compute_transpout, transpout): Remove, move logic
	to prune_expressions.
	(compute_pre_data): Move pruning of trapping expressions ...
	(prune_expressions): ... here.  New static function.
	(compute_code_hoist_data): Use it.
	(alloc_code_hoist_mem, free_code_hoist_mem, hoist_code): Update.

[-- Attachment #3: 0002-Replace-transpout.patch --]
[-- Type: text/plain, Size: 8485 bytes --]

From e17982a7ba86c9dd1023704a57317af1cff41787 Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Thu, 15 Jul 2010 10:12:17 -0700
Subject: [PATCH 02/12] Replace transpout

---
 gcc/gcse.c |  156 +++++++++++++++++++++++++++++-------------------------------
 1 files changed, 76 insertions(+), 80 deletions(-)

diff --git a/gcc/gcse.c b/gcc/gcse.c
index e506d47..0d80fb2 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -468,7 +468,6 @@ static void mark_oprs_set (rtx);
 static void alloc_cprop_mem (int, int);
 static void free_cprop_mem (void);
 static void compute_transp (const_rtx, int, sbitmap *, int);
-static void compute_transpout (void);
 static void compute_local_properties (sbitmap *, sbitmap *, sbitmap *,
 				      struct hash_table_d *);
 static void compute_cprop_data (void);
@@ -3172,11 +3171,6 @@ bypass_conditional_jumps (void)
 /* Nonzero for expressions that are transparent in the block.  */
 static sbitmap *transp;
 
-/* Nonzero for expressions that are transparent at the end of the block.
-   This is only zero for expressions killed by abnormal critical edge
-   created by a calls.  */
-static sbitmap *transpout;
-
 /* Nonzero for expressions that are computed (available) in the block.  */
 static sbitmap *comp;
 
@@ -3240,33 +3234,59 @@ free_pre_mem (void)
   pre_optimal = pre_redundant = pre_insert_map = pre_delete_map = NULL;
 }
 
-/* Top level routine to do the dataflow analysis needed by PRE.  */
+/* Remove certain expressions from anticipatable and transparent
+   sets of basic blocks that have incoming abnormal edge.
+   For PRE remove potentially trapping expressions to avoid placing
+   them on abnormal edges.  For HOIST remove memory references that
+   can be clobbered by calls.  */
 
 static void
-compute_pre_data (void)
+prune_expressions (bool pre_p)
 {
-  sbitmap trapping_expr;
-  basic_block bb;
+  sbitmap prune_exprs;
   unsigned int ui;
+  basic_block bb;
 
-  compute_local_properties (transp, comp, antloc, &expr_hash_table);
-  sbitmap_vector_zero (ae_kill, last_basic_block);
-
-  /* Collect expressions which might trap.  */
-  trapping_expr = sbitmap_alloc (expr_hash_table.n_elems);
-  sbitmap_zero (trapping_expr);
+  prune_exprs = sbitmap_alloc (expr_hash_table.n_elems);
+  sbitmap_zero (prune_exprs);
   for (ui = 0; ui < expr_hash_table.size; ui++)
     {
       struct expr *e;
       for (e = expr_hash_table.table[ui]; e != NULL; e = e->next_same_hash)
-	if (may_trap_p (e->expr))
-	  SET_BIT (trapping_expr, e->bitmap_index);
-    }
+	{
+	  /* Note potentially trapping expressions.  */
+	  if (may_trap_p (e->expr))
+	    {
+	      SET_BIT (prune_exprs, e->bitmap_index);
+	      continue;
+	    }
 
-  /* Compute ae_kill for each basic block using:
+	  if (!pre_p && MEM_P (e->expr))
+	    /* Note memory references that can be clobbered by a call.
+	       We do not split abnormal edges in HOIST, so would
+	       a memory reference get hoisted along an abnormal edge,
+	       it would be placed /before/ the call.  Therefore, only
+	       constant memory references can be hoisted along abnormal
+	       edges.  */
+	    {
+	      if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF
+		  && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0)))
+		continue;
 
-     ~(TRANSP | COMP)
-  */
+	      if (MEM_READONLY_P (e->expr)
+		  && !MEM_VOLATILE_P (e->expr)
+		  && MEM_NOTRAP_P (e->expr))
+		/* Constant memory reference, e.g., a PIC address.  */
+		continue;
+
+	      /* ??? Optimally, we would use interprocedural alias
+		 analysis to determine if this mem is actually killed
+		 by this call.  */
+
+	      SET_BIT (prune_exprs, e->bitmap_index);
+	    }
+	}
+    }
 
   FOR_EACH_BB (bb)
     {
@@ -3274,17 +3294,43 @@ compute_pre_data (void)
       edge_iterator ei;
 
       /* If the current block is the destination of an abnormal edge, we
-	 kill all trapping expressions because we won't be able to properly
-	 place the instruction on the edge.  So make them neither
-	 anticipatable nor transparent.  This is fairly conservative.  */
+	 kill all trapping (for PRE) or memory (for HOIST) expressions
+	 because we won't be able to properly place the instruction on
+	 the edge.  So make them neither anticipatable nor transparent.
+	 This is fairly conservative.  */
       FOR_EACH_EDGE (e, ei, bb->preds)
-	if (e->flags & EDGE_ABNORMAL)
+	if ((e->flags & EDGE_ABNORMAL)
+	    && (pre_p || CALL_P (BB_END (e->src))))
 	  {
-	    sbitmap_difference (antloc[bb->index], antloc[bb->index], trapping_expr);
-	    sbitmap_difference (transp[bb->index], transp[bb->index], trapping_expr);
+	    sbitmap_difference (antloc[bb->index],
+				antloc[bb->index], prune_exprs);
+	    sbitmap_difference (transp[bb->index],
+				transp[bb->index], prune_exprs);
 	    break;
 	  }
+    }
+
+  sbitmap_free (prune_exprs);
+}
+
+/* Top level routine to do the dataflow analysis needed by PRE.  */
 
+static void
+compute_pre_data (void)
+{
+  basic_block bb;
+
+  compute_local_properties (transp, comp, antloc, &expr_hash_table);
+  prune_expressions (true);
+  sbitmap_vector_zero (ae_kill, last_basic_block);
+
+  /* Compute ae_kill for each basic block using:
+
+     ~(TRANSP | COMP)
+  */
+
+  FOR_EACH_BB (bb)
+    {
       sbitmap_a_or_b (ae_kill[bb->index], transp[bb->index], comp[bb->index]);
       sbitmap_not (ae_kill[bb->index], ae_kill[bb->index]);
     }
@@ -3295,7 +3341,6 @@ compute_pre_data (void)
   antloc = NULL;
   sbitmap_vector_free (ae_kill);
   ae_kill = NULL;
-  sbitmap_free (trapping_expr);
 }
 \f
 /* PRE utilities */
@@ -4050,52 +4095,6 @@ add_label_notes (rtx x, rtx insn)
     }
 }
 
-/* Compute transparent outgoing information for each block.
-
-   An expression is transparent to an edge unless it is killed by
-   the edge itself.  This can only happen with abnormal control flow,
-   when the edge is traversed through a call.  This happens with
-   non-local labels and exceptions.
-
-   This would not be necessary if we split the edge.  While this is
-   normally impossible for abnormal critical edges, with some effort
-   it should be possible with exception handling, since we still have
-   control over which handler should be invoked.  But due to increased
-   EH table sizes, this may not be worthwhile.  */
-
-static void
-compute_transpout (void)
-{
-  basic_block bb;
-  unsigned int i;
-  struct expr *expr;
-
-  sbitmap_vector_ones (transpout, last_basic_block);
-
-  FOR_EACH_BB (bb)
-    {
-      /* Note that flow inserted a nop at the end of basic blocks that
-	 end in call instructions for reasons other than abnormal
-	 control flow.  */
-      if (! CALL_P (BB_END (bb)))
-	continue;
-
-      for (i = 0; i < expr_hash_table.size; i++)
-	for (expr = expr_hash_table.table[i]; expr ; expr = expr->next_same_hash)
-	  if (MEM_P (expr->expr))
-	    {
-	      if (GET_CODE (XEXP (expr->expr, 0)) == SYMBOL_REF
-		  && CONSTANT_POOL_ADDRESS_P (XEXP (expr->expr, 0)))
-		continue;
-
-	      /* ??? Optimally, we would use interprocedural alias
-		 analysis to determine if this mem is actually killed
-		 by this call.  */
-	      RESET_BIT (transpout[bb->index], expr->bitmap_index);
-	    }
-    }
-}
-
 /* Code Hoisting variables and subroutines.  */
 
 /* Very busy expressions.  */
@@ -4124,7 +4123,6 @@ alloc_code_hoist_mem (int n_blocks, int n_exprs)
   hoist_vbein = sbitmap_vector_alloc (n_blocks, n_exprs);
   hoist_vbeout = sbitmap_vector_alloc (n_blocks, n_exprs);
   hoist_exprs = sbitmap_vector_alloc (n_blocks, n_exprs);
-  transpout = sbitmap_vector_alloc (n_blocks, n_exprs);
 }
 
 /* Free vars used for code hoisting analysis.  */
@@ -4139,7 +4137,6 @@ free_code_hoist_mem (void)
   sbitmap_vector_free (hoist_vbein);
   sbitmap_vector_free (hoist_vbeout);
   sbitmap_vector_free (hoist_exprs);
-  sbitmap_vector_free (transpout);
 
   free_dominance_info (CDI_DOMINATORS);
 }
@@ -4192,7 +4189,7 @@ static void
 compute_code_hoist_data (void)
 {
   compute_local_properties (transp, comp, antloc, &expr_hash_table);
-  compute_transpout ();
+  prune_expressions (false);
   compute_code_hoist_vbeinout ();
   calculate_dominance_info (CDI_DOMINATORS);
   if (dump_file)
@@ -4294,8 +4291,7 @@ hoist_code (void)
 	{
 	  int hoistable = 0;
 
-	  if (TEST_BIT (hoist_vbeout[bb->index], i)
-	      && TEST_BIT (transpout[bb->index], i))
+	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
 	      /* We've found a potentially hoistable expression, now
 		 we look at every block BB dominates to see if it
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Tune hoisting for ARM
  2010-06-23 21:20 ` ARM improvements for GCSE Maxim Kuvyrkov
                     ` (2 preceding siblings ...)
  2010-06-23 21:35   ` Wrap calculation of PIC address into a single instruction Maxim Kuvyrkov
@ 2010-07-17 16:52   ` Maxim Kuvyrkov
  2010-07-19  8:09     ` Richard Earnshaw
  3 siblings, 1 reply; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-17 16:52 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 335 bytes --]

The attached patch tunes code hoisting parameter gcse-unrestricted-cost 
for ARM.  It is provides a small (~0.3%), but measurable size reduction 
to use a bit lesser cost cutoff for ARM PIC.

Tuning was done on SPEC2K benchmarks.

OK to check in?

Thank you,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

[-- Attachment #2: 0012-ARM-tuning-parameters.ChangeLog --]
[-- Type: text/plain, Size: 199 bytes --]

2010-07-15  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/arm/arm.c (params.h): Include.
	(arm_override_options): Tune gcse-unrestricted-cost.
	* config/arm/t-arm (arm.o): Define dependencies.

[-- Attachment #3: 0012-ARM-tuning-parameters.patch --]
[-- Type: text/plain, Size: 2187 bytes --]

From b923a9faac6b18e30d86c8bf66a26a493a08e88f Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim@codesourcery.com>
Date: Tue, 29 Jun 2010 07:07:21 -0700
Subject: [PATCH 12/12] ARM tuning parameters

---
 gcc/config/arm/arm.c |    9 +++++++++
 gcc/config/arm/t-arm |    9 +++++++++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f00416..b5cc3ed 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -56,6 +56,7 @@
 #include "df.h"
 #include "intl.h"
 #include "libfuncs.h"
+#include "params.h"
 
 /* Forward definitions of types.  */
 typedef struct minipool_node    Mnode;
@@ -1872,6 +1873,14 @@ arm_override_options (void)
       flag_reorder_blocks = 1;
     }
 
+  if (!PARAM_SET_P (PARAM_GCSE_UNRESTRICTED_COST)
+      && flag_pic)
+    /* Hoisting PIC address calculations more aggressively provides a small,
+       but measurable, size reduction for PIC code.  Therefore, we decrease
+       the bar for unrestricted expression hoisting to the cost of PIC address
+       calculation, which is 2 instructions.  */
+    set_param_value ("gcse-unrestricted-cost", 2);
+
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
 }
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index de2bbc4..4879211 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -45,6 +45,15 @@ $(srcdir)/config/arm/arm-tune.md: $(srcdir)/config/arm/gentune.sh \
 		$(srcdir)/config/arm/arm-cores.def > \
 		$(srcdir)/config/arm/arm-tune.md
 
+arm.o: $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+  $(RTL_H) $(TREE_H) $(OBSTACK_H) $(REGS_H) hard-reg-set.h \
+  insn-config.h conditions.h output.h \
+  $(INSN_ATTR_H) $(FLAGS_H) reload.h $(FUNCTION_H) \
+  $(EXPR_H) $(OPTABS_H) toplev.h $(RECOG_H) $(CGRAPH_H) \
+  $(GGC_H) except.h $(C_PRAGMA_H) $(INTEGRATE_H) $(TM_P_H) \
+  $(TARGET_H) $(TARGET_DEF_H) debug.h langhooks.h $(DF_H) \
+  intl.h libfuncs.h $(PARAMS_H)
+
 arm-c.o: $(srcdir)/config/arm/arm-c.c $(CONFIG_H) $(SYSTEM_H) \
     coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-- 
1.6.2.4


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Tune hoisting for ARM
  2010-07-17 16:52   ` Tune hoisting for ARM Maxim Kuvyrkov
@ 2010-07-19  8:09     ` Richard Earnshaw
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Earnshaw @ 2010-07-19  8:09 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches

On Sat, 2010-07-17 at 20:52 +0400, Maxim Kuvyrkov wrote:
> The attached patch tunes code hoisting parameter gcse-unrestricted-cost 
> for ARM.  It is provides a small (~0.3%), but measurable size reduction 
> to use a bit lesser cost cutoff for ARM PIC.
> 
> Tuning was done on SPEC2K benchmarks.
> 
> OK to check in?

This is OK

R.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-17 16:41                                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-19 18:08                                         ` Jeff Law
  2010-07-19 18:39                                           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-07-19 18:08 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

On 07/17/10 10:41, Maxim Kuvyrkov wrote:
> On 7/16/10 10:37 PM, Jeff Law wrote:
>> On 07/15/10 13:22, Maxim Kuvyrkov wrote:
>>> On 7/15/10 8:06 PM, Jeff Law wrote:
>>> ...
>>>> ? See the e->flags & EDGE_ABNORMAL test prior to removing elements of
>>>> trapping_expr from antloc or transp.
>>>
>>> I missed that point that we should avoid emitting anything at the ends
>>> of basic blocks with abnormal edges.
>>>
>>> Is the attached patch OK?
>> Yes as long as it passes the usual bootstrap & regression test.
>
> Well, after the regtest I now know the exact purpose of transpout.  It 
> is to avoid moving memory references across calls that can clobber 
> memory.  Without the checks done in compute_transpout a memory 
> reference can be moved along an abnormal edge and, when that happens, 
> it gets placed /before/ the call.
Placement before the call is intentional.  The CALL_INSN (if it has 
abnormal edges) has to be considered to change control flow.  If the 
hoisted insn is placed after the CALL_INSN, the hoisted insn may not be 
executed if the CALL_INSN throws (for example).  
insert_insn_end_basic_block contains the logic to determine where in a 
block to place the new insn.

I *think* PRE avoids this by only inserting in a block where it knows 
the inserted insn is safe to place anywhere within the block.  In the 
case of a MEM, we must have already determined that the block (and thus 
any CALL_INSN within the block)  doesn't clobber the MEM.  Hoisting is a 
little different in that it only verifies that it should be safe to 
place the MEM at the end of block.

Just so I'm certain I understand the issue.  Is the problem that the 
CALL_INSN is clobbering the value in the MEM, which normally wouldn't be 
a problem except that we can (in certain cases) place the MEM before the 
CALL_INSN.  Right?  One could argue this is a failing of 
hoist_expr_reaches_here_p which fails to take the corner cases of 
inserting at the end of a block into account.

Looking at insert_insn_end_basic_block we have another class of 
problems.  Say the target block ends with a set & jump insn.  The 
destination of the set might clobber a value needed by an insn hoisted 
to the end of the block.  What a mess we've stumbled into.

While logically it makes more sense to fix these issues in 
hoist_expr_reaches_here_p, I'm not sure I want to force you to open that 
can-o-worms right now.

> Similarly, we want to avoid hoisting potentially trapping expressions 
> along abnormal edges as that would allow the trap to happen prematurely.
Agreed.


>
> Jeff, I'm sorry this is taking so long; would you please review this 
> incarnation of the patch.
No worries.  Sometimes what looks like a simple change turns into a 
major hairball.  It's just part of the development process.
>
> Bootstrapped and tested on i686-linux (biarch) and arm-linux.
Looks pretty good.
> +      if (!pre_p && MEM_P (e->expr))
> +        /* Note memory references that can be clobbered by a call.
> +           We do not split abnormal edges in HOIST, so would
> +           a memory reference get hoisted along an abnormal edge,
> +           it would be placed /before/ the call.  Therefore, only
> +           constant memory references can be hoisted along abnormal
> +           edges.  */
> +        {
> +          if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF
> + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0)))
> +        continue;

This is the only part I'm struggling with.  It looks like you're trying 
to avoid setting prune_exprs if the MEM's address has certain properties 
(such as a reference to static memory).   However, if my analysis of the 
problem is correct, that's not sufficient to solve the problem.  Plus 
there's probably better ways to detect static memory references.

jeff


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-19 18:08                                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-07-19 18:39                                           ` Maxim Kuvyrkov
  2010-07-23  8:36                                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-07-27 18:41                                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  0 siblings, 2 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-19 18:39 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 7/19/10 10:08 PM, Jeff Law wrote:
> On 07/17/10 10:41, Maxim Kuvyrkov wrote:
>> On 7/16/10 10:37 PM, Jeff Law wrote:
>>> On 07/15/10 13:22, Maxim Kuvyrkov wrote:
>>>> On 7/15/10 8:06 PM, Jeff Law wrote:
>>>> ...
>>>>> ? See the e->flags & EDGE_ABNORMAL test prior to removing elements of
>>>>> trapping_expr from antloc or transp.
>>>>
>>>> I missed that point that we should avoid emitting anything at the ends
>>>> of basic blocks with abnormal edges.
>>>>
>>>> Is the attached patch OK?
>>> Yes as long as it passes the usual bootstrap & regression test.
>>
>> Well, after the regtest I now know the exact purpose of transpout. It
>> is to avoid moving memory references across calls that can clobber
>> memory. Without the checks done in compute_transpout a memory
>> reference can be moved along an abnormal edge and, when that happens,
>> it gets placed /before/ the call.
> Placement before the call is intentional.

Certainly.

> The CALL_INSN (if it has
> abnormal edges) has to be considered to change control flow. If the
> hoisted insn is placed after the CALL_INSN, the hoisted insn may not be
> executed if the CALL_INSN throws (for example).
> insert_insn_end_basic_block contains the logic to determine where in a
> block to place the new insn.
>
> I *think* PRE avoids this by only inserting in a block where it knows
> the inserted insn is safe to place anywhere within the block. In the
> case of a MEM, we must have already determined that the block (and thus
> any CALL_INSN within the block) doesn't clobber the MEM. Hoisting is a
> little different in that it only verifies that it should be safe to
> place the MEM at the end of block.

Agreed.

>
> Just so I'm certain I understand the issue. Is the problem that the
> CALL_INSN is clobbering the value in the MEM, which normally wouldn't be
> a problem except that we can (in certain cases) place the MEM before the
> CALL_INSN. Right?

Correct.

> One could argue this is a failing of
> hoist_expr_reaches_here_p which fails to take the corner cases of
> inserting at the end of a block into account.

While that's technically correct, I see hoist_expr_reaches_here_p as 
designed to perform it's job using antloc, transp, and other data sets, 
i.e., without examining the instruction stream.  It seems simpler to 
handle corner cases like this one by cleaning up the data sets earlier 
in the pass.

>
> Looking at insert_insn_end_basic_block we have another class of
> problems. Say the target block ends with a set & jump insn. The
> destination of the set might clobber a value needed by an insn hoisted
> to the end of the block. What a mess we've stumbled into.

In theory, "yes" it may happen that a set and/or jump insns modify 
memory or have other interesting side effects.  In practice, we don't 
seem to have this problem with any of the ports.

>> + if (!pre_p && MEM_P (e->expr))
>> + /* Note memory references that can be clobbered by a call.
>> + We do not split abnormal edges in HOIST, so would
>> + a memory reference get hoisted along an abnormal edge,
>> + it would be placed /before/ the call. Therefore, only
>> + constant memory references can be hoisted along abnormal
>> + edges. */
>> + {
>> + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF
>> + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0)))
>> + continue;
>
> This is the only part I'm struggling with. It looks like you're trying
> to avoid setting prune_exprs if the MEM's address has certain properties
> (such as a reference to static memory).  However, if my analysis of the
> problem is correct, that's not sufficient to solve the problem.

This code comes straight from compute_transpout.

Why do you think these checks aren't sufficient?  A memory reference to 
a constant pool is, well, a constant.

It may be that the second check

	      if (MEM_READONLY_P (e->expr)
		  && !MEM_VOLATILE_P (e->expr)
		  && MEM_NOTRAP_P (e->expr))
		/* Constant memory reference, e.g., a PIC address.  */
		continue;

implies CONSTANT_POOL_ADDRESS_P, but I prefer to err on the side of 
caution have both checks in place.

Thanks,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-19 18:39                                           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-23  8:36                                             ` Maxim Kuvyrkov
  2010-07-27 18:41                                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
  1 sibling, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-23  8:36 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 7/19/10 10:39 PM, Maxim Kuvyrkov wrote:
> On 7/19/10 10:08 PM, Jeff Law wrote:

Ping.

>>> + if (!pre_p && MEM_P (e->expr))
>>> + /* Note memory references that can be clobbered by a call.
>>> + We do not split abnormal edges in HOIST, so would
>>> + a memory reference get hoisted along an abnormal edge,
>>> + it would be placed /before/ the call. Therefore, only
>>> + constant memory references can be hoisted along abnormal
>>> + edges. */
>>> + {
>>> + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF
>>> + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0)))
>>> + continue;
>>
>> This is the only part I'm struggling with. It looks like you're trying
>> to avoid setting prune_exprs if the MEM's address has certain properties
>> (such as a reference to static memory). However, if my analysis of the
>> problem is correct, that's not sufficient to solve the problem.
>
> This code comes straight from compute_transpout.
>
> Why do you think these checks aren't sufficient? A memory reference to a
> constant pool is, well, a constant.
>
> It may be that the second check
>
> if (MEM_READONLY_P (e->expr)
> && !MEM_VOLATILE_P (e->expr)
> && MEM_NOTRAP_P (e->expr))
> /* Constant memory reference, e.g., a PIC address. */
> continue;
>
> implies CONSTANT_POOL_ADDRESS_P, but I prefer to err on the side of
> caution have both checks in place.


-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-19 18:39                                           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  2010-07-23  8:36                                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
@ 2010-07-27 18:41                                             ` Jeff Law
  2010-07-27 19:02                                               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
  1 sibling, 1 reply; 94+ messages in thread
From: Jeff Law @ 2010-07-27 18:41 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Steven Bosscher, gcc-patches

  On 07/19/10 12:39, Maxim Kuvyrkov wrote:
>
>
>>
>> Just so I'm certain I understand the issue. Is the problem that the
>> CALL_INSN is clobbering the value in the MEM, which normally wouldn't be
>> a problem except that we can (in certain cases) place the MEM before the
>> CALL_INSN. Right?
>
> Correct.
>
>> One could argue this is a failing of
>> hoist_expr_reaches_here_p which fails to take the corner cases of
>> inserting at the end of a block into account.
>
> While that's technically correct, I see hoist_expr_reaches_here_p as 
> designed to perform it's job using antloc, transp, and other data 
> sets, i.e., without examining the instruction stream.  It seems 
> simpler to handle corner cases like this one by cleaning up the data 
> sets earlier in the pass.
But you don't know the target block for the insertion when you're 
pruning the data sets.  Thus you don't know if the target block has any 
of the odd situations which require insertion earlier in the block.    
hoist_expr_reaches_here_p is supposed to determine if an expression, if 
inserted at the "end" of a specific block reaches a later point in the 
cfg unchanged.  Since we know the target block of the insertion we can 
check for the various odd conditions that require insertion before the 
actual end of the target block.

>
>>
>> Looking at insert_insn_end_basic_block we have another class of
>> problems. Say the target block ends with a set & jump insn. The
>> destination of the set might clobber a value needed by an insn hoisted
>> to the end of the block. What a mess we've stumbled into.
>
> In theory, "yes" it may happen that a set and/or jump insns modify 
> memory or have other interesting side effects.  In practice, we don't 
> seem to have this problem with any of the ports.
The fact that we haven't stumbled across this bug doesn't mean the 
problem does not exist.  I think we've largely been lucky because we 
don't run code hoisting unless optimizing for size and because the 
combiner is the pass most likely to create these insns and the combiner 
runs after gcse.

I'm not saying you have to address this problem right now (though a 
comment discussing the issue would be greatly appreciated).
>
>>> + if (!pre_p && MEM_P (e->expr))
>>> + /* Note memory references that can be clobbered by a call.
>>> + We do not split abnormal edges in HOIST, so would
>>> + a memory reference get hoisted along an abnormal edge,
>>> + it would be placed /before/ the call. Therefore, only
>>> + constant memory references can be hoisted along abnormal
>>> + edges. */
>>> + {
>>> + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF
>>> + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0)))
>>> + continue;
>>
>> This is the only part I'm struggling with. It looks like you're trying
>> to avoid setting prune_exprs if the MEM's address has certain properties
>> (such as a reference to static memory).  However, if my analysis of the
>> problem is correct, that's not sufficient to solve the problem.
>
> This code comes straight from compute_transpout.
I mis-understood what the code was checking for -- it's dealing with 
MEMs which are known to be constant and thus can't be clobbered by the 
call, which clearly isn't a problem.  My bad.


I think at this point we should go forward with your patch and if you 
could add a pair of comments to expr_reaches_here_p and the pruning code 
which touch on the problems with set-and-jump insns as a separate patch, 
that would be greatly appreciated.

Thanks for your patience,
Jeff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch
  2010-07-27 18:41                                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
@ 2010-07-27 19:02                                               ` Maxim Kuvyrkov
  0 siblings, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-27 19:02 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steven Bosscher, gcc-patches

On 7/27/10 10:07 PM, Jeff Law wrote:
> On 07/19/10 12:39, Maxim Kuvyrkov wrote:

>> While that's technically correct, I see hoist_expr_reaches_here_p as
>> designed to perform it's job using antloc, transp, and other data
>> sets, i.e., without examining the instruction stream. It seems simpler
>> to handle corner cases like this one by cleaning up the data sets
>> earlier in the pass.
> But you don't know the target block for the insertion when you're
> pruning the data sets. Thus you don't know if the target block has any
> of the odd situations which require insertion earlier in the block.

I think the fact that there is an "odd" instruction on the path from 
expression's initial location to the target makes it impossible to hoist 
it to the target block.  Even if the target block is "normal" and 
instructions can be added immediately at its end the "odd" instruction 
will clobber the expression.

> hoist_expr_reaches_here_p is supposed to determine if an expression, if
> inserted at the "end" of a specific block reaches a later point in the
> cfg unchanged. Since we know the target block of the insertion we can
> check for the various odd conditions that require insertion before the
> actual end of the target block.

In general, I agree.  I just cannot readily come up with a case that 
would be under-optimized by pruning data sets compared to 
hoist_expr_reaches_here_p.  Anyway, that's, probably, moot at this point.

> I think at this point we should go forward with your patch and if you
> could add a pair of comments to expr_reaches_here_p and the pruning code
> which touch on the problems with set-and-jump insns as a separate patch,
> that would be greatly appreciated.

Thanks for review!

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Improvements to code hoisting
  2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
                   ` (10 preceding siblings ...)
  2010-07-01 11:05 ` 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach Maxim Kuvyrkov
@ 2010-07-27 21:21 ` Maxim Kuvyrkov
  11 siblings, 0 replies; 94+ messages in thread
From: Maxim Kuvyrkov @ 2010-07-27 21:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jeff Law, Richard Earnshaw

On 6/16/10 7:47 PM, Maxim Kuvyrkov wrote:
> The following series of patches improves code hoisting and PRE RTL-level
> optimizations. The two threads of the patches correspond to
> target-independent changes to gcse.c and to changes to ARM backend to
> make it emit RTL that is better suited for optimizers.
>
> Motivating examples for this work are ARM PRs
> http://gcc.gnu.org/PR42495
> http://gcc.gnu.org/PR42574
> With the patches applied GCC produces perfect code for these examples.

Jeff,
Richard,

The whole patch series is now committed.  Thank you for reviewing them.

I've also checked in a follow up patch adding testcases from PRs 40956, 
42495 and 42574 that these patches fixed.

Regards,

-- 
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2010-07-27 21:10 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-16 15:57 Improvements to code hoisting Maxim Kuvyrkov
2010-06-16 15:58 ` 0001-Add-hoist_insn-debug-counter.patch Maxim Kuvyrkov
2010-06-16 16:34   ` 0001-Add-hoist_insn-debug-counter.patch Jeff Law
2010-06-16 15:59 ` 0002-Allow-constant-MEMs-through-calls.patch Maxim Kuvyrkov
2010-06-16 16:52   ` 0002-Allow-constant-MEMs-through-calls.patch Jeff Law
2010-06-16 16:03 ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
2010-06-16 17:19   ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
2010-06-16 17:23     ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
2010-06-16 17:32       ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
2010-06-16 17:50         ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
2010-06-16 19:10           ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
2010-06-16 19:25             ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
2010-06-16 19:31               ` 0003-Improve-VBEout-computation.patch Paolo Bonzini
2010-06-21 18:46                 ` 0003-Improve-VBEout-computation.patch Jeff Law
2010-06-21 18:58       ` 0003-Improve-VBEout-computation.patch Jeff Law
2010-06-21 19:00   ` 0003-Improve-VBEout-computation.patch Jeff Law
2010-06-22 12:30     ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
2010-06-23 19:25       ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
2010-06-29 19:08         ` 0003-Improve-VBEout-computation.patch Maxim Kuvyrkov
2010-06-30 17:14           ` 0003-Improve-VBEout-computation.patch Jeff Law
2010-06-16 16:20 ` 0004-Set-pseudos-only-once.patch Maxim Kuvyrkov
2010-06-21 18:22   ` 0004-Set-pseudos-only-once.patch Jeff Law
2010-06-22 12:34     ` 0004-Set-pseudos-only-once.patch Maxim Kuvyrkov
2010-06-23 22:01       ` 0004-Set-pseudos-only-once.patch Jeff Law
2010-06-16 16:20 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-16 16:43   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-21 19:45     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-06-21 20:27       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-21 21:35         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-06-21 21:50           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-21 22:21             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-06-21 22:26               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-22 15:17                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Paolo Bonzini
2010-06-22 12:42           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-23 19:50             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-23 20:06               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Paolo Bonzini
2010-06-23 20:30                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-23 21:23                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-23 21:30                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-24 17:11               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-29 19:12                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-30  1:43                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-30  9:39                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-30 12:14                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-30 16:41                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-06-30 16:42                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Steven Bosscher
2010-06-30 16:48                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-06-30 16:33                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-06-30 18:46                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-06-30 20:53                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-01 16:54                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-07-02 16:08                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-07 16:56                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-07-09 20:18                           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-14 20:58                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-07-14 21:42                               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-15 16:06                                 ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-07-15 19:22                                   ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-16 18:37                                     ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-07-17 16:41                                       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-19 18:08                                         ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-07-19 18:39                                           ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-23  8:36                                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-07-27 18:41                                             ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Jeff Law
2010-07-27 19:02                                               ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-29 19:22       ` 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch Maxim Kuvyrkov
2010-06-16 16:23 ` 0006-GCSE-complex-constants.patch Maxim Kuvyrkov
2010-06-16 17:18   ` 0006-GCSE-complex-constants.patch Jeff Law
2010-06-23 20:39     ` 0006-GCSE-complex-constants.patch Maxim Kuvyrkov
     [not found]       ` <4C2BBEB5.4080209@codesourcery.com>
2010-07-01 17:01         ` 0006-GCSE-complex-constants.patch Jeff Law
2010-06-16 16:25 ` 0007-Add-open-ended-comments.patch Maxim Kuvyrkov
2010-06-16 17:46   ` 0007-Add-open-ended-comments.patch Jeff Law
2010-06-23 20:45     ` 0007-Add-open-ended-comments.patch Maxim Kuvyrkov
2010-06-16 16:54 ` Improvements to code hoisting Richard Guenther
2010-07-01  9:00   ` Maxim Kuvyrkov
2010-06-23 20:42 ` Update compute_transpout Maxim Kuvyrkov
2010-06-23 21:57   ` Jeff Law
2010-06-23 21:20 ` ARM improvements for GCSE Maxim Kuvyrkov
2010-06-23 21:22   ` Maxim Kuvyrkov
2010-06-24 11:24     ` Richard Earnshaw
2010-06-23 21:30   ` Fix thumb1 size cost of small constants Maxim Kuvyrkov
2010-06-24 11:28     ` Richard Earnshaw
2010-06-23 21:35   ` Wrap calculation of PIC address into a single instruction Maxim Kuvyrkov
2010-06-23 21:38     ` Andrew Pinski
2010-06-23 21:41     ` Steven Bosscher
2010-06-23 22:23       ` Maxim Kuvyrkov
2010-06-24 11:56         ` Maxim Kuvyrkov
2010-06-29 19:18           ` Maxim Kuvyrkov
2010-07-01 12:40     ` Richard Earnshaw
2010-07-17 16:52   ` Tune hoisting for ARM Maxim Kuvyrkov
2010-07-19  8:09     ` Richard Earnshaw
2010-07-01 11:05 ` 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach Maxim Kuvyrkov
2010-07-01 14:26   ` 0008-Don-t-kill-generated-expressions-in-hoist_expr_reach Jeff Law
2010-07-27 21:21 ` Improvements to code hoisting Maxim Kuvyrkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).