public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
@ 2015-09-28 10:19 Bin Cheng
  2015-09-28 12:19 ` Bernd Schmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Bin Cheng @ 2015-09-28 10:19 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4002 bytes --]

Hi,
For below rtl dump before loop invariant pass:

LOOP:
 1482: r1838:DI=0xffffffffffffa880
 1483: r1837:DI=sfp:DI+r1838:DI
      REG_DEAD r1838:DI
      REG_EQUAL sfp:DI-0x5780
 1484: r1839:V4SI=r910:V4SI>>const_vector
 1485: r1840:V4SI=r1067:V4SI+r910:V4SI
      REG_DEAD r910:V4SI
 1486: r1841:V4SI=r1840:V4SI>>const_vector
      REG_DEAD r1840:V4SI
 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI))
      REG_DEAD r1841:V4SI
      REG_DEAD r1839:V4SI
 1488: [r870:DI+r1837:DI]=r1842:V8HI
 ...

While the dump for loop invariant pass is as below:

;;Set in insn 1471 is invariant (0), cost 4, depends on 
;;Set in insn 1482 is invariant (1), cost 4, depends on 
;;Set in insn 1483 is invariant (2), cost 4, depends on 1
;;Decided to move invariant 0 -- gain 4
;;Decided to move invariant 1 -- gain 4
 
 2034: r2163:DI=0xffffffffffffa880
LOOP:
 1483: r1837:DI=sfp:DI+r2163:DI
      REG_DEAD r1838:DI
      REG_EQUAL sfp:DI-0x5780
 1484: r1839:V4SI=r910:V4SI>>const_vector
 1485: r1840:V4SI=r1067:V4SI+r910:V4SI
      REG_DEAD r910:V4SI
 1486: r1841:V4SI=r1840:V4SI>>const_vector
      REG_DEAD r1840:V4SI
 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI))
      REG_DEAD r1841:V4SI
      REG_DEAD r1839:V4SI
 1488: [r870:DI+r1837:DI]=r1842:V8HI
 ...

Note instructions 1482 and 1483 both compute loop invariant values, but only
1482 is hoisted out of loop. Since computation in 1483 uses sfp, the final
assembly code is even worse because we need another one or two instructions
to compute sfp, depending on the immediate constant value.

The direct reason that r1837 isn't hoisted is its cost is computed as 0,
rather than 4.  I believe this is a known issue and we have tried more than
once by tuning the famous magic number "3" in loop-invariant.c.  After
investigation, I believe the problem lies in cost computation, rather than
the magic number itself.  Maybe that's one reason those experiments didn't
end with good results.

The below check conditions count invariant expr's cost only if the expr is
used outside of address expression, or the address expression is too
expensive. There is an implicit assumption in it: If the invariant
expression is not referred outside of address expression, it can be forward
propagated into address expressions. But this assumption is not always true,
especially on target with limited addressing modes.

  if (!inv->cheap_address
      || inv->def->n_uses = 0
      || inv->def->n_addr_uses < inv->def->n_uses)
    (*comp_cost) += inv->cost * inv->eqno;

Look at the example, r1837 computed in insn1483 is used in address
expression in insn1488, but it can't be forward propagated into it because
"r1870 + sfp + 0xffffa880" isn't a valid address expression on aarch64.
Which means r1837 has to be computed as an independent instruction and the
cost should be counted.

IMHO, we need to track if loop invariant expression can/cant be propagated
into address expressions and use that information to compute the cost, as
below:

  if (!inv->cheap_address
      || inv->def->n_uses = 0
      || inv->def->n_addr_uses < inv->def->n_uses
      || inv->def->cant_prop_to_addr_use)
    (*comp_cost) += inv->cost * inv->eqno;

Though this patch can be improved by analyze address expression propagation
more precisely, experiments shows spec2k/fp is already improved on aarch64.
I will collect data for spec2k6 later but would like to start discussing
before my holiday.  I also collected spec2k6 on x86_64, no regression.

Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So any
comments?

Thanks,
bin

2015-09-28  Bin Cheng  <bin.cheng@arm.com>

	* loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
	(inv_cant_fwprop_to_addr_use): New function.
	(record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
	(get_inv_cost): Count cost if inv can't be propagated into its
	address uses.

[-- Attachment #2: loop-invariant-addr-cost-20150928.txt --]
[-- Type: text/plain, Size: 2308 bytes --]

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..3c2395c 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -99,6 +99,8 @@ struct def
   unsigned n_uses;		/* Number of such uses.  */
   unsigned n_addr_uses;		/* Number of uses in addresses.  */
   unsigned invno;		/* The corresponding invariant.  */
+  bool cant_prop_to_addr_uses;	/* True if the corresponding inv can't be
+				   propagated into its address uses.  */
 };
 
 /* The data stored for each invariant.  */
@@ -762,6 +764,34 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
   return inv;
 }
 
+/* Given invariant DEF and its address USE, check if the corresponding
+   invariant expr can be propagated into the use or not.  */
+
+static bool
+inv_cant_prop_to_addr_use (struct def *def, df_ref use)
+{
+  struct invariant *inv;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx_insn *use_insn = DF_REF_INSN (use);
+  rtx_insn *def_insn;
+  bool ok;
+
+  inv = invariants[def->invno];
+  /* No need to check if address expression is expensive.  */
+  if (!inv->cheap_address)
+    return true;
+
+  def_insn = inv->insn;
+  def_set = single_set (def_insn);
+  if (!def_set)
+    return true;
+
+  validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
+  ok = verify_changes (0);
+  cancel_changes (0);
+  return !ok;
+}
+
 /* Record USE at DEF.  */
 
 static void
@@ -777,7 +807,11 @@ record_use (struct def *def, df_ref use)
   def->uses = u;
   def->n_uses++;
   if (u->addr_use_p)
-    def->n_addr_uses++;
+    {
+      def->n_addr_uses++;
+      if (!def->cant_prop_to_addr_uses && inv_cant_prop_to_addr_use (def, use))
+	def->cant_prop_to_addr_uses = true;
+    }
 }
 
 /* Finds the invariants USE depends on and store them to the DEPENDS_ON
@@ -1158,7 +1192,9 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed,
 
   if (!inv->cheap_address
       || inv->def->n_uses == 0
-      || inv->def->n_addr_uses < inv->def->n_uses)
+      || inv->def->n_addr_uses < inv->def->n_uses
+      /* Count cost if the inv can't be propagated into address uses.  */
+      || inv->def->cant_prop_to_addr_uses)
     (*comp_cost) += inv->cost * inv->eqno;
 
 #ifdef STACK_REGS

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
  2015-09-28 10:19 [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses Bin Cheng
@ 2015-09-28 12:19 ` Bernd Schmidt
  2015-09-28 18:32   ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Bernd Schmidt @ 2015-09-28 12:19 UTC (permalink / raw)
  To: Bin Cheng, gcc-patches

On 09/28/2015 11:43 AM, Bin Cheng wrote:
> Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So any
> comments?
>
> Thanks,
> bin
>
> 2015-09-28  Bin Cheng  <bin.cheng@arm.com>
>
> 	* loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
> 	(inv_cant_fwprop_to_addr_use): New function.
> 	(record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
> 	(get_inv_cost): Count cost if inv can't be propagated into its
> 	address uses.

It looks at least plausible. Another option which I think has had some 
discussion recently would be to just move everything, and leave it to 
cprop to put things back together if the costs allow it.


Bernd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
  2015-09-28 12:19 ` Bernd Schmidt
@ 2015-09-28 18:32   ` Jeff Law
  2015-09-30  6:11     ` Bin.Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2015-09-28 18:32 UTC (permalink / raw)
  To: Bernd Schmidt, Bin Cheng, gcc-patches

On 09/28/2015 05:28 AM, Bernd Schmidt wrote:
> On 09/28/2015 11:43 AM, Bin Cheng wrote:
>> Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So
>> any
>> comments?
>>
>> Thanks,
>> bin
>>
>> 2015-09-28  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
>>     (inv_cant_fwprop_to_addr_use): New function.
>>     (record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
>>     (get_inv_cost): Count cost if inv can't be propagated into its
>>     address uses.
>
> It looks at least plausible.
Definitely plausible.  Many targets have restrictions on the immediate 
offsets, so this potentially affects many targets (in a good way).



  Another option which I think has had some
> discussion recently would be to just move everything, and leave it to
> cprop to put things back together if the costs allow it.
I go back and forth on this kind of approach.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
  2015-09-28 18:32   ` Jeff Law
@ 2015-09-30  6:11     ` Bin.Cheng
  2015-10-09 12:00       ` Bin.Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Bin.Cheng @ 2015-09-30  6:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, Bin Cheng, gcc-patches List

On Tue, Sep 29, 2015 at 1:21 AM, Jeff Law <law@redhat.com> wrote:
> On 09/28/2015 05:28 AM, Bernd Schmidt wrote:
>>
>> On 09/28/2015 11:43 AM, Bin Cheng wrote:
>>>
>>> Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So
>>> any
>>> comments?
>>>
>>> Thanks,
>>> bin
>>>
>>> 2015-09-28  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>     * loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
>>>     (inv_cant_fwprop_to_addr_use): New function.
>>>     (record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
>>>     (get_inv_cost): Count cost if inv can't be propagated into its
>>>     address uses.
>>
>>
>> It looks at least plausible.
>
> Definitely plausible.  Many targets have restrictions on the immediate
> offsets, so this potentially affects many targets (in a good way).
Here is some more information.
For spec2k6 on x86, no regression or obvious improvement.
For spec2k6 on aarch64, several cases are improved.
Given the results and your positive feedback, I will continue to work
on this patch when I get back from holiday.

>
>
>  Another option which I think has had some
>>
>> discussion recently would be to just move everything, and leave it to
>> cprop to put things back together if the costs allow it.
>
> I go back and forth on this kind of approach.
Hmm, either way we need to model rtx/register pressure costs, in loop
invariant or cprop.

Thanks,
bin

>
> jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
  2015-09-30  6:11     ` Bin.Cheng
@ 2015-10-09 12:00       ` Bin.Cheng
  2015-10-09 12:04         ` Bernd Schmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Bin.Cheng @ 2015-10-09 12:00 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, Bin Cheng, gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

On Wed, Sep 30, 2015 at 11:33 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Sep 29, 2015 at 1:21 AM, Jeff Law <law@redhat.com> wrote:
>> On 09/28/2015 05:28 AM, Bernd Schmidt wrote:
>>>
>>> On 09/28/2015 11:43 AM, Bin Cheng wrote:
>>>>
>>>> Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So
>>>> any
>>>> comments?
>>>>
>>>> Thanks,
>>>> bin
>>>>
>>>> 2015-09-28  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>     * loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
>>>>     (inv_cant_fwprop_to_addr_use): New function.
>>>>     (record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
>>>>     (get_inv_cost): Count cost if inv can't be propagated into its
>>>>     address uses.
>>>
>>>
>>> It looks at least plausible.
>>
>> Definitely plausible.  Many targets have restrictions on the immediate
>> offsets, so this potentially affects many targets (in a good way).
> Here is some more information.
> For spec2k6 on x86, no regression or obvious improvement.
> For spec2k6 on aarch64, several cases are improved.
> Given the results and your positive feedback, I will continue to work
> on this patch when I get back from holiday.
I further bootstrap and test attached patch on aarch64.  Also three
cases in spec2k6/fp are improved by 3~6%, two cases in spec2k6/fp are
regressed by ~2%.  Overall score is improved by ~0.8% for spec2k6/fp
on aarch64 of my run.  I may later analyze the regression.

So is this patch OK?

Thanks,
bin

2015-10-09  Bin Cheng  <bin.cheng@arm.com>

    * loop-invariant.c (struct def): New field cant_prop_to_addr_uses.
    (inv_cant_prop_to_addr_use): New function.
    (record_use): Call inv_cant_prop_to_addr_use, set the new field.
    (get_inv_cost): Count cost if inv can't be propagated into its
    address uses.

>
>>
>>
>>  Another option which I think has had some
>>>
>>> discussion recently would be to just move everything, and leave it to
>>> cprop to put things back together if the costs allow it.
>>
>> I go back and forth on this kind of approach.
> Hmm, either way we need to model rtx/register pressure costs, in loop
> invariant or cprop.
>
> Thanks,
> bin
>
>>
>> jeff

[-- Attachment #2: loop-invariant-addr-cost-20150928.txt --]
[-- Type: text/plain, Size: 2236 bytes --]

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..3c2395c 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -99,6 +99,8 @@ struct def
   unsigned n_uses;		/* Number of such uses.  */
   unsigned n_addr_uses;		/* Number of uses in addresses.  */
   unsigned invno;		/* The corresponding invariant.  */
+  bool cant_prop_to_addr_uses;	/* True if the corresponding inv can't be
+				   propagated into its address uses.  */
 };
 
 /* The data stored for each invariant.  */
@@ -762,6 +764,34 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
   return inv;
 }
 
+/* Given invariant DEF and its address USE, check if the corresponding
+   invariant expr can be propagated into the use or not.  */
+
+static bool
+inv_cant_prop_to_addr_use (struct def *def, df_ref use)
+{
+  struct invariant *inv;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx_insn *use_insn = DF_REF_INSN (use);
+  rtx_insn *def_insn;
+  bool ok;
+
+  inv = invariants[def->invno];
+  /* No need to check if address expression is expensive.  */
+  if (!inv->cheap_address)
+    return true;
+
+  def_insn = inv->insn;
+  def_set = single_set (def_insn);
+  if (!def_set)
+    return true;
+
+  validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
+  ok = verify_changes (0);
+  cancel_changes (0);
+  return !ok;
+}
+
 /* Record USE at DEF.  */
 
 static void
@@ -777,7 +807,11 @@ record_use (struct def *def, df_ref use)
   def->uses = u;
   def->n_uses++;
   if (u->addr_use_p)
-    def->n_addr_uses++;
+    {
+      def->n_addr_uses++;
+      if (!def->cant_prop_to_addr_uses && inv_cant_prop_to_addr_use (def, use))
+	def->cant_prop_to_addr_uses = true;
+    }
 }
 
 /* Finds the invariants USE depends on and store them to the DEPENDS_ON
@@ -1158,7 +1192,9 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed,
 
   if (!inv->cheap_address
       || inv->def->n_uses == 0
-      || inv->def->n_addr_uses < inv->def->n_uses)
+      || inv->def->n_addr_uses < inv->def->n_uses
+      /* Count cost if the inv can't be propagated into address uses.  */
+      || inv->def->cant_prop_to_addr_uses)
     (*comp_cost) += inv->cost * inv->eqno;
 
 #ifdef STACK_REGS

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
  2015-10-09 12:00       ` Bin.Cheng
@ 2015-10-09 12:04         ` Bernd Schmidt
  2015-10-21  4:01           ` Bin.Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Bernd Schmidt @ 2015-10-09 12:04 UTC (permalink / raw)
  To: Bin.Cheng, Jeff Law; +Cc: Bin Cheng, gcc-patches List

On 10/09/2015 02:00 PM, Bin.Cheng wrote:
> I further bootstrap and test attached patch on aarch64.  Also three
> cases in spec2k6/fp are improved by 3~6%, two cases in spec2k6/fp are
> regressed by ~2%.  Overall score is improved by ~0.8% for spec2k6/fp
> on aarch64 of my run.  I may later analyze the regression.
>
> So is this patch OK?

I'll approve this with one change, but please keep an eye out for 
performance regressions on other targets.

>      * loop-invariant.c (struct def): New field cant_prop_to_addr_uses.
>      (inv_cant_prop_to_addr_use): New function.

I would like these to have switched truthvalues, i.e. 
can_prop_to_addr_uses, inv_can_prop_to_addr_use. Otherwise we end up 
with double negations like !def->cant_prop_to_addr_uses which can be 
slightly confusing.

You'll probably slightly need to tweak the initialization when 
n_addr_uses goes from zero to one.


Bernd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
  2015-10-09 12:04         ` Bernd Schmidt
@ 2015-10-21  4:01           ` Bin.Cheng
  2015-10-26  8:31             ` Bin.Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Bin.Cheng @ 2015-10-21  4:01 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, Bin Cheng, gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 1889 bytes --]

On Fri, Oct 9, 2015 at 8:04 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 10/09/2015 02:00 PM, Bin.Cheng wrote:
>>
>> I further bootstrap and test attached patch on aarch64.  Also three
>> cases in spec2k6/fp are improved by 3~6%, two cases in spec2k6/fp are
>> regressed by ~2%.  Overall score is improved by ~0.8% for spec2k6/fp
>> on aarch64 of my run.  I may later analyze the regression.
>>
>> So is this patch OK?
>
Hi Bernd,
Thanks for reviewing this patch.  I further collected perf data for
spec2k on AArch64.  Three fp cases are improved by 3-5%, no obvious
regression.  As for int cases, perlbmk is improved by 8%, but crafty
is regressed by 3.8%.  Together with spec2k6 data, I think this patch
is generally good.  I scanned hot functions in crafty but didn't find
obvious regression because lim hoist decision is very different
because of this change.  The regression could be caused by register
pressure..

>
> I'll approve this with one change, but please keep an eye out for
> performance regressions on other targets.
Sure.

>
>>      * loop-invariant.c (struct def): New field cant_prop_to_addr_uses.
>>      (inv_cant_prop_to_addr_use): New function.
>
>
> I would like these to have switched truthvalues, i.e. can_prop_to_addr_uses,
> inv_can_prop_to_addr_use. Otherwise we end up with double negations like
> !def->cant_prop_to_addr_uses which can be slightly confusing.
>
> You'll probably slightly need to tweak the initialization when n_addr_uses
> goes from zero to one.
Here is the new version patch with your comments incorporated.

Thanks,
bin

2015-10-19  Bin Cheng  <bin.cheng@arm.com>

    * loop-invariant.c (struct def): New field can_prop_to_addr_uses.
    (inv_can_prop_to_addr_use): New function.
    (record_use): Call can_prop_to_addr_uses, set the new field.
    (get_inv_cost): Count cost if inv can't be propagated into its
    address uses.

[-- Attachment #2: loop-invariant-addr-cost-20151019.txt --]
[-- Type: text/plain, Size: 2402 bytes --]

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..7ac38c6 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -99,6 +99,8 @@ struct def
   unsigned n_uses;		/* Number of such uses.  */
   unsigned n_addr_uses;		/* Number of uses in addresses.  */
   unsigned invno;		/* The corresponding invariant.  */
+  bool can_prop_to_addr_uses;	/* True if the corresponding inv can be
+				   propagated into its address uses.  */
 };
 
 /* The data stored for each invariant.  */
@@ -762,6 +764,34 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
   return inv;
 }
 
+/* Given invariant DEF and its address USE, check if the corresponding
+   invariant expr can be propagated into the use or not.  */
+
+static bool
+inv_can_prop_to_addr_use (struct def *def, df_ref use)
+{
+  struct invariant *inv;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx_insn *use_insn = DF_REF_INSN (use);
+  rtx_insn *def_insn;
+  bool ok;
+
+  inv = invariants[def->invno];
+  /* No need to check if address expression is expensive.  */
+  if (!inv->cheap_address)
+    return false;
+
+  def_insn = inv->insn;
+  def_set = single_set (def_insn);
+  if (!def_set)
+    return false;
+
+  validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
+  ok = verify_changes (0);
+  cancel_changes (0);
+  return ok;
+}
+
 /* Record USE at DEF.  */
 
 static void
@@ -777,7 +807,16 @@ record_use (struct def *def, df_ref use)
   def->uses = u;
   def->n_uses++;
   if (u->addr_use_p)
-    def->n_addr_uses++;
+    {
+      /* Initialize propagation information if this is the first addr
+	 use of the inv def.  */
+      if (def->n_addr_uses == 0)
+	def->can_prop_to_addr_uses = true;
+
+      def->n_addr_uses++;
+      if (def->can_prop_to_addr_uses && !inv_can_prop_to_addr_use (def, use))
+	def->can_prop_to_addr_uses = false;
+    }
 }
 
 /* Finds the invariants USE depends on and store them to the DEPENDS_ON
@@ -1158,7 +1197,9 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed,
 
   if (!inv->cheap_address
       || inv->def->n_uses == 0
-      || inv->def->n_addr_uses < inv->def->n_uses)
+      || inv->def->n_addr_uses < inv->def->n_uses
+      /* Count cost if the inv can't be propagated into address uses.  */
+      || !inv->def->can_prop_to_addr_uses)
     (*comp_cost) += inv->cost * inv->eqno;
 
 #ifdef STACK_REGS

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses
  2015-10-21  4:01           ` Bin.Cheng
@ 2015-10-26  8:31             ` Bin.Cheng
  0 siblings, 0 replies; 8+ messages in thread
From: Bin.Cheng @ 2015-10-26  8:31 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, Bin Cheng, gcc-patches List

On Wed, Oct 21, 2015 at 11:55 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Oct 9, 2015 at 8:04 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>> On 10/09/2015 02:00 PM, Bin.Cheng wrote:
>>>
>>> I further bootstrap and test attached patch on aarch64.  Also three
>>> cases in spec2k6/fp are improved by 3~6%, two cases in spec2k6/fp are
>>> regressed by ~2%.  Overall score is improved by ~0.8% for spec2k6/fp
>>> on aarch64 of my run.  I may later analyze the regression.
>>>
>>> So is this patch OK?
>>
> Hi Bernd,
> Thanks for reviewing this patch.  I further collected perf data for
> spec2k on AArch64.  Three fp cases are improved by 3-5%, no obvious
> regression.  As for int cases, perlbmk is improved by 8%, but crafty
> is regressed by 3.8%.  Together with spec2k6 data, I think this patch
> is generally good.  I scanned hot functions in crafty but didn't find
> obvious regression because lim hoist decision is very different
> because of this change.  The regression could be caused by register
> pressure..
>
>>
>> I'll approve this with one change, but please keep an eye out for
>> performance regressions on other targets.
> Sure.
>
>>
>>>      * loop-invariant.c (struct def): New field cant_prop_to_addr_uses.
>>>      (inv_cant_prop_to_addr_use): New function.
>>
>>
>> I would like these to have switched truthvalues, i.e. can_prop_to_addr_uses,
>> inv_can_prop_to_addr_use. Otherwise we end up with double negations like
>> !def->cant_prop_to_addr_uses which can be slightly confusing.
>>
>> You'll probably slightly need to tweak the initialization when n_addr_uses
>> goes from zero to one.
> Here is the new version patch with your comments incorporated.
>
Given the patch was pre-approved and there is no other comments, I
will apply it later.

Thanks,
bin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-10-26  5:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-28 10:19 [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses Bin Cheng
2015-09-28 12:19 ` Bernd Schmidt
2015-09-28 18:32   ` Jeff Law
2015-09-30  6:11     ` Bin.Cheng
2015-10-09 12:00       ` Bin.Cheng
2015-10-09 12:04         ` Bernd Schmidt
2015-10-21  4:01           ` Bin.Cheng
2015-10-26  8:31             ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).