[PATCH][compare-elim] Merge zero-comparisons with normal ops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][compare-elim] Merge zero-comparisons with normal ops
@ 2017-08-10 22:03 Michael Collison
  2017-08-28 18:43 ` Jeff Law
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Collison @ 2017-08-10 22:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Hi all,

One issue that we keep encountering on aarch64 is GCC not making good use of the flag-setting arithmetic instructions
like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the result against zero.
They are represented in a fairly standard way in the backend as PARALLEL patterns:
(parallel [(set (reg x1) (op (reg x2) (reg x3)))
           (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])

GCC isn't forming these from separate arithmetic and comparison instructions as aggressively as it could.
A particular pain point is when the result of the arithmetic insn is used before the comparison instruction.
The testcase in this patch is one such example where we have:
(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
        (plus:SI (reg:SI 0 x0 [ x ])
            (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
     (nil))
(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
        (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
            (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
     (nil))
(insn 34 33 17 2 (set (reg:CC 66 cc)
        (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
            (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
     (nil))

This scares combine away as x0 is used in insn 33 as well as the comparison in insn 34.
I think the compare-elim pass can help us here.

This patch extends it by handling comparisons against zero, finding the defining instruction of the compare
and merging the comparison with the defining instruction into a PARALLEL that will hopefully match the form
described above. If between the comparison and the defining insn we find an instruction that uses the condition
registers or any control flow we bail out, and we don't cross basic blocks.
This simple technique allows us to catch cases such as this example.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>

	* compare-elim.c: Include emit-rtl.h.
	(can_merge_compare_into_arith): New function.
	(try_validate_parallel): Likewise.
	(try_merge_compare): Likewise.
	(try_eliminate_compare): Call the above when no previous clobber
	is available.
	(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
	dataflow problems.

2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>
	    
	* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.

[-- Attachment #2: pr5198v1.patch --]
[-- Type: application/octet-stream, Size: 6340 bytes --]

diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 7e557a2..c65d155 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "insn-config.h"
 #include "recog.h"
+#include "emit-rtl.h"
 #include "cfgrtl.h"
 #include "tree-pass.h"
 #include "domwalk.h"
@@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn *start)
   return reg;
 }
 
+/* Return true if it is okay to merge the comparison CMP_INSN with
+   the instruction ARITH_INSN.  Both instructions are assumed to be in the
+   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
+   that there are no uses or defs of the condition flags or control flow
+   changes between the two instructions.  */
+
+static bool
+can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
+{
+  for (rtx_insn *insn = PREV_INSN (cmp_insn);
+       insn && insn != arith_insn;
+       insn = PREV_INSN (insn))
+    {
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      /* Bail if there are jumps or calls in between.  */
+      if (!NONJUMP_INSN_P (insn))
+	return false;
+
+      df_ref ref;
+      /* Find a USE of the flags register.  */
+      FOR_EACH_INSN_USE (ref, insn)
+	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+	  return false;
+
+      /* Find a DEF of the flags register.  */
+      FOR_EACH_INSN_DEF (ref, insn)
+	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+	  return false;
+    }
+  return true;
+}
+
+/* Given two SET expressions, SET_A and SET_B determine whether they form
+   a recognizable pattern when emitted in parallel.  Return that parallel
+   if so.  Otherwise return NULL.  This tries both permutations of SET_A
+   and SET_B within the PARALLEL.  */
+
+static rtx
+try_validate_parallel (rtx set_a, rtx set_b)
+{
+  rtx par
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
+  rtx_insn *seq;
+  start_sequence ();
+  seq = emit_insn (par);
+  end_sequence ();
+  if (recog_memoized (seq) > 0)
+    return par;
+
+  par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_b, set_a));
+  start_sequence ();
+  seq = emit_insn (par);
+  end_sequence ();
+  return recog_memoized (seq) > 0 ? par : NULL_RTX;
+}
+
+/* For a comparison instruction described by CMP check if it compares a
+   register with zero i.e. it is of the form CC := CMP R1, 0.
+   If it is, find the instruction defining R1 (say I1) and try to create a
+   PARALLEL consisting of I1 and the comparison, representing a flag-setting
+   arithmetic instruction.  Example:
+   I1: R1 := R2 + R3
+   <instructions that don't read the condition register>
+   I2: CC := CMP R1 0
+   I2 can be merged with I1 into:
+   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
+   This catches cases where R1 is used between I1 and I2 and therefore
+   combine and other RTL optimisations will not try to propagate it into
+   I2.  Return true if we succeeded in merging CMP.  */
+
+static bool
+try_merge_compare (struct comparison *cmp)
+{
+  rtx_insn *cmp_insn = cmp->insn;
+
+  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
+    return false;
+  rtx in_a = cmp->in_a;
+  if (!in_a)
+    return false;
+  df_ref use;
+
+  FOR_EACH_INSN_USE (use, cmp_insn)
+    if (DF_REF_REGNO (use) == REGNO (in_a))
+      break;
+  if (!use)
+    return false;
+
+  struct df_link *ref_chain;
+  ref_chain = DF_REF_CHAIN (use);
+  if (!ref_chain || !ref_chain->ref
+      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
+    return false;
+
+  rtx_insn *def_insn = DF_REF_INSN (ref_chain->ref);
+  /* We found the insn that defines in_a.  Only consider the cases where
+     it is in the same block as the comparison.  */
+  if (BLOCK_FOR_INSN (cmp_insn) != BLOCK_FOR_INSN (def_insn))
+    return false;
+
+  rtx set = single_set (def_insn);
+  if (!set)
+    return false;
+
+  if (!can_merge_compare_into_arith (cmp_insn, def_insn))
+    return false;
+
+  rtx src = SET_SRC (set);
+  rtx flags = maybe_select_cc_mode (cmp, src, CONST0_RTX (GET_MODE (src)));
+  if (!flags)
+    {
+    /* We may already have a change group going through maybe_select_cc_mode.
+       Discard it properly.  */
+      cancel_changes (0);
+      return false;
+    }
+
+  rtx flag_set
+    = gen_rtx_SET (flags, gen_rtx_COMPARE (GET_MODE (flags),
+					   copy_rtx (src),
+					   CONST0_RTX (GET_MODE (src))));
+  rtx arith_set = copy_rtx (PATTERN (def_insn));
+  rtx par = try_validate_parallel (flag_set, arith_set);
+  if (!par)
+    {
+      /* We may already have a change group going through maybe_select_cc_mode.
+	 Discard it properly.  */
+      cancel_changes (0);
+      return false;
+    }
+  if (!apply_change_group ())
+    return false;
+  emit_insn_after (par, def_insn);
+  delete_insn (def_insn);
+  delete_insn (cmp->insn);
+  return true;
+}
+
 /* Attempt to replace a comparison with a prior arithmetic insn that can
    compute the same flags value as the comparison itself.  Return true if
    successful, having made all rtl modifications necessary.  */
@@ -588,7 +728,9 @@ try_eliminate_compare (struct comparison *cmp)
 {
   rtx flags, in_a, in_b, cmp_src;
 
-  /* We must have found an interesting "clobber" preceding the compare.  */
+  if (try_merge_compare (cmp))
+    return true;
+
   if (cmp->prev_clobber == NULL)
     return false;
 
@@ -714,6 +856,7 @@ try_eliminate_compare (struct comparison *cmp)
 static unsigned int
 execute_compare_elim_after_reload (void)
 {
+  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
   df_analyze ();
 
   gcc_checking_assert (!all_compares.exists ());
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c b/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c
new file mode 100644
index 0000000..953c388
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* X is both compared against zero and used.  Make sure we can still
+   generate an ADDS and avoid an explicit comparison against zero.  */
+
+int
+foo (int x, int y)
+{
+  x += y;
+  if (x != 0)
+    x = x + 2;
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "adds\\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-not "cmp\\tw\[0-9\]+, 0" } } */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-08-10 22:03 [PATCH][compare-elim] Merge zero-comparisons with normal ops Michael Collison
@ 2017-08-28 18:43 ` Jeff Law
  2017-08-29  9:25   ` Kyrill Tkachov
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2017-08-28 18:43 UTC (permalink / raw)
  To: Michael Collison, gcc-patches; +Cc: nd

On 08/10/2017 03:14 PM, Michael Collison wrote:
> Hi all,
> 
> One issue that we keep encountering on aarch64 is GCC not making good use of the flag-setting arithmetic instructions
> like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the result against zero.
> They are represented in a fairly standard way in the backend as PARALLEL patterns:
> (parallel [(set (reg x1) (op (reg x2) (reg x3)))
>            (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])
> 
> GCC isn't forming these from separate arithmetic and comparison instructions as aggressively as it could.
> A particular pain point is when the result of the arithmetic insn is used before the comparison instruction.
> The testcase in this patch is one such example where we have:
> (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>         (plus:SI (reg:SI 0 x0 [ x ])
>             (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
>      (nil))
> (insn 33 7 34 2 (set (reg:SI 1 x1 [77])
>         (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>             (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
>      (nil))
> (insn 34 33 17 2 (set (reg:CC 66 cc)
>         (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>             (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
>      (nil))
> 
> This scares combine away as x0 is used in insn 33 as well as the comparison in insn 34.
> I think the compare-elim pass can help us here.
Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.

> 
> This patch extends it by handling comparisons against zero, finding the defining instruction of the compare
> and merging the comparison with the defining instruction into a PARALLEL that will hopefully match the form
> described above. If between the comparison and the defining insn we find an instruction that uses the condition
> registers or any control flow we bail out, and we don't cross basic blocks.
> This simple technique allows us to catch cases such as this example.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 	    Michael Collison <michael.collison@arm.com>
> 
> 	* compare-elim.c: Include emit-rtl.h.
> 	(can_merge_compare_into_arith): New function.
> 	(try_validate_parallel): Likewise.
> 	(try_merge_compare): Likewise.
> 	(try_eliminate_compare): Call the above when no previous clobber
> 	is available.
> 	(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
> 	dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 	    Michael Collison <michael.collison@arm.com>
> 	    
> 	* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
> 
> 
> pr5198v1.patch
> 
> 
> diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
> index 7e557a2..c65d155 100644
> --- a/gcc/compare-elim.c
> +++ b/gcc/compare-elim.c
> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm_p.h"
>  #include "insn-config.h"
>  #include "recog.h"
> +#include "emit-rtl.h"
>  #include "cfgrtl.h"
>  #include "tree-pass.h"
>  #include "domwalk.h"
> @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn *start)
>    return reg;
>  }
>  
> +/* Return true if it is okay to merge the comparison CMP_INSN with
> +   the instruction ARITH_INSN.  Both instructions are assumed to be in the
> +   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
> +   that there are no uses or defs of the condition flags or control flow
> +   changes between the two instructions.  */
> +
> +static bool
> +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
> +{
> +  for (rtx_insn *insn = PREV_INSN (cmp_insn);
> +       insn && insn != arith_insn;
> +       insn = PREV_INSN (insn))
> +    {
> +      if (!NONDEBUG_INSN_P (insn))
> +	continue;
> +      /* Bail if there are jumps or calls in between.  */
> +      if (!NONJUMP_INSN_P (insn))
> +	return false;
> +
> +      df_ref ref;
> +      /* Find a USE of the flags register.  */
> +      FOR_EACH_INSN_USE (ref, insn)
> +	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +	  return false;
> +
> +      /* Find a DEF of the flags register.  */
> +      FOR_EACH_INSN_DEF (ref, insn)
> +	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +	  return false;
> +    }
> +  return true;
> +}
What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.

Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?


> +
> +/* Given two SET expressions, SET_A and SET_B determine whether they form
> +   a recognizable pattern when emitted in parallel.  Return that parallel
> +   if so.  Otherwise return NULL.  This tries both permutations of SET_A
> +   and SET_B within the PARALLEL.  */
> +
> +static rtx
> +try_validate_parallel (rtx set_a, rtx set_b)
> +{
> +  rtx par
> +    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
> +  rtx_insn *seq;
> +  start_sequence ();
> +  seq = emit_insn (par);
> +  end_sequence ();
> +  if (recog_memoized (seq) > 0)
> +    return par;
> +
> +  par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_b, set_a));
> +  start_sequence ();
> +  seq = emit_insn (par);
> +  end_sequence ();
> +  return recog_memoized (seq) > 0 ? par : NULL_RTX;
> +}
I don't think you need to build up a sequence here.   Can't you just
allocate the INSN container and set its PATTERN?

> +
> +/* For a comparison instruction described by CMP check if it compares a
> +   register with zero i.e. it is of the form CC := CMP R1, 0.
> +   If it is, find the instruction defining R1 (say I1) and try to create a
> +   PARALLEL consisting of I1 and the comparison, representing a flag-setting
> +   arithmetic instruction.  Example:
> +   I1: R1 := R2 + R3
> +   <instructions that don't read the condition register>
> +   I2: CC := CMP R1 0
> +   I2 can be merged with I1 into:
> +   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
> +   This catches cases where R1 is used between I1 and I2 and therefore
> +   combine and other RTL optimisations will not try to propagate it into
> +   I2.  Return true if we succeeded in merging CMP.  */
> +
> +static bool
> +try_merge_compare (struct comparison *cmp)
> +{
> +  rtx_insn *cmp_insn = cmp->insn;
> +
> +  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
> +    return false;
> +  rtx in_a = cmp->in_a;
> +  if (!in_a)
> +    return false;
Isn't the second IF redundant?  How can !in_a ever be true here given we
tested REG_P (cmp->in_a)?


> +  df_ref use;
> +
> +  FOR_EACH_INSN_USE (use, cmp_insn)
> +    if (DF_REF_REGNO (use) == REGNO (in_a))
> +      break;
> +  if (!use)
> +    return false;
> +
> +  struct df_link *ref_chain;
> +  ref_chain = DF_REF_CHAIN (use);
> +  if (!ref_chain || !ref_chain->ref
> +      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
> +    return false;
So what are you checking for here?  I've got a pretty good guess, but
let's just make it explicit in a comment.


Generally this looks pretty close.  THe biggest concern I see is I think
you need to verify that the inputs on the arthmetic insn don't change
between the arithmetic and the compare.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-08-28 18:43 ` Jeff Law
@ 2017-08-29  9:25   ` Kyrill Tkachov
  2017-09-01 23:07     ` Segher Boessenkool
  0 siblings, 1 reply; 13+ messages in thread
From: Kyrill Tkachov @ 2017-08-29  9:25 UTC (permalink / raw)
  To: Jeff Law, Michael Collison, gcc-patches; +Cc: nd

Hi Jeff,
Some comments since I wrote this patch some time ago...

On 28/08/17 19:26, Jeff Law wrote:
> On 08/10/2017 03:14 PM, Michael Collison wrote:
>> Hi all,
>>
>> One issue that we keep encountering on aarch64 is GCC not making good use of the flag-setting arithmetic instructions
>> like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the result against zero.
>> They are represented in a fairly standard way in the backend as PARALLEL patterns:
>> (parallel [(set (reg x1) (op (reg x2) (reg x3)))
>>             (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])
>>
>> GCC isn't forming these from separate arithmetic and comparison instructions as aggressively as it could.
>> A particular pain point is when the result of the arithmetic insn is used before the comparison instruction.
>> The testcase in this patch is one such example where we have:
>> (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>>          (plus:SI (reg:SI 0 x0 [ x ])
>>              (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
>>       (nil))
>> (insn 33 7 34 2 (set (reg:SI 1 x1 [77])
>>          (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>>              (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
>>       (nil))
>> (insn 34 33 17 2 (set (reg:CC 66 cc)
>>          (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>>              (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
>>       (nil))
>>
>> This scares combine away as x0 is used in insn 33 as well as the comparison in insn 34.
>> I think the compare-elim pass can help us here.
> Is it the multiple use or the hard register that combine doesn't
> appreciate.  The latter would definitely steer us towards compare-elim.

It's the multiple use IIRC.

>> This patch extends it by handling comparisons against zero, finding the defining instruction of the compare
>> and merging the comparison with the defining instruction into a PARALLEL that will hopefully match the form
>> described above. If between the comparison and the defining insn we find an instruction that uses the condition
>> registers or any control flow we bail out, and we don't cross basic blocks.
>> This simple technique allows us to catch cases such as this example.
>>
>> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and x86_64.
>>
>> Ok for trunk?
>>
>> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>> 	    Michael Collison <michael.collison@arm.com>
>>
>> 	* compare-elim.c: Include emit-rtl.h.
>> 	(can_merge_compare_into_arith): New function.
>> 	(try_validate_parallel): Likewise.
>> 	(try_merge_compare): Likewise.
>> 	(try_eliminate_compare): Call the above when no previous clobber
>> 	is available.
>> 	(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
>> 	dataflow problems.
>>
>> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>> 	    Michael Collison <michael.collison@arm.com>
>> 	
>> 	* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
>>
>>
>> pr5198v1.patch
>>
>>
>> diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
>> index 7e557a2..c65d155 100644
>> --- a/gcc/compare-elim.c
>> +++ b/gcc/compare-elim.c
>> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "tm_p.h"
>>   #include "insn-config.h"
>>   #include "recog.h"
>> +#include "emit-rtl.h"
>>   #include "cfgrtl.h"
>>   #include "tree-pass.h"
>>   #include "domwalk.h"
>> @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn *start)
>>     return reg;
>>   }
>>   
>> +/* Return true if it is okay to merge the comparison CMP_INSN with
>> +   the instruction ARITH_INSN.  Both instructions are assumed to be in the
>> +   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
>> +   that there are no uses or defs of the condition flags or control flow
>> +   changes between the two instructions.  */
>> +
>> +static bool
>> +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
>> +{
>> +  for (rtx_insn *insn = PREV_INSN (cmp_insn);
>> +       insn && insn != arith_insn;
>> +       insn = PREV_INSN (insn))
>> +    {
>> +      if (!NONDEBUG_INSN_P (insn))
>> +	continue;
>> +      /* Bail if there are jumps or calls in between.  */
>> +      if (!NONJUMP_INSN_P (insn))
>> +	return false;
>> +
>> +      df_ref ref;
>> +      /* Find a USE of the flags register.  */
>> +      FOR_EACH_INSN_USE (ref, insn)
>> +	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
>> +	  return false;
>> +
>> +      /* Find a DEF of the flags register.  */
>> +      FOR_EACH_INSN_DEF (ref, insn)
>> +	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
>> +	  return false;
>> +    }
>> +  return true;
>> +}
> What about old style asms?  Do you need to explicit punt those?  I don't
> think they carry DF info.

I think we want to bail out on those to be safe. How are they 
represented in RTL?

> Don't you also have to verify that the inputs to ARITH_INSN are
> unchanged between ARITH_INSN and CMP_INSN?

I don't think so. As long as there is no flag clobbering instructions 
between them we should be ok.

Consider:
r1 := r2 + r3  // arith_insn
r2 := r4 + r5  // arith_insn input modified before cmp_insn
cc := CMP r1 0 // cmp_insn

this would be transformed into
{r1 := r2 + r3 ; cc := CMP (r2 + r3) 0} //parallel
r2 := r4 + r5



>
>
>> +
>> +/* Given two SET expressions, SET_A and SET_B determine whether they form
>> +   a recognizable pattern when emitted in parallel.  Return that parallel
>> +   if so.  Otherwise return NULL.  This tries both permutations of SET_A
>> +   and SET_B within the PARALLEL.  */
>> +
>> +static rtx
>> +try_validate_parallel (rtx set_a, rtx set_b)
>> +{
>> +  rtx par
>> +    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
>> +  rtx_insn *seq;
>> +  start_sequence ();
>> +  seq = emit_insn (par);
>> +  end_sequence ();
>> +  if (recog_memoized (seq) > 0)
>> +    return par;
>> +
>> +  par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_b, set_a));
>> +  start_sequence ();
>> +  seq = emit_insn (par);
>> +  end_sequence ();
>> +  return recog_memoized (seq) > 0 ? par : NULL_RTX;
>> +}
> I don't think you need to build up a sequence here.   Can't you just
> allocate the INSN container and set its PATTERN?

Yes, I think you're right.

>> +
>> +/* For a comparison instruction described by CMP check if it compares a
>> +   register with zero i.e. it is of the form CC := CMP R1, 0.
>> +   If it is, find the instruction defining R1 (say I1) and try to create a
>> +   PARALLEL consisting of I1 and the comparison, representing a flag-setting
>> +   arithmetic instruction.  Example:
>> +   I1: R1 := R2 + R3
>> +   <instructions that don't read the condition register>
>> +   I2: CC := CMP R1 0
>> +   I2 can be merged with I1 into:
>> +   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
>> +   This catches cases where R1 is used between I1 and I2 and therefore
>> +   combine and other RTL optimisations will not try to propagate it into
>> +   I2.  Return true if we succeeded in merging CMP.  */
>> +
>> +static bool
>> +try_merge_compare (struct comparison *cmp)
>> +{
>> +  rtx_insn *cmp_insn = cmp->insn;
>> +
>> +  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
>> +    return false;
>> +  rtx in_a = cmp->in_a;
>> +  if (!in_a)
>> +    return false;
> Isn't the second IF redundant?  How can !in_a ever be true here given we
> tested REG_P (cmp->in_a)?

Err... yes.

>
>> +  df_ref use;
>> +
>> +  FOR_EACH_INSN_USE (use, cmp_insn)
>> +    if (DF_REF_REGNO (use) == REGNO (in_a))
>> +      break;
>> +  if (!use)
>> +    return false;
>> +
>> +  struct df_link *ref_chain;
>> +  ref_chain = DF_REF_CHAIN (use);
>> +  if (!ref_chain || !ref_chain->ref
>> +      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
>> +    return false;
> So what are you checking for here?  I've got a pretty good guess, but
> let's just make it explicit in a comment.

This snippet just uses df to find the instruction that defines in_a.
 From dealing with the df infrastructure in the past I found that it 
needed some of this
NULL-checking. Adding a comment here would be good, I agree.

Thanks,
Kyrill

>
> Generally this looks pretty close.  THe biggest concern I see is I think
> you need to verify that the inputs on the arthmetic insn don't change
> between the arithmetic and the compare.
>
> jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-08-29  9:25   ` Kyrill Tkachov
@ 2017-09-01 23:07     ` Segher Boessenkool
  2017-09-06 15:56       ` Michael Collison
  0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-09-01 23:07 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Jeff Law, Michael Collison, gcc-patches, nd

Hi!

On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote:
> On 28/08/17 19:26, Jeff Law wrote:
> >On 08/10/2017 03:14 PM, Michael Collison wrote:
> >>One issue that we keep encountering on aarch64 is GCC not making good use 
> >>of the flag-setting arithmetic instructions
> >>like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and 
> >>compare the result against zero.
> >>They are represented in a fairly standard way in the backend as PARALLEL 
> >>patterns:
> >>(parallel [(set (reg x1) (op (reg x2) (reg x3)))
> >>            (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 
> >>            0)))])

That is incorrect: the compare has to come first.  From md.texi:

@cindex @code{compare}, canonicalization of
[ ... ]

@item
For instructions that inherently set a condition code register, the
@code{compare} operator is always written as the first RTL expression of
the @code{parallel} instruction pattern.  For example,
[ ... ]

aarch64.md seems to do this correctly, fwiw.

> >>GCC isn't forming these from separate arithmetic and comparison 
> >>instructions as aggressively as it could.
> >>A particular pain point is when the result of the arithmetic insn is used 
> >>before the comparison instruction.
> >>The testcase in this patch is one such example where we have:
> >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>         (plus:SI (reg:SI 0 x0 [ x ])
> >>             (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
> >>      (nil))
> >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> >>         (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>             (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
> >>      (nil))
> >>(insn 34 33 17 2 (set (reg:CC 66 cc)
> >>         (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>             (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
> >>      (nil))
> >>
> >>This scares combine away as x0 is used in insn 33 as well as the 
> >>comparison in insn 34.
> >>I think the compare-elim pass can help us here.
> >Is it the multiple use or the hard register that combine doesn't
> >appreciate.  The latter would definitely steer us towards compare-elim.
> 
> It's the multiple use IIRC.

Multiple use, and multiple set (of x1), and more complications...

7+33 won't combine to an existing insn.

7+34 will not even be tried (insn 33 is the first use of x0, not insn 34).
But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so
7 cannot be merged into 34.

7+33+34 results in a parallel of a compare with the same invalid insn
as in the 7+33 case.  Combine would try to split it to two insns again,
except it already has two insns (the arith and the compare).  It does
not see that when it splits the insn it can combine the first half with
the compare.

What would be needed is pulling insn 34 before insn 33 (which is fine, no
conflicts there), and then we could combine 7+34 just fine.  But combine
tries to be linear complexity, and it really cannot change insns around
anyway.

Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-09-01 23:07     ` Segher Boessenkool
@ 2017-09-06 15:56       ` Michael Collison
  2017-10-13 18:04         ` Jeff Law
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Collison @ 2017-09-06 15:56 UTC (permalink / raw)
  To: Segher Boessenkool, Kyrill Tkachov; +Cc: Jeff Law, gcc-patches, nd

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]

Patch updated with all relevant comments and suggestions.

Bootstrapped and tested on arm-none-linux-gnueabihf, and aarch64-none-linux-gnu and x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>

	* compare-elim.c: Include emit-rtl.h.
	(can_merge_compare_into_arith): New function.
	(try_validate_parallel): Likewise.
	(try_merge_compare): Likewise.
	(try_eliminate_compare): Call the above when no previous clobber
	is available.
	(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
	dataflow problems.

2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>
	    
	* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.

-----Original Message-----
From: Segher Boessenkool [mailto:segher@kernel.crashing.org] 
Sent: Saturday, September 2, 2017 12:07 AM
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: Jeff Law <law@redhat.com>; Michael Collison <Michael.Collison@arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>
Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

Hi!

On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote:
> On 28/08/17 19:26, Jeff Law wrote:
> >On 08/10/2017 03:14 PM, Michael Collison wrote:
> >>One issue that we keep encountering on aarch64 is GCC not making 
> >>good use of the flag-setting arithmetic instructions like ADDS, 
> >>SUBS, ANDS etc. that perform an arithmetic operation and compare the 
> >>result against zero.
> >>They are represented in a fairly standard way in the backend as 
> >>PARALLEL
> >>patterns:
> >>(parallel [(set (reg x1) (op (reg x2) (reg x3)))
> >>            (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 
> >>            0)))])

That is incorrect: the compare has to come first.  From md.texi:

@cindex @code{compare}, canonicalization of [ ... ]

@item
For instructions that inherently set a condition code register, the @code{compare} operator is always written as the first RTL expression of the @code{parallel} instruction pattern.  For example, [ ... ]

aarch64.md seems to do this correctly, fwiw.

> >>GCC isn't forming these from separate arithmetic and comparison 
> >>instructions as aggressively as it could.
> >>A particular pain point is when the result of the arithmetic insn is 
> >>used before the comparison instruction.
> >>The testcase in this patch is one such example where we have:
> >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>         (plus:SI (reg:SI 0 x0 [ x ])
> >>             (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
> >>      (nil))
> >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> >>         (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>             (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
> >>      (nil))
> >>(insn 34 33 17 2 (set (reg:CC 66 cc)
> >>         (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>             (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
> >>      (nil))
> >>
> >>This scares combine away as x0 is used in insn 33 as well as the 
> >>comparison in insn 34.
> >>I think the compare-elim pass can help us here.
> >Is it the multiple use or the hard register that combine doesn't 
> >appreciate.  The latter would definitely steer us towards compare-elim.
> 
> It's the multiple use IIRC.

Multiple use, and multiple set (of x1), and more complications...

7+33 won't combine to an existing insn.

7+34 will not even be tried (insn 33 is the first use of x0, not insn 34).
But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so
7 cannot be merged into 34.

7+33+34 results in a parallel of a compare with the same invalid insn
as in the 7+33 case.  Combine would try to split it to two insns again, except it already has two insns (the arith and the compare).  It does not see that when it splits the insn it can combine the first half with the compare.

What would be needed is pulling insn 34 before insn 33 (which is fine, no conflicts there), and then we could combine 7+34 just fine.  But combine tries to be linear complexity, and it really cannot change insns around anyway.


Segher

[-- Attachment #2: pr5198v2.patch --]
[-- Type: application/octet-stream, Size: 6317 bytes --]

diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 7e557a2..794a452 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "insn-config.h"
 #include "recog.h"
+#include "emit-rtl.h"
 #include "cfgrtl.h"
 #include "tree-pass.h"
 #include "domwalk.h"
@@ -579,6 +580,143 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn *start)
   return reg;
 }
 
+/* Return true if it is okay to merge the comparison CMP_INSN with
+   the instruction ARITH_INSN.  Both instructions are assumed to be in the
+   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
+   that there are no uses or defs of the condition flags or control flow
+   changes between the two instructions.  */
+
+static bool
+can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
+{
+  for (rtx_insn *insn = PREV_INSN (cmp_insn);
+       insn && insn != arith_insn;
+       insn = PREV_INSN (insn))
+    {
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      /* Bail if there are jumps or calls in between.  */
+      if (!NONJUMP_INSN_P (insn))
+	return false;
+
+      /* Bail on old-style asm statements because they lack
+	 data flow information.  */
+      if (GET_CODE (PATTERN (insn)) == ASM_INPUT)
+	return false;
+
+      df_ref ref;
+      /* Find a USE of the flags register.  */
+      FOR_EACH_INSN_USE (ref, insn)
+	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+	  return false;
+
+      /* Find a DEF of the flags register.  */
+      FOR_EACH_INSN_DEF (ref, insn)
+	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+	  return false;
+    }
+  return true;
+}
+
+/* Given two SET expressions, SET_A and SET_B determine whether they form
+   a recognizable pattern when emitted in parallel.  Return that parallel
+   if so.  Otherwise return NULL.  */
+
+static rtx
+try_validate_parallel (rtx set_a, rtx set_b)
+{
+  rtx par
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
+
+  rtx_insn *insn;
+  insn = gen_rtx_INSN (VOIDmode, 0, 0, 0, par, 0, -1, 0);
+
+  return recog_memoized (insn) > 0 ? par : NULL_RTX;
+}
+
+/* For a comparison instruction described by CMP check if it compares a
+   register with zero i.e. it is of the form CC := CMP R1, 0.
+   If it is, find the instruction defining R1 (say I1) and try to create a
+   PARALLEL consisting of I1 and the comparison, representing a flag-setting
+   arithmetic instruction.  Example:
+   I1: R1 := R2 + R3
+   <instructions that don't read the condition register>
+   I2: CC := CMP R1 0
+   I2 can be merged with I1 into:
+   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
+   This catches cases where R1 is used between I1 and I2 and therefore
+   combine and other RTL optimisations will not try to propagate it into
+   I2.  Return true if we succeeded in merging CMP.  */
+
+static bool
+try_merge_compare (struct comparison *cmp)
+{
+  rtx_insn *cmp_insn = cmp->insn;
+
+  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
+    return false;
+  rtx in_a = cmp->in_a;
+  df_ref use;
+
+  FOR_EACH_INSN_USE (use, cmp_insn)
+    if (DF_REF_REGNO (use) == REGNO (in_a))
+      break;
+  if (!use)
+    return false;
+
+  /* Validate the data flow information before attempting to
+     find the instruction that defines in_a.  */
+
+  struct df_link *ref_chain;
+  ref_chain = DF_REF_CHAIN (use);
+  if (!ref_chain || !ref_chain->ref
+      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
+    return false;
+
+  rtx_insn *def_insn = DF_REF_INSN (ref_chain->ref);
+  /* We found the insn that defines in_a.  Only consider the cases where
+     it is in the same block as the comparison.  */
+  if (BLOCK_FOR_INSN (cmp_insn) != BLOCK_FOR_INSN (def_insn))
+    return false;
+
+  rtx set = single_set (def_insn);
+  if (!set)
+    return false;
+
+  if (!can_merge_compare_into_arith (cmp_insn, def_insn))
+    return false;
+
+  rtx src = SET_SRC (set);
+  rtx flags = maybe_select_cc_mode (cmp, src, CONST0_RTX (GET_MODE (src)));
+  if (!flags)
+    {
+    /* We may already have a change group going through maybe_select_cc_mode.
+       Discard it properly.  */
+      cancel_changes (0);
+      return false;
+    }
+
+  rtx flag_set
+    = gen_rtx_SET (flags, gen_rtx_COMPARE (GET_MODE (flags),
+					   copy_rtx (src),
+					   CONST0_RTX (GET_MODE (src))));
+  rtx arith_set = copy_rtx (PATTERN (def_insn));
+  rtx par = try_validate_parallel (flag_set, arith_set);
+  if (!par)
+    {
+      /* We may already have a change group going through maybe_select_cc_mode.
+	 Discard it properly.  */
+      cancel_changes (0);
+      return false;
+    }
+  if (!apply_change_group ())
+    return false;
+  emit_insn_after (par, def_insn);
+  delete_insn (def_insn);
+  delete_insn (cmp->insn);
+  return true;
+}
+
 /* Attempt to replace a comparison with a prior arithmetic insn that can
    compute the same flags value as the comparison itself.  Return true if
    successful, having made all rtl modifications necessary.  */
@@ -588,7 +726,9 @@ try_eliminate_compare (struct comparison *cmp)
 {
   rtx flags, in_a, in_b, cmp_src;
 
-  /* We must have found an interesting "clobber" preceding the compare.  */
+  if (try_merge_compare (cmp))
+    return true;
+
   if (cmp->prev_clobber == NULL)
     return false;
 
@@ -714,6 +854,7 @@ try_eliminate_compare (struct comparison *cmp)
 static unsigned int
 execute_compare_elim_after_reload (void)
 {
+  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
   df_analyze ();
 
   gcc_checking_assert (!all_compares.exists ());
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c b/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c
new file mode 100644
index 0000000..953c388
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* X is both compared against zero and used.  Make sure we can still
+   generate an ADDS and avoid an explicit comparison against zero.  */
+
+int
+foo (int x, int y)
+{
+  x += y;
+  if (x != 0)
+    x = x + 2;
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "adds\\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-not "cmp\\tw\[0-9\]+, 0" } } */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-09-06 15:56       ` Michael Collison
@ 2017-10-13 18:04         ` Jeff Law
  2017-10-14  8:45           ` Eric Botcazou
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2017-10-13 18:04 UTC (permalink / raw)
  To: Michael Collison, Segher Boessenkool, Kyrill Tkachov; +Cc: gcc-patches, nd

On 09/06/2017 09:56 AM, Michael Collison wrote:
> Patch updated with all relevant comments and suggestions.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, and aarch64-none-linux-gnu and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 	    Michael Collison <michael.collison@arm.com>
> 
> 	* compare-elim.c: Include emit-rtl.h.
> 	(can_merge_compare_into_arith): New function.
> 	(try_validate_parallel): Likewise.
> 	(try_merge_compare): Likewise.
> 	(try_eliminate_compare): Call the above when no previous clobber
> 	is available.
> 	(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
> 	dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 	    Michael Collison <michael.collison@arm.com>
> 	    
> 	* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
Sorry for the delay.

This looks good.  OK for the trunk.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-10-13 18:04         ` Jeff Law
@ 2017-10-14  8:45           ` Eric Botcazou
  2017-10-17 11:57             ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Botcazou @ 2017-10-14  8:45 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Michael Collison, Segher Boessenkool, Kyrill Tkachov, nd

> This looks good.  OK for the trunk.

FWIW I disagree.  The patch completely shuns the existing implementation of 
the pass, which is based on a forward scan within basic blocks to identify the 
various interesting instructions and record them, and uses full-blown def-use 
and use-def chains instead, which are much more costly to compute.  It's not 
clear to me why the existing implementation couldn't have been extended.

The result is that, for targets for which the pass was initially written, i.e. 
targets for which most (all) arithmetic instructions clobber the flags, the 
pass will be slower for absolutely no benefits, as the existing implementation 
would already have caught all the interesting cases.

So it's again a case of a generic change made for a specific target without 
consideration for other, admittedly less mainstream, targets...

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-10-14  8:45           ` Eric Botcazou
@ 2017-10-17 11:57             ` Richard Biener
  2017-10-17 19:29               ` Michael Collison
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2017-10-17 11:57 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: Jeff Law, GCC Patches, Michael Collison, Segher Boessenkool,
	Kyrill Tkachov, nd

On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> This looks good.  OK for the trunk.
>
> FWIW I disagree.  The patch completely shuns the existing implementation of
> the pass, which is based on a forward scan within basic blocks to identify the
> various interesting instructions and record them, and uses full-blown def-use
> and use-def chains instead, which are much more costly to compute.  It's not
> clear to me why the existing implementation couldn't have been extended.
>
> The result is that, for targets for which the pass was initially written, i.e.
> targets for which most (all) arithmetic instructions clobber the flags, the
> pass will be slower for absolutely no benefits, as the existing implementation
> would already have caught all the interesting cases.
>
> So it's again a case of a generic change made for a specific target without
> consideration for other, admittedly less mainstream, targets...

I agree with Eric here.

Richard.

> --
> Eric Botcazou

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-10-17 11:57             ` Richard Biener
@ 2017-10-17 19:29               ` Michael Collison
  2017-10-17 20:05                 ` Eric Botcazou
  2017-10-17 20:12                 ` Richard Biener
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Collison @ 2017-10-17 19:29 UTC (permalink / raw)
  To: Richard Biener, Eric Botcazou
  Cc: Jeff Law, GCC Patches, Segher Boessenkool, Kyrill Tkachov, nd

Richard and Eric,

I see you have objected and indicated the additional cost. Have you quantified how much more expensive the pass is?

-----Original Message-----
From: Richard Biener [mailto:richard.guenther@gmail.com] 
Sent: Tuesday, October 17, 2017 4:45 AM
To: Eric Botcazou <ebotcazou@adacore.com>
Cc: Jeff Law <law@redhat.com>; GCC Patches <gcc-patches@gcc.gnu.org>; Michael Collison <Michael.Collison@arm.com>; Segher Boessenkool <segher@kernel.crashing.org>; Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; nd <nd@arm.com>
Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> This looks good.  OK for the trunk.
>
> FWIW I disagree.  The patch completely shuns the existing 
> implementation of the pass, which is based on a forward scan within 
> basic blocks to identify the various interesting instructions and 
> record them, and uses full-blown def-use and use-def chains instead, 
> which are much more costly to compute.  It's not clear to me why the existing implementation couldn't have been extended.
>
> The result is that, for targets for which the pass was initially written, i.e.
> targets for which most (all) arithmetic instructions clobber the 
> flags, the pass will be slower for absolutely no benefits, as the 
> existing implementation would already have caught all the interesting cases.
>
> So it's again a case of a generic change made for a specific target 
> without consideration for other, admittedly less mainstream, targets...

I agree with Eric here.

Richard.

> --
> Eric Botcazou

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-10-17 19:29               ` Michael Collison
@ 2017-10-17 20:05                 ` Eric Botcazou
  2017-10-17 20:12                 ` Richard Biener
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Botcazou @ 2017-10-17 20:05 UTC (permalink / raw)
  To: Michael Collison
  Cc: gcc-patches, Richard Biener, Jeff Law, Segher Boessenkool,
	Kyrill Tkachov, nd

> I see you have objected and indicated the additional cost. Have you
> quantified how much more expensive the pass is?

No, but use-def chains are known to be slow because DF is slow, see e.g. the 
comment located a few lines below the call to try_merge_compare:

  /* ??? This is one point at which one could argue that DF_REF_CHAIN would
     be useful, but it is thought to be too heavy-weight a solution here.  */

Note that the patch breaks e.g. the Visium port, because the pass now sends 
all kind of junk RTXes to the select_cc_mode target hook, which was written to 
be in exact keeping with the arithmetic patterns of the MD file.  I'm going to 
fix the breakage of course, but this shows that the patch burns a large amount 
of cycles for targets like Visium for no benefit.

Index: config/visium/visium.c
===================================================================
--- config/visium/visium.c      (revision 253767)
+++ config/visium/visium.c      (working copy)
@@ -2938,12 +2938,6 @@ visium_select_cc_mode (enum rtx_code cod
       /* This is a btst, the result is in C instead of Z.  */
       return CCCmode;
 
-    case CONST_INT:
-      /* This is a degenerate case, typically an uninitialized variable.  */
-      gcc_assert (op0 == constm1_rtx);
-
-      /* ... fall through ... */
-
     case REG:
     case AND:
     case IOR:
@@ -2953,6 +2947,7 @@ visium_select_cc_mode (enum rtx_code cod
     case LSHIFTRT:
     case TRUNCATE:
     case SIGN_EXTEND:
+    case ZERO_EXTEND:
       /* Pretend that the flags are set as for a COMPARE with zero.
         That's mostly true, except for the 2 right shift insns that
         will set the C flag.  But the C flag is relevant only for
@@ -2960,6 +2955,16 @@ visium_select_cc_mode (enum rtx_code cod
         when applied to a comparison with zero.  */
       return CCmode;
 
+    /* ??? Cater to the junk RTXes sent by try_merge_compare.  */
+    case ASM_OPERANDS:
+    case CALL:
+    case CONST_INT:
+    case LO_SUM:
+    case HIGH:
+    case MEM:
+    case UNSPEC:
+      return CCmode;
+
     default:
       gcc_unreachable ();


-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-10-17 19:29               ` Michael Collison
  2017-10-17 20:05                 ` Eric Botcazou
@ 2017-10-17 20:12                 ` Richard Biener
  2017-10-17 20:29                   ` Michael Collison
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Biener @ 2017-10-17 20:12 UTC (permalink / raw)
  To: Michael Collison, Eric Botcazou
  Cc: Jeff Law, GCC Patches, Segher Boessenkool, Kyrill Tkachov, nd

On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison <Michael.Collison@arm.com> wrote:
>Richard and Eric,
>
>I see you have objected and indicated the additional cost. Have you
>quantified how much more expensive the pass is?

DF has known quadratic behavior in memory for certain problems. Not sure off head if DU and UD fall into this category. 

Richard. 

>-----Original Message-----
>From: Richard Biener [mailto:richard.guenther@gmail.com] 
>Sent: Tuesday, October 17, 2017 4:45 AM
>To: Eric Botcazou <ebotcazou@adacore.com>
>Cc: Jeff Law <law@redhat.com>; GCC Patches <gcc-patches@gcc.gnu.org>;
>Michael Collison <Michael.Collison@arm.com>; Segher Boessenkool
><segher@kernel.crashing.org>; Kyrill Tkachov
><kyrylo.tkachov@foss.arm.com>; nd <nd@arm.com>
>Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal
>ops
>
>On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou <ebotcazou@adacore.com>
>wrote:
>>> This looks good.  OK for the trunk.
>>
>> FWIW I disagree.  The patch completely shuns the existing 
>> implementation of the pass, which is based on a forward scan within 
>> basic blocks to identify the various interesting instructions and 
>> record them, and uses full-blown def-use and use-def chains instead, 
>> which are much more costly to compute.  It's not clear to me why the
>existing implementation couldn't have been extended.
>>
>> The result is that, for targets for which the pass was initially
>written, i.e.
>> targets for which most (all) arithmetic instructions clobber the 
>> flags, the pass will be slower for absolutely no benefits, as the 
>> existing implementation would already have caught all the interesting
>cases.
>>
>> So it's again a case of a generic change made for a specific target 
>> without consideration for other, admittedly less mainstream,
>targets...
>
>I agree with Eric here.
>
>Richard.
>
>> --
>> Eric Botcazou

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-10-17 20:12                 ` Richard Biener
@ 2017-10-17 20:29                   ` Michael Collison
  2017-10-18 14:14                     ` Eric Botcazou
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Collison @ 2017-10-17 20:29 UTC (permalink / raw)
  To: Richard Biener, Eric Botcazou
  Cc: Jeff Law, GCC Patches, Segher Boessenkool, Kyrill Tkachov, nd

Are we in agreement that I should revert the patch?

-----Original Message-----
From: Richard Biener [mailto:richard.guenther@gmail.com] 
Sent: Tuesday, October 17, 2017 1:10 PM
To: Michael Collison <Michael.Collison@arm.com>; Eric Botcazou <ebotcazou@adacore.com>
Cc: Jeff Law <law@redhat.com>; GCC Patches <gcc-patches@gcc.gnu.org>; Segher Boessenkool <segher@kernel.crashing.org>; Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; nd <nd@arm.com>
Subject: RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison <Michael.Collison@arm.com> wrote:
>Richard and Eric,
>
>I see you have objected and indicated the additional cost. Have you 
>quantified how much more expensive the pass is?

DF has known quadratic behavior in memory for certain problems. Not sure off head if DU and UD fall into this category. 

Richard. 

>-----Original Message-----
>From: Richard Biener [mailto:richard.guenther@gmail.com]
>Sent: Tuesday, October 17, 2017 4:45 AM
>To: Eric Botcazou <ebotcazou@adacore.com>
>Cc: Jeff Law <law@redhat.com>; GCC Patches <gcc-patches@gcc.gnu.org>; 
>Michael Collison <Michael.Collison@arm.com>; Segher Boessenkool 
><segher@kernel.crashing.org>; Kyrill Tkachov 
><kyrylo.tkachov@foss.arm.com>; nd <nd@arm.com>
>Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal 
>ops
>
>On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou <ebotcazou@adacore.com>
>wrote:
>>> This looks good.  OK for the trunk.
>>
>> FWIW I disagree.  The patch completely shuns the existing 
>> implementation of the pass, which is based on a forward scan within 
>> basic blocks to identify the various interesting instructions and 
>> record them, and uses full-blown def-use and use-def chains instead, 
>> which are much more costly to compute.  It's not clear to me why the
>existing implementation couldn't have been extended.
>>
>> The result is that, for targets for which the pass was initially
>written, i.e.
>> targets for which most (all) arithmetic instructions clobber the 
>> flags, the pass will be slower for absolutely no benefits, as the 
>> existing implementation would already have caught all the interesting
>cases.
>>
>> So it's again a case of a generic change made for a specific target 
>> without consideration for other, admittedly less mainstream,
>targets...
>
>I agree with Eric here.
>
>Richard.
>
>> --
>> Eric Botcazou


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops
  2017-10-17 20:29                   ` Michael Collison
@ 2017-10-18 14:14                     ` Eric Botcazou
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Botcazou @ 2017-10-18 14:14 UTC (permalink / raw)
  To: Michael Collison
  Cc: gcc-patches, Richard Biener, Jeff Law, Segher Boessenkool,
	Kyrill Tkachov, nd

> Are we in agreement that I should revert the patch?

I think that you can leave it for now in order to have some feedback (see for 
example PR rtl-optimization/82597) but ideally it should be rewritten so as to 
reuse the existing infrastructure of the pass.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-18 14:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-10 22:03 [PATCH][compare-elim] Merge zero-comparisons with normal ops Michael Collison
2017-08-28 18:43 ` Jeff Law
2017-08-29  9:25   ` Kyrill Tkachov
2017-09-01 23:07     ` Segher Boessenkool
2017-09-06 15:56       ` Michael Collison
2017-10-13 18:04         ` Jeff Law
2017-10-14  8:45           ` Eric Botcazou
2017-10-17 11:57             ` Richard Biener
2017-10-17 19:29               ` Michael Collison
2017-10-17 20:05                 ` Eric Botcazou
2017-10-17 20:12                 ` Richard Biener
2017-10-17 20:29                   ` Michael Collison
2017-10-18 14:14                     ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).