RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Michael Collison <Michael.Collison@arm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>,
	Kyrill Tkachov	<kyrylo.tkachov@foss.arm.com>
Cc: Jeff Law <law@redhat.com>,
	"gcc-patches@gcc.gnu.org"	<gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>
Subject: RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
Date: Wed, 06 Sep 2017 15:56:00 -0000	[thread overview]
Message-ID: <HE1PR0802MB2377818A883A7B663207D6B395970@HE1PR0802MB2377.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20170901230726.GS13471@gate.crashing.org>

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]

Patch updated with all relevant comments and suggestions.

Bootstrapped and tested on arm-none-linux-gnueabihf, and aarch64-none-linux-gnu and x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>

	* compare-elim.c: Include emit-rtl.h.
	(can_merge_compare_into_arith): New function.
	(try_validate_parallel): Likewise.
	(try_merge_compare): Likewise.
	(try_eliminate_compare): Call the above when no previous clobber
	is available.
	(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
	dataflow problems.

2017-08-05  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
	    Michael Collison <michael.collison@arm.com>
	    
	* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.

-----Original Message-----
From: Segher Boessenkool [mailto:segher@kernel.crashing.org] 
Sent: Saturday, September 2, 2017 12:07 AM
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: Jeff Law <law@redhat.com>; Michael Collison <Michael.Collison@arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>
Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

Hi!

On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote:
> On 28/08/17 19:26, Jeff Law wrote:
> >On 08/10/2017 03:14 PM, Michael Collison wrote:
> >>One issue that we keep encountering on aarch64 is GCC not making 
> >>good use of the flag-setting arithmetic instructions like ADDS, 
> >>SUBS, ANDS etc. that perform an arithmetic operation and compare the 
> >>result against zero.
> >>They are represented in a fairly standard way in the backend as 
> >>PARALLEL
> >>patterns:
> >>(parallel [(set (reg x1) (op (reg x2) (reg x3)))
> >>            (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 
> >>            0)))])

That is incorrect: the compare has to come first.  From md.texi:

@cindex @code{compare}, canonicalization of [ ... ]

@item
For instructions that inherently set a condition code register, the @code{compare} operator is always written as the first RTL expression of the @code{parallel} instruction pattern.  For example, [ ... ]

aarch64.md seems to do this correctly, fwiw.

> >>GCC isn't forming these from separate arithmetic and comparison 
> >>instructions as aggressively as it could.
> >>A particular pain point is when the result of the arithmetic insn is 
> >>used before the comparison instruction.
> >>The testcase in this patch is one such example where we have:
> >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>         (plus:SI (reg:SI 0 x0 [ x ])
> >>             (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
> >>      (nil))
> >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> >>         (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>             (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
> >>      (nil))
> >>(insn 34 33 17 2 (set (reg:CC 66 cc)
> >>         (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
> >>             (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
> >>      (nil))
> >>
> >>This scares combine away as x0 is used in insn 33 as well as the 
> >>comparison in insn 34.
> >>I think the compare-elim pass can help us here.
> >Is it the multiple use or the hard register that combine doesn't 
> >appreciate.  The latter would definitely steer us towards compare-elim.
> 
> It's the multiple use IIRC.

Multiple use, and multiple set (of x1), and more complications...

7+33 won't combine to an existing insn.

7+34 will not even be tried (insn 33 is the first use of x0, not insn 34).
But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so
7 cannot be merged into 34.

7+33+34 results in a parallel of a compare with the same invalid insn
as in the 7+33 case.  Combine would try to split it to two insns again, except it already has two insns (the arith and the compare).  It does not see that when it splits the insn it can combine the first half with the compare.

What would be needed is pulling insn 34 before insn 33 (which is fine, no conflicts there), and then we could combine 7+34 just fine.  But combine tries to be linear complexity, and it really cannot change insns around anyway.


Segher

[-- Attachment #2: pr5198v2.patch --]
[-- Type: application/octet-stream, Size: 6317 bytes --]

diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 7e557a2..794a452 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "insn-config.h"
 #include "recog.h"
+#include "emit-rtl.h"
 #include "cfgrtl.h"
 #include "tree-pass.h"
 #include "domwalk.h"
@@ -579,6 +580,143 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn *start)
   return reg;
 }
 
+/* Return true if it is okay to merge the comparison CMP_INSN with
+   the instruction ARITH_INSN.  Both instructions are assumed to be in the
+   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
+   that there are no uses or defs of the condition flags or control flow
+   changes between the two instructions.  */
+
+static bool
+can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
+{
+  for (rtx_insn *insn = PREV_INSN (cmp_insn);
+       insn && insn != arith_insn;
+       insn = PREV_INSN (insn))
+    {
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      /* Bail if there are jumps or calls in between.  */
+      if (!NONJUMP_INSN_P (insn))
+	return false;
+
+      /* Bail on old-style asm statements because they lack
+	 data flow information.  */
+      if (GET_CODE (PATTERN (insn)) == ASM_INPUT)
+	return false;
+
+      df_ref ref;
+      /* Find a USE of the flags register.  */
+      FOR_EACH_INSN_USE (ref, insn)
+	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+	  return false;
+
+      /* Find a DEF of the flags register.  */
+      FOR_EACH_INSN_DEF (ref, insn)
+	if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+	  return false;
+    }
+  return true;
+}
+
+/* Given two SET expressions, SET_A and SET_B determine whether they form
+   a recognizable pattern when emitted in parallel.  Return that parallel
+   if so.  Otherwise return NULL.  */
+
+static rtx
+try_validate_parallel (rtx set_a, rtx set_b)
+{
+  rtx par
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
+
+  rtx_insn *insn;
+  insn = gen_rtx_INSN (VOIDmode, 0, 0, 0, par, 0, -1, 0);
+
+  return recog_memoized (insn) > 0 ? par : NULL_RTX;
+}
+
+/* For a comparison instruction described by CMP check if it compares a
+   register with zero i.e. it is of the form CC := CMP R1, 0.
+   If it is, find the instruction defining R1 (say I1) and try to create a
+   PARALLEL consisting of I1 and the comparison, representing a flag-setting
+   arithmetic instruction.  Example:
+   I1: R1 := R2 + R3
+   <instructions that don't read the condition register>
+   I2: CC := CMP R1 0
+   I2 can be merged with I1 into:
+   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
+   This catches cases where R1 is used between I1 and I2 and therefore
+   combine and other RTL optimisations will not try to propagate it into
+   I2.  Return true if we succeeded in merging CMP.  */
+
+static bool
+try_merge_compare (struct comparison *cmp)
+{
+  rtx_insn *cmp_insn = cmp->insn;
+
+  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
+    return false;
+  rtx in_a = cmp->in_a;
+  df_ref use;
+
+  FOR_EACH_INSN_USE (use, cmp_insn)
+    if (DF_REF_REGNO (use) == REGNO (in_a))
+      break;
+  if (!use)
+    return false;
+
+  /* Validate the data flow information before attempting to
+     find the instruction that defines in_a.  */
+
+  struct df_link *ref_chain;
+  ref_chain = DF_REF_CHAIN (use);
+  if (!ref_chain || !ref_chain->ref
+      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
+    return false;
+
+  rtx_insn *def_insn = DF_REF_INSN (ref_chain->ref);
+  /* We found the insn that defines in_a.  Only consider the cases where
+     it is in the same block as the comparison.  */
+  if (BLOCK_FOR_INSN (cmp_insn) != BLOCK_FOR_INSN (def_insn))
+    return false;
+
+  rtx set = single_set (def_insn);
+  if (!set)
+    return false;
+
+  if (!can_merge_compare_into_arith (cmp_insn, def_insn))
+    return false;
+
+  rtx src = SET_SRC (set);
+  rtx flags = maybe_select_cc_mode (cmp, src, CONST0_RTX (GET_MODE (src)));
+  if (!flags)
+    {
+    /* We may already have a change group going through maybe_select_cc_mode.
+       Discard it properly.  */
+      cancel_changes (0);
+      return false;
+    }
+
+  rtx flag_set
+    = gen_rtx_SET (flags, gen_rtx_COMPARE (GET_MODE (flags),
+					   copy_rtx (src),
+					   CONST0_RTX (GET_MODE (src))));
+  rtx arith_set = copy_rtx (PATTERN (def_insn));
+  rtx par = try_validate_parallel (flag_set, arith_set);
+  if (!par)
+    {
+      /* We may already have a change group going through maybe_select_cc_mode.
+	 Discard it properly.  */
+      cancel_changes (0);
+      return false;
+    }
+  if (!apply_change_group ())
+    return false;
+  emit_insn_after (par, def_insn);
+  delete_insn (def_insn);
+  delete_insn (cmp->insn);
+  return true;
+}
+
 /* Attempt to replace a comparison with a prior arithmetic insn that can
    compute the same flags value as the comparison itself.  Return true if
    successful, having made all rtl modifications necessary.  */
@@ -588,7 +726,9 @@ try_eliminate_compare (struct comparison *cmp)
 {
   rtx flags, in_a, in_b, cmp_src;
 
-  /* We must have found an interesting "clobber" preceding the compare.  */
+  if (try_merge_compare (cmp))
+    return true;
+
   if (cmp->prev_clobber == NULL)
     return false;
 
@@ -714,6 +854,7 @@ try_eliminate_compare (struct comparison *cmp)
 static unsigned int
 execute_compare_elim_after_reload (void)
 {
+  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
   df_analyze ();
 
   gcc_checking_assert (!all_compares.exists ());
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c b/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c
new file mode 100644
index 0000000..953c388
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpelim_mult_uses_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* X is both compared against zero and used.  Make sure we can still
+   generate an ADDS and avoid an explicit comparison against zero.  */
+
+int
+foo (int x, int y)
+{
+  x += y;
+  if (x != 0)
+    x = x + 2;
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "adds\\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-not "cmp\\tw\[0-9\]+, 0" } } */
-- 
1.9.1

next prev parent reply	other threads:[~2017-09-06 15:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-10 22:03 Michael Collison
2017-08-28 18:43 ` Jeff Law
2017-08-29  9:25   ` Kyrill Tkachov
2017-09-01 23:07     ` Segher Boessenkool
2017-09-06 15:56       ` Michael Collison [this message]
2017-10-13 18:04         ` Jeff Law
2017-10-14  8:45           ` Eric Botcazou
2017-10-17 11:57             ` Richard Biener
2017-10-17 19:29               ` Michael Collison
2017-10-17 20:05                 ` Eric Botcazou
2017-10-17 20:12                 ` Richard Biener
2017-10-17 20:29                   ` Michael Collison
2017-10-18 14:14                     ` Eric Botcazou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HE1PR0802MB2377818A883A7B663207D6B395970@HE1PR0802MB2377.eurprd08.prod.outlook.com \
    --to=michael.collison@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=law@redhat.com \
    --cc=nd@arm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).