public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes
@ 2015-12-17 15:36 Kyrill Tkachov
  2015-12-17 15:58 ` Bernd Schmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Kyrill Tkachov @ 2015-12-17 15:36 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 3523 bytes --]

Hi all,

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.
This means that for the example:
int
f255 (int x)
{
   if (x & 255)
     return 1;
   return x;
}

it ends up trying to make a QImode comparison against zero, for which targets like
aarch64 have no pattern.

This patch attempts to fix this in two places in combine.
First is simplify_comparison when handling the and-bitmask case.
Currently it will call gen_lowpart_or_truncate on the argument to produce the short subreg.
With this patch we don't do that when comparing against zero.
This way the and-bitmask form is preserved for make_extraction later on to convert
into a zero_extract.
The second place is in make_extraction itself where it tries to avoid creating a zero_extract,
but the canonicalisation rules and the function comment for make_extraction say that it should
try hard create a zero_extraction when inside a comparison in particular
(" IN_COMPARE is nonzero if we are in a COMPARE.  This means that a
    ZERO_EXTRACT should be built even for bits starting at bit 0.")

With this patch for the testcases:
int
f255 (int x)
{
   if (x & 255)
     return 1;
   return x;
}

int
foo (long x)
{
    return ((short) x != 0) ? x : 1;
}

we now generate for aarch64 at -O2:
f255:
         tst     x0, 255
         csinc   w0, w0, wzr, eq
         ret

and
foo:
         tst     x0, 65535
         csinc   x0, x0, xzr, ne
         ret


instead of the previous:
f255:
         and     w1, w0, 255
         cmp     w1, wzr
         csinc   w0, w0, wzr, eq
         ret

foo:
         sxth    w1, w0
         cmp     w1, wzr
         csinc   x0, x0, xzr, ne
         ret


Bootstrapped and tested on arm, aarch64, x86_64.
To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64 pattern
for comparing a zero_extract with zero.
On aarch64 this greatly increases the usage of the TST instruction by about 54% on SPEC2006.
Performance-wise there were no regressions and slight improvements on SPECINT that may just
be above normal noise (overall 0.5% improvement).
On arm it makes very little difference (arm already defines QI and HImode comparisons against zero)
but makes more use of the lsrs-immediate instruction in place of the arm tst instruction, which has
a shorter encoding in Thumb2 state.
On x86_64 I saw no difference in code size for SPEC2006 on my setup.

What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules with minimal(none?) negative
fallout.

Thanks,
Kyrill


2015-12-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68796
     * combine.c (make_extraction): Don't try to avoid the extraction if
     inside a compare.
     (simplify_comparison): Don't truncate to lowpart if comparing against
     zero and target doesn't have a native compare instruction in the
     required short mode.

2015-12-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68796
     * gcc.target/aarch64/tst_5.c: New test.
     * gcc.target/aarch64/tst_6.c: Likewise.

[-- Attachment #2: combine-compare-extract.patch --]
[-- Type: text/x-patch, Size: 2975 bytes --]

diff --git a/gcc/combine.c b/gcc/combine.c
index 8601d8983ce345e2129dd047b3520d98c0582842..345e63f9a05f2310a5c9e5b239ed069d22565d1c 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7337,10 +7337,13 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
      low-order bit and this is either not in the destination or we have the
      appropriate STRICT_LOW_PART operation available.
 
+     Don't do this if we are inside a comparison, as the canonicalization
+     rules call for a zero_extract form.
      For MEM, we can avoid an extract if the field starts on an appropriate
      boundary and we can change the mode of the memory reference.  */
 
   if (tmode != BLKmode
+      && !in_compare
       && ((pos_rtx == 0 && (pos % BITS_PER_WORD) == 0
 	   && !MEM_P (inner)
 	   && (inner_mode == tmode
@@ -12108,14 +12111,19 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 
 	     unless TRULY_NOOP_TRUNCATION allows it or the register is
 	     known to hold a value of the required mode the
-	     transformation is invalid.  */
+	     transformation is invalid.
+	     If the target does not have a compare instruction of that mode
+	     don't do this when comparing against 0 since the canonicalization
+	     rules require such an operation to be represented as a
+	     zero_extract, which make_extraction will produce later on.  */
 	  if ((equality_comparison_p || unsigned_comparison_p)
 	      && CONST_INT_P (XEXP (op0, 1))
 	      && (i = exact_log2 ((UINTVAL (XEXP (op0, 1))
 				   & GET_MODE_MASK (mode))
 				  + 1)) >= 0
 	      && const_op >> i == 0
-	      && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode)
+	      && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode
+	      && (const_op != 0 || have_insn_for (COMPARE, tmode)))
 	    {
 	      op0 = gen_lowpart_or_truncate (tmode, XEXP (op0, 0));
 	      continue;
diff --git a/gcc/testsuite/gcc.target/aarch64/tst_5.c b/gcc/testsuite/gcc.target/aarch64/tst_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..868a3be502e1468592c17b8a9c6a359af00e61a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tst_5.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+f255 (int x)
+{
+  if (x & 255)
+    return 1;
+  return x;
+}
+
+int
+f65535 (int x)
+{
+  if (x & 65535)
+    return 1;
+  return x;
+}
+
+/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*255" } } */
+/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*65535" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/tst_6.c b/gcc/testsuite/gcc.target/aarch64/tst_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..2f1c78d873445b11562c5e2863a3f642ff11f318
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tst_6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (long x)
+{
+   return ((short) x != 0) ? x : 1;
+}
+
+/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*65535" } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-17 18:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-17 15:36 [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes Kyrill Tkachov
2015-12-17 15:58 ` Bernd Schmidt
2015-12-17 16:10   ` Kyrill Tkachov
2015-12-17 16:12     ` Bernd Schmidt
2015-12-17 16:26       ` Kyrill Tkachov
2015-12-17 16:59         ` Jeff Law
2015-12-17 17:05           ` Kyrill Tkachov
2015-12-17 17:33             ` Jeff Law
2015-12-17 17:27       ` Segher Boessenkool
2015-12-17 17:44         ` Kyrill Tkachov
2015-12-17 18:05           ` Bernd Schmidt
2015-12-17 16:46   ` Jeff Law
2015-12-17 16:52   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).