public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
@ 2015-07-29 14:16 Kyrill Tkachov
  2015-07-29 22:46 ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Kyrill Tkachov @ 2015-07-29 14:16 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3018 bytes --]

Hi all,

This patch improves RTL if-conversion on sequences that perform a conditional select on integer constants.
Most of the smart logic to do that already exists in the noce_try_store_flag_constants function.
However, currently that function is tried after noce_try_cmove.
noce_try_cmove is a simple catch-all function that just loads the two immediates and performs a conditional
select between them. It returns true and then the caller noce_process_if_block doesn't try any other transformations,
completely skipping the more aggressive transformations that noce_try_store_flag_constants allows!

Calling noce_try_store_flag_constants before noce_try_cmove allows for the smarter if-conversion transformations
to be used. An example that occurs a lot in the gcc code itself is for the C code:
int
foo (int a, int b)
{
   return ((a & (1 << 25)) ? 5 : 4);
}

i.e. test a bit in a and return 5 or 4. Currently on aarch64 this generates the naive:
         and     w2, w0, 33554432  // mask away all bits except bit 25
         mov     w1, 4
         cmp     w2, wzr
         mov     w0, 5
         csel    w0, w0, w1, ne


whereas with this patch this can be transformed into the much better:
         ubfx    x0, x0, 25, 1  // extract bit 25
         add     w0, w0, 4

Another issue I encountered is that the code that claims to perform the transformation:
       /* if (test) x = 3; else x = 4;
      =>   x = 3 + (test == 0);  */

doesn't seem to do exactly that in all cases. In fact for that case it will try something like:
x = 4 - (test == 0)
which is suboptimal for targets like aarch64 which have a conditional increment operation.
This patch tweaks that code to always try to generate an addition of the condition rather than
a subtraction.

Anyway, for code:
int
fooinc (int x)
{
   return x ? 1025 : 1026;
}

we currently generate:
         mov     w2, 1025
         mov     w1, 1026
         cmp     w0, wzr
         csel    w0, w2, w1, ne

whereas with this patch we will generate:
         cmp     w0, wzr
         cset    w0, eq
         add     w0, w0, 1025

Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?

Thanks,
Kyrill

P.S. noce_try_store_flag_constants is currently gated on !targetm.have_conditional_execution () but I don't see
any reason to restrict it on targets with conditional execution. For example, I think the first example above
would show a benefit on arm if it was enabled there. But that can be a separate investigation.

2015-07-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
     -STORE_FLAG and condition is reversable.  Prefer to add to the
     flag value.
     (noce_process_if_block): Try noce_try_store_flag_constants before
     noce_try_cmove.

2015-07-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/csel_bfx_1.c: New test.
     * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifcvt-csel-imms.patch --]
[-- Type: text/x-patch; name=ifcvt-csel-imms.patch, Size: 3312 bytes --]

commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Jul 28 14:59:46 2015 +0100

    [RTL-ifcvt] Improve conditional increment ops on immediates

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..80d0285 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 
       reversep = 0;
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
-	normalize = 0;
+	normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
       else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	       && (STORE_FLAG_VALUE == 1
 		   || if_info->branch_cost >= 2))
@@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	 =>   x = 3 + (test == 0);  */
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
 	{
-	  target = expand_simple_binop (mode,
-					(diff == STORE_FLAG_VALUE
-					 ? PLUS : MINUS),
-					gen_int_mode (ifalse, mode), target,
+	  rtx_code code = reversep ? PLUS :
+				    (diff == STORE_FLAG_VALUE ? PLUS
+							       : MINUS);
+	  HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
+
+	  target = expand_simple_binop (mode, code,
+					gen_int_mode (to_add, mode), target,
 					if_info->x, 0, OPTAB_WIDEN);
 	}
 
@@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
     goto success;
   if (noce_try_abs (if_info))
     goto success;
+  if (!targetm.have_conditional_execution ()
+      && noce_try_store_flag_constants (if_info))
+    goto success;
   if (HAVE_conditional_move
       && noce_try_cmove (if_info))
     goto success;
   if (! targetm.have_conditional_execution ())
     {
-      if (noce_try_store_flag_constants (if_info))
-	goto success;
       if (noce_try_addcc (if_info))
 	goto success;
       if (noce_try_store_flag_mask (if_info))
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
new file mode 100644
index 0000000..c20597f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+int
+foo (int a, int b)
+{
+  return ((a & (1 << 25)) ? 5 : 4);
+}
+
+/* { dg-final { scan-assembler "ubfx\t\[xw\]\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
new file mode 100644
index 0000000..2ae434d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+extern void abort (void);
+
+int
+fooinc (int x)
+{
+  if (x)
+    return 1025;
+  else
+    return 1026;
+}
+
+int
+fooinc2 (int x)
+{
+  if (x)
+    return 1026;
+  else
+    return 1025;
+}
+
+int
+main (void)
+{
+  if (fooinc (0) != 1026)
+    abort ();
+
+  if (fooinc (1) != 1025)
+    abort ();
+
+  if (fooinc2 (0) != 1025)
+    abort ();
+
+  if (fooinc2 (1) != 1026)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
@ 2015-08-03 12:33 Uros Bizjak
  2015-08-03 13:02 ` Kyrill Tkachov
  0 siblings, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2015-08-03 12:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrill Tkachov, Jeff Law, H.J. Lu

Hello!

> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>     when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>     explicit.  Prefer to add the flag whenever possible.
>     (noce_process_if_block): Try noce_try_store_flag_constants before
>     noce_try_cmove.
>
> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/csel_bfx_1.c: New test.
>     * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

This patch regressed following tests on x86_64:

FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]

While cmov3 case is questionable by itself in light of PR56309 [1],
the cnov2 case regressed from:

       cmpl    %edi, %esi
       sbbl    %eax, %eax
       andl    $-10, %eax
       addl    $5, %eax
       ret

to:

       xorl    %eax, %eax
       cmpl    %esi, %edi
       setbe   %al
       negl    %eax
       andl    $10, %eax
       subl    $5, %eax
       ret

Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
anymore. Non-QImode setcc instructions are problematic on x86, since
they need to be zero-extended to their full length.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-08-12 17:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-29 14:16 [PATCH][RTL-ifcvt] Improve conditional select ops on immediates Kyrill Tkachov
2015-07-29 22:46 ` Jeff Law
2015-07-30 14:30   ` Kyrill Tkachov
2015-07-31 16:16     ` Jeff Law
2015-08-03 12:33 Uros Bizjak
2015-08-03 13:02 ` Kyrill Tkachov
2015-08-03 13:37   ` Uros Bizjak
2015-08-03 13:43     ` Kyrill Tkachov
2015-08-03 14:12       ` Kyrill Tkachov
2015-08-03 14:15       ` Uros Bizjak
2015-08-03 15:37         ` Kyrill Tkachov
2015-08-03 15:46           ` Uros Bizjak
2015-08-03 15:53             ` Kyrill Tkachov
2015-08-03 17:20               ` Kyrill Tkachov
2015-08-03 17:37                 ` Uros Bizjak
2015-08-04  8:44                   ` Kyrill Tkachov
2015-08-10  9:36                     ` Kyrill Tkachov
2015-08-10  9:43                       ` Uros Bizjak
2015-08-10  9:44                         ` Kyrill Tkachov
2015-08-12 17:52                     ` Jeff Law
2015-08-03 14:04   ` Uros Bizjak
2015-08-03 14:37     ` H.J. Lu
2015-08-03 15:50     ` Ilya Enkovich
2015-08-03 16:29       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).