public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
@ 2015-07-29 14:16 Kyrill Tkachov
  2015-07-29 22:46 ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Kyrill Tkachov @ 2015-07-29 14:16 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3018 bytes --]

Hi all,

This patch improves RTL if-conversion on sequences that perform a conditional select on integer constants.
Most of the smart logic to do that already exists in the noce_try_store_flag_constants function.
However, currently that function is tried after noce_try_cmove.
noce_try_cmove is a simple catch-all function that just loads the two immediates and performs a conditional
select between them. It returns true and then the caller noce_process_if_block doesn't try any other transformations,
completely skipping the more aggressive transformations that noce_try_store_flag_constants allows!

Calling noce_try_store_flag_constants before noce_try_cmove allows for the smarter if-conversion transformations
to be used. An example that occurs a lot in the gcc code itself is for the C code:
int
foo (int a, int b)
{
   return ((a & (1 << 25)) ? 5 : 4);
}

i.e. test a bit in a and return 5 or 4. Currently on aarch64 this generates the naive:
         and     w2, w0, 33554432  // mask away all bits except bit 25
         mov     w1, 4
         cmp     w2, wzr
         mov     w0, 5
         csel    w0, w0, w1, ne


whereas with this patch this can be transformed into the much better:
         ubfx    x0, x0, 25, 1  // extract bit 25
         add     w0, w0, 4

Another issue I encountered is that the code that claims to perform the transformation:
       /* if (test) x = 3; else x = 4;
      =>   x = 3 + (test == 0);  */

doesn't seem to do exactly that in all cases. In fact for that case it will try something like:
x = 4 - (test == 0)
which is suboptimal for targets like aarch64 which have a conditional increment operation.
This patch tweaks that code to always try to generate an addition of the condition rather than
a subtraction.

Anyway, for code:
int
fooinc (int x)
{
   return x ? 1025 : 1026;
}

we currently generate:
         mov     w2, 1025
         mov     w1, 1026
         cmp     w0, wzr
         csel    w0, w2, w1, ne

whereas with this patch we will generate:
         cmp     w0, wzr
         cset    w0, eq
         add     w0, w0, 1025

Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?

Thanks,
Kyrill

P.S. noce_try_store_flag_constants is currently gated on !targetm.have_conditional_execution () but I don't see
any reason to restrict it on targets with conditional execution. For example, I think the first example above
would show a benefit on arm if it was enabled there. But that can be a separate investigation.

2015-07-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
     -STORE_FLAG and condition is reversable.  Prefer to add to the
     flag value.
     (noce_process_if_block): Try noce_try_store_flag_constants before
     noce_try_cmove.

2015-07-29  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/csel_bfx_1.c: New test.
     * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifcvt-csel-imms.patch --]
[-- Type: text/x-patch; name=ifcvt-csel-imms.patch, Size: 3312 bytes --]

commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Jul 28 14:59:46 2015 +0100

    [RTL-ifcvt] Improve conditional increment ops on immediates

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..80d0285 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 
       reversep = 0;
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
-	normalize = 0;
+	normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
       else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	       && (STORE_FLAG_VALUE == 1
 		   || if_info->branch_cost >= 2))
@@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	 =>   x = 3 + (test == 0);  */
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
 	{
-	  target = expand_simple_binop (mode,
-					(diff == STORE_FLAG_VALUE
-					 ? PLUS : MINUS),
-					gen_int_mode (ifalse, mode), target,
+	  rtx_code code = reversep ? PLUS :
+				    (diff == STORE_FLAG_VALUE ? PLUS
+							       : MINUS);
+	  HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
+
+	  target = expand_simple_binop (mode, code,
+					gen_int_mode (to_add, mode), target,
 					if_info->x, 0, OPTAB_WIDEN);
 	}
 
@@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
     goto success;
   if (noce_try_abs (if_info))
     goto success;
+  if (!targetm.have_conditional_execution ()
+      && noce_try_store_flag_constants (if_info))
+    goto success;
   if (HAVE_conditional_move
       && noce_try_cmove (if_info))
     goto success;
   if (! targetm.have_conditional_execution ())
     {
-      if (noce_try_store_flag_constants (if_info))
-	goto success;
       if (noce_try_addcc (if_info))
 	goto success;
       if (noce_try_store_flag_mask (if_info))
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
new file mode 100644
index 0000000..c20597f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+int
+foo (int a, int b)
+{
+  return ((a & (1 << 25)) ? 5 : 4);
+}
+
+/* { dg-final { scan-assembler "ubfx\t\[xw\]\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
new file mode 100644
index 0000000..2ae434d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+extern void abort (void);
+
+int
+fooinc (int x)
+{
+  if (x)
+    return 1025;
+  else
+    return 1026;
+}
+
+int
+fooinc2 (int x)
+{
+  if (x)
+    return 1026;
+  else
+    return 1025;
+}
+
+int
+main (void)
+{
+  if (fooinc (0) != 1026)
+    abort ();
+
+  if (fooinc (1) != 1025)
+    abort ();
+
+  if (fooinc2 (0) != 1025)
+    abort ();
+
+  if (fooinc2 (1) != 1026)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-07-29 14:16 [PATCH][RTL-ifcvt] Improve conditional select ops on immediates Kyrill Tkachov
@ 2015-07-29 22:46 ` Jeff Law
  2015-07-30 14:30   ` Kyrill Tkachov
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2015-07-29 22:46 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 07/29/2015 07:49 AM, Kyrill Tkachov wrote:
> Hi all,
>
> This patch improves RTL if-conversion on sequences that perform a
> conditional select on integer constants.
> Most of the smart logic to do that already exists in the
> noce_try_store_flag_constants function.
> However, currently that function is tried after noce_try_cmove.
> noce_try_cmove is a simple catch-all function that just loads the two
> immediates and performs a conditional
> select between them. It returns true and then the caller
> noce_process_if_block doesn't try any other transformations,
> completely skipping the more aggressive transformations that
> noce_try_store_flag_constants allows!
>
> Calling noce_try_store_flag_constants before noce_try_cmove allows for
> the smarter if-conversion transformations
> to be used. An example that occurs a lot in the gcc code itself is for
> the C code:
> int
> foo (int a, int b)
> {
>    return ((a & (1 << 25)) ? 5 : 4);
> }
>
> i.e. test a bit in a and return 5 or 4. Currently on aarch64 this
> generates the naive:
>          and     w2, w0, 33554432  // mask away all bits except bit 25
>          mov     w1, 4
>          cmp     w2, wzr
>          mov     w0, 5
>          csel    w0, w0, w1, ne
>
>
> whereas with this patch this can be transformed into the much better:
>          ubfx    x0, x0, 25, 1  // extract bit 25
>          add     w0, w0, 4
I suspect the PA would benefit from this as well, probably several 
other architectures with reasonable bitfield extraction capabilities.

>
> Another issue I encountered is that the code that claims to perform the
> transformation:
>        /* if (test) x = 3; else x = 4;
>       =>   x = 3 + (test == 0);  */
>
> doesn't seem to do exactly that in all cases. In fact for that case it
> will try something like:
> x = 4 - (test == 0)
> which is suboptimal for targets like aarch64 which have a conditional
> increment operation.
I vaguely recall targets that don't have const - X insns, but do have X 
+ const style insns.  And more generally I think we're better off 
generating the PLUS version.


> This patch tweaks that code to always try to generate an addition of the
> condition rather than
> a subtraction.
>
> Anyway, for code:
> int
> fooinc (int x)
> {
>    return x ? 1025 : 1026;
> }
>
> we currently generate:
>          mov     w2, 1025
>          mov     w1, 1026
>          cmp     w0, wzr
>          csel    w0, w2, w1, ne
>
> whereas with this patch we will generate:
>          cmp     w0, wzr
>          cset    w0, eq
>          add     w0, w0, 1025
>
> Bootstrapped and tested on arm, aarch64, x86_64.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> P.S. noce_try_store_flag_constants is currently gated on
> !targetm.have_conditional_execution () but I don't see
> any reason to restrict it on targets with conditional execution. For
> example, I think the first example above
> would show a benefit on arm if it was enabled there. But that can be a
> separate investigation.
>
> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
>      -STORE_FLAG and condition is reversable.  Prefer to add to the
>      flag value.
>      (noce_process_if_block): Try noce_try_store_flag_constants before
>      noce_try_cmove.
>
> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>
> ifcvt-csel-imms.patch
>
>
> commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> Date:   Tue Jul 28 14:59:46 2015 +0100
>
>      [RTL-ifcvt] Improve conditional increment ops on immediates
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index a57d78c..80d0285 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
>
>         reversep = 0;
>         if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
> -	normalize = 0;
> +	normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
We generally avoid using a ',' operator like this.  However, I can see 
that you're just following existing convention in that code.  So I won't 
object.



>         else if (ifalse == 0 && exact_log2 (itrue) >= 0
>   	       && (STORE_FLAG_VALUE == 1
>   		   || if_info->branch_cost >= 2))
> @@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
>   	 =>   x = 3 + (test == 0);  */
>         if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
>   	{
> -	  target = expand_simple_binop (mode,
> -					(diff == STORE_FLAG_VALUE
> -					 ? PLUS : MINUS),
> -					gen_int_mode (ifalse, mode), target,
> +	  rtx_code code = reversep ? PLUS :
> +				    (diff == STORE_FLAG_VALUE ? PLUS
> +							       : MINUS);
> +	  HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
> +
> +	  target = expand_simple_binop (mode, code,
> +					gen_int_mode (to_add, mode), target,
>   					if_info->x, 0, OPTAB_WIDEN);
>   	}
Note that STORE_FLAG_VALUE can potentially be any value.  Most targets 
use "1", but others use -1 (m68k for example, there are others).  I'm 
concerned that the reversep ? MIN (ifalse, itrue) won't do what we want 
on targets such as the m68k.

The logic here is also somewhat convoluted.  When reversep is true, we 
always use PLUS, but is that actually correct?  Normally we'd compute 
the code, then invert the code.  ie, if we normally want PLUS, then 
we've flip to MINUS and vice-versa.


>
> @@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
>       goto success;
>     if (noce_try_abs (if_info))
>       goto success;
> +  if (!targetm.have_conditional_execution ()
> +      && noce_try_store_flag_constants (if_info))
> +    goto success;
>     if (HAVE_conditional_move
>         && noce_try_cmove (if_info))
>       goto success;
>     if (! targetm.have_conditional_execution ())
>       {
> -      if (noce_try_store_flag_constants (if_info))
> -	goto success;
>         if (noce_try_addcc (if_info))
>   	goto success;
>         if (noce_try_store_flag_mask (if_info))
This part seems fine and ought to be able to go forward independently. 
Consider this hunk and its associated testcase approved if you want to 
test it independently and get it installed while we iterate on the other 
hunks,

jeff


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-07-29 22:46 ` Jeff Law
@ 2015-07-30 14:30   ` Kyrill Tkachov
  2015-07-31 16:16     ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Kyrill Tkachov @ 2015-07-30 14:30 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 8530 bytes --]

Hi Jeff,

On 29/07/15 23:38, Jeff Law wrote:
> On 07/29/2015 07:49 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This patch improves RTL if-conversion on sequences that perform a
>> conditional select on integer constants.
>> Most of the smart logic to do that already exists in the
>> noce_try_store_flag_constants function.
>> However, currently that function is tried after noce_try_cmove.
>> noce_try_cmove is a simple catch-all function that just loads the two
>> immediates and performs a conditional
>> select between them. It returns true and then the caller
>> noce_process_if_block doesn't try any other transformations,
>> completely skipping the more aggressive transformations that
>> noce_try_store_flag_constants allows!
>>
>> Calling noce_try_store_flag_constants before noce_try_cmove allows for
>> the smarter if-conversion transformations
>> to be used. An example that occurs a lot in the gcc code itself is for
>> the C code:
>> int
>> foo (int a, int b)
>> {
>>     return ((a & (1 << 25)) ? 5 : 4);
>> }
>>
>> i.e. test a bit in a and return 5 or 4. Currently on aarch64 this
>> generates the naive:
>>           and     w2, w0, 33554432  // mask away all bits except bit 25
>>           mov     w1, 4
>>           cmp     w2, wzr
>>           mov     w0, 5
>>           csel    w0, w0, w1, ne
>>
>>
>> whereas with this patch this can be transformed into the much better:
>>           ubfx    x0, x0, 25, 1  // extract bit 25
>>           add     w0, w0, 4
> I suspect the PA would benefit from this as well, probably several
> other architectures with reasonable bitfield extraction capabilities.

This helps on arm as well but isn't enabled there at the moment
because this function is gated on !have_conditional_execution ().
We can look at enabling this for conditional execution
as well as a separate piece of work.

>
>> Another issue I encountered is that the code that claims to perform the
>> transformation:
>>         /* if (test) x = 3; else x = 4;
>>        =>   x = 3 + (test == 0);  */
>>
>> doesn't seem to do exactly that in all cases. In fact for that case it
>> will try something like:
>> x = 4 - (test == 0)
>> which is suboptimal for targets like aarch64 which have a conditional
>> increment operation.
> I vaguely recall targets that don't have const - X insns, but do have X
> + const style insns.  And more generally I think we're better off
> generating the PLUS version.

Now that I think of it, aarch64 does have the CSETM instruction that
conditionally sets a register to 0 or -1, but I didn't see it being utilised
as frequently as I'd like to. Anyway, I do prefer generating the PLUS version
when possible too, which is what this patch tries to do.

>
>
>> This patch tweaks that code to always try to generate an addition of the
>> condition rather than
>> a subtraction.
>>
>> Anyway, for code:
>> int
>> fooinc (int x)
>> {
>>     return x ? 1025 : 1026;
>> }
>>
>> we currently generate:
>>           mov     w2, 1025
>>           mov     w1, 1026
>>           cmp     w0, wzr
>>           csel    w0, w2, w1, ne
>>
>> whereas with this patch we will generate:
>>           cmp     w0, wzr
>>           cset    w0, eq
>>           add     w0, w0, 1025
>>
>> Bootstrapped and tested on arm, aarch64, x86_64.
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> P.S. noce_try_store_flag_constants is currently gated on
>> !targetm.have_conditional_execution () but I don't see
>> any reason to restrict it on targets with conditional execution. For
>> example, I think the first example above
>> would show a benefit on arm if it was enabled there. But that can be a
>> separate investigation.
>>
>> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>       * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
>>       -STORE_FLAG and condition is reversable.  Prefer to add to the
>>       flag value.
>>       (noce_process_if_block): Try noce_try_store_flag_constants before
>>       noce_try_cmove.
>>
>> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>       * gcc.target/aarch64/csel_bfx_1.c: New test.
>>       * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>
>> ifcvt-csel-imms.patch
>>
>>
>> commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
>> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>> Date:   Tue Jul 28 14:59:46 2015 +0100
>>
>>       [RTL-ifcvt] Improve conditional increment ops on immediates
>>
>> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
>> index a57d78c..80d0285 100644
>> --- a/gcc/ifcvt.c
>> +++ b/gcc/ifcvt.c
>> @@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
>>
>>          reversep = 0;
>>          if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
>> -	normalize = 0;
>> +	normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
> We generally avoid using a ',' operator like this.  However, I can see
> that you're just following existing convention in that code.  So I won't
> object.

Ok, I've respun the patch to convert this part to use a more conventional
style and also took the liberty of converting reversep and can_reverse into
bool variables.

>
>
>
>>          else if (ifalse == 0 && exact_log2 (itrue) >= 0
>>    	       && (STORE_FLAG_VALUE == 1
>>    		   || if_info->branch_cost >= 2))
>> @@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
>>    	 =>   x = 3 + (test == 0);  */
>>          if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
>>    	{
>> -	  target = expand_simple_binop (mode,
>> -					(diff == STORE_FLAG_VALUE
>> -					 ? PLUS : MINUS),
>> -					gen_int_mode (ifalse, mode), target,
>> +	  rtx_code code = reversep ? PLUS :
>> +				    (diff == STORE_FLAG_VALUE ? PLUS
>> +							       : MINUS);
>> +	  HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
>> +
>> +	  target = expand_simple_binop (mode, code,
>> +					gen_int_mode (to_add, mode), target,
>>    					if_info->x, 0, OPTAB_WIDEN);
>>    	}
> Note that STORE_FLAG_VALUE can potentially be any value.  Most targets
> use "1", but others use -1 (m68k for example, there are others).  I'm
> concerned that the reversep ? MIN (ifalse, itrue) won't do what we want
> on targets such as the m68k.
>
> The logic here is also somewhat convoluted.  When reversep is true, we
> always use PLUS, but is that actually correct?  Normally we'd compute
> the code, then invert the code.  ie, if we normally want PLUS, then
> we've flip to MINUS and vice-versa.

I agree that it's hard to follow. In this respin I have
explicitly laid out the different combinations of diff
and STORE_FLAG_VALUE. It is more verbose now, but hopefully
it's easier to follow.

>
>
>> @@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
>>        goto success;
>>      if (noce_try_abs (if_info))
>>        goto success;
>> +  if (!targetm.have_conditional_execution ()
>> +      && noce_try_store_flag_constants (if_info))
>> +    goto success;
>>      if (HAVE_conditional_move
>>          && noce_try_cmove (if_info))
>>        goto success;
>>      if (! targetm.have_conditional_execution ())
>>        {
>> -      if (noce_try_store_flag_constants (if_info))
>> -	goto success;
>>          if (noce_try_addcc (if_info))
>>    	goto success;
>>          if (noce_try_store_flag_mask (if_info))
> This part seems fine and ought to be able to go forward independently.
> Consider this hunk and its associated testcase approved if you want to
> test it independently and get it installed while we iterate on the other
> hunks,

This respin includes that hunk, however if this patch goes through any more
respins then I'll separate it out into a separate patch.

This version has been bootstrapped and tested on x86_64 and aarch64.

Thanks for the feedback!
Kyrill

2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
     when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
     explicit.  Prefer to add the flag whenever possible.
     (noce_process_if_block): Try noce_try_store_flag_constants before
     noce_try_cmove.

2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/csel_bfx_1.c: New test.
     * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.



>
> jeff
>
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifcvt-csel-imms.patch --]
[-- Type: text/x-patch; name=ifcvt-csel-imms.patch, Size: 5658 bytes --]

commit bd720b6a5a298449ad850517e0fc29e825c702d9
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Jul 28 14:59:46 2015 +0100

    [RTL-ifcvt] Improve conditional increment ops on immediates

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..909c674 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1194,9 +1194,10 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 {
   rtx target;
   rtx_insn *seq;
-  int reversep;
+  bool reversep;
   HOST_WIDE_INT itrue, ifalse, diff, tmp;
-  int normalize, can_reverse;
+  int normalize;
+  bool can_reverse;
   machine_mode mode;
 
   if (!noce_simple_bbs (if_info))
@@ -1208,6 +1209,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
       mode = GET_MODE (if_info->x);
       ifalse = INTVAL (if_info->a);
       itrue = INTVAL (if_info->b);
+      bool subtract_flag_p = false;
 
       diff = (unsigned HOST_WIDE_INT) itrue - ifalse;
       /* Make sure we can represent the difference between the two values.  */
@@ -1220,23 +1222,61 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
       can_reverse = (reversed_comparison_code (if_info->cond, if_info->jump)
 		     != UNKNOWN);
 
-      reversep = 0;
+      reversep = false;
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
-	normalize = 0;
+	{
+	  normalize = 0;
+	  /* We could collapse these cases but it is easier to follow the
+	     diff/STORE_FLAG_VALUE combinations when they are listed
+	     explicitly.  */
+
+	  /* test ? 3 : 4
+	     => 4 + (test != 0).  */
+	  if (diff < 0 && STORE_FLAG_VALUE < 0)
+	      reversep = false;
+	  /* test ? 4 : 3
+	     => can_reverse  | 4 + (test == 0)
+		!can_reverse | 3 - (test != 0).  */
+	  else if (diff > 0 && STORE_FLAG_VALUE < 0)
+	    {
+	      reversep = can_reverse;
+	      subtract_flag_p = !can_reverse;
+	    }
+	  /* test ? 3 : 4
+	     => can_reverse  | 3 + (test == 0)
+		!can_reverse | 4 - (test != 0).  */
+	  else if (diff < 0 && STORE_FLAG_VALUE > 0)
+	    {
+	      reversep = can_reverse;
+	      subtract_flag_p = !can_reverse;
+	    }
+	  /* test ? 4 : 3
+	     => 4 + (test != 0).  */
+	  else if (diff > 0 && STORE_FLAG_VALUE > 0)
+	    reversep = false;
+	  else
+	    gcc_unreachable ();
+	}
       else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	       && (STORE_FLAG_VALUE == 1
 		   || if_info->branch_cost >= 2))
 	normalize = 1;
       else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
 	       && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
-	normalize = 1, reversep = 1;
+	{
+	  normalize = 1;
+	  reversep = true;
+	}
       else if (itrue == -1
 	       && (STORE_FLAG_VALUE == -1
 		   || if_info->branch_cost >= 2))
 	normalize = -1;
       else if (ifalse == -1 && can_reverse
 	       && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
-	normalize = -1, reversep = 1;
+	{
+	  normalize = -1;
+	  reversep = true;
+	}
       else if ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
 	       || if_info->branch_cost >= 3)
 	normalize = -1;
@@ -1261,9 +1301,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	 =>   x = 3 + (test == 0);  */
       if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
 	{
-	  target = expand_simple_binop (mode,
-					(diff == STORE_FLAG_VALUE
-					 ? PLUS : MINUS),
+	  /* Always use ifalse here.  It should have been swapped with itrue
+	     when appropriate when reversep is true.  */
+	  target = expand_simple_binop (mode, subtract_flag_p ? MINUS : PLUS,
 					gen_int_mode (ifalse, mode), target,
 					if_info->x, 0, OPTAB_WIDEN);
 	}
@@ -3120,13 +3160,14 @@ noce_process_if_block (struct noce_if_info *if_info)
     goto success;
   if (noce_try_abs (if_info))
     goto success;
+  if (!targetm.have_conditional_execution ()
+      && noce_try_store_flag_constants (if_info))
+    goto success;
   if (HAVE_conditional_move
       && noce_try_cmove (if_info))
     goto success;
   if (! targetm.have_conditional_execution ())
     {
-      if (noce_try_store_flag_constants (if_info))
-	goto success;
       if (noce_try_addcc (if_info))
 	goto success;
       if (noce_try_store_flag_mask (if_info))
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
new file mode 100644
index 0000000..c20597f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+int
+foo (int a, int b)
+{
+  return ((a & (1 << 25)) ? 5 : 4);
+}
+
+/* { dg-final { scan-assembler "ubfx\t\[xw\]\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
new file mode 100644
index 0000000..2ae434d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+
+extern void abort (void);
+
+int
+fooinc (int x)
+{
+  if (x)
+    return 1025;
+  else
+    return 1026;
+}
+
+int
+fooinc2 (int x)
+{
+  if (x)
+    return 1026;
+  else
+    return 1025;
+}
+
+int
+main (void)
+{
+  if (fooinc (0) != 1026)
+    abort ();
+
+  if (fooinc (1) != 1025)
+    abort ();
+
+  if (fooinc2 (0) != 1025)
+    abort ();
+
+  if (fooinc2 (1) != 1026)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-07-30 14:30   ` Kyrill Tkachov
@ 2015-07-31 16:16     ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2015-07-31 16:16 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 07/30/2015 07:42 AM, Kyrill Tkachov wrote:
> Hi Jeff,
>
> On 29/07/15 23:38, Jeff Law wrote:
>> On 07/29/2015 07:49 AM, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> This patch improves RTL if-conversion on sequences that perform a
>>> conditional select on integer constants.
>>> Most of the smart logic to do that already exists in the
>>> noce_try_store_flag_constants function.
>>> However, currently that function is tried after noce_try_cmove.
>>> noce_try_cmove is a simple catch-all function that just loads the two
>>> immediates and performs a conditional
>>> select between them. It returns true and then the caller
>>> noce_process_if_block doesn't try any other transformations,
>>> completely skipping the more aggressive transformations that
>>> noce_try_store_flag_constants allows!
>>>
>>> Calling noce_try_store_flag_constants before noce_try_cmove allows for
>>> the smarter if-conversion transformations
>>> to be used. An example that occurs a lot in the gcc code itself is for
>>> the C code:
>>> int
>>> foo (int a, int b)
>>> {
>>>     return ((a & (1 << 25)) ? 5 : 4);
>>> }
>>>
>>> i.e. test a bit in a and return 5 or 4. Currently on aarch64 this
>>> generates the naive:
>>>           and     w2, w0, 33554432  // mask away all bits except bit 25
>>>           mov     w1, 4
>>>           cmp     w2, wzr
>>>           mov     w0, 5
>>>           csel    w0, w0, w1, ne
>>>
>>>
>>> whereas with this patch this can be transformed into the much better:
>>>           ubfx    x0, x0, 25, 1  // extract bit 25
>>>           add     w0, w0, 4
>> I suspect the PA would benefit from this as well, probably several
>> other architectures with reasonable bitfield extraction capabilities.
>
> This helps on arm as well but isn't enabled there at the moment
> because this function is gated on !have_conditional_execution ().
> We can look at enabling this for conditional execution
> as well as a separate piece of work.
>
>>
>>> Another issue I encountered is that the code that claims to perform the
>>> transformation:
>>>         /* if (test) x = 3; else x = 4;
>>>        =>   x = 3 + (test == 0);  */
>>>
>>> doesn't seem to do exactly that in all cases. In fact for that case it
>>> will try something like:
>>> x = 4 - (test == 0)
>>> which is suboptimal for targets like aarch64 which have a conditional
>>> increment operation.
>> I vaguely recall targets that don't have const - X insns, but do have X
>> + const style insns.  And more generally I think we're better off
>> generating the PLUS version.
>
> Now that I think of it, aarch64 does have the CSETM instruction that
> conditionally sets a register to 0 or -1, but I didn't see it being
> utilised
> as frequently as I'd like to. Anyway, I do prefer generating the PLUS
> version
> when possible too, which is what this patch tries to do.
>
>>
>>
>>> This patch tweaks that code to always try to generate an addition of the
>>> condition rather than
>>> a subtraction.
>>>
>>> Anyway, for code:
>>> int
>>> fooinc (int x)
>>> {
>>>     return x ? 1025 : 1026;
>>> }
>>>
>>> we currently generate:
>>>           mov     w2, 1025
>>>           mov     w1, 1026
>>>           cmp     w0, wzr
>>>           csel    w0, w2, w1, ne
>>>
>>> whereas with this patch we will generate:
>>>           cmp     w0, wzr
>>>           cset    w0, eq
>>>           add     w0, w0, 1025
>>>
>>> Bootstrapped and tested on arm, aarch64, x86_64.
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> P.S. noce_try_store_flag_constants is currently gated on
>>> !targetm.have_conditional_execution () but I don't see
>>> any reason to restrict it on targets with conditional execution. For
>>> example, I think the first example above
>>> would show a benefit on arm if it was enabled there. But that can be a
>>> separate investigation.
>>>
>>> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>>       * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
>>>       -STORE_FLAG and condition is reversable.  Prefer to add to the
>>>       flag value.
>>>       (noce_process_if_block): Try noce_try_store_flag_constants before
>>>       noce_try_cmove.
>>>
>>> 2015-07-29  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>>       * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>       * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>>
>>> ifcvt-csel-imms.patch
>>>
>>>
>>> commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
>>> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>>> Date:   Tue Jul 28 14:59:46 2015 +0100
>>>
>>>       [RTL-ifcvt] Improve conditional increment ops on immediates
>>>
>>> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
>>> index a57d78c..80d0285 100644
>>> --- a/gcc/ifcvt.c
>>> +++ b/gcc/ifcvt.c
>>> @@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct
>>> noce_if_info *if_info)
>>>
>>>          reversep = 0;
>>>          if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
>>> -    normalize = 0;
>>> +    normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) &&
>>> can_reverse;
>> We generally avoid using a ',' operator like this.  However, I can see
>> that you're just following existing convention in that code.  So I won't
>> object.
>
> Ok, I've respun the patch to convert this part to use a more conventional
> style and also took the liberty of converting reversep and can_reverse into
> bool variables.
>
>>
>>
>>
>>>          else if (ifalse == 0 && exact_log2 (itrue) >= 0
>>>               && (STORE_FLAG_VALUE == 1
>>>               || if_info->branch_cost >= 2))
>>> @@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct
>>> noce_if_info *if_info)
>>>         =>   x = 3 + (test == 0);  */
>>>          if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
>>>        {
>>> -      target = expand_simple_binop (mode,
>>> -                    (diff == STORE_FLAG_VALUE
>>> -                     ? PLUS : MINUS),
>>> -                    gen_int_mode (ifalse, mode), target,
>>> +      rtx_code code = reversep ? PLUS :
>>> +                    (diff == STORE_FLAG_VALUE ? PLUS
>>> +                                   : MINUS);
>>> +      HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
>>> +
>>> +      target = expand_simple_binop (mode, code,
>>> +                    gen_int_mode (to_add, mode), target,
>>>                        if_info->x, 0, OPTAB_WIDEN);
>>>        }
>> Note that STORE_FLAG_VALUE can potentially be any value.  Most targets
>> use "1", but others use -1 (m68k for example, there are others).  I'm
>> concerned that the reversep ? MIN (ifalse, itrue) won't do what we want
>> on targets such as the m68k.
>>
>> The logic here is also somewhat convoluted.  When reversep is true, we
>> always use PLUS, but is that actually correct?  Normally we'd compute
>> the code, then invert the code.  ie, if we normally want PLUS, then
>> we've flip to MINUS and vice-versa.
>
> I agree that it's hard to follow. In this respin I have
> explicitly laid out the different combinations of diff
> and STORE_FLAG_VALUE. It is more verbose now, but hopefully
> it's easier to follow.
>
>>
>>
>>> @@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info
>>> *if_info)
>>>        goto success;
>>>      if (noce_try_abs (if_info))
>>>        goto success;
>>> +  if (!targetm.have_conditional_execution ()
>>> +      && noce_try_store_flag_constants (if_info))
>>> +    goto success;
>>>      if (HAVE_conditional_move
>>>          && noce_try_cmove (if_info))
>>>        goto success;
>>>      if (! targetm.have_conditional_execution ())
>>>        {
>>> -      if (noce_try_store_flag_constants (if_info))
>>> -    goto success;
>>>          if (noce_try_addcc (if_info))
>>>        goto success;
>>>          if (noce_try_store_flag_mask (if_info))
>> This part seems fine and ought to be able to go forward independently.
>> Consider this hunk and its associated testcase approved if you want to
>> test it independently and get it installed while we iterate on the other
>> hunks,
>
> This respin includes that hunk, however if this patch goes through any more
> respins then I'll separate it out into a separate patch.
>
> This version has been bootstrapped and tested on x86_64 and aarch64.
>
> Thanks for the feedback!
> Kyrill
>
> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>      when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>      explicit.  Prefer to add the flag whenever possible.
>      (noce_process_if_block): Try noce_try_store_flag_constants before
>      noce_try_cmove.
>
> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
OK.  Thanks for taking care of this.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-04  8:44                   ` Kyrill Tkachov
  2015-08-10  9:36                     ` Kyrill Tkachov
@ 2015-08-12 17:52                     ` Jeff Law
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff Law @ 2015-08-12 17:52 UTC (permalink / raw)
  To: Kyrill Tkachov, Uros Bizjak; +Cc: gcc-patches, H.J. Lu

On 08/04/2015 02:44 AM, Kyrill Tkachov wrote:
>
> On 03/08/15 18:37, Uros Bizjak wrote:
>> On Mon, Aug 3, 2015 at 7:20 PM, Kyrill Tkachov
>> <kyrylo.tkachov@arm.com> wrote:
>>
>>>>>>> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
>>>>>>> don't think this is a good idea. In the expander, there is already
>>>>>>> quite some target-dependent code that goes great length to
>>>>>>> utilize sbb
>>>>>>> insn as much as possible, before cmove is used.
>>>>>>>
>>>>>>> IMO, as far as x86 is concerned, the best solution would be to
>>>>>>> revert
>>>>>>> the change. ix86_expand_int_movcc already does some tricks from your
>>>>>>> patch in a target-efficient way. Generic change that was
>>>>>>> introduced by
>>>>>>> your patch now interferes with this expansion.
>>>>>> Well, technically the transformation was already there, it was just
>>>>>> never
>>>>>> reached for an x86 compilation because noce_try_cmove was tried in
>>>>>> front
>>>>>> of
>>>>>> it
>>>>>> and used a target-specific expansion.
>>>>>> In any case, how's this proposal?
>>>>>> The transformation noce_try_store_flag_constants
>>>>>>          /* if (test) x = a; else x = b;
>>>>>>         =>   x = (-(test != 0) & (b - a)) + a;  */
>>>>>>
>>>>>> Is a catch-all-immediates transformation in
>>>>>> noce_try_store_flag_constants.
>>>>>> What if we moved it to noce_try_cmove and performed it only if the
>>>>>> target-specific
>>>>>> conditional move expansion there failed?
>>>>>>
>>>>>> That way we can try the x86_64-specific sequence first and still give
>>>>>> the
>>>>>> opportunity
>>>>>> to noce_try_store_flag_constants to perform the transformations
>>>>>> that can
>>>>>> benefit targets
>>>>>> that don't have highly specific conditional move expanders.
>>>>> Yes, let's try this approach. As was found out, some targets (e.g.
>>>>> x86) hide lots of different target-dependent expansion strategies into
>>>>> movcc expander. Perhaps this fact should be documented in the comment
>>>>> in the generic code?
>>>> Ok, I'll work on that approach and add a comment.
>>>
>>> I'm testing a patch that fix the testcases on x86_64 and does not
>>> harm codegen on aarch64. Feel free to file a PR and assign it to me.
>> PR67103 [1]
>>
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103
>
> Thanks,
> Here's the patch to move that transformation from
> noce_try_store_flag_constants
> to noce_try_cmove after the target-specific expansion has had a go.
>
> This fixes the testcases for me on x86_64.
> In i386.exp I only see:
> FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[
> \t]\\((%|)r[a-z0-9]*
> FAIL: gcc.target/i386/pr61403.c scan-assembler blend
>
> which were there before my patch.
> Bootstrap and testing on x86_64, arm and aarch64 is successful for me.
>
> Is this ok?
>
> Thanks,
> Kyrill
>
> 2015-08-04  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      PR rtl-optimization/67103
>      * ifcvt.c (noce_try_store_flag_constants): Move
>      x = (-(test != 0) & (b - a)) + a transformation to...
>      (noce_try_cmove): ... Here.  Try it if normal conditional
>      move fails.
OK.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-10  9:43                       ` Uros Bizjak
@ 2015-08-10  9:44                         ` Kyrill Tkachov
  0 siblings, 0 replies; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-10  9:44 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu


On 10/08/15 10:43, Uros Bizjak wrote:
> On Mon, Aug 10, 2015 at 11:36 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>>>>> I'm testing a patch that fix the testcases on x86_64 and does not
>>>>> harm codegen on aarch64. Feel free to file a PR and assign it to me.
>>>> PR67103 [1]
>>>>
>>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103
>>> Thanks,
>>> Here's the patch to move that transformation from
>>> noce_try_store_flag_constants
>>> to noce_try_cmove after the target-specific expansion has had a go.
>>>
>>> This fixes the testcases for me on x86_64.
>>> In i386.exp I only see:
>>> FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[
>>> \t]\\((%|)r[a-z0-9]*
>>> FAIL: gcc.target/i386/pr61403.c scan-assembler blend
>>>
>>> which were there before my patch.
>>> Bootstrap and testing on x86_64, arm and aarch64 is successful for me.
>>>
>>> Is this ok?
>>
>> Ping.
>> Uros, does the codegen with this patch look ok to you?
> Yes, the code of previously failing testcases looks OK.
>
> You will need an approval from rtl-optimization maintainer, though.

Sure, thanks for confirming.

Kyrill

>
> Uros.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-10  9:36                     ` Kyrill Tkachov
@ 2015-08-10  9:43                       ` Uros Bizjak
  2015-08-10  9:44                         ` Kyrill Tkachov
  0 siblings, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2015-08-10  9:43 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches, Jeff Law, H.J. Lu

On Mon, Aug 10, 2015 at 11:36 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

>>>> I'm testing a patch that fix the testcases on x86_64 and does not
>>>> harm codegen on aarch64. Feel free to file a PR and assign it to me.
>>>
>>> PR67103 [1]
>>>
>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103
>>
>> Thanks,
>> Here's the patch to move that transformation from
>> noce_try_store_flag_constants
>> to noce_try_cmove after the target-specific expansion has had a go.
>>
>> This fixes the testcases for me on x86_64.
>> In i386.exp I only see:
>> FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[
>> \t]\\((%|)r[a-z0-9]*
>> FAIL: gcc.target/i386/pr61403.c scan-assembler blend
>>
>> which were there before my patch.
>> Bootstrap and testing on x86_64, arm and aarch64 is successful for me.
>>
>> Is this ok?
>
>
> Ping.
> Uros, does the codegen with this patch look ok to you?

Yes, the code of previously failing testcases looks OK.

You will need an approval from rtl-optimization maintainer, though.

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-04  8:44                   ` Kyrill Tkachov
@ 2015-08-10  9:36                     ` Kyrill Tkachov
  2015-08-10  9:43                       ` Uros Bizjak
  2015-08-12 17:52                     ` Jeff Law
  1 sibling, 1 reply; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-10  9:36 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu


On 04/08/15 09:44, Kyrill Tkachov wrote:
> On 03/08/15 18:37, Uros Bizjak wrote:
>> On Mon, Aug 3, 2015 at 7:20 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>>>>>> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
>>>>>>> don't think this is a good idea. In the expander, there is already
>>>>>>> quite some target-dependent code that goes great length to utilize sbb
>>>>>>> insn as much as possible, before cmove is used.
>>>>>>>
>>>>>>> IMO, as far as x86 is concerned, the best solution would be to revert
>>>>>>> the change. ix86_expand_int_movcc already does some tricks from your
>>>>>>> patch in a target-efficient way. Generic change that was introduced by
>>>>>>> your patch now interferes with this expansion.
>>>>>> Well, technically the transformation was already there, it was just
>>>>>> never
>>>>>> reached for an x86 compilation because noce_try_cmove was tried in front
>>>>>> of
>>>>>> it
>>>>>> and used a target-specific expansion.
>>>>>> In any case, how's this proposal?
>>>>>> The transformation noce_try_store_flag_constants
>>>>>>           /* if (test) x = a; else x = b;
>>>>>>          =>   x = (-(test != 0) & (b - a)) + a;  */
>>>>>>
>>>>>> Is a catch-all-immediates transformation in
>>>>>> noce_try_store_flag_constants.
>>>>>> What if we moved it to noce_try_cmove and performed it only if the
>>>>>> target-specific
>>>>>> conditional move expansion there failed?
>>>>>>
>>>>>> That way we can try the x86_64-specific sequence first and still give
>>>>>> the
>>>>>> opportunity
>>>>>> to noce_try_store_flag_constants to perform the transformations that can
>>>>>> benefit targets
>>>>>> that don't have highly specific conditional move expanders.
>>>>> Yes, let's try this approach. As was found out, some targets (e.g.
>>>>> x86) hide lots of different target-dependent expansion strategies into
>>>>> movcc expander. Perhaps this fact should be documented in the comment
>>>>> in the generic code?
>>>> Ok, I'll work on that approach and add a comment.
>>> I'm testing a patch that fix the testcases on x86_64 and does not
>>> harm codegen on aarch64. Feel free to file a PR and assign it to me.
>> PR67103 [1]
>>
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103
> Thanks,
> Here's the patch to move that transformation from noce_try_store_flag_constants
> to noce_try_cmove after the target-specific expansion has had a go.
>
> This fixes the testcases for me on x86_64.
> In i386.exp I only see:
> FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[ \t]\\((%|)r[a-z0-9]*
> FAIL: gcc.target/i386/pr61403.c scan-assembler blend
>
> which were there before my patch.
> Bootstrap and testing on x86_64, arm and aarch64 is successful for me.
>
> Is this ok?

Ping.
Uros, does the codegen with this patch look ok to you?

Thanks,
Kyrill

>
> Thanks,
> Kyrill
>
> 2015-08-04  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR rtl-optimization/67103
>       * ifcvt.c (noce_try_store_flag_constants): Move
>       x = (-(test != 0) & (b - a)) + a transformation to...
>       (noce_try_cmove): ... Here.  Try it if normal conditional
>       move fails.
>
>
>> Thanks,
>> Uros.
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 17:37                 ` Uros Bizjak
@ 2015-08-04  8:44                   ` Kyrill Tkachov
  2015-08-10  9:36                     ` Kyrill Tkachov
  2015-08-12 17:52                     ` Jeff Law
  0 siblings, 2 replies; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-04  8:44 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 3031 bytes --]


On 03/08/15 18:37, Uros Bizjak wrote:
> On Mon, Aug 3, 2015 at 7:20 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>>>>>> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
>>>>>> don't think this is a good idea. In the expander, there is already
>>>>>> quite some target-dependent code that goes great length to utilize sbb
>>>>>> insn as much as possible, before cmove is used.
>>>>>>
>>>>>> IMO, as far as x86 is concerned, the best solution would be to revert
>>>>>> the change. ix86_expand_int_movcc already does some tricks from your
>>>>>> patch in a target-efficient way. Generic change that was introduced by
>>>>>> your patch now interferes with this expansion.
>>>>> Well, technically the transformation was already there, it was just
>>>>> never
>>>>> reached for an x86 compilation because noce_try_cmove was tried in front
>>>>> of
>>>>> it
>>>>> and used a target-specific expansion.
>>>>> In any case, how's this proposal?
>>>>> The transformation noce_try_store_flag_constants
>>>>>          /* if (test) x = a; else x = b;
>>>>>         =>   x = (-(test != 0) & (b - a)) + a;  */
>>>>>
>>>>> Is a catch-all-immediates transformation in
>>>>> noce_try_store_flag_constants.
>>>>> What if we moved it to noce_try_cmove and performed it only if the
>>>>> target-specific
>>>>> conditional move expansion there failed?
>>>>>
>>>>> That way we can try the x86_64-specific sequence first and still give
>>>>> the
>>>>> opportunity
>>>>> to noce_try_store_flag_constants to perform the transformations that can
>>>>> benefit targets
>>>>> that don't have highly specific conditional move expanders.
>>>> Yes, let's try this approach. As was found out, some targets (e.g.
>>>> x86) hide lots of different target-dependent expansion strategies into
>>>> movcc expander. Perhaps this fact should be documented in the comment
>>>> in the generic code?
>>> Ok, I'll work on that approach and add a comment.
>>
>> I'm testing a patch that fix the testcases on x86_64 and does not
>> harm codegen on aarch64. Feel free to file a PR and assign it to me.
> PR67103 [1]
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103

Thanks,
Here's the patch to move that transformation from noce_try_store_flag_constants
to noce_try_cmove after the target-specific expansion has had a go.

This fixes the testcases for me on x86_64.
In i386.exp I only see:
FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[ \t]\\((%|)r[a-z0-9]*
FAIL: gcc.target/i386/pr61403.c scan-assembler blend

which were there before my patch.
Bootstrap and testing on x86_64, arm and aarch64 is successful for me.

Is this ok?

Thanks,
Kyrill

2015-08-04  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/67103
     * ifcvt.c (noce_try_store_flag_constants): Move
     x = (-(test != 0) & (b - a)) + a transformation to...
     (noce_try_cmove): ... Here.  Try it if normal conditional
     move fails.


>
> Thanks,
> Uros.
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: uros.patch --]
[-- Type: text/x-patch; name=uros.patch, Size: 3221 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 0ebe107..b06eaab 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1410,9 +1410,6 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  normalize = -1;
 	  reversep = true;
 	}
-      else if ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
-	       || if_info->branch_cost >= 3)
-	normalize = -1;
       else
 	return FALSE;
 
@@ -1481,18 +1478,10 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 					target, gen_int_mode (ifalse, mode),
 					if_info->x, 0, OPTAB_WIDEN);
 	}
-
-      /* if (test) x = a; else x = b;
-	 =>   x = (-(test != 0) & (b - a)) + a;  */
       else
 	{
-	  target = expand_simple_binop (mode, AND,
-					target, gen_int_mode (diff, mode),
-					if_info->x, 0, OPTAB_WIDEN);
-	  if (target)
-	    target = expand_simple_binop (mode, PLUS,
-					  target, gen_int_mode (ifalse, mode),
-					  if_info->x, 0, OPTAB_WIDEN);
+	  end_sequence ();
+	  return FALSE;
 	}
 
       if (! target)
@@ -1818,11 +1807,67 @@ noce_try_cmove (struct noce_if_info *if_info)
 				   INSN_LOCATION (if_info->insn_a));
 	  return TRUE;
 	}
-      else
+      /* If both a and b are constants try a last-ditch transformation:
+	 if (test) x = a; else x = b;
+	 =>   x = (-(test != 0) & (b - a)) + a;
+	 Try this only if the target-specific expansion above has failed.
+	 The target-specific expander may want to generate sequences that
+	 we don't know about, so give them a chance before trying this
+	 approach.  */
+      else if (!targetm.have_conditional_execution ()
+		&& CONST_INT_P (if_info->a) && CONST_INT_P (if_info->b)
+		&& ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
+		    || if_info->branch_cost >= 3))
 	{
-	  end_sequence ();
-	  return FALSE;
+	  machine_mode mode = GET_MODE (if_info->x);
+	  HOST_WIDE_INT ifalse = INTVAL (if_info->a);
+	  HOST_WIDE_INT itrue = INTVAL (if_info->b);
+	  rtx target = noce_emit_store_flag (if_info, if_info->x, false, -1);
+	  if (!target)
+	    {
+	      end_sequence ();
+	      return FALSE;
+	    }
+
+	  HOST_WIDE_INT diff = (unsigned HOST_WIDE_INT) itrue - ifalse;
+	  /* Make sure we can represent the difference
+	     between the two values.  */
+	  if ((diff > 0)
+	      != ((ifalse < 0) != (itrue < 0) ? ifalse < 0 : ifalse < itrue))
+	    {
+	      end_sequence ();
+	      return FALSE;
+	    }
+
+	  diff = trunc_int_for_mode (diff, mode);
+	  target = expand_simple_binop (mode, AND,
+					target, gen_int_mode (diff, mode),
+					if_info->x, 0, OPTAB_WIDEN);
+	  if (target)
+	    target = expand_simple_binop (mode, PLUS,
+					  target, gen_int_mode (ifalse, mode),
+					  if_info->x, 0, OPTAB_WIDEN);
+	  if (target)
+	    {
+	      if (target != if_info->x)
+		noce_emit_move_insn (if_info->x, target);
+
+	      seq = end_ifcvt_sequence (if_info);
+	      if (!seq)
+		return FALSE;
+
+	      emit_insn_before_setloc (seq, if_info->jump,
+				   INSN_LOCATION (if_info->insn_a));
+	      return TRUE;
+	    }
+	  else
+	    {
+	      end_sequence ();
+	      return FALSE;
+	    }
 	}
+      else
+	end_sequence ();
     }
 
   return FALSE;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 17:20               ` Kyrill Tkachov
@ 2015-08-03 17:37                 ` Uros Bizjak
  2015-08-04  8:44                   ` Kyrill Tkachov
  0 siblings, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2015-08-03 17:37 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches, Jeff Law, H.J. Lu

On Mon, Aug 3, 2015 at 7:20 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

>>>>> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
>>>>> don't think this is a good idea. In the expander, there is already
>>>>> quite some target-dependent code that goes great length to utilize sbb
>>>>> insn as much as possible, before cmove is used.
>>>>>
>>>>> IMO, as far as x86 is concerned, the best solution would be to revert
>>>>> the change. ix86_expand_int_movcc already does some tricks from your
>>>>> patch in a target-efficient way. Generic change that was introduced by
>>>>> your patch now interferes with this expansion.
>>>>
>>>> Well, technically the transformation was already there, it was just
>>>> never
>>>> reached for an x86 compilation because noce_try_cmove was tried in front
>>>> of
>>>> it
>>>> and used a target-specific expansion.
>>>> In any case, how's this proposal?
>>>> The transformation noce_try_store_flag_constants
>>>>         /* if (test) x = a; else x = b;
>>>>        =>   x = (-(test != 0) & (b - a)) + a;  */
>>>>
>>>> Is a catch-all-immediates transformation in
>>>> noce_try_store_flag_constants.
>>>> What if we moved it to noce_try_cmove and performed it only if the
>>>> target-specific
>>>> conditional move expansion there failed?
>>>>
>>>> That way we can try the x86_64-specific sequence first and still give
>>>> the
>>>> opportunity
>>>> to noce_try_store_flag_constants to perform the transformations that can
>>>> benefit targets
>>>> that don't have highly specific conditional move expanders.
>>>
>>> Yes, let's try this approach. As was found out, some targets (e.g.
>>> x86) hide lots of different target-dependent expansion strategies into
>>> movcc expander. Perhaps this fact should be documented in the comment
>>> in the generic code?
>>
>> Ok, I'll work on that approach and add a comment.
>
>
> I'm testing a patch that fix the testcases on x86_64 and does not
> harm codegen on aarch64. Feel free to file a PR and assign it to me.

PR67103 [1]

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 15:53             ` Kyrill Tkachov
@ 2015-08-03 17:20               ` Kyrill Tkachov
  2015-08-03 17:37                 ` Uros Bizjak
  0 siblings, 1 reply; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-03 17:20 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu


On 03/08/15 16:53, Kyrill Tkachov wrote:
> On 03/08/15 16:45, Uros Bizjak wrote:
>> On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>>> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
>>>> don't think this is a good idea. In the expander, there is already
>>>> quite some target-dependent code that goes great length to utilize sbb
>>>> insn as much as possible, before cmove is used.
>>>>
>>>> IMO, as far as x86 is concerned, the best solution would be to revert
>>>> the change. ix86_expand_int_movcc already does some tricks from your
>>>> patch in a target-efficient way. Generic change that was introduced by
>>>> your patch now interferes with this expansion.
>>> Well, technically the transformation was already there, it was just never
>>> reached for an x86 compilation because noce_try_cmove was tried in front of
>>> it
>>> and used a target-specific expansion.
>>> In any case, how's this proposal?
>>> The transformation noce_try_store_flag_constants
>>>         /* if (test) x = a; else x = b;
>>>        =>   x = (-(test != 0) & (b - a)) + a;  */
>>>
>>> Is a catch-all-immediates transformation in noce_try_store_flag_constants.
>>> What if we moved it to noce_try_cmove and performed it only if the
>>> target-specific
>>> conditional move expansion there failed?
>>>
>>> That way we can try the x86_64-specific sequence first and still give the
>>> opportunity
>>> to noce_try_store_flag_constants to perform the transformations that can
>>> benefit targets
>>> that don't have highly specific conditional move expanders.
>> Yes, let's try this approach. As was found out, some targets (e.g.
>> x86) hide lots of different target-dependent expansion strategies into
>> movcc expander. Perhaps this fact should be documented in the comment
>> in the generic code?
> Ok, I'll work on that approach and add a comment.

I'm testing a patch that fix the testcases on x86_64 and does not
harm codegen on aarch64. Feel free to file a PR and assign it to me.

Thanks,
Kyrill

>
> Thanks,
> Kyrill
>
>
>> Uros.
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 15:50     ` Ilya Enkovich
@ 2015-08-03 16:29       ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2015-08-03 16:29 UTC (permalink / raw)
  To: Ilya Enkovich, Uros Bizjak; +Cc: Kyrill Tkachov, gcc-patches, H.J. Lu

On 08/03/2015 09:50 AM, Ilya Enkovich wrote:

>
> The original code looks better, tree height is just 2 and therefore it
> can be executed in 2 cycles. New code has more dependencies and tree
> height becomes 5. It is always hard to say for all x86 targets but as
> a generic code the original version is better.
Agreed.  Reducing tree height is definitely a good thing as a general rule.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 15:46           ` Uros Bizjak
@ 2015-08-03 15:53             ` Kyrill Tkachov
  2015-08-03 17:20               ` Kyrill Tkachov
  0 siblings, 1 reply; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-03 15:53 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu


On 03/08/15 16:45, Uros Bizjak wrote:
> On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>>> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
>>> don't think this is a good idea. In the expander, there is already
>>> quite some target-dependent code that goes great length to utilize sbb
>>> insn as much as possible, before cmove is used.
>>>
>>> IMO, as far as x86 is concerned, the best solution would be to revert
>>> the change. ix86_expand_int_movcc already does some tricks from your
>>> patch in a target-efficient way. Generic change that was introduced by
>>> your patch now interferes with this expansion.
>>
>> Well, technically the transformation was already there, it was just never
>> reached for an x86 compilation because noce_try_cmove was tried in front of
>> it
>> and used a target-specific expansion.
>> In any case, how's this proposal?
>> The transformation noce_try_store_flag_constants
>>        /* if (test) x = a; else x = b;
>>       =>   x = (-(test != 0) & (b - a)) + a;  */
>>
>> Is a catch-all-immediates transformation in noce_try_store_flag_constants.
>> What if we moved it to noce_try_cmove and performed it only if the
>> target-specific
>> conditional move expansion there failed?
>>
>> That way we can try the x86_64-specific sequence first and still give the
>> opportunity
>> to noce_try_store_flag_constants to perform the transformations that can
>> benefit targets
>> that don't have highly specific conditional move expanders.
> Yes, let's try this approach. As was found out, some targets (e.g.
> x86) hide lots of different target-dependent expansion strategies into
> movcc expander. Perhaps this fact should be documented in the comment
> in the generic code?

Ok, I'll work on that approach and add a comment.

Thanks,
Kyrill


>
> Uros.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 14:04   ` Uros Bizjak
  2015-08-03 14:37     ` H.J. Lu
@ 2015-08-03 15:50     ` Ilya Enkovich
  2015-08-03 16:29       ` Jeff Law
  1 sibling, 1 reply; 24+ messages in thread
From: Ilya Enkovich @ 2015-08-03 15:50 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Kyrill Tkachov, gcc-patches, Jeff Law, H.J. Lu

2015-08-03 17:04 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>> On 03/08/15 13:33, Uros Bizjak wrote:
>>>
>>> Hello!
>>>
>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>>      when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>>      explicit.  Prefer to add the flag whenever possible.
>>>>      (noce_process_if_block): Try noce_try_store_flag_constants before
>>>>      noce_try_cmove.
>>>>
>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>>
>>> This patch regressed following tests on x86_64:
>>>
>>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>
> The difference for cmov3.c on x86_64 is:
>
>        cmpl    %esi, %edi
>        movl    $-5, %edx
>        movl    $5, %eax
>        cmovg   %edx, %eax
>        ret
>
> vs. new code:
>
>        xorl    %eax, %eax
>        cmpl    %esi, %edi
>        setle   %al
>        negl    %eax
>        andl    $10, %eax
>        subl    $5, %eax
>        ret
>
> I'm not sure old code is really better than new. HJ, do you have any
> better insight?
>
> Uros.

The original code looks better, tree height is just 2 and therefore it
can be executed in 2 cycles. New code has more dependencies and tree
height becomes 5. It is always hard to say for all x86 targets but as
a generic code the original version is better.

Thanks,
Ilya

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 15:37         ` Kyrill Tkachov
@ 2015-08-03 15:46           ` Uros Bizjak
  2015-08-03 15:53             ` Kyrill Tkachov
  0 siblings, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2015-08-03 15:46 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches, Jeff Law, H.J. Lu

On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

>> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
>> don't think this is a good idea. In the expander, there is already
>> quite some target-dependent code that goes great length to utilize sbb
>> insn as much as possible, before cmove is used.
>>
>> IMO, as far as x86 is concerned, the best solution would be to revert
>> the change. ix86_expand_int_movcc already does some tricks from your
>> patch in a target-efficient way. Generic change that was introduced by
>> your patch now interferes with this expansion.
>
>
> Well, technically the transformation was already there, it was just never
> reached for an x86 compilation because noce_try_cmove was tried in front of
> it
> and used a target-specific expansion.
> In any case, how's this proposal?
> The transformation noce_try_store_flag_constants
>       /* if (test) x = a; else x = b;
>      =>   x = (-(test != 0) & (b - a)) + a;  */
>
> Is a catch-all-immediates transformation in noce_try_store_flag_constants.
> What if we moved it to noce_try_cmove and performed it only if the
> target-specific
> conditional move expansion there failed?
>
> That way we can try the x86_64-specific sequence first and still give the
> opportunity
> to noce_try_store_flag_constants to perform the transformations that can
> benefit targets
> that don't have highly specific conditional move expanders.

Yes, let's try this approach. As was found out, some targets (e.g.
x86) hide lots of different target-dependent expansion strategies into
movcc expander. Perhaps this fact should be documented in the comment
in the generic code?

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 14:15       ` Uros Bizjak
@ 2015-08-03 15:37         ` Kyrill Tkachov
  2015-08-03 15:46           ` Uros Bizjak
  0 siblings, 1 reply; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-03 15:37 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu


On 03/08/15 15:15, Uros Bizjak wrote:
> On Mon, Aug 3, 2015 at 3:43 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> On 03/08/15 14:37, Uros Bizjak wrote:
>>> On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com>
>>> wrote:
>>>> On 03/08/15 13:33, Uros Bizjak wrote:
>>>>> Hello!
>>>>>
>>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>>
>>>>>>        * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>>>>        when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>>>>        explicit.  Prefer to add the flag whenever possible.
>>>>>>        (noce_process_if_block): Try noce_try_store_flag_constants before
>>>>>>        noce_try_cmove.
>>>>>>
>>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>>
>>>>>>        * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>>>>        * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>>>> This patch regressed following tests on x86_64:
>>>>>
>>>>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>>>>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>>>>
>>>> Sorry for that :(
>>>> I could have sworn they passed for me during my run...
>>>>
>>>>> While cmov3 case is questionable by itself in light of PR56309 [1],
>>>>> the cnov2 case regressed from:
>>>>>
>>>>>           cmpl    %edi, %esi
>>>>>           sbbl    %eax, %eax
>>>>>           andl    $-10, %eax
>>>>>           addl    $5, %eax
>>>>>           ret
>>>>>
>>>>> to:
>>>>>
>>>>>           xorl    %eax, %eax
>>>>>           cmpl    %esi, %edi
>>>>>           setbe   %al
>>>>>           negl    %eax
>>>>>           andl    $10, %eax
>>>>>           subl    $5, %eax
>>>>>           ret
>>>>>
>>>>> Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
>>>>> anymore. Non-QImode setcc instructions are problematic on x86, since
>>>>> they need to be zero-extended to their full length.
>>> IMO, we can check if the target is able to generate insn that
>>> implements conditional move between 0 and -1 for certain comparison
>>> mode, like:
>>>
>>> #(insn:TI 27 26 28 2 (parallel [
>>> #            (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
>>> #                (if_then_else:SI (ltu:SI (reg:CC 17 flags)
>>> #                        (const_int 0 [0]))
>>> #                    (const_int -1 [0xffffffffffffffff])
>>> #                    (const_int 0 [0])))
>>> #            (clobber (reg:CC 17 flags))
>>> #        ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
>>> #     (expr_list:REG_DEAD (reg:CC 17 flags)
>>> #        (expr_list:REG_UNUSED (reg:CC 17 flags)
>>> #            (nil))))
>>>          sbbl    %eax, %eax      # 27    *x86_movsicc_0_m1       [length =
>>> 2]
>>>
>>> this is the key insn, and as shown above, further asm sequence is
>>> similar between patched and unpatched compiler.
>>
>> Hmm yes.
>> We have a HAVE_conditional_move check, but that's not fine grained enough.
>> How about trying an emit_conditional_move and checking that the result is a
>> single insn?
> Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
> don't think this is a good idea. In the expander, there is already
> quite some target-dependent code that goes great length to utilize sbb
> insn as much as possible, before cmove is used.
>
> IMO, as far as x86 is concerned, the best solution would be to revert
> the change. ix86_expand_int_movcc already does some tricks from your
> patch in a target-efficient way. Generic change that was introduced by
> your patch now interferes with this expansion.

Well, technically the transformation was already there, it was just never
reached for an x86 compilation because noce_try_cmove was tried in front of it
and used a target-specific expansion.
In any case, how's this proposal?
The transformation noce_try_store_flag_constants
       /* if (test) x = a; else x = b;
      =>   x = (-(test != 0) & (b - a)) + a;  */

Is a catch-all-immediates transformation in noce_try_store_flag_constants.
What if we moved it to noce_try_cmove and performed it only if the target-specific
conditional move expansion there failed?

That way we can try the x86_64-specific sequence first and still give the opportunity
to noce_try_store_flag_constants to perform the transformations that can benefit targets
that don't have highly specific conditional move expanders.

Kyrill

>
> Uros.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 14:04   ` Uros Bizjak
@ 2015-08-03 14:37     ` H.J. Lu
  2015-08-03 15:50     ` Ilya Enkovich
  1 sibling, 0 replies; 24+ messages in thread
From: H.J. Lu @ 2015-08-03 14:37 UTC (permalink / raw)
  To: Uros Bizjak, Zamyatin, Igor; +Cc: Kyrill Tkachov, gcc-patches, Jeff Law

On Mon, Aug 3, 2015 at 7:04 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>> On 03/08/15 13:33, Uros Bizjak wrote:
>>>
>>> Hello!
>>>
>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>>      when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>>      explicit.  Prefer to add the flag whenever possible.
>>>>      (noce_process_if_block): Try noce_try_store_flag_constants before
>>>>      noce_try_cmove.
>>>>
>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>>
>>> This patch regressed following tests on x86_64:
>>>
>>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>
> The difference for cmov3.c on x86_64 is:
>
>        cmpl    %esi, %edi
>        movl    $-5, %edx
>        movl    $5, %eax
>        cmovg   %edx, %eax
>        ret
>
> vs. new code:
>
>        xorl    %eax, %eax
>        cmpl    %esi, %edi
>        setle   %al
>        negl    %eax
>        andl    $10, %eax
>        subl    $5, %eax
>        ret
>
> I'm not sure old code is really better than new. HJ, do you have any
> better insight?

We are looking into it.

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 13:43     ` Kyrill Tkachov
  2015-08-03 14:12       ` Kyrill Tkachov
@ 2015-08-03 14:15       ` Uros Bizjak
  2015-08-03 15:37         ` Kyrill Tkachov
  1 sibling, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2015-08-03 14:15 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches, Jeff Law, H.J. Lu

On Mon, Aug 3, 2015 at 3:43 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
> On 03/08/15 14:37, Uros Bizjak wrote:
>>
>> On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com>
>> wrote:
>>>
>>> On 03/08/15 13:33, Uros Bizjak wrote:
>>>>
>>>> Hello!
>>>>
>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>       * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>>>       when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>>>       explicit.  Prefer to add the flag whenever possible.
>>>>>       (noce_process_if_block): Try noce_try_store_flag_constants before
>>>>>       noce_try_cmove.
>>>>>
>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>       * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>>>       * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>>>
>>>> This patch regressed following tests on x86_64:
>>>>
>>>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>>>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>>>
>>>
>>> Sorry for that :(
>>> I could have sworn they passed for me during my run...
>>>
>>>> While cmov3 case is questionable by itself in light of PR56309 [1],
>>>> the cnov2 case regressed from:
>>>>
>>>>          cmpl    %edi, %esi
>>>>          sbbl    %eax, %eax
>>>>          andl    $-10, %eax
>>>>          addl    $5, %eax
>>>>          ret
>>>>
>>>> to:
>>>>
>>>>          xorl    %eax, %eax
>>>>          cmpl    %esi, %edi
>>>>          setbe   %al
>>>>          negl    %eax
>>>>          andl    $10, %eax
>>>>          subl    $5, %eax
>>>>          ret
>>>>
>>>> Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
>>>> anymore. Non-QImode setcc instructions are problematic on x86, since
>>>> they need to be zero-extended to their full length.
>>
>> IMO, we can check if the target is able to generate insn that
>> implements conditional move between 0 and -1 for certain comparison
>> mode, like:
>>
>> #(insn:TI 27 26 28 2 (parallel [
>> #            (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
>> #                (if_then_else:SI (ltu:SI (reg:CC 17 flags)
>> #                        (const_int 0 [0]))
>> #                    (const_int -1 [0xffffffffffffffff])
>> #                    (const_int 0 [0])))
>> #            (clobber (reg:CC 17 flags))
>> #        ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
>> #     (expr_list:REG_DEAD (reg:CC 17 flags)
>> #        (expr_list:REG_UNUSED (reg:CC 17 flags)
>> #            (nil))))
>>         sbbl    %eax, %eax      # 27    *x86_movsicc_0_m1       [length =
>> 2]
>>
>> this is the key insn, and as shown above, further asm sequence is
>> similar between patched and unpatched compiler.
>
>
> Hmm yes.
> We have a HAVE_conditional_move check, but that's not fine grained enough.
> How about trying an emit_conditional_move and checking that the result is a
> single insn?

Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
don't think this is a good idea. In the expander, there is already
quite some target-dependent code that goes great length to utilize sbb
insn as much as possible, before cmove is used.

IMO, as far as x86 is concerned, the best solution would be to revert
the change. ix86_expand_int_movcc already does some tricks from your
patch in a target-efficient way. Generic change that was introduced by
your patch now interferes with this expansion.

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 13:43     ` Kyrill Tkachov
@ 2015-08-03 14:12       ` Kyrill Tkachov
  2015-08-03 14:15       ` Uros Bizjak
  1 sibling, 0 replies; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-03 14:12 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu


On 03/08/15 14:43, Kyrill Tkachov wrote:
> On 03/08/15 14:37, Uros Bizjak wrote:
>> On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>> On 03/08/15 13:33, Uros Bizjak wrote:
>>>> Hello!
>>>>
>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>        * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>>>        when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>>>        explicit.  Prefer to add the flag whenever possible.
>>>>>        (noce_process_if_block): Try noce_try_store_flag_constants before
>>>>>        noce_try_cmove.
>>>>>
>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>        * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>>>        * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>>> This patch regressed following tests on x86_64:
>>>>
>>>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>>>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>>> Sorry for that :(
>>> I could have sworn they passed for me during my run...
>>>
>>>> While cmov3 case is questionable by itself in light of PR56309 [1],
>>>> the cnov2 case regressed from:
>>>>
>>>>           cmpl    %edi, %esi
>>>>           sbbl    %eax, %eax
>>>>           andl    $-10, %eax
>>>>           addl    $5, %eax
>>>>           ret
>>>>
>>>> to:
>>>>
>>>>           xorl    %eax, %eax
>>>>           cmpl    %esi, %edi
>>>>           setbe   %al
>>>>           negl    %eax
>>>>           andl    $10, %eax
>>>>           subl    $5, %eax
>>>>           ret
>>>>
>>>> Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
>>>> anymore. Non-QImode setcc instructions are problematic on x86, since
>>>> they need to be zero-extended to their full length.
>> IMO, we can check if the target is able to generate insn that
>> implements conditional move between 0 and -1 for certain comparison
>> mode, like:
>>
>> #(insn:TI 27 26 28 2 (parallel [
>> #            (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
>> #                (if_then_else:SI (ltu:SI (reg:CC 17 flags)
>> #                        (const_int 0 [0]))
>> #                    (const_int -1 [0xffffffffffffffff])
>> #                    (const_int 0 [0])))
>> #            (clobber (reg:CC 17 flags))
>> #        ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
>> #     (expr_list:REG_DEAD (reg:CC 17 flags)
>> #        (expr_list:REG_UNUSED (reg:CC 17 flags)
>> #            (nil))))
>>          sbbl    %eax, %eax      # 27    *x86_movsicc_0_m1       [length = 2]
>>
>> this is the key insn, and as shown above, further asm sequence is
>> similar between patched and unpatched compiler.
> Hmm yes.
> We have a HAVE_conditional_move check, but that's not fine grained enough.
> How about trying an emit_conditional_move and checking that the result is a
> single insn?

Hmm, looking deeper into it, I see that:

        /* if (test) x = a; else x = b;
       =>   x = (-(test != 0) & (b - a)) + a;  */

is exactly the transformation that:
        cmpl    %edi, %esi
        sbbl    %eax, %eax
        andl    $-10, %eax
        addl    $5, %eax

is supposed to represent, right? It seems that noce_try_store_flag_constants is just
not emitting the optimal rtx forms for x86
So, if before the patch the code in noce_try_store_flag_constants was never being called
what performed this transformation? Turns out it's the conditional move expander in
emit_conditional_move that ends up calling the ix86_expand_int_movcc that performs lots of
these complex transformations, including this one in an x86-specific way.

I suppose that's why noce_try_store_flag_constants was tried after noce_try_cmove.
Because, it is expected that noce_try_store_flag_constants will call the conditional move
expander and the target-specific conditional move expander is expected to do all the complex
transformations beneficial to the target. I suppose the comment "Try only simple constants and registers here"
above noce_try_cmove is somewhat misleading then.

I see a couple of avenues here. We try both noce_try_cmove and noce_try_store_flag_constants, compare their
seq_cost and pick the cheapest, which sounds somewhat ugly to me, or in noce_try_store_flag_constants we
try to be more fine-grained and try to restrict some of the transformations, but on what criteria?

Kyrill




>
> Kyrill
>
>
>> Uros.
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 13:02 ` Kyrill Tkachov
  2015-08-03 13:37   ` Uros Bizjak
@ 2015-08-03 14:04   ` Uros Bizjak
  2015-08-03 14:37     ` H.J. Lu
  2015-08-03 15:50     ` Ilya Enkovich
  1 sibling, 2 replies; 24+ messages in thread
From: Uros Bizjak @ 2015-08-03 14:04 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches, Jeff Law, H.J. Lu

On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
> On 03/08/15 13:33, Uros Bizjak wrote:
>>
>> Hello!
>>
>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>      when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>      explicit.  Prefer to add the flag whenever possible.
>>>      (noce_process_if_block): Try noce_try_store_flag_constants before
>>>      noce_try_cmove.
>>>
>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>
>> This patch regressed following tests on x86_64:
>>
>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]

The difference for cmov3.c on x86_64 is:

       cmpl    %esi, %edi
       movl    $-5, %edx
       movl    $5, %eax
       cmovg   %edx, %eax
       ret

vs. new code:

       xorl    %eax, %eax
       cmpl    %esi, %edi
       setle   %al
       negl    %eax
       andl    $10, %eax
       subl    $5, %eax
       ret

I'm not sure old code is really better than new. HJ, do you have any
better insight?

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 13:37   ` Uros Bizjak
@ 2015-08-03 13:43     ` Kyrill Tkachov
  2015-08-03 14:12       ` Kyrill Tkachov
  2015-08-03 14:15       ` Uros Bizjak
  0 siblings, 2 replies; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-03 13:43 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H.J. Lu


On 03/08/15 14:37, Uros Bizjak wrote:
> On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> On 03/08/15 13:33, Uros Bizjak wrote:
>>> Hello!
>>>
>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>>       when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>>       explicit.  Prefer to add the flag whenever possible.
>>>>       (noce_process_if_block): Try noce_try_store_flag_constants before
>>>>       noce_try_cmove.
>>>>
>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>>       * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>> This patch regressed following tests on x86_64:
>>>
>>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>>
>> Sorry for that :(
>> I could have sworn they passed for me during my run...
>>
>>> While cmov3 case is questionable by itself in light of PR56309 [1],
>>> the cnov2 case regressed from:
>>>
>>>          cmpl    %edi, %esi
>>>          sbbl    %eax, %eax
>>>          andl    $-10, %eax
>>>          addl    $5, %eax
>>>          ret
>>>
>>> to:
>>>
>>>          xorl    %eax, %eax
>>>          cmpl    %esi, %edi
>>>          setbe   %al
>>>          negl    %eax
>>>          andl    $10, %eax
>>>          subl    $5, %eax
>>>          ret
>>>
>>> Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
>>> anymore. Non-QImode setcc instructions are problematic on x86, since
>>> they need to be zero-extended to their full length.
> IMO, we can check if the target is able to generate insn that
> implements conditional move between 0 and -1 for certain comparison
> mode, like:
>
> #(insn:TI 27 26 28 2 (parallel [
> #            (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
> #                (if_then_else:SI (ltu:SI (reg:CC 17 flags)
> #                        (const_int 0 [0]))
> #                    (const_int -1 [0xffffffffffffffff])
> #                    (const_int 0 [0])))
> #            (clobber (reg:CC 17 flags))
> #        ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
> #     (expr_list:REG_DEAD (reg:CC 17 flags)
> #        (expr_list:REG_UNUSED (reg:CC 17 flags)
> #            (nil))))
>         sbbl    %eax, %eax      # 27    *x86_movsicc_0_m1       [length = 2]
>
> this is the key insn, and as shown above, further asm sequence is
> similar between patched and unpatched compiler.

Hmm yes.
We have a HAVE_conditional_move check, but that's not fine grained enough.
How about trying an emit_conditional_move and checking that the result is a
single insn?

Kyrill


>
> Uros.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 13:02 ` Kyrill Tkachov
@ 2015-08-03 13:37   ` Uros Bizjak
  2015-08-03 13:43     ` Kyrill Tkachov
  2015-08-03 14:04   ` Uros Bizjak
  1 sibling, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2015-08-03 13:37 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches, Jeff Law, H.J. Lu

On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
> On 03/08/15 13:33, Uros Bizjak wrote:
>>
>> Hello!
>>
>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>      when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>      explicit.  Prefer to add the flag whenever possible.
>>>      (noce_process_if_block): Try noce_try_store_flag_constants before
>>>      noce_try_cmove.
>>>
>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>
>> This patch regressed following tests on x86_64:
>>
>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>
>
> Sorry for that :(
> I could have sworn they passed for me during my run...
>
>> While cmov3 case is questionable by itself in light of PR56309 [1],
>> the cnov2 case regressed from:
>>
>>         cmpl    %edi, %esi
>>         sbbl    %eax, %eax
>>         andl    $-10, %eax
>>         addl    $5, %eax
>>         ret
>>
>> to:
>>
>>         xorl    %eax, %eax
>>         cmpl    %esi, %edi
>>         setbe   %al
>>         negl    %eax
>>         andl    $10, %eax
>>         subl    $5, %eax
>>         ret
>>
>> Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
>> anymore. Non-QImode setcc instructions are problematic on x86, since
>> they need to be zero-extended to their full length.

IMO, we can check if the target is able to generate insn that
implements conditional move between 0 and -1 for certain comparison
mode, like:

#(insn:TI 27 26 28 2 (parallel [
#            (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
#                (if_then_else:SI (ltu:SI (reg:CC 17 flags)
#                        (const_int 0 [0]))
#                    (const_int -1 [0xffffffffffffffff])
#                    (const_int 0 [0])))
#            (clobber (reg:CC 17 flags))
#        ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
#     (expr_list:REG_DEAD (reg:CC 17 flags)
#        (expr_list:REG_UNUSED (reg:CC 17 flags)
#            (nil))))
       sbbl    %eax, %eax      # 27    *x86_movsicc_0_m1       [length = 2]

this is the key insn, and as shown above, further asm sequence is
similar between patched and unpatched compiler.

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
  2015-08-03 12:33 Uros Bizjak
@ 2015-08-03 13:02 ` Kyrill Tkachov
  2015-08-03 13:37   ` Uros Bizjak
  2015-08-03 14:04   ` Uros Bizjak
  0 siblings, 2 replies; 24+ messages in thread
From: Kyrill Tkachov @ 2015-08-03 13:02 UTC (permalink / raw)
  To: Uros Bizjak, gcc-patches; +Cc: Jeff Law, H.J. Lu


On 03/08/15 13:33, Uros Bizjak wrote:
> Hello!
>
>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>      when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>      explicit.  Prefer to add the flag whenever possible.
>>      (noce_process_if_block): Try noce_try_store_flag_constants before
>>      noce_try_cmove.
>>
>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * gcc.target/aarch64/csel_bfx_1.c: New test.
>>      * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
> This patch regressed following tests on x86_64:
>
> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]

Sorry for that :(
I could have sworn they passed for me during my run...

> While cmov3 case is questionable by itself in light of PR56309 [1],
> the cnov2 case regressed from:
>
>         cmpl    %edi, %esi
>         sbbl    %eax, %eax
>         andl    $-10, %eax
>         addl    $5, %eax
>         ret
>
> to:
>
>         xorl    %eax, %eax
>         cmpl    %esi, %edi
>         setbe   %al
>         negl    %eax
>         andl    $10, %eax
>         subl    $5, %eax
>         ret
>
> Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
> anymore. Non-QImode setcc instructions are problematic on x86, since
> they need to be zero-extended to their full length.

The transformation at fault in noce_try_store_flag_constants is:
       /* if (test) x = a; else x = b;
      =>   x = (-(test != 0) & (b - a)) + a;  */

I suppose we want to not perform this transformation on x86.
The question is, on what characteristic of the target do we want to gate this
on?


>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
>
> Uros.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
@ 2015-08-03 12:33 Uros Bizjak
  2015-08-03 13:02 ` Kyrill Tkachov
  0 siblings, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2015-08-03 12:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrill Tkachov, Jeff Law, H.J. Lu

Hello!

> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>     when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>     explicit.  Prefer to add the flag whenever possible.
>     (noce_process_if_block): Try noce_try_store_flag_constants before
>     noce_try_cmove.
>
> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/csel_bfx_1.c: New test.
>     * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

This patch regressed following tests on x86_64:

FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]

While cmov3 case is questionable by itself in light of PR56309 [1],
the cnov2 case regressed from:

       cmpl    %edi, %esi
       sbbl    %eax, %eax
       andl    $-10, %eax
       addl    $5, %eax
       ret

to:

       xorl    %eax, %eax
       cmpl    %esi, %edi
       setbe   %al
       negl    %eax
       andl    $10, %eax
       subl    $5, %eax
       ret

Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
anymore. Non-QImode setcc instructions are problematic on x86, since
they need to be zero-extended to their full length.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-08-12 17:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-29 14:16 [PATCH][RTL-ifcvt] Improve conditional select ops on immediates Kyrill Tkachov
2015-07-29 22:46 ` Jeff Law
2015-07-30 14:30   ` Kyrill Tkachov
2015-07-31 16:16     ` Jeff Law
2015-08-03 12:33 Uros Bizjak
2015-08-03 13:02 ` Kyrill Tkachov
2015-08-03 13:37   ` Uros Bizjak
2015-08-03 13:43     ` Kyrill Tkachov
2015-08-03 14:12       ` Kyrill Tkachov
2015-08-03 14:15       ` Uros Bizjak
2015-08-03 15:37         ` Kyrill Tkachov
2015-08-03 15:46           ` Uros Bizjak
2015-08-03 15:53             ` Kyrill Tkachov
2015-08-03 17:20               ` Kyrill Tkachov
2015-08-03 17:37                 ` Uros Bizjak
2015-08-04  8:44                   ` Kyrill Tkachov
2015-08-10  9:36                     ` Kyrill Tkachov
2015-08-10  9:43                       ` Uros Bizjak
2015-08-10  9:44                         ` Kyrill Tkachov
2015-08-12 17:52                     ` Jeff Law
2015-08-03 14:04   ` Uros Bizjak
2015-08-03 14:37     ` H.J. Lu
2015-08-03 15:50     ` Ilya Enkovich
2015-08-03 16:29       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).