[PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]
@ 2024-01-17  9:50 Monk Chiang
  2024-01-17 12:14 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Monk Chiang @ 2024-01-17  9:50 UTC (permalink / raw)
  To: gcc-patches, rguenther, apinski, jeffreyalaw; +Cc: Monk Chiang

This allows the backend to generate movcc instructions, if target
machine has movcc pattern.

branchless-cond.c needs to be updated since some target machines have
conditional move instructions, and the experssion will not change to
branchless expression.

gcc/ChangeLog:
	PR target/113095
	* match.pd (`(zero_one == 0) ? y : z <op> y`,
	`(zero_one != 0) ? z <op> y : y`): Do not match to branchless
	expression, if target machine has conditional move pattern.

gcc/testsuite/ChangeLog:
	* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
---
 gcc/match.pd                                  | 30 +++++++++++++++++--
 .../gcc.dg/tree-ssa/branchless-cond.c         |  6 ++--
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index e42ecaf9ec7..a1f90b1cd41 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
        && TYPE_PRECISION (type) > 1
        && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
-       (op (mult (convert:type @0) @2) @1))))
+   (with {
+      bool can_movecc_p = false;
+      if (can_conditionally_move_p (TYPE_MODE (type)))
+	can_movecc_p = true;
+
+      /* Some target only support word_mode for movcc pattern, if type can
+	 extend to word_mode then use conditional move. Even if there is a
+	 extend instruction, the cost is lower than branchless.  */
+      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+	  && can_conditionally_move_p (word_mode))
+	can_movecc_p = true;
+    }
+    (if (!can_movecc_p)
+     (op (mult (convert:type @0) @2) @1))))))
 
 /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */
 (for op (bit_xor bit_ior plus)
@@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
        && TYPE_PRECISION (type) > 1
        && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
-       (op (mult (convert:type @0) @2) @1))))
+   (with {
+      bool can_movecc_p = false;
+      if (can_conditionally_move_p (TYPE_MODE (type)))
+	can_movecc_p = true;
+
+      /* Some target only support word_mode for movcc pattern, if type can
+	 extend to word_mode then use conditional move. Even if there is a
+	 extend instruction, the cost is lower than branchless.  */
+      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+	  && can_conditionally_move_p (word_mode))
+	can_movecc_p = true;
+    }
+    (if (!can_movecc_p)
+     (op (mult (convert:type @0) @2) @1))))))
 
 /* ?: Value replacement. */
 /* a == 0 ? b : b + a  -> b + a */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
index e063dc4bb5f..c002ed97364 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
   return ((x & 1) != 0) ? z | y : y;
 }
 
-/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
+/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
-- 
2.40.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]
  2024-01-17  9:50 [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095] Monk Chiang
@ 2024-01-17 12:14 ` Richard Biener
  2024-01-17 14:37   ` Jeff Law
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2024-01-17 12:14 UTC (permalink / raw)
  To: Monk Chiang; +Cc: gcc-patches, apinski, jeffreyalaw

On Wed, 17 Jan 2024, Monk Chiang wrote:

> This allows the backend to generate movcc instructions, if target
> machine has movcc pattern.
> 
> branchless-cond.c needs to be updated since some target machines have
> conditional move instructions, and the experssion will not change to
> branchless expression.

While I agree this pattern should possibly be applied during RTL
expansion or instruction selection on x86 which also has movcc
the multiplication is cheaper.  So I don't think this isn't the way to go.

I'd rather revert the change than trying to "fix" it this way?

Thanks,
Richard.

> gcc/ChangeLog:
> 	PR target/113095
> 	* match.pd (`(zero_one == 0) ? y : z <op> y`,
> 	`(zero_one != 0) ? z <op> y : y`): Do not match to branchless
> 	expression, if target machine has conditional move pattern.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
> ---
>  gcc/match.pd                                  | 30 +++++++++++++++++--
>  .../gcc.dg/tree-ssa/branchless-cond.c         |  6 ++--
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e42ecaf9ec7..a1f90b1cd41 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (if (INTEGRAL_TYPE_P (type)
>         && TYPE_PRECISION (type) > 1
>         && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
> -       (op (mult (convert:type @0) @2) @1))))
> +   (with {
> +      bool can_movecc_p = false;
> +      if (can_conditionally_move_p (TYPE_MODE (type)))
> +	can_movecc_p = true;
> +
> +      /* Some target only support word_mode for movcc pattern, if type can
> +	 extend to word_mode then use conditional move. Even if there is a
> +	 extend instruction, the cost is lower than branchless.  */
> +      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +	  && can_conditionally_move_p (word_mode))
> +	can_movecc_p = true;
> +    }
> +    (if (!can_movecc_p)
> +     (op (mult (convert:type @0) @2) @1))))))
>  
>  /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */
>  (for op (bit_xor bit_ior plus)
> @@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (if (INTEGRAL_TYPE_P (type)
>         && TYPE_PRECISION (type) > 1
>         && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
> -       (op (mult (convert:type @0) @2) @1))))
> +   (with {
> +      bool can_movecc_p = false;
> +      if (can_conditionally_move_p (TYPE_MODE (type)))
> +	can_movecc_p = true;
> +
> +      /* Some target only support word_mode for movcc pattern, if type can
> +	 extend to word_mode then use conditional move. Even if there is a
> +	 extend instruction, the cost is lower than branchless.  */
> +      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +	  && can_conditionally_move_p (word_mode))
> +	can_movecc_p = true;
> +    }
> +    (if (!can_movecc_p)
> +     (op (mult (convert:type @0) @2) @1))))))
>  
>  /* ?: Value replacement. */
>  /* a == 0 ? b : b + a  -> b + a */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> index e063dc4bb5f..c002ed97364 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> @@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
>    return ((x & 1) != 0) ? z | y : y;
>  }
>  
> -/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]
  2024-01-17 12:14 ` Richard Biener
@ 2024-01-17 14:37   ` Jeff Law
  2024-01-18  3:19     ` Monk Chiang
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Law @ 2024-01-17 14:37 UTC (permalink / raw)
  To: Richard Biener, Monk Chiang; +Cc: gcc-patches, apinski



On 1/17/24 05:14, Richard Biener wrote:
> On Wed, 17 Jan 2024, Monk Chiang wrote:
> 
>> This allows the backend to generate movcc instructions, if target
>> machine has movcc pattern.
>>
>> branchless-cond.c needs to be updated since some target machines have
>> conditional move instructions, and the experssion will not change to
>> branchless expression.
> 
> While I agree this pattern should possibly be applied during RTL
> expansion or instruction selection on x86 which also has movcc
> the multiplication is cheaper.  So I don't think this isn't the way to go.
> 
> I'd rather revert the change than trying to "fix" it this way?
WRT reverting -- the patch in question's sole purpose was to enable 
branchless sequences for that very same code.  Reverting would regress 
performance on a variety of micro-architectures.  IIUC, the issue is 
that the SiFive part in question has a fusion which allows it to do the 
branchy sequence cheaply.

ISTM this really needs to be addressed during expansion and most likely 
with a RISC-V target twiddle for the micro-archs which have 
short-forward-branch optimizations.

jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]
  2024-01-17 14:37   ` Jeff Law
@ 2024-01-18  3:19     ` Monk Chiang
  2024-01-18  3:30       ` Palmer Dabbelt
  0 siblings, 1 reply; 5+ messages in thread
From: Monk Chiang @ 2024-01-18  3:19 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, gcc-patches, apinski

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

Thanks for your advice!! I agree it should be fixed in the RISC-V backend
when expansion.


On Wed, Jan 17, 2024 at 10:37 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 1/17/24 05:14, Richard Biener wrote:
> > On Wed, 17 Jan 2024, Monk Chiang wrote:
> >
> >> This allows the backend to generate movcc instructions, if target
> >> machine has movcc pattern.
> >>
> >> branchless-cond.c needs to be updated since some target machines have
> >> conditional move instructions, and the experssion will not change to
> >> branchless expression.
> >
> > While I agree this pattern should possibly be applied during RTL
> > expansion or instruction selection on x86 which also has movcc
> > the multiplication is cheaper.  So I don't think this isn't the way to
> go.
> >
> > I'd rather revert the change than trying to "fix" it this way?
> WRT reverting -- the patch in question's sole purpose was to enable
> branchless sequences for that very same code.  Reverting would regress
> performance on a variety of micro-architectures.  IIUC, the issue is
> that the SiFive part in question has a fusion which allows it to do the
> branchy sequence cheaply.
>
> ISTM this really needs to be addressed during expansion and most likely
> with a RISC-V target twiddle for the micro-archs which have
> short-forward-branch optimizations.
>
> jeff
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]
  2024-01-18  3:19     ` Monk Chiang
@ 2024-01-18  3:30       ` Palmer Dabbelt
  0 siblings, 0 replies; 5+ messages in thread
From: Palmer Dabbelt @ 2024-01-18  3:30 UTC (permalink / raw)
  To: monk.chiang; +Cc: jeffreyalaw, rguenther, gcc-patches, apinski

On Wed, 17 Jan 2024 19:19:58 PST (-0800), monk.chiang@sifive.com wrote:
> Thanks for your advice!! I agree it should be fixed in the RISC-V backend
> when expansion.
>
>
> On Wed, Jan 17, 2024 at 10:37 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>>
>>
>> On 1/17/24 05:14, Richard Biener wrote:
>> > On Wed, 17 Jan 2024, Monk Chiang wrote:
>> >
>> >> This allows the backend to generate movcc instructions, if target
>> >> machine has movcc pattern.
>> >>
>> >> branchless-cond.c needs to be updated since some target machines have
>> >> conditional move instructions, and the experssion will not change to
>> >> branchless expression.
>> >
>> > While I agree this pattern should possibly be applied during RTL
>> > expansion or instruction selection on x86 which also has movcc
>> > the multiplication is cheaper.  So I don't think this isn't the way to
>> go.
>> >
>> > I'd rather revert the change than trying to "fix" it this way?
>> WRT reverting -- the patch in question's sole purpose was to enable
>> branchless sequences for that very same code.  Reverting would regress
>> performance on a variety of micro-architectures.  IIUC, the issue is
>> that the SiFive part in question has a fusion which allows it to do the
>> branchy sequence cheaply.
>>
>> ISTM this really needs to be addressed during expansion and most likely
>> with a RISC-V target twiddle for the micro-archs which have
>> short-forward-branch optimizations.

IIRC I ran into some of these middle-end interactions a year or two ago 
and determined that we'd need middle-end changes to get this working 
smoothly -- essentially replacing the expander checks for a MOVCC insn  
with some sort of costing.

Without that, we're just going to end up with some missed optimizations 
that favor one way or the other.

>>
>> jeff
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-18  3:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-17  9:50 [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095] Monk Chiang
2024-01-17 12:14 ` Richard Biener
2024-01-17 14:37   ` Jeff Law
2024-01-18  3:19     ` Monk Chiang
2024-01-18  3:30       ` Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).