* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
@ 2017-06-29 11:20 Wilco Dijkstra
2017-06-29 11:41 ` Richard Biener
2017-06-29 13:43 ` Jeff Law
0 siblings, 2 replies; 10+ messages in thread
From: Wilco Dijkstra @ 2017-06-29 11:20 UTC (permalink / raw)
To: Richard Biener, Naveen.Hurugalawadi; +Cc: nd, GCC Patches
Richard Biener wrote:
> Hurugalawadi, Naveen wrote:
> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
> What's the reason of this transform? I expect that the HW multiplier
> is quite fast given one operand is either zero or one and a multiplication
> is a gimple operation that's better handled in optimizations than
> COND_EXPRs which eventually expand to conditional code which
> would be much slower.
Even really fast multipliers have several cycles latency, and this is generally
fixed irrespectively of the inputs. Maybe you were thinking about division?
Additionally integer multiply typically has much lower throughput than other
ALU operations like conditional move - a modern CPU may have 4 ALUs
but only 1 multiplier, so removing redundant integer multiplies is always good.
Note (m1 > m2) is also a conditional expression which will result in branches
for floating point expressions and on some targets even for integers. Moving
the multiply into the conditional expression generates the best code:
Integer version:
f1:
cmp w0, 100
csel w0, w1, wzr, gt
ret
f2:
cmp w0, 100
cset w0, gt
mul w0, w0, w1
ret
Float version:
f3:
movi v1.2s, #0
cmp w0, 100
fcsel s0, s0, s1, gt
ret
f4:
cmp w0, 100
bgt .L8
movi v1.2s, #0
fmul s0, s0, s1 // eh???
.L8:
ret
Wilco
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-06-29 11:20 [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0 Wilco Dijkstra
@ 2017-06-29 11:41 ` Richard Biener
2017-06-29 15:10 ` Wilco Dijkstra
2017-06-29 13:43 ` Jeff Law
1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2017-06-29 11:41 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: Naveen.Hurugalawadi, nd, GCC Patches
On Thu, Jun 29, 2017 at 1:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
>> What's the reason of this transform? I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
>
> Even really fast multipliers have several cycles latency, and this is generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
>
> Additionally integer multiply typically has much lower throughput than other
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always good.
>
> Note (m1 > m2) is also a conditional expression which will result in branches
> for floating point expressions and on some targets even for integers. Moving
> the multiply into the conditional expression generates the best code:
>
> Integer version:
> f1:
> cmp w0, 100
> csel w0, w1, wzr, gt
> ret
> f2:
> cmp w0, 100
> cset w0, gt
> mul w0, w0, w1
> ret
>
> Float version:
> f3:
> movi v1.2s, #0
> cmp w0, 100
> fcsel s0, s0, s1, gt
> ret
> f4:
> cmp w0, 100
> bgt .L8
> movi v1.2s, #0
> fmul s0, s0, s1 // eh???
> .L8:
> ret
But then
int f (int m, int c)
{
return (m & 1) * c;
}
int g (int m, int c)
{
if (m & 1 != 0)
return c;
return 0;
}
f:
.LFB0:
.cfi_startproc
andl $1, %edi
movl %edi, %eax
imull %esi, %eax
ret
g:
.LFB1:
.cfi_startproc
movl %edi, %eax
andl $1, %eax
cmovne %esi, %eax
ret
anyway. As a general comment to the patch please do it as
a pattern in match.pd
(match boolean_value_range_p
@0
(if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) == 1)))
(match boolean_value_range_p
INTEGER_CST
(if (integer_zerop (t) || integer_onep (t))))
(match boolean_value_range_p
SSA_NAME
(if (INTEGRAL_TYPE_P (type)
&& ~get_nonzero_bits (t) == 1)))
(simplify
(mult:c boolean_value_range_p@0 @1)
(cond @0 @1 @0))
or something like that.
Richard.
> Wilco
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-06-29 11:20 [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0 Wilco Dijkstra
2017-06-29 11:41 ` Richard Biener
@ 2017-06-29 13:43 ` Jeff Law
1 sibling, 0 replies; 10+ messages in thread
From: Jeff Law @ 2017-06-29 13:43 UTC (permalink / raw)
To: Wilco Dijkstra, Richard Biener, Naveen.Hurugalawadi; +Cc: nd, GCC Patches
On 06/29/2017 05:20 AM, Wilco Dijkstra wrote:
> Richard Biener wrote:
>> Hurugalawadi, Naveen wrote:
>>> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
>> What's the reason of this transform? I expect that the HW multiplier
>> is quite fast given one operand is either zero or one and a multiplication
>> is a gimple operation that's better handled in optimizations than
>> COND_EXPRs which eventually expand to conditional code which
>> would be much slower.
>
> Even really fast multipliers have several cycles latency, and this is generally
> fixed irrespectively of the inputs. Maybe you were thinking about division?
And on some targets, just getting the arguments into the right register
bank is many cycles. Think HPPA where integer multiply occurs in the
floating point unit. Though I don't think that oddity should drive this
discussion.
>
> Additionally integer multiply typically has much lower throughput than other
> ALU operations like conditional move - a modern CPU may have 4 ALUs
> but only 1 multiplier, so removing redundant integer multiplies is always good.
I'd tend to agree in general.
jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-06-29 11:41 ` Richard Biener
@ 2017-06-29 15:10 ` Wilco Dijkstra
2017-06-30 8:16 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Wilco Dijkstra @ 2017-06-29 15:10 UTC (permalink / raw)
To: Richard Biener; +Cc: Naveen.Hurugalawadi, nd, GCC Patches
Richard Biener wrote:
> int f (int m, int c)
> {
> return (m & 1) * c;
> }
This case (integer[0,1] rather than boolean input) should be transformed into c & -(m & 1).
Wilco
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-06-29 15:10 ` Wilco Dijkstra
@ 2017-06-30 8:16 ` Richard Biener
2017-07-04 11:13 ` Hurugalawadi, Naveen
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2017-06-30 8:16 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: Naveen.Hurugalawadi, nd, GCC Patches
On Thu, Jun 29, 2017 at 5:10 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Richard Biener wrote:
>
>> int f (int m, int c)
>> {
>> return (m & 1) * c;
>> }
>
> This case (integer[0,1] rather than boolean input) should be transformed into c & -(m & 1).
The proposed patch handled both the same. This means the pattern
shouldn't use range-info
but instead match a more complex
(mult (convert (cmp @0 @1)) @2)
? Note that from a gimple perspective c & -(m & 1) is more complex
than (m & 1) * c
and thus non-canonical.
> Wilco
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-06-30 8:16 ` Richard Biener
@ 2017-07-04 11:13 ` Hurugalawadi, Naveen
2017-07-19 2:58 ` [PING} " Hurugalawadi, Naveen
2017-07-19 6:09 ` Jeff Law
0 siblings, 2 replies; 10+ messages in thread
From: Hurugalawadi, Naveen @ 2017-07-04 11:13 UTC (permalink / raw)
To: Richard Biener, Wilco Dijkstra; +Cc: nd, GCC Patches
[-- Attachment #1: Type: text/plain, Size: 610 bytes --]
Hi,
Thanks for the review and comments on the patch.
>> The proposed patch handled both the same. This means the pattern
>> shouldn't use range-info but instead match a more complex
The patch handles as per the discussion by matching the pattern
in match.pd.
Bootstrapped and Regression tested on AArch64 and X86_64.
Please review the patch and let us know if its okay?
Thanks,
Naveen
2017-07-04 Naveen H.S <Naveen.Hurugalawadi@cavium.com>
gcc
* match.pd (((m1 >/</>=/<= m2) * d -> (m1 >/</>=/<= m2) ? d : 0) New
pattern.
gcc/testsuite
* gcc.dg/tree-ssa/vrp116.c: New Test.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mul-match.patch --]
[-- Type: text/x-diff; name="mul-match.patch", Size: 1030 bytes --]
diff --git a/gcc/match.pd b/gcc/match.pd
index 4c64b21..d914db1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1088,6 +1088,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(convert (bit_and (bit_not @1) @0))))
+/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0 */
+(for cmp (gt lt ge le)
+(simplify
+ (mult (convert (cmp @0 @1)) @2)
+ (cond (cmp @0 @1) @2 { build_zero_cst (type); })))
+
/* For integral types with undefined overflow and C != 0 fold
x * C EQ/NE y * C into x EQ/NE y. */
(for cmp (eq ne)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
new file mode 100644
index 0000000..d9d7b23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+int
+f (int m1, int m2, int c)
+{
+ int d = m1 > m2;
+ int e = d * c;
+ return e ? m1 : m2;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PING} [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-07-04 11:13 ` Hurugalawadi, Naveen
@ 2017-07-19 2:58 ` Hurugalawadi, Naveen
2017-07-19 6:09 ` Jeff Law
1 sibling, 0 replies; 10+ messages in thread
From: Hurugalawadi, Naveen @ 2017-07-19 2:58 UTC (permalink / raw)
To: Richard Biener, Wilco Dijkstra; +Cc: nd, GCC Patches
Hi,
Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00178.html
Thanks,
Naveen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-07-04 11:13 ` Hurugalawadi, Naveen
2017-07-19 2:58 ` [PING} " Hurugalawadi, Naveen
@ 2017-07-19 6:09 ` Jeff Law
1 sibling, 0 replies; 10+ messages in thread
From: Jeff Law @ 2017-07-19 6:09 UTC (permalink / raw)
To: Hurugalawadi, Naveen, Richard Biener, Wilco Dijkstra; +Cc: nd, GCC Patches
On 07/04/2017 05:13 AM, Hurugalawadi, Naveen wrote:
> Hi,
>
> Thanks for the review and comments on the patch.
>
>>> The proposed patch handled both the same. This means the pattern
>>> shouldn't use range-info but instead match a more complex
>
> The patch handles as per the discussion by matching the pattern
> in match.pd.
>
> Bootstrapped and Regression tested on AArch64 and X86_64.
> Please review the patch and let us know if its okay?
>
> Thanks,
> Naveen
>
> 2017-07-04 Naveen H.S <Naveen.Hurugalawadi@cavium.com>
>
> gcc
> * match.pd (((m1 >/</>=/<= m2) * d -> (m1 >/</>=/<= m2) ? d : 0) New
> pattern.
>
> gcc/testsuite
> * gcc.dg/tree-ssa/vrp116.c: New Test.
>
OK for the trunk. Sorry for the delay.
Presumably EQ/NE are handled by a different pattern?
Jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
2017-06-29 5:06 Hurugalawadi, Naveen
@ 2017-06-29 9:08 ` Richard Biener
0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2017-06-29 9:08 UTC (permalink / raw)
To: Hurugalawadi, Naveen; +Cc: gcc-patches
On Thu, Jun 29, 2017 at 7:06 AM, Hurugalawadi, Naveen
<Naveen.Hurugalawadi@cavium.com> wrote:
> Hi,
>
> The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
>
> The patch optimizes it inside tree-vrp.c when simplifying with the range
> inside simplify_stmt_using_ranges. If a multiply is found and either side
> has a range [0,1], then transform it.
>
> Ex:- d * c where d has a range of [0,1] transform it to be
> COND_EXPR(d != 0, c, 0).
> The other optimization passes should prop m1 > m2.
>
> Bootstrapped and Regression tested on AArch64 and X86_64.
> Please review the patch and let us know if its okay?
What's the reason of this transform? I expect that the HW multiplier
is quite fast given one operand is either zero or one and a multiplication
is a gimple operation that's better handled in optimizations than
COND_EXPRs which eventually expand to conditional code which
would be much slower.
Thanks,
Richard.
> Thanks,
> Naveen
>
> 2017-06-28 Naveen H.S <Naveen.Hurugalawadi@cavium.com>
>
> gcc
> * tree-vrp.c (simplify_stmt_using_ranges): Add case for
> optimizing a case of multiplication.
> (simplify_mult_ops_using_ranges): New.
>
> gcc/testsuite
> * gcc.dg/tree-ssa/vrp116.c: New Test.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0
@ 2017-06-29 5:06 Hurugalawadi, Naveen
2017-06-29 9:08 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Hurugalawadi, Naveen @ 2017-06-29 5:06 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 809 bytes --]
Hi,
The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0.
The patch optimizes it inside tree-vrp.c when simplifying with the range
inside simplify_stmt_using_ranges. If a multiply is found and either side
has a range [0,1], then transform it.
Ex:- d * c where d has a range of [0,1] transform it to be
COND_EXPR(d != 0, c, 0).
The other optimization passes should prop m1 > m2.
Bootstrapped and Regression tested on AArch64 and X86_64.
Please review the patch and let us know if its okay?
Thanks,
Naveen
2017-06-28 Naveen H.S <Naveen.Hurugalawadi@cavium.com>
gcc
* tree-vrp.c (simplify_stmt_using_ranges): Add case for
optimizing a case of multiplication.
(simplify_mult_ops_using_ranges): New.
gcc/testsuite
* gcc.dg/tree-ssa/vrp116.c: New Test.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mul-vrp.patch --]
[-- Type: text/x-diff; name="mul-vrp.patch", Size: 2329 bytes --]
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
new file mode 100644
index 0000000..d9d7b23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+int
+f (int m1, int m2, int c)
+{
+ int d = m1 > m2;
+ int e = d * c;
+ return e ? m1 : m2;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 9ca3924..291b87f 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9146,6 +9146,46 @@ vrp_visit_phi_node (gphi *phi)
return SSA_PROP_NOT_INTERESTING;
}
+static bool
+simplify_mult_ops_using_ranges (gimple_stmt_iterator * gsi, gimple *stmt)
+{
+ enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+ tree op0, op1, lhs;
+
+ op0 = gimple_assign_rhs1 (stmt);
+ op1 = gimple_assign_rhs2 (stmt);
+ lhs = gimple_assign_lhs (stmt);
+
+ if (!op_with_boolean_value_range_p (op0)
+ && !op_with_boolean_value_range_p (op1))
+ return false;
+
+ if (rhs_code == MULT_EXPR)
+ {
+ if (op_with_boolean_value_range_p (op0))
+ {
+ tree t = build_int_cst (TREE_TYPE (lhs), 0);
+ tree tmp = build3 (COND_EXPR, TREE_TYPE (lhs),
+ build2 (NE_EXPR, boolean_type_node, op0, t),
+ op1, t);
+ gimple *new_assign = gimple_build_assign (lhs, tmp);
+ gsi_replace (gsi, new_assign, true);
+ return true;
+ }
+ if (op_with_boolean_value_range_p (op1))
+ {
+ tree t = build_int_cst (TREE_TYPE (lhs), 0);
+ tree tmp = build3 (COND_EXPR, TREE_TYPE (lhs),
+ build2 (NE_EXPR, boolean_type_node, op1, t),
+ op0, t);
+ gimple *new_assign = gimple_build_assign (lhs, tmp);
+ gsi_replace (gsi, new_assign, true);
+ return true;
+ }
+ }
+ return false;
+}
+
/* Simplify boolean operations if the source is known
to be already a boolean. */
static bool
@@ -10345,6 +10385,11 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
return simplify_div_or_mod_using_ranges (gsi, stmt);
break;
+ case MULT_EXPR:
+ if (INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
+ return simplify_mult_ops_using_ranges (gsi, stmt);
+ break;
+
/* Transform ABS (X) into X or -X as appropriate. */
case ABS_EXPR:
if (TREE_CODE (rhs1) == SSA_NAME
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-07-19 6:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29 11:20 [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0 Wilco Dijkstra
2017-06-29 11:41 ` Richard Biener
2017-06-29 15:10 ` Wilco Dijkstra
2017-06-30 8:16 ` Richard Biener
2017-07-04 11:13 ` Hurugalawadi, Naveen
2017-07-19 2:58 ` [PING} " Hurugalawadi, Naveen
2017-07-19 6:09 ` Jeff Law
2017-06-29 13:43 ` Jeff Law
-- strict thread matches above, loose matches on Subject: below --
2017-06-29 5:06 Hurugalawadi, Naveen
2017-06-29 9:08 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).