public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y
@ 2022-02-22  3:57 Zhao Wei Liew
  2022-02-22  4:00 ` Zhao Wei Liew
  0 siblings, 1 reply; 7+ messages in thread
From: Zhao Wei Liew @ 2022-02-22  3:57 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

Hi,

This is a partial optimization for PR103855.

Initially, I looked into optimizing RTL generation or a more complex
GIMPLE transformation so that we could optimize other cases as well,
such as ((unsigned long long) short / int).

However, that is a bit too complex for now. While I continue to look
into that change, I've decided to implement this simpler match.pd
transformation.

Greatly appreciate any feedback on this patch or guidance for
implementing the more advanced optimizations!

Thanks,
Zhao Wei

[-- Attachment #2: 0001-tree-optimization-PR103855-Fold-type-X-type-Y.patch --]
[-- Type: application/octet-stream, Size: 3754 bytes --]

From dd3bb05cd7be72d080598cb693549ac74d5cb02d Mon Sep 17 00:00:00 2001
From: Zhao Wei Liew <zhaoweiliew@gmail.com>
Date: Sat, 19 Feb 2022 16:28:38 +0800
Subject: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y

This pattern converts (trunc_div (convert a) (convert b)) to
(convert (trunc_div a b)) when:

1. type, a, and b all have unsigned integeral types
2. a and b have the same type precision
3. type has type precision at least as larger as a and b

This is useful as wider divisions are typically more expensive.

To illustrate the effects, consider the following code snippet:

unsigned long long f(unsigned int a, unsigned int b) {
	unsigned long long all = a;
	return all / b;
}

Without the pattern, g++ -std=c++20 -O2 generates the following
assembly:

f(unsigned int, unsigned int):
	mov eax, edi
	mov esi, esi
	xor edx, edx
	div rsi
	ret

With the pattern, it generates this:

f(unsigned int, unsigned int):
	mov eax, edi
	xor edx, edx
	div esi
	ret

This is identical to what clang++ -std=c++20 -O2 generates.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>

	PR tree-optimization/103855

gcc/ChangeLog:

	* match.pd: Add pattern for (type)X / (type)Y.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/divide-8.c: New test.
	* gcc.dg/tree-ssa/divide-9.c: New test.
---
 gcc/match.pd                             | 15 +++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c |  9 +++++++++
 gcc/testsuite/gcc.dg/tree-ssa/divide-9.c |  9 +++++++++
 3 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-9.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 10f62284862..393b43756dd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -684,6 +684,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   (convert (trunc_mod @0 @1))))
 
+/* (type)X / (type)Y -> (type)(X / Y)
+   when the resulting type is at least precise as the original types
+   and when all the types are unsigned integral types. */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+      && TYPE_UNSIGNED (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && TYPE_UNSIGNED (TREE_TYPE (@1))
+      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
+  (convert (trunc_div @0 @1))))
+
 /* x * (1 + y / x) - y -> x - y % x */
 (simplify
  (minus (mult:cs @0 (plus:s (trunc_div:s @1 @0) integer_onep)) @1)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
new file mode 100644
index 00000000000..dc3dc9ca769
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/103855 */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned int f(unsigned int a, unsigned int b) {
+    unsigned long long all = a;
+    return all / b;
+}
+
+/* { dg-final { scan-tree-dump-not "\\\(long long unsigned int\\\)" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
new file mode 100644
index 00000000000..6986b5484e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/103855 */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned long long f(unsigned int a, unsigned int b) {
+    unsigned long long all = a;
+    return all / b;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\(long long unsigned int\\\)" 1 "optimized" } } */
-- 
2.35.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y
  2022-02-22  3:57 [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y Zhao Wei Liew
@ 2022-02-22  4:00 ` Zhao Wei Liew
  2022-02-22  7:53   ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Zhao Wei Liew @ 2022-02-22  4:00 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

On Tue, 22 Feb 2022 at 11:57, Zhao Wei Liew <zhaoweiliew@gmail.com> wrote:
>
> Hi,
>
> This is a partial optimization for PR103855.
>
> Initially, I looked into optimizing RTL generation or a more complex
> GIMPLE transformation so that we could optimize other cases as well,
> such as ((unsigned long long) short / int).
>
> However, that is a bit too complex for now. While I continue to look
> into that change, I've decided to implement this simpler match.pd
> transformation.
>
> Greatly appreciate any feedback on this patch or guidance for
> implementing the more advanced optimizations!
>
> Thanks,
> Zhao Wei

Sorry, the original patch wasn't recognized as a text file. I've added
a .txt extension to make it explicit.

[-- Attachment #2: 0001-tree-optimization-PR103855-Fold-type-X-type-Y.patch.txt --]
[-- Type: text/plain, Size: 3754 bytes --]

From dd3bb05cd7be72d080598cb693549ac74d5cb02d Mon Sep 17 00:00:00 2001
From: Zhao Wei Liew <zhaoweiliew@gmail.com>
Date: Sat, 19 Feb 2022 16:28:38 +0800
Subject: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y

This pattern converts (trunc_div (convert a) (convert b)) to
(convert (trunc_div a b)) when:

1. type, a, and b all have unsigned integeral types
2. a and b have the same type precision
3. type has type precision at least as larger as a and b

This is useful as wider divisions are typically more expensive.

To illustrate the effects, consider the following code snippet:

unsigned long long f(unsigned int a, unsigned int b) {
	unsigned long long all = a;
	return all / b;
}

Without the pattern, g++ -std=c++20 -O2 generates the following
assembly:

f(unsigned int, unsigned int):
	mov eax, edi
	mov esi, esi
	xor edx, edx
	div rsi
	ret

With the pattern, it generates this:

f(unsigned int, unsigned int):
	mov eax, edi
	xor edx, edx
	div esi
	ret

This is identical to what clang++ -std=c++20 -O2 generates.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>

	PR tree-optimization/103855

gcc/ChangeLog:

	* match.pd: Add pattern for (type)X / (type)Y.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/divide-8.c: New test.
	* gcc.dg/tree-ssa/divide-9.c: New test.
---
 gcc/match.pd                             | 15 +++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c |  9 +++++++++
 gcc/testsuite/gcc.dg/tree-ssa/divide-9.c |  9 +++++++++
 3 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-9.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 10f62284862..393b43756dd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -684,6 +684,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   (convert (trunc_mod @0 @1))))
 
+/* (type)X / (type)Y -> (type)(X / Y)
+   when the resulting type is at least precise as the original types
+   and when all the types are unsigned integral types. */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+      && TYPE_UNSIGNED (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && TYPE_UNSIGNED (TREE_TYPE (@1))
+      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
+  (convert (trunc_div @0 @1))))
+
 /* x * (1 + y / x) - y -> x - y % x */
 (simplify
  (minus (mult:cs @0 (plus:s (trunc_div:s @1 @0) integer_onep)) @1)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
new file mode 100644
index 00000000000..dc3dc9ca769
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/103855 */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned int f(unsigned int a, unsigned int b) {
+    unsigned long long all = a;
+    return all / b;
+}
+
+/* { dg-final { scan-tree-dump-not "\\\(long long unsigned int\\\)" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
new file mode 100644
index 00000000000..6986b5484e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/103855 */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned long long f(unsigned int a, unsigned int b) {
+    unsigned long long all = a;
+    return all / b;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\(long long unsigned int\\\)" 1 "optimized" } } */
-- 
2.35.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y
  2022-02-22  4:00 ` Zhao Wei Liew
@ 2022-02-22  7:53   ` Richard Biener
  2022-03-17  0:49     ` [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855] Zhao Wei Liew
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2022-02-22  7:53 UTC (permalink / raw)
  To: Zhao Wei Liew; +Cc: GCC Patches

On Tue, Feb 22, 2022 at 5:01 AM Zhao Wei Liew via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, 22 Feb 2022 at 11:57, Zhao Wei Liew <zhaoweiliew@gmail.com> wrote:
> >
> > Hi,
> >
> > This is a partial optimization for PR103855.
> >
> > Initially, I looked into optimizing RTL generation or a more complex
> > GIMPLE transformation so that we could optimize other cases as well,
> > such as ((unsigned long long) short / int).
> >
> > However, that is a bit too complex for now. While I continue to look
> > into that change, I've decided to implement this simpler match.pd
> > transformation.
> >
> > Greatly appreciate any feedback on this patch or guidance for
> > implementing the more advanced optimizations!
> >
> > Thanks,
> > Zhao Wei
>
> Sorry, the original patch wasn't recognized as a text file. I've added
> a .txt extension to make it explicit.

A few comments - note the change is only appropriate for next stage1.
Since we're
currently in a regression fixing period reviews can be slow - do not
hesitate to ping
forgotten patches when stage1 opens again.

+/* (type)X / (type)Y -> (type)(X / Y)
+   when the resulting type is at least precise as the original types
+   and when all the types are unsigned integral types. */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+      && TYPE_UNSIGNED (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && TYPE_UNSIGNED (TREE_TYPE (@1))
+      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
+  (convert (trunc_div @0 @1))))

since you are requiring the types of @0 and @1 to match it's easier to write

     && types_match (TREE_TYPE(@0), TREE_TYPE (@1))

that allows you to elide checks on either @0 or @1.  I suppose the transform
does not work for signed types because of the -INT_MIN / -1 overflow case?
It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
expr_not_equal_to (@1, -1)
(correctly written, lookup the existing examples in match.pd for the X
% -Y transform)

I'll note that as written the transform will not catch CST / (T)x or
(T)x / CST since
you'll not see conversions around constants.  I'm not sure whether
using (convert[12]? ...)
or writing special patterns with INTEGER_CST operands is more convenient.
There is int_fits_type_p to check whether a constant will fit in a
type without truncation
or sign change.

When @0 and @1 do not have the same type there might still be a common type
that can be used and is smaller than 'type', it might be as simple as using
build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).

In the past there have been attempts to more globally narrow operations using
a new pass rather than using individual patterns.  So for more complicated cases
that might be the way to go.  There's now also the ISEL pass which does
pre-RTL expansion transforms that need some global context and for example
can look at SSA uses.

Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]
  2022-02-22  7:53   ` Richard Biener
@ 2022-03-17  0:49     ` Zhao Wei Liew
  2022-07-09 18:19       ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Zhao Wei Liew @ 2022-03-17  0:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3309 bytes --]

Thanks for the detailed review. I have uploaded patch v2 based on the review.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html
Changes since v1:
1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix
test regressions caused by those patterns.
2. Support signed integers except for the possible INT_MIN / -1 case.
3. Support cases where X and Y have different precisions.

On Tue, 22 Feb 2022 at 15:54, Richard Biener <richard.guenther@gmail.com> wrote:
> +/* (type)X / (type)Y -> (type)(X / Y)
> +   when the resulting type is at least precise as the original types
> +   and when all the types are unsigned integral types. */
> +(simplify
> + (trunc_div (convert @0) (convert @1))
> + (if (INTEGRAL_TYPE_P (type)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +      && TYPE_UNSIGNED (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && TYPE_UNSIGNED (TREE_TYPE (@1))
> +      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
> +      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
> +  (convert (trunc_div @0 @1))))
>
> since you are requiring the types of @0 and @1 to match it's easier to write
>
>      && types_match (TREE_TYPE(@0), TREE_TYPE (@1))
>

Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now
have different precisions, so I believe that I can't use types_match()
anymore.

> that allows you to elide checks on either @0 or @1.  I suppose the transform
> does not work for signed types because of the -INT_MIN / -1 overflow case?
> It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
> expr_not_equal_to (@1, -1)
> (correctly written, lookup the existing examples in match.pd for the X
> % -Y transform)

That's right. I have made changes to support signed types except for
the INT_MIN / -1 case.

> I'll note that as written the transform will not catch CST / (T)x or
> (T)x / CST since
> you'll not see conversions around constants.  I'm not sure whether
> using (convert[12]? ...)
> or writing special patterns with INTEGER_CST operands is more convenient.
> There is int_fits_type_p to check whether a constant will fit in a
> type without truncation
> or sign change.

I see. I couldn't find an easy way to use (convert[12]? ...) in this
case so I added 2 other patterns for the CST / (T)x and (T)x / CST
cases.

> When @0 and @1 do not have the same type there might still be a common type
> that can be used and is smaller than 'type', it might be as simple as using
> build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).

Thanks for the tip. I've made a similar change, but using either
TREE_TYPE (@0) or TREE_TYPE (@1) instead of
build_nonstandard_integer_type().

>
> In the past there have been attempts to more globally narrow operations using
> a new pass rather than using individual patterns.  So for more complicated cases
> that might be the way to go.  There's now also the ISEL pass which does
> pre-RTL expansion transforms that need some global context and for example
> can look at SSA uses.

I had wanted to do something similar to handle all integral
trunc_divs, but I'm not sure where/how to start. It seems out of my
league at this moment. I'll gladly explore it after this change
though!

[-- Attachment #2: 0001-tree-optimization-PR103855-Fold-type-X-type-Y.patch.txt --]
[-- Type: text/plain, Size: 6817 bytes --]

From f5f768b55f6cadcd9eba459561abfc1d7e28f94e Mon Sep 17 00:00:00 2001
From: Zhao Wei Liew <zhaoweiliew@gmail.com>
Date: Sat, 19 Feb 2022 16:28:38 +0800
Subject: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y

This pattern converts (trunc_div (convert X) (convert Y)) to
(convert (trunc_div X Y)) when:

1. type, X, and Y are all INTEGRAL_TYPES with the same signedness.
2. type has TYPE_PRECISION at least as large as those of X and Y.
3. the result of the division is guaranteed to fit in either
   the type of Y or the type of Y.

This patch also adds corresponding patterns for CST / (type)Y
and (type)X / CST.

These patterns useful as wider divisions are typically more expensive.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>

	PR tree-optimization/103855

gcc/ChangeLog:

	* match.pd: Add patterns for (type)X / (type)Y,
          CST / (type)Y, and (type)X / CST.

gcc/testsuite/ChangeLog:

	* gcc.dg/ipa/pr91088.c: Fix a test regression.
	* gcc.dg/tree-ssa/divide-8.c: New test.
---
 gcc/match.pd                             | 57 ++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/ipa/pr91088.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c | 45 +++++++++++++++++++
 3 files changed, 104 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8b44b5cc92c..b0bfb94f506 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -687,6 +687,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   (convert (trunc_mod @0 @1))))
 
+/* (type)X / (type)Y -> (type)(X / Y)
+   when all types are integral and have the same sign and
+   the resulting type is at least precise as the original types, */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@1))
+      && TYPE_UNSIGNED (type) == TYPE_UNSIGNED (TREE_TYPE (@0))
+      && TYPE_UNSIGNED (type) == TYPE_UNSIGNED (TREE_TYPE (@1)))
+  (with
+   {
+    tree stype = TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
+                 ? TREE_TYPE (@0) : TREE_TYPE (@1);
+   }
+   (if (TYPE_UNSIGNED (type)
+        /* Avoid this transformation if X might be INT_MIN of the larger type or
+           Y might be -1 because the result might overflow. */
+        || TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (stype)
+        || expr_not_equal_to (@0, wi::to_wide (TYPE_MIN_VALUE (TREE_TYPE (@0))))
+        || expr_not_equal_to (@1, wi::minus_one (TYPE_PRECISION (TREE_TYPE (@1)))))
+    (convert (trunc_div (convert:stype @0) (convert:stype @1)))))))
+
+/* (type)X / CST -> (type)(X / CST) */
+(simplify
+ (trunc_div (convert @0) INTEGER_CST@1)
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (type)
+      && int_fits_type_p (@1, TREE_TYPE (@0))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0))
+      && (TYPE_UNSIGNED (type)
+       /* Avoid this transformation if X might be MIN_VALUE or
+          Y might be -1 because the result might overflow. */
+       || expr_not_equal_to (@0, wi::to_wide (TYPE_MIN_VALUE (TREE_TYPE (@0))))
+       || wi::to_wide (@1) != -1))
+   (with { tree stype = TREE_TYPE (@0); }
+    (convert (trunc_div @0 (convert:stype @1))))))
+
+/* CST / (type)Y -> (type)(CST / Y) */
+(simplify
+ (trunc_div INTEGER_CST@0 (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+      && TYPE_UNSIGNED (TREE_TYPE (@1)) == TYPE_UNSIGNED (type)
+      && int_fits_type_p (@0, TREE_TYPE (@1))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@1))
+      && (TYPE_UNSIGNED (type)
+          /* Avoid this transformation if X might be MIN_VALUE or
+             Y might be -1 because the result might overflow. */
+          || !tree_int_cst_equal (TYPE_MIN_VALUE (TREE_TYPE (@1)), @0)
+          || expr_not_equal_to (@1, wi::minus_one (TYPE_PRECISION (TREE_TYPE (@1))))))
+   (with { tree stype = TREE_TYPE (@1); }
+    (convert (trunc_div (convert:stype @0) @1)))))
+
 /* x * (1 + y / x) - y -> x - y % x */
 (simplify
  (minus (mult:cs @0 (plus:s (trunc_div:s @1 @0) integer_onep)) @1)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr91088.c b/gcc/testsuite/gcc.dg/ipa/pr91088.c
index cc146a88134..d7332974e37 100644
--- a/gcc/testsuite/gcc.dg/ipa/pr91088.c
+++ b/gcc/testsuite/gcc.dg/ipa/pr91088.c
@@ -60,7 +60,7 @@ int callee1 (struct A a)
   if ((a.f2 + 7) & 17)
     foo ();
 
-  if ((1300 / (short)a.f3) == 19)
+  if ((1300 / (a.f3 & 0xffff)) == 19)
     large_code;
 
   return 1;
@@ -115,6 +115,6 @@ int caller ()
 /* { dg-final { scan-ipa-dump-times "Creating a specialized node of callee1" 1 "cp" } } */
 /* { dg-final { scan-ipa-dump-times "Creating a specialized node of callee2" 1 "cp" } } */
 /* { dg-final { scan-ipa-dump-times "Creating a specialized node of callee3" 1 "cp" } } */
-/* { dg-final { scan-ipa-dump "op0\\\[offset: 32],\\(\\(short int\\) #\\),\\(\\(int\\) #\\),\\(1300 / #\\) == 19" "cp" } } */
+/* { dg-final { scan-ipa-dump "op0\\\[offset: 32],\\(# & 65535\\),\\(1300 / #\\) == 19" "cp" } } */
 /* { dg-final { scan-ipa-dump "op0\\\[ref offset: 0],\\(# \\^ 1\\) <" "cp" } } */
 /* { dg-final { scan-ipa-dump "op0,\\(# & 255\\),\\(1 - #\\),\\(# \\* 3\\),\\(27 % #\\) <" "cp" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
new file mode 100644
index 00000000000..e389e85a04f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
@@ -0,0 +1,45 @@
+/* PR tree-optimization/103855 */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned int f1(unsigned int a, unsigned int b) {
+	unsigned long long all = a;
+	return all / b;
+}
+
+unsigned int f2(unsigned char a, unsigned int b) {
+	unsigned long long all = a;
+	return all / b;
+}
+
+unsigned int f3(unsigned int a, unsigned char b) {
+	unsigned long long all = a;
+	return all / b;
+}
+
+int f4(char a, int b) {
+	long long all = a;
+	return all / b;
+}
+
+unsigned int f5(unsigned int a) {
+	unsigned long long all = a;
+	return all / 123456789;
+}
+
+unsigned int f6(unsigned int b) {
+	unsigned long long bll = b;
+	return 123456789 / bll;
+}
+
+int f7(int a) {
+  long long all = a;
+	return all / 123456789;
+}
+
+int f8(int b) {
+	long long bll = b;
+	return 123456789 / bll;
+}
+
+/* { dg-final { scan-tree-dump-not "\\\(long long unsigned int\\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\\(long long int\\\)" "optimized" } } */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]
  2022-03-17  0:49     ` [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855] Zhao Wei Liew
@ 2022-07-09 18:19       ` Jeff Law
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Law @ 2022-07-09 18:19 UTC (permalink / raw)
  To: gcc-patches, Zhao Wei Liew



On 3/16/2022 6:49 PM, Zhao Wei Liew via Gcc-patches wrote:
> Thanks for the detailed review. I have uploaded patch v2 based on the review.
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html
> Changes since v1:
> 1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix
> test regressions caused by those patterns.
> 2. Support signed integers except for the possible INT_MIN / -1 case.
> 3. Support cases where X and Y have different precisions.
>
> On Tue, 22 Feb 2022 at 15:54, Richard Biener <richard.guenther@gmail.com> wrote:
>> +/* (type)X / (type)Y -> (type)(X / Y)
>> +   when the resulting type is at least precise as the original types
>> +   and when all the types are unsigned integral types. */
>> +(simplify
>> + (trunc_div (convert @0) (convert @1))
>> + (if (INTEGRAL_TYPE_P (type)
>> +      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>> +      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>> +      && TYPE_UNSIGNED (type)
>> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
>> +      && TYPE_UNSIGNED (TREE_TYPE (@1))
>> +      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
>> +      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
>> +  (convert (trunc_div @0 @1))))
>>
>> since you are requiring the types of @0 and @1 to match it's easier to write
>>
>>       && types_match (TREE_TYPE(@0), TREE_TYPE (@1))
>>
> Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now
> have different precisions, so I believe that I can't use types_match()
> anymore.
>
>> that allows you to elide checks on either @0 or @1.  I suppose the transform
>> does not work for signed types because of the -INT_MIN / -1 overflow case?
>> It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
>> expr_not_equal_to (@1, -1)
>> (correctly written, lookup the existing examples in match.pd for the X
>> % -Y transform)
> That's right. I have made changes to support signed types except for
> the INT_MIN / -1 case.
>
>> I'll note that as written the transform will not catch CST / (T)x or
>> (T)x / CST since
>> you'll not see conversions around constants.  I'm not sure whether
>> using (convert[12]? ...)
>> or writing special patterns with INTEGER_CST operands is more convenient.
>> There is int_fits_type_p to check whether a constant will fit in a
>> type without truncation
>> or sign change.
> I see. I couldn't find an easy way to use (convert[12]? ...) in this
> case so I added 2 other patterns for the CST / (T)x and (T)x / CST
> cases.
>
>> When @0 and @1 do not have the same type there might still be a common type
>> that can be used and is smaller than 'type', it might be as simple as using
>> build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).
> Thanks for the tip. I've made a similar change, but using either
> TREE_TYPE (@0) or TREE_TYPE (@1) instead of
> build_nonstandard_integer_type().
>
>> In the past there have been attempts to more globally narrow operations using
>> a new pass rather than using individual patterns.  So for more complicated cases
>> that might be the way to go.  There's now also the ISEL pass which does
>> pre-RTL expansion transforms that need some global context and for example
>> can look at SSA uses.
> I had wanted to do something similar to handle all integral
> trunc_divs, but I'm not sure where/how to start. It seems out of my
> league at this moment. I'll gladly explore it after this change
> though!
Just a couple general notes in case you want to poke further in this space.

Earlier you mentioned you were trying to do some of this work in 
expand_divmod, but the operands had rtx types rather than tree types, 
thus you're unable to get at the range data.

In expand_expr_divmod there's treeop0, treeop1.  So if they are 
SSA_NAMEs, then you can query for their range.

Richi mentioned attempts to globally narrow during a new pass. That's a 
much broader chance, but has the potential to apply in a lot more 
places.    Narrowing early would potentially expose the narrowed 
statements to the vectorizer which could be a win. Narrowing late is 
likely a win too, but it likely only helps the expansion phase generate 
narrower operations.  This is obviously a larger project than just 
handling the division cases in match.pd. Tackling it is not (IMHO) a 
requirement to move forward.

Finally, narrowing isn't always a win.  There have been 
micro-architectures were sub-word accesses are slower than word 
accesses.  I'm not too worried about it for this patch, but it could 
become more important if you were to look into a more general solution.

As far as the patch is concerned.  At one point I was a bit worried that 
expr_not_equal_to wasn't right.  But after digging a bit deeper into its 
implementation, it should do the right thing here.  Note if you wanted 
to see how to get to underlying range data, you can find a simple 
example in that function.


Your new testcase needs to limit itself to targets where integers are at 
least 32 bits wide.    Something like this at the top of the new test:

/* { dg-require-effective-target int32plus } */

There are additional cases Andrew Pinski pointed out in BZ.  Does your 
new code handle all of them.  It's OK if it doesn't, but obviously not 
optimal.  Consider adding his additional testcases, potentially xfailed 
if they're not handled yet.

You changed gcc.dg/ipa/pr91088.c and the ChangeLog indicates "fix a test 
regression".   Can you say a bit more about what's going on there.  It's 
likely find, but we want to make sure that your change hasn't 
compromised the test.

Otherwise it looks pretty good.

jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y
  2022-02-19  9:36 [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y Zhao Wei Liew
@ 2022-02-19 10:05 ` Zhao Wei Liew
  0 siblings, 0 replies; 7+ messages in thread
From: Zhao Wei Liew @ 2022-02-19 10:05 UTC (permalink / raw)
  To: GCC Patches


> On 19 Feb 2022, at 5:36 PM, Zhao Wei Liew <zhaoweiliew@gmail.com> wrote:
> 
> This pattern converts (trunc_div (convert a) (convert b)) to
> (convert (trunc_div a b)) when:
> 
> 1. type, a, and b all have unsigned integeral types
> 2. a and b have the same type precision
> 3. type has type precision at least as larger as a and b
> 
> This is useful as wider divisions are typically more expensive.
> 
> To illustrate the effects, consider the following code snippet:
> 
> unsigned long long f(unsigned int a, unsigned int b) {
> 	unsigned long long all = a;
> 	return all / b;
> }
> 
> Without the pattern, g++ -std=c++20 -O2 generates the following
> assembly:
> 
> f(unsigned int, unsigned int):
> 	mov eax, edi
> 	mov esi, esi
> 	xor edx, edx
> 	div rsi
> 	ret
> 
> With the pattern, it generates this:
> 
> f(unsigned int, unsigned int):
> 	mov eax, edi
> 	xor edx, edx
> 	div esi
> 	ret
> 
> This is identical to what clang++ -std=c++20 -O2 generates.
> 
> Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
> 
> 	PR tree-optimization/103855
> 
> gcc/ChangeLog:
> 
> 	* match.pd: Add pattern for (type)X / (type)Y.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/tree-ssa/divide-8.c: New test.
> 	* gcc.dg/tree-ssa/divide-9.c: New test.
> ---
> gcc/match.pd                             | 15 +++++++++++++++
> gcc/testsuite/gcc.dg/tree-ssa/divide-8.c |  9 +++++++++
> gcc/testsuite/gcc.dg/tree-ssa/divide-9.c | 10 ++++++++++
> 3 files changed, 34 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 10f62284862..393b43756dd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -684,6 +684,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
>   (convert (trunc_mod @0 @1))))
> 
> +/* (type)X / (type)Y -> (type)(X / Y)
> +   when the resulting type is at least precise as the original types
> +   and when all the types are unsigned integral types. */
> +(simplify
> + (trunc_div (convert @0) (convert @1))
> + (if (INTEGRAL_TYPE_P (type)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +      && TYPE_UNSIGNED (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && TYPE_UNSIGNED (TREE_TYPE (@1))
> +      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
> +      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
> +  (convert (trunc_div @0 @1))))
> +
> /* x * (1 + y / x) - y -> x - y % x */
> (simplify
>  (minus (mult:cs @0 (plus:s (trunc_div:s @1 @0) integer_onep)) @1)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
> new file mode 100644
> index 00000000000..489604c4eb6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/103855 */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +unsigned int f(unsigned int a, unsigned int b) {
> +    unsigned long long all = a;
> +    return all / b;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "\(unsigned long long int)" "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
> new file mode 100644
> index 00000000000..3e75a49b509
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
> @@ -0,0 +1,10 @@
> +/* PR tree-optimization/103855 */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +unsigned long long f(unsigned int a, unsigned int b) {
> +    unsigned long long all = a;
> +    return all / b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\\(unsigned long long int\\\)" 1 "optimized" } } */
> +
> -- 
> 2.35.1
> 

Sorry, I noticed issues with the test cases when running a regression test.
I’ll complete regression testing before uploading a v2.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y
@ 2022-02-19  9:36 Zhao Wei Liew
  2022-02-19 10:05 ` Zhao Wei Liew
  0 siblings, 1 reply; 7+ messages in thread
From: Zhao Wei Liew @ 2022-02-19  9:36 UTC (permalink / raw)
  To: GCC Patches

This pattern converts (trunc_div (convert a) (convert b)) to
(convert (trunc_div a b)) when:

1. type, a, and b all have unsigned integeral types
2. a and b have the same type precision
3. type has type precision at least as larger as a and b

This is useful as wider divisions are typically more expensive.

To illustrate the effects, consider the following code snippet:

unsigned long long f(unsigned int a, unsigned int b) {
	unsigned long long all = a;
	return all / b;
}

Without the pattern, g++ -std=c++20 -O2 generates the following
assembly:

f(unsigned int, unsigned int):
	mov eax, edi
	mov esi, esi
	xor edx, edx
	div rsi
	ret

With the pattern, it generates this:

f(unsigned int, unsigned int):
	mov eax, edi
	xor edx, edx
	div esi
	ret

This is identical to what clang++ -std=c++20 -O2 generates.

Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>

	PR tree-optimization/103855

gcc/ChangeLog:

	* match.pd: Add pattern for (type)X / (type)Y.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/divide-8.c: New test.
	* gcc.dg/tree-ssa/divide-9.c: New test.
---
 gcc/match.pd                             | 15 +++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c |  9 +++++++++
 gcc/testsuite/gcc.dg/tree-ssa/divide-9.c | 10 ++++++++++
 3 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-9.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 10f62284862..393b43756dd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -684,6 +684,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   (convert (trunc_mod @0 @1))))
 
+/* (type)X / (type)Y -> (type)(X / Y)
+   when the resulting type is at least precise as the original types
+   and when all the types are unsigned integral types. */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+      && TYPE_UNSIGNED (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && TYPE_UNSIGNED (TREE_TYPE (@1))
+      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
+  (convert (trunc_div @0 @1))))
+
 /* x * (1 + y / x) - y -> x - y % x */
 (simplify
  (minus (mult:cs @0 (plus:s (trunc_div:s @1 @0) integer_onep)) @1)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
new file mode 100644
index 00000000000..489604c4eb6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-8.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/103855 */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned int f(unsigned int a, unsigned int b) {
+    unsigned long long all = a;
+    return all / b;
+}
+
+/* { dg-final { scan-tree-dump-not "\(unsigned long long int)" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
new file mode 100644
index 00000000000..3e75a49b509
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-9.c
@@ -0,0 +1,10 @@
+/* PR tree-optimization/103855 */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned long long f(unsigned int a, unsigned int b) {
+    unsigned long long all = a;
+    return all / b;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\(unsigned long long int\\\)" 1 "optimized" } } */
+
-- 
2.35.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-07-09 18:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22  3:57 [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y Zhao Wei Liew
2022-02-22  4:00 ` Zhao Wei Liew
2022-02-22  7:53   ` Richard Biener
2022-03-17  0:49     ` [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855] Zhao Wei Liew
2022-07-09 18:19       ` Jeff Law
  -- strict thread matches above, loose matches on Subject: below --
2022-02-19  9:36 [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y Zhao Wei Liew
2022-02-19 10:05 ` Zhao Wei Liew

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).