* [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd @ 2021-11-09 4:11 Navid Rahimi 2021-11-09 4:24 ` Navid Rahimi 2021-11-09 10:36 ` Richard Biener 0 siblings, 2 replies; 7+ messages in thread From: Navid Rahimi @ 2021-11-09 4:11 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 582 bytes --] Hi GCC community, This patch will add the missed pattern described in bug 102232 [1] to the match.pd. The testcase will test whether the multiplication and division has been removed from the code or not. The correctness proof for this pattern is here [2] in case anyone is curious. PR tree-optimization/102232 * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. 1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102232 2) https://alive2.llvm.org/ce/z/2VScjD Best wishes, Navid. [-- Attachment #2: 0001-PR-tree-optimization-102232.patch --] [-- Type: application/octet-stream, Size: 2454 bytes --] From 20dcbad29f4d41cce09ac799e9b5093167f3aaa0 Mon Sep 17 00:00:00 2001 From: Navid Rahimi <navidrahimi@microsoft.com> Date: Mon, 8 Nov 2021 13:57:19 -0800 Subject: [PATCH] PR tree-optimization/102232 * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. --- gcc/match.pd | 7 ++++ gcc/testsuite/gcc.dg/tree-ssa/pr102232.c | 52 ++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102232.c diff --git a/gcc/match.pd b/gcc/match.pd index 71cf6f9df0a..37c01e79d97 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1295,6 +1295,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_xor (bit_ior:c (bit_not @0) @1) (bit_ior:c @0 (bit_not @1))) (bit_xor @0 @1)) +/* x * (1 + y / x) - y -> x - y % x */ +(simplify + (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && types_match (@0, @1)) + (minus @0 (trunc_mod @1 @0)))) + /* ((x & y) - (x | y)) - 1 -> ~(x ^ y) */ (simplify (plus (nop_convert1? (minus@2 (nop_convert2? (bit_and:c @0 @1)) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c new file mode 100644 index 00000000000..63fe37c0936 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c @@ -0,0 +1,52 @@ +/* PR tree-optimization/102232 */ +/* { dg-do run } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +int __attribute__ ((noipa)) foo (int a, int b) +{ + return b * (1 + a / b) - a; +} + +int +main (void) +{ + // few randomly generated test cases + if (foo (71856034, 238) != 212) + { + return 1; + } + if (foo (71856034, 10909) != 1549) + { + return 1; + } + if (foo (20350, 1744) != 578) + { + return 1; + } + if (foo (444813, 88563) != 86565) + { + return 1; + } + if (foo (112237, 63004) != 13771) + { + return 1; + } + if (foo (68268386, 787116) != 210706) + { + return 1; + } + if (foo (-444813, 88563) != 90561) + { + return 1; + } + if (foo (-68268386, 787116) != 1363526) + { + return 1; + } + + return 0; +} + +/* Verify that multiplication and division has been removed. */ +/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */ +/* { dg-final { scan-tree-dump-not " / " "optimized" } } */ \ No newline at end of file -- 2.25.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd 2021-11-09 4:11 [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd Navid Rahimi @ 2021-11-09 4:24 ` Navid Rahimi 2021-11-09 10:36 ` Richard Biener 1 sibling, 0 replies; 7+ messages in thread From: Navid Rahimi @ 2021-11-09 4:24 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 1941 bytes --] Sorry, a last minute update via my editor changed the MIME type to application/octet-stream. You can find the patch with correct MIME (text/x-diff) attached to this email. (Same file and content, fixed MIME). Best wishes, Navid. ________________________________________ From: Gcc-patches <gcc-patches-bounces+navidrahimi=microsoft.com@gcc.gnu.org> on behalf of Navid Rahimi via Gcc-patches <gcc-patches@gcc.gnu.org> Sent: Monday, November 8, 2021 20:11 To: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd Hi GCC community, This patch will add the missed pattern described in bug 102232 [1] to the match.pd. The testcase will test whether the multiplication and division has been removed from the code or not. The correctness proof for this pattern is here [2] in case anyone is curious. PR tree-optimization/102232 * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. 1) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D102232&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cded0a3258b594204129608d9a3371763%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720279311390932%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AaXdpyn%2BNQ2BizC6FILJfL3B8WV84v5FuLuU1d8MJxU%3D&reserved=0 2) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2F2VScjD&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cded0a3258b594204129608d9a3371763%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720279311400927%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3zM4awjl2LxhsfMd1VAOjH4JQ9621Q7Mg61gEB2Su%2Bk%3D&reserved=0 Best wishes, Navid. [-- Attachment #2: 0001-PR-tree-optimization-102232.patch --] [-- Type: application/octet-stream, Size: 2454 bytes --] From 7c2abb0eab05766ab879066b000c13de827e3b3d Mon Sep 17 00:00:00 2001 From: Navid Rahimi <navidrahimi@microsoft.com> Date: Mon, 8 Nov 2021 13:57:19 -0800 Subject: [PATCH] PR tree-optimization/102232 * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. --- gcc/match.pd | 7 ++++ gcc/testsuite/gcc.dg/tree-ssa/pr102232.c | 52 ++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102232.c diff --git a/gcc/match.pd b/gcc/match.pd index 71cf6f9df0a..37c01e79d97 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1295,6 +1295,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_xor (bit_ior:c (bit_not @0) @1) (bit_ior:c @0 (bit_not @1))) (bit_xor @0 @1)) +/* x * (1 + y / x) - y -> x - y % x */ +(simplify + (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && types_match (@0, @1)) + (minus @0 (trunc_mod @1 @0)))) + /* ((x & y) - (x | y)) - 1 -> ~(x ^ y) */ (simplify (plus (nop_convert1? (minus@2 (nop_convert2? (bit_and:c @0 @1)) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c new file mode 100644 index 00000000000..e7485cf24e9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c @@ -0,0 +1,52 @@ +/* PR tree-optimization/102232 */ +/* { dg-do run } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +int __attribute__ ((noipa)) foo (int a, int b) +{ + return b * (1 + a / b) - a; +} + +int +main (void) +{ + // few randomly generated test cases + if (foo (71856034, 238) != 212) + { + return 1; + } + if (foo (71856034, 10909) != 1549) + { + return 1; + } + if (foo (20350, 1744) != 578) + { + return 1; + } + if (foo (444813, 88563) != 86565) + { + return 1; + } + if (foo (112237, 63004) != 13771) + { + return 1; + } + if (foo (68268386, 787116) != 210706) + { + return 1; + } + if (foo (-444813, 88563) != 90561) + { + return 1; + } + if (foo (-68268386, 787116) != 1363526) + { + return 1; + } + + return 0; +} + +/* Verify that multiplication and division has been removed. */ +/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */ +/* { dg-final { scan-tree-dump-not " / " "optimized" } } */ \ No newline at end of file -- 2.25.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd 2021-11-09 4:11 [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd Navid Rahimi 2021-11-09 4:24 ` Navid Rahimi @ 2021-11-09 10:36 ` Richard Biener 2021-11-09 16:25 ` [EXTERNAL] " Navid Rahimi 1 sibling, 1 reply; 7+ messages in thread From: Richard Biener @ 2021-11-09 10:36 UTC (permalink / raw) To: Navid Rahimi; +Cc: gcc-patches On Tue, Nov 9, 2021 at 5:12 AM Navid Rahimi via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi GCC community, > > This patch will add the missed pattern described in bug 102232 [1] to the match.pd. The testcase will test whether the multiplication and division has been removed from the code or not. The correctness proof for this pattern is here [2] in case anyone is curious. > > PR tree-optimization/102232 > * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. +/* x * (1 + y / x) - y -> x - y % x */ +(simplify + (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1) the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as constants come last - you can then remove the 'c' + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) you can use INTEGRAL_TYPE_P (type). + && types_match (@0, @1)) this test is not necessary + (minus @0 (trunc_mod @1 @0)))) But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So it looks like a conflict with the x * (1 + b) <-> x + x * b transform (fold_plusminus_mult)? That said, the special case of one definitely makes the result cheaper (one less operation). Please move the pattern next to the most relatest which I think is /* X - (X / Y) * Y is the same as X % Y. */ (simplify (minus (convert1? @0) (convert2? (mult:c (trunc_div @@0 @@1) @1))) (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) (convert (trunc_mod @0 @1)))) +int +main (void) +{ + // few randomly generated test cases + if (foo (71856034, 238) != 212) + { + return 1; the return value of 1 is an unreliable way to fail, please instead call __builtin_abort (); +/* { dg-options "-O3 -fdump-tree-optimized" } */ do we really need -O3 for this? I think that -O should be enough here? Thanks, Richard. > * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. > > > 1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102232 > 2) https://alive2.llvm.org/ce/z/2VScjD > > Best wishes, > Navid. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd 2021-11-09 10:36 ` Richard Biener @ 2021-11-09 16:25 ` Navid Rahimi 2021-11-10 8:35 ` Richard Biener 0 siblings, 1 reply; 7+ messages in thread From: Navid Rahimi @ 2021-11-09 16:25 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 5318 bytes --] Hi Richard, Thank you so much for your detailed feedback. I am attaching another version of the patch which does include all the changes you mentioned. Bellow you can see my response to your feedbacks: > the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as > constants come last - you can then remove the 'c' Fixed. I was not aware of the canonical order. > you can use INTEGRAL_TYPE_P (type). Fixed. Didn't know about "type" either. > this test is not necessary Fixed. > But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So > it looks like a conflict with the x * (1 + b) <-> x + x * b transform > (fold_plusminus_mult)? That said, the special case of one > definitely makes the result cheaper (one less operation). For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant [1]. If it was possible to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was not able to make any operation more performant. Either direction, looks like same amount of computation. 1) https://compiler-explorer.com/z/dWsq7Tzf4 > Please move the pattern next to the most relatest which I think is Fixed. > the return value of 1 is an unreliable way to fail, please instead call > __builtin_abort (); Fixed. > do we really need -O3 for this? I think that -O should be enough here? We don't specifically need that. But I realized that the optimization can happen in two different level at the compiler. It seems if you spread the computation over multiple statement like: int c = a/b; int y = b * (1 + c); return y - a; instead of : return b * (1 + a / b) - a; Then you have to have at least -O1 to have it optimized. Granted, I am not doing that in the testcase. In the new patch I am changing it to -O. Let me know if you have any suggestions. Best wishes, Navid. ________________________________________ From: Richard Biener <richard.guenther@gmail.com> Sent: Tuesday, November 9, 2021 02:36 To: Navid Rahimi Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd On Tue, Nov 9, 2021 at 5:12 AM Navid Rahimi via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi GCC community, > > This patch will add the missed pattern described in bug 102232 [1] to the match.pd. The testcase will test whether the multiplication and division has been removed from the code or not. The correctness proof for this pattern is here [2] in case anyone is curious. > > PR tree-optimization/102232 > * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. +/* x * (1 + y / x) - y -> x - y % x */ +(simplify + (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1) the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as constants come last - you can then remove the 'c' + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) you can use INTEGRAL_TYPE_P (type). + && types_match (@0, @1)) this test is not necessary + (minus @0 (trunc_mod @1 @0)))) But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So it looks like a conflict with the x * (1 + b) <-> x + x * b transform (fold_plusminus_mult)? That said, the special case of one definitely makes the result cheaper (one less operation). Please move the pattern next to the most relatest which I think is /* X - (X / Y) * Y is the same as X % Y. */ (simplify (minus (convert1? @0) (convert2? (mult:c (trunc_div @@0 @@1) @1))) (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) (convert (trunc_mod @0 @1)))) +int +main (void) +{ + // few randomly generated test cases + if (foo (71856034, 238) != 212) + { + return 1; the return value of 1 is an unreliable way to fail, please instead call __builtin_abort (); +/* { dg-options "-O3 -fdump-tree-optimized" } */ do we really need -O3 for this? I think that -O should be enough here? Thanks, Richard. > * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. > > > 1) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D102232&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334697749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Br7B1ShCT4ynLwypyM29LqaNF9GBJF3%2BIR6sZrDRTKM%3D&reserved=0 > 2) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2F2VScjD&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334707741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FH0%2BwEHmaGjyclNFD2HrKMXtXDscPAlc9uUs3p6UMkw%3D&reserved=0 > > Best wishes, > Navid. [-- Attachment #2: 0001-PR-tree-optimization-102232.patch --] [-- Type: application/octet-stream, Size: 2563 bytes --] From 3d7355450f018898e50c4a5c9c1951c93b8224ae Mon Sep 17 00:00:00 2001 From: Navid Rahimi <navidrahimi@microsoft.com> Date: Mon, 8 Nov 2021 13:57:19 -0800 Subject: [PATCH] PR tree-optimization/102232 * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. --- gcc/match.pd | 6 +++ gcc/testsuite/gcc.dg/tree-ssa/pr102232.c | 52 ++++++++++++++++++++++++ 2 files changed, 58 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102232.c diff --git a/gcc/match.pd b/gcc/match.pd index 71cf6f9df0a..c91fff46936 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -608,6 +608,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) (convert (trunc_mod @0 @1)))) +/* x * (1 + y / x) - y -> x - y % x */ +(simplify + (minus (mult:cs @0 (plus:s (trunc_div:s @1 @0) integer_onep)) @1) + (if (INTEGRAL_TYPE_P (type)) + (minus @0 (trunc_mod @1 @0)))) + /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR, i.e. "X % C" into "X & (C - 1)", if X and C are positive. Also optimize A % (C << N) where C is a power of 2, diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c new file mode 100644 index 00000000000..62bca6922ab --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c @@ -0,0 +1,52 @@ +/* PR tree-optimization/102232 */ +/* { dg-do run } */ +/* { dg-options "-O -fdump-tree-optimized" } */ + +int __attribute__ ((noipa)) foo (int a, int b) +{ + return b * (1 + a / b) - a; +} + +int +main (void) +{ + // few randomly generated test cases + if (foo (71856034, 238) != 212) + { + __builtin_abort (); + } + if (foo (71856034, 10909) != 1549) + { + __builtin_abort (); + } + if (foo (20350, 1744) != 578) + { + __builtin_abort (); + } + if (foo (444813, 88563) != 86565) + { + __builtin_abort (); + } + if (foo (112237, 63004) != 13771) + { + __builtin_abort (); + } + if (foo (68268386, 787116) != 210706) + { + __builtin_abort (); + } + if (foo (-444813, 88563) != 90561) + { + __builtin_abort (); + } + if (foo (-68268386, 787116) != 1363526) + { + __builtin_abort (); + } + + return 0; +} + +/* Verify that multiplication and division has been removed. */ +/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */ +/* { dg-final { scan-tree-dump-not " / " "optimized" } } */ \ No newline at end of file -- 2.25.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd 2021-11-09 16:25 ` [EXTERNAL] " Navid Rahimi @ 2021-11-10 8:35 ` Richard Biener 2021-11-23 3:09 ` Jeff Law 0 siblings, 1 reply; 7+ messages in thread From: Richard Biener @ 2021-11-10 8:35 UTC (permalink / raw) To: Navid Rahimi; +Cc: gcc-patches On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <navidrahimi@microsoft.com> wrote: > > Hi Richard, > > Thank you so much for your detailed feedback. I am attaching another version of the patch which does include all the changes you mentioned. > > Bellow you can see my response to your feedbacks: > > > the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as > > constants come last - you can then remove the 'c' > Fixed. I was not aware of the canonical order. > > > you can use INTEGRAL_TYPE_P (type). > Fixed. Didn't know about "type" either. > > > this test is not necessary > Fixed. > > > But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So > > it looks like a conflict with the x * (1 + b) <-> x + x * b transform > > (fold_plusminus_mult)? That said, the special case of one > > definitely makes the result cheaper (one less operation). > For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant [1]. If it was possible to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was not able to make any operation more performant. Either direction, looks like same amount of computation. > > 1) https://compiler-explorer.com/z/dWsq7Tzf4 > > > Please move the pattern next to the most relatest which I think is > Fixed. > > > the return value of 1 is an unreliable way to fail, please instead call > > __builtin_abort (); > Fixed. > > > do we really need -O3 for this? I think that -O should be enough here? > We don't specifically need that. But I realized that the optimization can happen in two different level at the compiler. It seems if you spread the computation over multiple statement like: > int c = a/b; > int y = b * (1 + c); > return y - a; > > instead of : > return b * (1 + a / b) - a; > > Then you have to have at least -O1 to have it optimized. Granted, I am not doing that in the testcase. In the new patch I am changing it to -O. Let me know if you have any suggestions. -O is fine, generally at -O0 we shouldn't expect such transforms to happen (but they still do, of course). The patch looks OK now. Thanks, Richard. > > Best wishes, > Navid. > > ________________________________________ > From: Richard Biener <richard.guenther@gmail.com> > Sent: Tuesday, November 9, 2021 02:36 > To: Navid Rahimi > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd > > On Tue, Nov 9, 2021 at 5:12 AM Navid Rahimi via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi GCC community, > > > > This patch will add the missed pattern described in bug 102232 [1] to the match.pd. The testcase will test whether the multiplication and division has been removed from the code or not. The correctness proof for this pattern is here [2] in case anyone is curious. > > > > PR tree-optimization/102232 > > * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. > > +/* x * (1 + y / x) - y -> x - y % x */ > +(simplify > + (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1) > > the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as > constants come last - you can then remove the 'c' > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > you can use INTEGRAL_TYPE_P (type). > > + && types_match (@0, @1)) > > this test is not necessary > > + (minus @0 (trunc_mod @1 @0)))) > > But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So > it looks like a conflict with the x * (1 + b) <-> x + x * b transform > (fold_plusminus_mult)? That said, the special case of one > definitely makes the result cheaper (one less operation). > > Please move the pattern next to the most relatest which I think is > > /* X - (X / Y) * Y is the same as X % Y. */ > (simplify > (minus (convert1? @0) (convert2? (mult:c (trunc_div @@0 @@1) @1))) > (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > (convert (trunc_mod @0 @1)))) > > +int > +main (void) > +{ > + // few randomly generated test cases > + if (foo (71856034, 238) != 212) > + { > + return 1; > > the return value of 1 is an unreliable way to fail, please instead call > __builtin_abort (); > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > do we really need -O3 for this? I think that -O should be enough here? > > Thanks, > Richard. > > > * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. > > > > > > 1) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D102232&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334697749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Br7B1ShCT4ynLwypyM29LqaNF9GBJF3%2BIR6sZrDRTKM%3D&reserved=0 > > 2) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2F2VScjD&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334707741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FH0%2BwEHmaGjyclNFD2HrKMXtXDscPAlc9uUs3p6UMkw%3D&reserved=0 > > > > Best wishes, > > Navid. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd 2021-11-10 8:35 ` Richard Biener @ 2021-11-23 3:09 ` Jeff Law 2021-11-23 3:10 ` Navid Rahimi 0 siblings, 1 reply; 7+ messages in thread From: Jeff Law @ 2021-11-23 3:09 UTC (permalink / raw) To: Richard Biener, Navid Rahimi; +Cc: gcc-patches On 11/10/2021 1:35 AM, Richard Biener via Gcc-patches wrote: > On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <navidrahimi@microsoft.com> wrote: >> Hi Richard, >> >> Thank you so much for your detailed feedback. I am attaching another version of the patch which does include all the changes you mentioned. >> >> Bellow you can see my response to your feedbacks: >> >>> the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as >>> constants come last - you can then remove the 'c' >> Fixed. I was not aware of the canonical order. >> >>> you can use INTEGRAL_TYPE_P (type). >> Fixed. Didn't know about "type" either. >> >>> this test is not necessary >> Fixed. >> >>> But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So >>> it looks like a conflict with the x * (1 + b) <-> x + x * b transform >>> (fold_plusminus_mult)? That said, the special case of one >>> definitely makes the result cheaper (one less operation). >> For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant [1]. If it was possible to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was not able to make any operation more performant. Either direction, looks like same amount of computation. >> >> 1) https://compiler-explorer.com/z/dWsq7Tzf4 >> >>> Please move the pattern next to the most relatest which I think is >> Fixed. >> >>> the return value of 1 is an unreliable way to fail, please instead call >>> __builtin_abort (); >> Fixed. >> >>> do we really need -O3 for this? I think that -O should be enough here? >> We don't specifically need that. But I realized that the optimization can happen in two different level at the compiler. It seems if you spread the computation over multiple statement like: >> int c = a/b; >> int y = b * (1 + c); >> return y - a; >> >> instead of : >> return b * (1 + a / b) - a; >> >> Then you have to have at least -O1 to have it optimized. Granted, I am not doing that in the testcase. In the new patch I am changing it to -O. Let me know if you have any suggestions. > -O is fine, generally at -O0 we shouldn't expect such transforms to > happen (but they still do, of course). > > The patch looks OK now. I've pushed this to the trunk for Navid. jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd 2021-11-23 3:09 ` Jeff Law @ 2021-11-23 3:10 ` Navid Rahimi 0 siblings, 0 replies; 7+ messages in thread From: Navid Rahimi @ 2021-11-23 3:10 UTC (permalink / raw) To: Jeff Law, Richard Biener; +Cc: gcc-patches Thanks Jeff for this too. Best wishes, Navid. ________________________________________ From: Jeff Law <jeffreyalaw@gmail.com> Sent: Monday, November 22, 2021 19:09 To: Richard Biener; Navid Rahimi Cc: gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd [You don't often get email from jeffreyalaw@gmail.com. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] On 11/10/2021 1:35 AM, Richard Biener via Gcc-patches wrote: > On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <navidrahimi@microsoft.com> wrote: >> Hi Richard, >> >> Thank you so much for your detailed feedback. I am attaching another version of the patch which does include all the changes you mentioned. >> >> Bellow you can see my response to your feedbacks: >> >>> the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as >>> constants come last - you can then remove the 'c' >> Fixed. I was not aware of the canonical order. >> >>> you can use INTEGRAL_TYPE_P (type). >> Fixed. Didn't know about "type" either. >> >>> this test is not necessary >> Fixed. >> >>> But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So >>> it looks like a conflict with the x * (1 + b) <-> x + x * b transform >>> (fold_plusminus_mult)? That said, the special case of one >>> definitely makes the result cheaper (one less operation). >> For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant [1]. If it was possible to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was not able to make any operation more performant. Either direction, looks like same amount of computation. >> >> 1) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcompiler-explorer.com%2Fz%2FdWsq7Tzf4&data=04%7C01%7Cnavidrahimi%40microsoft.com%7C1ddd355d86ee4a5d645808d9ae2e9e73%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732337521412439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=K4ecZcigYkh8%2BSiqDegRxm72gkL%2FnsbuDDp5nsM3ZqA%3D&reserved=0 >> >>> Please move the pattern next to the most relatest which I think is >> Fixed. >> >>> the return value of 1 is an unreliable way to fail, please instead call >>> __builtin_abort (); >> Fixed. >> >>> do we really need -O3 for this? I think that -O should be enough here? >> We don't specifically need that. But I realized that the optimization can happen in two different level at the compiler. It seems if you spread the computation over multiple statement like: >> int c = a/b; >> int y = b * (1 + c); >> return y - a; >> >> instead of : >> return b * (1 + a / b) - a; >> >> Then you have to have at least -O1 to have it optimized. Granted, I am not doing that in the testcase. In the new patch I am changing it to -O. Let me know if you have any suggestions. > -O is fine, generally at -O0 we shouldn't expect such transforms to > happen (but they still do, of course). > > The patch looks OK now. I've pushed this to the trunk for Navid. jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-11-23 3:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-09 4:11 [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd Navid Rahimi 2021-11-09 4:24 ` Navid Rahimi 2021-11-09 10:36 ` Richard Biener 2021-11-09 16:25 ` [EXTERNAL] " Navid Rahimi 2021-11-10 8:35 ` Richard Biener 2021-11-23 3:09 ` Jeff Law 2021-11-23 3:10 ` Navid Rahimi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).