Hi Richard, Thank you so much for your detailed feedback. I am attaching another version of the patch which does include all the changes you mentioned. Bellow you can see my response to your feedbacks: > the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as > constants come last - you can then remove the 'c' Fixed. I was not aware of the canonical order. > you can use INTEGRAL_TYPE_P (type). Fixed. Didn't know about "type" either. > this test is not necessary Fixed. > But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So > it looks like a conflict with the x * (1 + b) <-> x + x * b transform > (fold_plusminus_mult)? That said, the special case of one > definitely makes the result cheaper (one less operation). For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant [1]. If it was possible to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was not able to make any operation more performant. Either direction, looks like same amount of computation. 1) https://compiler-explorer.com/z/dWsq7Tzf4 > Please move the pattern next to the most relatest which I think is Fixed. > the return value of 1 is an unreliable way to fail, please instead call > __builtin_abort (); Fixed. > do we really need -O3 for this? I think that -O should be enough here? We don't specifically need that. But I realized that the optimization can happen in two different level at the compiler. It seems if you spread the computation over multiple statement like: int c = a/b; int y = b * (1 + c); return y - a; instead of : return b * (1 + a / b) - a; Then you have to have at least -O1 to have it optimized. Granted, I am not doing that in the testcase. In the new patch I am changing it to -O. Let me know if you have any suggestions. Best wishes, Navid. ________________________________________ From: Richard Biener Sent: Tuesday, November 9, 2021 02:36 To: Navid Rahimi Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd On Tue, Nov 9, 2021 at 5:12 AM Navid Rahimi via Gcc-patches wrote: > > Hi GCC community, > > This patch will add the missed pattern described in bug 102232 [1] to the match.pd. The testcase will test whether the multiplication and division has been removed from the code or not. The correctness proof for this pattern is here [2] in case anyone is curious. > > PR tree-optimization/102232 > * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. +/* x * (1 + y / x) - y -> x - y % x */ +(simplify + (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1) the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as constants come last - you can then remove the 'c' + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) you can use INTEGRAL_TYPE_P (type). + && types_match (@0, @1)) this test is not necessary + (minus @0 (trunc_mod @1 @0)))) But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So it looks like a conflict with the x * (1 + b) <-> x + x * b transform (fold_plusminus_mult)? That said, the special case of one definitely makes the result cheaper (one less operation). Please move the pattern next to the most relatest which I think is /* X - (X / Y) * Y is the same as X % Y. */ (simplify (minus (convert1? @0) (convert2? (mult:c (trunc_div @@0 @@1) @1))) (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) (convert (trunc_mod @0 @1)))) +int +main (void) +{ + // few randomly generated test cases + if (foo (71856034, 238) != 212) + { + return 1; the return value of 1 is an unreliable way to fail, please instead call __builtin_abort (); +/* { dg-options "-O3 -fdump-tree-optimized" } */ do we really need -O3 for this? I think that -O should be enough here? Thanks, Richard. > * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. > > > 1) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D102232&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334697749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Br7B1ShCT4ynLwypyM29LqaNF9GBJF3%2BIR6sZrDRTKM%3D&reserved=0 > 2) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2F2VScjD&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334707741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FH0%2BwEHmaGjyclNFD2HrKMXtXDscPAlc9uUs3p6UMkw%3D&reserved=0 > > Best wishes, > Navid.