* [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding. @ 2023-01-13 21:23 Andrew MacLeod 2023-01-13 21:54 ` Jakub Jelinek 0 siblings, 1 reply; 6+ messages in thread From: Andrew MacLeod @ 2023-01-13 21:23 UTC (permalink / raw) To: gcc-patches; +Cc: hernandez, aldy [-- Attachment #1: Type: text/plain, Size: 1231 bytes --] fold_range() already invokes wi_fold_in_parts to try to get more refined information. If the subranges are quite small, it will do each individual calculation and combine the results. x * y with x = [1,3] and y = [1,3] is broken down and we calculate each possibility and we end up with [1,4][6,6][9,9] instead of [1,9] We limit this as the time is between quadratic to exponential depending on the number of elements in x and y. If we also check the relation and determine that x == y, we don't need to worry about that growth as this process is linear. The above case will be broken down to just 1*1, 2*2 and 3*3, resulting in a range of [1, 1][4,4][9,9]. In the testcase, it happens to be the right_shift operation, but this solution is generic and applies to all range-op operations. I added a testcase which checks >>, *, + and %. I also arbitrarily chose 8 elements as the limit for breaking down individual operations. The overall compile time change for this is negligible. Although this is a regression fix, it will affect all operations where x == y, which is where my initial hesitancy arose. Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk? Andrew [-- Attachment #2: 0002-Utilize-op1-op2-when-invoking-range-ops-folding.patch --] [-- Type: text/x-patch, Size: 5107 bytes --] From fd50dabc626cea1886ebb517ca24c8b8f199c3aa Mon Sep 17 00:00:00 2001 From: Andrew MacLeod <amacleod@redhat.com> Date: Wed, 11 Jan 2023 18:12:51 -0500 Subject: [PATCH 2/3] Utilize op1 == op2 when invoking range-ops folding. If there exists an equivalence relationship between op1 and op2, any binary operation can be broken into individual operations and unioned if there are sufficently few elements in the set. PR tree-optimization/108359 gcc/ * range-op.cc (range_operator::wi_fold_in_parts_equiv): New. (range_operator::fold_range): If op1 is equivalent to op2 then invoke new fold_in_parts_equiv to operate on sub-components. * range-op.h (wi_fold_in_parts_equiv): New prototype. gcc/testsuite/ * gcc.dg/pr108359.c: New. --- gcc/range-op.cc | 50 +++++++++++++++++++++++++++++++++ gcc/range-op.h | 5 ++++ gcc/testsuite/gcc.dg/pr108359.c | 50 +++++++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr108359.c diff --git a/gcc/range-op.cc b/gcc/range-op.cc index ec75e07bc8a..2cb2c1344f1 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -160,6 +160,36 @@ range_operator::wi_fold (irange &r, tree type, r.set_varying (type); } +// Call wi_fold when both op1 and op2 are equivalent. Further split small +// subranges into constants. This can provide better precision. +// For x + y, when x == y with a range of [0,4] instead of [0, 8] produce +// [0,0][2, 2][4,4][6, 6][8, 8] + +void +range_operator::wi_fold_in_parts_equiv (irange &r, tree type, + const wide_int &lh_lb, + const wide_int &lh_ub) const +{ + int_range_max tmp; + widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)), + widest_int::from (lh_lb, TYPE_SIGN (type))); + // if there are 1 to 8 values in the LH range, split them up. + r.set_undefined (); + if (lh_range >= 0 && lh_range <= 7) + { + unsigned x; + for (x = 0; x <= lh_range; x++) + { + wide_int val = lh_lb + x; + wi_fold (tmp, type, val, val, val, val); + r.union_ (tmp); + } + } + // Otherwise just call wi_fold. + else + wi_fold (r, type, lh_lb, lh_ub, lh_lb, lh_ub); +} + // Call wi_fold, except further split small subranges into constants. // This can provide better precision. For something 8 >> [0,1] // Instead of [8, 16], we will produce [8,8][16,16] @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type, unsigned num_lh = lh.num_pairs (); unsigned num_rh = rh.num_pairs (); + // If op1 and op2 are equivalences, then we don't need a complete cross + // product, just pairs of matching elements. + if (relation_equiv_p (rel) && (lh == rh)) + { + int_range_max tmp; + r.set_undefined (); + for (unsigned x = 0; x < num_lh; ++x) + { + wide_int lh_lb = lh.lower_bound (x); + wide_int lh_ub = lh.upper_bound (x); + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); + r.union_ (tmp); + if (r.varying_p ()) + break; + } + op1_op2_relation_effect (r, type, lh, rh, rel); + update_known_bitmask (r, m_code, lh, rh); + return true; + } + // If both ranges are single pairs, fold directly into the result range. // If the number of subranges grows too high, produce a summary result as the // loop becomes exponential with little benefit. See PR 103821. diff --git a/gcc/range-op.h b/gcc/range-op.h index b7b8a3b9473..998aeedb0d9 100644 --- a/gcc/range-op.h +++ b/gcc/range-op.h @@ -109,6 +109,11 @@ protected: const wide_int &rh_lb, const wide_int &rh_ub) const; + // Called by fold range to split small subranges into parts when op1 == op2 + void wi_fold_in_parts_equiv (irange &r, tree type, + const wide_int &lb, + const wide_int &ub) const; + // Tree code of the range operator or ERROR_MARK if unknown. tree_code m_code; }; diff --git a/gcc/testsuite/gcc.dg/pr108359.c b/gcc/testsuite/gcc.dg/pr108359.c new file mode 100644 index 00000000000..00fd2de6dc7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr108359.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +/* PR test case. */ +int b = 10; +int c; +char e; +void foo(); +static char(a)(char f, char g) { return f && g == 1 ? 0 : f % g; } +short(d)(short f, short g) { return f * g; } +int main() { + short h; + int i; + unsigned j; + h = d(b && c, 5); + j = h; + i = a(h, 237); + unsigned k = i; + e = i < 0 || k >= 32 ? 0 : i >> k; + if (e) { + c = 0; + foo(); + } +} + + +/* Also Check that small ranges are broken down and optimized properly + in the egneric case. This function should always return 0. */ + +int otherfunc (int x, int z) { + if (x < 0 || x > 6 ) + return 0; + + if (x == z) + { + if (x >> z > 0) + return 1; + if (x * z > 26 && x * z < 35) + return 1; + if (x + z == 5) + return 1; + if ((x + z) % 2 == 1) + return 1; + } + return 0; +} + + +/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "return 1" "optimized" } } */ -- 2.38.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding. 2023-01-13 21:23 [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding Andrew MacLeod @ 2023-01-13 21:54 ` Jakub Jelinek 2023-01-13 22:07 ` Andrew MacLeod 0 siblings, 1 reply; 6+ messages in thread From: Jakub Jelinek @ 2023-01-13 21:54 UTC (permalink / raw) To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote: > fold_range() already invokes wi_fold_in_parts to try to get more refined > information. If the subranges are quite small, it will do each individual > calculation and combine the results. > > x * y with x = [1,3] and y = [1,3] is broken down and we calculate each > possibility and we end up with [1,4][6,6][9,9] instead of [1,9] > > We limit this as the time is between quadratic to exponential depending on > the number of elements in x and y. > > If we also check the relation and determine that x == y, we don't need to > worry about that growth as this process is linear. The above case will be > broken down to just 1*1, 2*2 and 3*3, resulting in a range of [1, > 1][4,4][9,9]. > > In the testcase, it happens to be the right_shift operation, but this > solution is generic and applies to all range-op operations. I added a > testcase which checks >>, *, + and %. > > I also arbitrarily chose 8 elements as the limit for breaking down > individual operations. The overall compile time change for this is > negligible. > > Although this is a regression fix, it will affect all operations where x == > y, which is where my initial hesitancy arose. > > Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for > trunk? Will defer to Aldy, just some nits. > + // if there are 1 to 8 values in the LH range, split them up. > + r.set_undefined (); > + if (lh_range >= 0 && lh_range <= 7) > + { > + unsigned x; > + for (x = 0; x <= lh_range; x++) Nothing uses x after the loop, so why not for (unsigned x = 0; x <= lh_range; x++) instead? > @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type, > unsigned num_lh = lh.num_pairs (); > unsigned num_rh = rh.num_pairs (); > > + // If op1 and op2 are equivalences, then we don't need a complete cross > + // product, just pairs of matching elements. > + if (relation_equiv_p (rel) && (lh == rh)) The ()s around lh == rh look superfluous to me. > + { > + int_range_max tmp; > + r.set_undefined (); > + for (unsigned x = 0; x < num_lh; ++x) fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something like that be there for this case too? I mean, every wi_fold_in_parts_equiv can result in 8 subranges, but num_lh could be up to 255 here, it is true it is linear and union_ should merge excess ones, but still I wonder if some larger num_lh upper bound like 20 or 32 wouldn't be useful. Up to you... > + { > + wide_int lh_lb = lh.lower_bound (x); > + wide_int lh_ub = lh.upper_bound (x); > + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); > + r.union_ (tmp); > + if (r.varying_p ()) > + break; > + } > + op1_op2_relation_effect (r, type, lh, rh, rel); > + update_known_bitmask (r, m_code, lh, rh); > + return true; > + } > + Jakub ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding. 2023-01-13 21:54 ` Jakub Jelinek @ 2023-01-13 22:07 ` Andrew MacLeod 2023-01-16 7:19 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Andrew MacLeod @ 2023-01-13 22:07 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, hernandez, aldy, Richard Biener [-- Attachment #1: Type: text/plain, Size: 3505 bytes --] On 1/13/23 16:54, Jakub Jelinek wrote: > On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote: >> fold_range() already invokes wi_fold_in_parts to try to get more refined >> information. If the subranges are quite small, it will do each individual >> calculation and combine the results. >> >> x * y with x = [1,3] and y = [1,3] is broken down and we calculate each >> possibility and we end up with [1,4][6,6][9,9] instead of [1,9] >> >> We limit this as the time is between quadratic to exponential depending on >> the number of elements in x and y. >> >> If we also check the relation and determine that x == y, we don't need to >> worry about that growth as this process is linear. The above case will be >> broken down to just 1*1, 2*2 and 3*3, resulting in a range of [1, >> 1][4,4][9,9]. >> >> In the testcase, it happens to be the right_shift operation, but this >> solution is generic and applies to all range-op operations. I added a >> testcase which checks >>, *, + and %. >> >> I also arbitrarily chose 8 elements as the limit for breaking down >> individual operations. The overall compile time change for this is >> negligible. >> >> Although this is a regression fix, it will affect all operations where x == >> y, which is where my initial hesitancy arose. >> >> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for >> trunk? > Will defer to Aldy, just some nits. Did you mean Richi? > >> + // if there are 1 to 8 values in the LH range, split them up. >> + r.set_undefined (); >> + if (lh_range >= 0 && lh_range <= 7) >> + { >> + unsigned x; >> + for (x = 0; x <= lh_range; x++) > Nothing uses x after the loop, so why not > for (unsigned x = 0; x <= lh_range; x++) > instead? Just old habits. >> @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type, >> unsigned num_lh = lh.num_pairs (); >> unsigned num_rh = rh.num_pairs (); >> >> + // If op1 and op2 are equivalences, then we don't need a complete cross >> + // product, just pairs of matching elements. >> + if (relation_equiv_p (rel) && (lh == rh)) > The ()s around lh == rh look superfluous to me. Yeah I just found it marginally more readable, but it is superfluous >> + { >> + int_range_max tmp; >> + r.set_undefined (); >> + for (unsigned x = 0; x < num_lh; ++x) > fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something > like that be there for this case too? > I mean, every wi_fold_in_parts_equiv can result in 8 subranges, > but num_lh could be up to 255 here, it is true it is linear and union_ > should merge excess ones, but still I wonder if some larger num_lh upper > bound like 20 or 32 wouldn't be useful. Up to you... fold_range has the num_lh * num_rh limit because it was quadratic/exponential and changes rapidly. Since this was linear based on the number of sub ranges I didn't think it would matter much, but sure, we can put a similar limit on it.. 16 seems reasonable. >> + { >> + wide_int lh_lb = lh.lower_bound (x); >> + wide_int lh_ub = lh.upper_bound (x); >> + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); >> + r.union_ (tmp); >> + if (r.varying_p ()) >> + break; >> + } >> + op1_op2_relation_effect (r, type, lh, rh, rel); >> + update_known_bitmask (r, m_code, lh, rh); >> + return true; >> + } >> + > Jakub > Updated patch attached... I'll run it through testing whe the current one is done. Andrew [-- Attachment #2: 0002-Utilize-op1-op2-when-invoking-range-ops-folding.patch --] [-- Type: text/x-patch, Size: 5110 bytes --] From 42010868ff3cdbb5b9ad3484f115b7c23f9e14e7 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod <amacleod@redhat.com> Date: Wed, 11 Jan 2023 18:12:51 -0500 Subject: [PATCH 2/3] Utilize op1 == op2 when invoking range-ops folding. If there exists an equivalence relationship between op1 and op2, any binary operation can be broken into individual operations and unioned if there are sufficently few elements in the set. PR tree-optimization/108359 gcc/ * range-op.cc (range_operator::wi_fold_in_parts_equiv): New. (range_operator::fold_range): If op1 is equivalent to op2 then invoke new fold_in_parts_equiv to operate on sub-components. * range-op.h (wi_fold_in_parts_equiv): New prototype. gcc/testsuite/ * gcc.dg/pr108359.c: New. --- gcc/range-op.cc | 49 ++++++++++++++++++++++++++++++++ gcc/range-op.h | 5 ++++ gcc/testsuite/gcc.dg/pr108359.c | 50 +++++++++++++++++++++++++++++++++ 3 files changed, 104 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr108359.c diff --git a/gcc/range-op.cc b/gcc/range-op.cc index ec75e07bc8a..33bc4dcb4b4 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -160,6 +160,35 @@ range_operator::wi_fold (irange &r, tree type, r.set_varying (type); } +// Call wi_fold when both op1 and op2 are equivalent. Further split small +// subranges into constants. This can provide better precision. +// For x + y, when x == y with a range of [0,4] instead of [0, 8] produce +// [0,0][2, 2][4,4][6, 6][8, 8] + +void +range_operator::wi_fold_in_parts_equiv (irange &r, tree type, + const wide_int &lh_lb, + const wide_int &lh_ub) const +{ + int_range_max tmp; + widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)), + widest_int::from (lh_lb, TYPE_SIGN (type))); + // if there are 1 to 8 values in the LH range, split them up. + r.set_undefined (); + if (lh_range >= 0 && lh_range <= 7) + { + for (unsigned x = 0; x <= lh_range; x++) + { + wide_int val = lh_lb + x; + wi_fold (tmp, type, val, val, val, val); + r.union_ (tmp); + } + } + // Otherwise just call wi_fold. + else + wi_fold (r, type, lh_lb, lh_ub, lh_lb, lh_ub); +} + // Call wi_fold, except further split small subranges into constants. // This can provide better precision. For something 8 >> [0,1] // Instead of [8, 16], we will produce [8,8][16,16] @@ -234,6 +263,26 @@ range_operator::fold_range (irange &r, tree type, unsigned num_lh = lh.num_pairs (); unsigned num_rh = rh.num_pairs (); + // If op1 and op2 are equivalences, then we don't need a complete cross + // product, just pairs of matching elements. + if (relation_equiv_p (rel) && lh == rh && num_lh <= 16) + { + int_range_max tmp; + r.set_undefined (); + for (unsigned x = 0; x < num_lh; ++x) + { + wide_int lh_lb = lh.lower_bound (x); + wide_int lh_ub = lh.upper_bound (x); + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); + r.union_ (tmp); + if (r.varying_p ()) + break; + } + op1_op2_relation_effect (r, type, lh, rh, rel); + update_known_bitmask (r, m_code, lh, rh); + return true; + } + // If both ranges are single pairs, fold directly into the result range. // If the number of subranges grows too high, produce a summary result as the // loop becomes exponential with little benefit. See PR 103821. diff --git a/gcc/range-op.h b/gcc/range-op.h index b7b8a3b9473..998aeedb0d9 100644 --- a/gcc/range-op.h +++ b/gcc/range-op.h @@ -109,6 +109,11 @@ protected: const wide_int &rh_lb, const wide_int &rh_ub) const; + // Called by fold range to split small subranges into parts when op1 == op2 + void wi_fold_in_parts_equiv (irange &r, tree type, + const wide_int &lb, + const wide_int &ub) const; + // Tree code of the range operator or ERROR_MARK if unknown. tree_code m_code; }; diff --git a/gcc/testsuite/gcc.dg/pr108359.c b/gcc/testsuite/gcc.dg/pr108359.c new file mode 100644 index 00000000000..00fd2de6dc7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr108359.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +/* PR test case. */ +int b = 10; +int c; +char e; +void foo(); +static char(a)(char f, char g) { return f && g == 1 ? 0 : f % g; } +short(d)(short f, short g) { return f * g; } +int main() { + short h; + int i; + unsigned j; + h = d(b && c, 5); + j = h; + i = a(h, 237); + unsigned k = i; + e = i < 0 || k >= 32 ? 0 : i >> k; + if (e) { + c = 0; + foo(); + } +} + + +/* Also Check that small ranges are broken down and optimized properly + in the egneric case. This function should always return 0. */ + +int otherfunc (int x, int z) { + if (x < 0 || x > 6 ) + return 0; + + if (x == z) + { + if (x >> z > 0) + return 1; + if (x * z > 26 && x * z < 35) + return 1; + if (x + z == 5) + return 1; + if ((x + z) % 2 == 1) + return 1; + } + return 0; +} + + +/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "return 1" "optimized" } } */ -- 2.38.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding. 2023-01-13 22:07 ` Andrew MacLeod @ 2023-01-16 7:19 ` Richard Biener 2023-01-16 7:32 ` Aldy Hernandez 2023-01-16 8:32 ` Jakub Jelinek 0 siblings, 2 replies; 6+ messages in thread From: Richard Biener @ 2023-01-16 7:19 UTC (permalink / raw) To: Andrew MacLeod; +Cc: Jakub Jelinek, gcc-patches, hernandez, aldy On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > > On 1/13/23 16:54, Jakub Jelinek wrote: > > On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote: > >> fold_range() already invokes wi_fold_in_parts to try to get more refined > >> information. If the subranges are quite small, it will do each individual > >> calculation and combine the results. > >> > >> x * y with x = [1,3] and y = [1,3] is broken down and we calculate each > >> possibility and we end up with [1,4][6,6][9,9] instead of [1,9] > >> > >> We limit this as the time is between quadratic to exponential depending on > >> the number of elements in x and y. > >> > >> If we also check the relation and determine that x == y, we don't need to > >> worry about that growth as this process is linear. The above case will be > >> broken down to just 1*1, 2*2 and 3*3, resulting in a range of [1, > >> 1][4,4][9,9]. > >> > >> In the testcase, it happens to be the right_shift operation, but this > >> solution is generic and applies to all range-op operations. I added a > >> testcase which checks >>, *, + and %. > >> > >> I also arbitrarily chose 8 elements as the limit for breaking down > >> individual operations. The overall compile time change for this is > >> negligible. > >> > >> Although this is a regression fix, it will affect all operations where x == > >> y, which is where my initial hesitancy arose. > >> > >> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for > >> trunk? > > Will defer to Aldy, just some nits. > Did you mean Richi? I suppose Aldy, since it's a regression fix it's OK even during stage4. I do have a comment as well though, you do + // If op1 and op2 are equivalences, then we don't need a complete cross + // product, just pairs of matching elements. + if (relation_equiv_p (rel) && lh == rh && num_lh <= 16) + { + int_range_max tmp; + r.set_undefined (); + for (unsigned x = 0; x < num_lh; ++x) + { + wide_int lh_lb = lh.lower_bound (x); + wide_int lh_ub = lh.upper_bound (x); + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); and that does + widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)), + widest_int::from (lh_lb, TYPE_SIGN (type))); + // if there are 1 to 8 values in the LH range, split them up. + r.set_undefined (); + if (lh_range >= 0 && lh_range <= 7) + { + for (unsigned x = 0; x <= lh_range; x++) which in total limits the number of sub-ranges in the output but in an odd way. It's also all-or-nothing. IIRC there's a hard limit on the number of sub-ranges in the output anyway via int_range<N>, so why not use that and always do the first loop over the sub-ranges of the inputs and the second loop over the range members but stop when we reach N-1 and then use wi_fold on the remainds? Your above code suggests we go up to 112 sub-ranges and once we'd reach 113 we'd fold down to a single. Maybe my "heuristic" wouldn't be much better, but then somehow breaking the heuristic down to a single magic number would be better, esp. since .union_ will undo some of the breakup when reaching N? > > > >> + // if there are 1 to 8 values in the LH range, split them up. > >> + r.set_undefined (); > >> + if (lh_range >= 0 && lh_range <= 7) > >> + { > >> + unsigned x; > >> + for (x = 0; x <= lh_range; x++) > > Nothing uses x after the loop, so why not > > for (unsigned x = 0; x <= lh_range; x++) > > instead? > > Just old habits. > > > >> @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type, > >> unsigned num_lh = lh.num_pairs (); > >> unsigned num_rh = rh.num_pairs (); > >> > >> + // If op1 and op2 are equivalences, then we don't need a complete cross > >> + // product, just pairs of matching elements. > >> + if (relation_equiv_p (rel) && (lh == rh)) > > The ()s around lh == rh look superfluous to me. > Yeah I just found it marginally more readable, but it is superfluous > >> + { > >> + int_range_max tmp; > >> + r.set_undefined (); > >> + for (unsigned x = 0; x < num_lh; ++x) > > fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something > > like that be there for this case too? > > I mean, every wi_fold_in_parts_equiv can result in 8 subranges, > > but num_lh could be up to 255 here, it is true it is linear and union_ > > should merge excess ones, but still I wonder if some larger num_lh upper > > bound like 20 or 32 wouldn't be useful. Up to you... > fold_range has the num_lh * num_rh limit because it was > quadratic/exponential and changes rapidly. Since this was linear based > on the number of sub ranges I didn't think it would matter much, but > sure, we can put a similar limit on it.. 16 seems reasonable. > >> + { > >> + wide_int lh_lb = lh.lower_bound (x); > >> + wide_int lh_ub = lh.upper_bound (x); > >> + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); > >> + r.union_ (tmp); > >> + if (r.varying_p ()) > >> + break; > >> + } > >> + op1_op2_relation_effect (r, type, lh, rh, rel); > >> + update_known_bitmask (r, m_code, lh, rh); > >> + return true; > >> + } > >> + > > Jakub > > > Updated patch attached... I'll run it through testing whe the current > one is done. > > > Andrew ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding. 2023-01-16 7:19 ` Richard Biener @ 2023-01-16 7:32 ` Aldy Hernandez 2023-01-16 8:32 ` Jakub Jelinek 1 sibling, 0 replies; 6+ messages in thread From: Aldy Hernandez @ 2023-01-16 7:32 UTC (permalink / raw) To: Richard Biener, Andrew MacLeod; +Cc: Jakub Jelinek, gcc-patches On 1/16/23 08:19, Richard Biener wrote: > On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod <amacleod@redhat.com> wrote: >> >> >> On 1/13/23 16:54, Jakub Jelinek wrote: >>> On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote: >>>> fold_range() already invokes wi_fold_in_parts to try to get more refined >>>> information. If the subranges are quite small, it will do each individual >>>> calculation and combine the results. >>>> >>>> x * y with x = [1,3] and y = [1,3] is broken down and we calculate each >>>> possibility and we end up with [1,4][6,6][9,9] instead of [1,9] >>>> >>>> We limit this as the time is between quadratic to exponential depending on >>>> the number of elements in x and y. >>>> >>>> If we also check the relation and determine that x == y, we don't need to >>>> worry about that growth as this process is linear. The above case will be >>>> broken down to just 1*1, 2*2 and 3*3, resulting in a range of [1, >>>> 1][4,4][9,9]. >>>> >>>> In the testcase, it happens to be the right_shift operation, but this >>>> solution is generic and applies to all range-op operations. I added a >>>> testcase which checks >>, *, + and %. >>>> >>>> I also arbitrarily chose 8 elements as the limit for breaking down >>>> individual operations. The overall compile time change for this is >>>> negligible. >>>> >>>> Although this is a regression fix, it will affect all operations where x == >>>> y, which is where my initial hesitancy arose. >>>> >>>> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for >>>> trunk? >>> Will defer to Aldy, just some nits. >> Did you mean Richi? > > I suppose Aldy, since it's a regression fix it's OK even during stage4. I don't have any issues with the patch. Whatever the release managers agree to, I'm fine with. Aldy > > I do have a comment as well though, you do > > + // If op1 and op2 are equivalences, then we don't need a complete cross > + // product, just pairs of matching elements. > + if (relation_equiv_p (rel) && lh == rh && num_lh <= 16) > + { > + int_range_max tmp; > + r.set_undefined (); > + for (unsigned x = 0; x < num_lh; ++x) > + { > + wide_int lh_lb = lh.lower_bound (x); > + wide_int lh_ub = lh.upper_bound (x); > + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); > > and that does > > + widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)), > + widest_int::from (lh_lb, TYPE_SIGN (type))); > + // if there are 1 to 8 values in the LH range, split them up. > + r.set_undefined (); > + if (lh_range >= 0 && lh_range <= 7) > + { > + for (unsigned x = 0; x <= lh_range; x++) > > which in total limits the number of sub-ranges in the output but in an > odd way. It's also all-or-nothing. IIRC there's a hard limit on the > number of sub-ranges in the output anyway via int_range<N>, so > why not use that and always do the first loop over the sub-ranges > of the inputs and the second loop over the range members but > stop when we reach N-1 and then use wi_fold on the remainds? > > Your above code suggests we go up to 112 sub-ranges and once > we'd reach 113 we'd fold down to a single. > > Maybe my "heuristic" wouldn't be much better, but then somehow > breaking the heuristic down to a single magic number would be > better, esp. since .union_ will undo some of the breakup when > reaching N? > >>> >>>> + // if there are 1 to 8 values in the LH range, split them up. >>>> + r.set_undefined (); >>>> + if (lh_range >= 0 && lh_range <= 7) >>>> + { >>>> + unsigned x; >>>> + for (x = 0; x <= lh_range; x++) >>> Nothing uses x after the loop, so why not >>> for (unsigned x = 0; x <= lh_range; x++) >>> instead? >> >> Just old habits. >> >> >>>> @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type, >>>> unsigned num_lh = lh.num_pairs (); >>>> unsigned num_rh = rh.num_pairs (); >>>> >>>> + // If op1 and op2 are equivalences, then we don't need a complete cross >>>> + // product, just pairs of matching elements. >>>> + if (relation_equiv_p (rel) && (lh == rh)) >>> The ()s around lh == rh look superfluous to me. >> Yeah I just found it marginally more readable, but it is superfluous >>>> + { >>>> + int_range_max tmp; >>>> + r.set_undefined (); >>>> + for (unsigned x = 0; x < num_lh; ++x) >>> fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something >>> like that be there for this case too? >>> I mean, every wi_fold_in_parts_equiv can result in 8 subranges, >>> but num_lh could be up to 255 here, it is true it is linear and union_ >>> should merge excess ones, but still I wonder if some larger num_lh upper >>> bound like 20 or 32 wouldn't be useful. Up to you... >> fold_range has the num_lh * num_rh limit because it was >> quadratic/exponential and changes rapidly. Since this was linear based >> on the number of sub ranges I didn't think it would matter much, but >> sure, we can put a similar limit on it.. 16 seems reasonable. >>>> + { >>>> + wide_int lh_lb = lh.lower_bound (x); >>>> + wide_int lh_ub = lh.upper_bound (x); >>>> + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); >>>> + r.union_ (tmp); >>>> + if (r.varying_p ()) >>>> + break; >>>> + } >>>> + op1_op2_relation_effect (r, type, lh, rh, rel); >>>> + update_known_bitmask (r, m_code, lh, rh); >>>> + return true; >>>> + } >>>> + >>> Jakub >>> >> Updated patch attached... I'll run it through testing whe the current >> one is done. >> >> >> Andrew > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding. 2023-01-16 7:19 ` Richard Biener 2023-01-16 7:32 ` Aldy Hernandez @ 2023-01-16 8:32 ` Jakub Jelinek 1 sibling, 0 replies; 6+ messages in thread From: Jakub Jelinek @ 2023-01-16 8:32 UTC (permalink / raw) To: Richard Biener; +Cc: Andrew MacLeod, gcc-patches, hernandez, aldy On Mon, Jan 16, 2023 at 08:19:25AM +0100, Richard Biener wrote: > + // If op1 and op2 are equivalences, then we don't need a complete cross > + // product, just pairs of matching elements. > + if (relation_equiv_p (rel) && lh == rh && num_lh <= 16) > + { > + int_range_max tmp; > + r.set_undefined (); > + for (unsigned x = 0; x < num_lh; ++x) > + { > + wide_int lh_lb = lh.lower_bound (x); > + wide_int lh_ub = lh.upper_bound (x); > + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); > > and that does > > + widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)), > + widest_int::from (lh_lb, TYPE_SIGN (type))); > + // if there are 1 to 8 values in the LH range, split them up. > + r.set_undefined (); > + if (lh_range >= 0 && lh_range <= 7) > + { > + for (unsigned x = 0; x <= lh_range; x++) > > which in total limits the number of sub-ranges in the output but in an > odd way. It's also all-or-nothing. IIRC there's a hard limit on the > number of sub-ranges in the output anyway via int_range<N>, so > why not use that and always do the first loop over the sub-ranges > of the inputs and the second loop over the range members but > stop when we reach N-1 and then use wi_fold on the remainds? > > Your above code suggests we go up to 112 sub-ranges and once > we'd reach 113 we'd fold down to a single. > > Maybe my "heuristic" wouldn't be much better, but then somehow > breaking the heuristic down to a single magic number would be > better, esp. since .union_ will undo some of the breakup when > reaching N? The <= X in the first case is what I've suggested, the previous behavior didn't suffer from this all or nothing case, but I was worried because the behavior is that we create many subranges and then in the end just merge the last ones into a single subrange such that it fits into the limit. So we could create 255 * 8 subranges and then merge the last 255 * 7 + 1 into one. We could call that wi_fold_in_parts_equiv only if r.num_parts () < m_max_ranges and wi_fold otherwise... Jakub ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-01-16 8:32 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-13 21:23 [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding Andrew MacLeod 2023-01-13 21:54 ` Jakub Jelinek 2023-01-13 22:07 ` Andrew MacLeod 2023-01-16 7:19 ` Richard Biener 2023-01-16 7:32 ` Aldy Hernandez 2023-01-16 8:32 ` Jakub Jelinek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).