* [PATCH] ranger: Optimise irange_union
@ 2021-12-05 21:55 Richard Sandiford
2021-12-06 7:34 ` Richard Biener
2021-12-06 13:41 ` Andrew MacLeod
0 siblings, 2 replies; 3+ messages in thread
From: Richard Sandiford @ 2021-12-05 21:55 UTC (permalink / raw)
To: gcc-patches
When compiling an optabs.ii at -O2 with a release-checking build,
the hottest function in the profile was irange_union. This patch
tries to optimise it a bit. The specific changes are:
- Use quick_push rather than safe_push, since the final number
of entries is known in advance.
- Avoid assigning wi::to_wide & co. to a temporary wide_int,
such as in:
wide_int val_j = wi::to_wide (res[j]);
wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST
storage. Assigning the result to wide_int forces an unnecessary
copy to temporary storage.
This is one area where "auto" helps a lot. In the end though,
it seemed more readable to inline the wi::to_*s rather than
use auto.
- Use to_widest_int rather than to_wide_int. Both are functionally
correct, but to_widest_int is more efficient, for three reasons:
- to_wide returns a wide-int representation in which the most
significant element might not be canonically sign-extended.
This is because we want to allow the storage of an INTEGER_CST
like 0x1U << 31 to be accessed directly with both a wide_int view
(where only 32 bits matter) and a widest_int view (where many more
bits matter, and where the 32 bits are zero-extended to match the
unsigned type). However, operating on uncanonicalised wide_int
forms is less efficient than operating on canonicalised forms.
- to_widest_int has a constant rather than variable precision and
there are never any redundant upper bits to worry about.
- Using widest_int avoids the need for an overflow check, since
there is enough precision to add 1 to any IL constant without
wrap-around.
This gives a ~2% compile-time speed up with the test above.
I also tried adding a path for two single-pair ranges, but it
wasn't a win.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
Richard
gcc/
* value-range.cc (irange::irange_union): Use quick_push rather
than safe_push. Use widest_int rather than wide_int. Avoid
assigning wi::to_* results to wide*_int temporaries.
---
gcc/value-range.cc | 46 +++++++++++++---------------------------------
1 file changed, 13 insertions(+), 33 deletions(-)
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 82509fa55a7..d38d0786072 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1550,70 +1550,50 @@ irange::irange_union (const irange &r)
// the merge is performed.
//
// [Xi,Yi]..[Xn,Yn] U [Xj,Yj]..[Xm,Ym] --> [Xk,Yk]..[Xp,Yp]
- tree ttype = r.type ();
- signop sign = TYPE_SIGN (ttype);
-
- auto_vec<tree, 20> res;
- wide_int u1 ;
- wi::overflow_type ovf;
+ auto_vec<tree, 20> res (m_num_ranges * 2 + r.m_num_ranges * 2);
unsigned i = 0, j = 0, k = 0;
while (i < m_num_ranges * 2 && j < r.m_num_ranges * 2)
{
// lower of Xi and Xj is the lowest point.
- if (wi::le_p (wi::to_wide (m_base[i]), wi::to_wide (r.m_base[j]), sign))
+ if (wi::to_widest (m_base[i]) <= wi::to_widest (r.m_base[j]))
{
- res.safe_push (m_base[i]);
- res.safe_push (m_base[i + 1]);
+ res.quick_push (m_base[i]);
+ res.quick_push (m_base[i + 1]);
k += 2;
i += 2;
}
else
{
- res.safe_push (r.m_base[j]);
- res.safe_push (r.m_base[j + 1]);
+ res.quick_push (r.m_base[j]);
+ res.quick_push (r.m_base[j + 1]);
k += 2;
j += 2;
}
}
for ( ; i < m_num_ranges * 2; i += 2)
{
- res.safe_push (m_base[i]);
- res.safe_push (m_base[i + 1]);
+ res.quick_push (m_base[i]);
+ res.quick_push (m_base[i + 1]);
k += 2;
}
for ( ; j < r.m_num_ranges * 2; j += 2)
{
- res.safe_push (r.m_base[j]);
- res.safe_push (r.m_base[j + 1]);
+ res.quick_push (r.m_base[j]);
+ res.quick_push (r.m_base[j + 1]);
k += 2;
}
// Now normalize the vector removing any overlaps.
i = 2;
- int prec = TYPE_PRECISION (ttype);
- wide_int max_val = wi::max_value (prec, sign);
for (j = 2; j < k ; j += 2)
{
- wide_int val_im1 = wi::to_wide (res[i - 1]);
- if (val_im1 == max_val)
- break;
- u1 = wi::add (val_im1, 1, sign, &ovf);
-
- // Overflow indicates we are at MAX already.
- // A wide int bug requires the previous max_val check
- // trigger: gcc.c-torture/compile/pr80443.c with -O3
- if (ovf == wi::OVF_OVERFLOW)
- break;
-
- wide_int val_j = wi::to_wide (res[j]);
- wide_int val_jp1 = wi::to_wide (res[j+1]);
// Current upper+1 is >= lower bound next pair, then we merge ranges.
- if (wi::ge_p (u1, val_j, sign))
+ if (wi::to_widest (res[i - 1]) + 1 >= wi::to_widest (res[j]))
{
// New upper bounds is greater of current or the next one.
- if (wi::gt_p (val_jp1, val_im1, sign))
- res [i - 1] = res[j + 1];
+ if (wi::to_widest (res[j + 1]) > wi::to_widest (res[i - 1]))
+ res[i - 1] = res[j + 1];
}
else
{
--
2.31.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] ranger: Optimise irange_union
2021-12-05 21:55 [PATCH] ranger: Optimise irange_union Richard Sandiford
@ 2021-12-06 7:34 ` Richard Biener
2021-12-06 13:41 ` Andrew MacLeod
1 sibling, 0 replies; 3+ messages in thread
From: Richard Biener @ 2021-12-06 7:34 UTC (permalink / raw)
To: Richard Sandiford, GCC Patches
On Sun, Dec 5, 2021 at 10:55 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> When compiling an optabs.ii at -O2 with a release-checking build,
> the hottest function in the profile was irange_union. This patch
> tries to optimise it a bit. The specific changes are:
>
> - Use quick_push rather than safe_push, since the final number
> of entries is known in advance.
>
> - Avoid assigning wi::to_wide & co. to a temporary wide_int,
> such as in:
>
> wide_int val_j = wi::to_wide (res[j]);
>
> wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST
> storage. Assigning the result to wide_int forces an unnecessary
> copy to temporary storage.
>
> This is one area where "auto" helps a lot. In the end though,
> it seemed more readable to inline the wi::to_*s rather than
> use auto.
>
> - Use to_widest_int rather than to_wide_int. Both are functionally
> correct, but to_widest_int is more efficient, for three reasons:
>
> - to_wide returns a wide-int representation in which the most
> significant element might not be canonically sign-extended.
> This is because we want to allow the storage of an INTEGER_CST
> like 0x1U << 31 to be accessed directly with both a wide_int view
> (where only 32 bits matter) and a widest_int view (where many more
> bits matter, and where the 32 bits are zero-extended to match the
> unsigned type). However, operating on uncanonicalised wide_int
> forms is less efficient than operating on canonicalised forms.
>
> - to_widest_int has a constant rather than variable precision and
> there are never any redundant upper bits to worry about.
>
> - Using widest_int avoids the need for an overflow check, since
> there is enough precision to add 1 to any IL constant without
> wrap-around.
>
> This gives a ~2% compile-time speed up with the test above.
>
> I also tried adding a path for two single-pair ranges, but it
> wasn't a win.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
OK.
Thanks,
Richard.
> Richard
>
>
> gcc/
> * value-range.cc (irange::irange_union): Use quick_push rather
> than safe_push. Use widest_int rather than wide_int. Avoid
> assigning wi::to_* results to wide*_int temporaries.
> ---
> gcc/value-range.cc | 46 +++++++++++++---------------------------------
> 1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 82509fa55a7..d38d0786072 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -1550,70 +1550,50 @@ irange::irange_union (const irange &r)
> // the merge is performed.
> //
> // [Xi,Yi]..[Xn,Yn] U [Xj,Yj]..[Xm,Ym] --> [Xk,Yk]..[Xp,Yp]
> - tree ttype = r.type ();
> - signop sign = TYPE_SIGN (ttype);
> -
> - auto_vec<tree, 20> res;
> - wide_int u1 ;
> - wi::overflow_type ovf;
> + auto_vec<tree, 20> res (m_num_ranges * 2 + r.m_num_ranges * 2);
> unsigned i = 0, j = 0, k = 0;
>
> while (i < m_num_ranges * 2 && j < r.m_num_ranges * 2)
> {
> // lower of Xi and Xj is the lowest point.
> - if (wi::le_p (wi::to_wide (m_base[i]), wi::to_wide (r.m_base[j]), sign))
> + if (wi::to_widest (m_base[i]) <= wi::to_widest (r.m_base[j]))
> {
> - res.safe_push (m_base[i]);
> - res.safe_push (m_base[i + 1]);
> + res.quick_push (m_base[i]);
> + res.quick_push (m_base[i + 1]);
> k += 2;
> i += 2;
> }
> else
> {
> - res.safe_push (r.m_base[j]);
> - res.safe_push (r.m_base[j + 1]);
> + res.quick_push (r.m_base[j]);
> + res.quick_push (r.m_base[j + 1]);
> k += 2;
> j += 2;
> }
> }
> for ( ; i < m_num_ranges * 2; i += 2)
> {
> - res.safe_push (m_base[i]);
> - res.safe_push (m_base[i + 1]);
> + res.quick_push (m_base[i]);
> + res.quick_push (m_base[i + 1]);
> k += 2;
> }
> for ( ; j < r.m_num_ranges * 2; j += 2)
> {
> - res.safe_push (r.m_base[j]);
> - res.safe_push (r.m_base[j + 1]);
> + res.quick_push (r.m_base[j]);
> + res.quick_push (r.m_base[j + 1]);
> k += 2;
> }
>
> // Now normalize the vector removing any overlaps.
> i = 2;
> - int prec = TYPE_PRECISION (ttype);
> - wide_int max_val = wi::max_value (prec, sign);
> for (j = 2; j < k ; j += 2)
> {
> - wide_int val_im1 = wi::to_wide (res[i - 1]);
> - if (val_im1 == max_val)
> - break;
> - u1 = wi::add (val_im1, 1, sign, &ovf);
> -
> - // Overflow indicates we are at MAX already.
> - // A wide int bug requires the previous max_val check
> - // trigger: gcc.c-torture/compile/pr80443.c with -O3
> - if (ovf == wi::OVF_OVERFLOW)
> - break;
> -
> - wide_int val_j = wi::to_wide (res[j]);
> - wide_int val_jp1 = wi::to_wide (res[j+1]);
> // Current upper+1 is >= lower bound next pair, then we merge ranges.
> - if (wi::ge_p (u1, val_j, sign))
> + if (wi::to_widest (res[i - 1]) + 1 >= wi::to_widest (res[j]))
> {
> // New upper bounds is greater of current or the next one.
> - if (wi::gt_p (val_jp1, val_im1, sign))
> - res [i - 1] = res[j + 1];
> + if (wi::to_widest (res[j + 1]) > wi::to_widest (res[i - 1]))
> + res[i - 1] = res[j + 1];
> }
> else
> {
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] ranger: Optimise irange_union
2021-12-05 21:55 [PATCH] ranger: Optimise irange_union Richard Sandiford
2021-12-06 7:34 ` Richard Biener
@ 2021-12-06 13:41 ` Andrew MacLeod
1 sibling, 0 replies; 3+ messages in thread
From: Andrew MacLeod @ 2021-12-06 13:41 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 12/5/21 16:55, Richard Sandiford via Gcc-patches wrote:
>
> This gives a ~2% compile-time speed up with the test above.
>
> I also tried adding a path for two single-pair ranges, but it
> wasn't a win.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>
> Richard
>
Thanks for these performance tweaks!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-12-06 13:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-05 21:55 [PATCH] ranger: Optimise irange_union Richard Sandiford
2021-12-06 7:34 ` Richard Biener
2021-12-06 13:41 ` Andrew MacLeod
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).