On 5/15/23 13:08, Richard Biener wrote:
> On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>>
>> <tldr>
>> We can now have int_range<N, RESIZABLE=false> for automatically
>> resizable ranges.  int_range_max is now int_range<3, true>
>> for a 69X reduction in size from current trunk, and 6.9X reduction from
>> GCC12.  This incurs a 5% performance penalty for VRP that is more than
>> covered by our > 13% improvements recently.
>> </tldr>
>>
>> int_range_max is the temporary range object we use in the ranger for
>> integers.  With the conversion to wide_int, this structure bloated up
>> significantly because wide_ints are huge (80 bytes a piece) and are
>> about 10 times as big as a plain tree.  Since the temporary object
>> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
>> This means the structure grew from 4112 bytes to 40912 bytes.
>>
>> This patch adds the ability to resize ranges as needed, defaulting to
>> no resizing, while int_range_max now defaults to 3 sub-ranges (instead
>> of 255) and grows to 255 when the range being calculated does not fit.
>>
>> For example:
>>
>> int_range<1> foo;       // 1 sub-range with no resizing.
>> int_range<5> foo;       // 5 sub-ranges with no resizing.
>> int_range<5, true> foo; // 5 sub-ranges with resizing.
>>
>> I ran some tests and found that 3 sub-ranges cover 99% of cases, so
>> I've set the int_range_max default to that:
>>
>>          typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
>>
>> We don't bother growing incrementally, since the default covers most
>> cases and we have a 255 hard-limit.  This hard limit could be reduced
>> to 128, since my tests never saw a range needing more than 124, but we
>> could do that as a follow-up if needed.
>>
>> With 3-subranges, int_range_max is now 592 bytes versus 40912 for
>> trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
>> and 3.02% for threading, with no noticeable change in overall
>> compilation (0.27%).  This is more than covered by our 13.26%
>> improvements for the legacy removal + wide_int conversion.
> 
> Thanks for doing this.
> 
>> I think this approach is a good alternative, while providing us with
>> flexibility going forward.  For example, we could try defaulting to a
>> 8 sub-ranges for a noticeable improvement in VRP.  We could also use
>> large sub-ranges for switch analysis to avoid resizing.
>>
>> Another approach I tried was always resizing.  With this, we could
>> drop the whole int_range<N> nonsense, and have irange just hold a
>> resizable range.  This simplified things, but incurred a 7% penalty on
>> ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
>> this wasn't some artifact of valgrind.  However, until we're sure,
>> let's avoid massive changes, especially since IPA changes are coming
>> up.
>>
>> For the curious, a particular hot spot for IPA in this area was:
>>
>> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
>> {
>> ...
>> ...
>>    value_range save (m_vr);
>>    m_vr.union_ (*other_vr);
>>    return m_vr != save;
>> }
>>
>> The problem isn't the resizing (since we do that at most once) but the
>> fact that for some functions with lots of callers we end up a huge
>> range that gets copied and compared for every meet operation.  Maybe
>> the IPA algorithm could be adjusted somehow??.
> 
> Well, the above just wants to know whether the union_ operation changed
> the range.  I suppose that would be an interesting (and easy to compute?)
> secondary output of union_ and it seems it already computes that (but
> maybe not correctly?).  So I suggest to change the above to

union_ returns a value specifically for that, which Andrew uses for 
cache optimization.  For that matter, your suggestion was my first 
approach, but I quickly found out we were being overly pessimistic in 
some cases, and I was too lazy to figure out why.

> 
>    bool res;
>    if (flag_checking)
>     {
>        value_range save (m_vr);
>        res = m_vr.union_ (*other_vr);
>        gcc_assert (res == (m_vr != save));
>     }
>   else
>      res = m_vr.union (*other_vr);
>   return res;

With your suggested sanity check I chased the problem to a minor 
inconsistency when unioning nonzero masks.  The issue wasn't a bug, just 
a pessimization.  I'm attaching a patch that corrects the oversight 
(well, not oversight, everything was more expensive with trees)... It 
yields a 6.89% improvement to the ipa-cp pass!!!  Thanks.

I'll push it if it passes tests.

BTW, without the annoying IPA-cp performance regression, this paves the 
way for nuking int_range<N> in favor of just irange, and have everything 
resize as needed.  I'll wait for Andrew to chime in when he returns from 
PTO, since we may want to leave int_range<N> around since it does 
provide flexibility (at the expensive of fugly looking declarations).

Aldy