On 5/15/23 13:08, Richard Biener wrote: > On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez wrote: >> >> >> We can now have int_range for automatically >> resizable ranges. int_range_max is now int_range<3, true> >> for a 69X reduction in size from current trunk, and 6.9X reduction from >> GCC12. This incurs a 5% performance penalty for VRP that is more than >> covered by our > 13% improvements recently. >> >> >> int_range_max is the temporary range object we use in the ranger for >> integers. With the conversion to wide_int, this structure bloated up >> significantly because wide_ints are huge (80 bytes a piece) and are >> about 10 times as big as a plain tree. Since the temporary object >> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word. >> This means the structure grew from 4112 bytes to 40912 bytes. >> >> This patch adds the ability to resize ranges as needed, defaulting to >> no resizing, while int_range_max now defaults to 3 sub-ranges (instead >> of 255) and grows to 255 when the range being calculated does not fit. >> >> For example: >> >> int_range<1> foo; // 1 sub-range with no resizing. >> int_range<5> foo; // 5 sub-ranges with no resizing. >> int_range<5, true> foo; // 5 sub-ranges with resizing. >> >> I ran some tests and found that 3 sub-ranges cover 99% of cases, so >> I've set the int_range_max default to that: >> >> typedef int_range<3, /*RESIZABLE=*/true> int_range_max; >> >> We don't bother growing incrementally, since the default covers most >> cases and we have a 255 hard-limit. This hard limit could be reduced >> to 128, since my tests never saw a range needing more than 124, but we >> could do that as a follow-up if needed. >> >> With 3-subranges, int_range_max is now 592 bytes versus 40912 for >> trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP >> and 3.02% for threading, with no noticeable change in overall >> compilation (0.27%). This is more than covered by our 13.26% >> improvements for the legacy removal + wide_int conversion. > > Thanks for doing this. > >> I think this approach is a good alternative, while providing us with >> flexibility going forward. For example, we could try defaulting to a >> 8 sub-ranges for a noticeable improvement in VRP. We could also use >> large sub-ranges for switch analysis to avoid resizing. >> >> Another approach I tried was always resizing. With this, we could >> drop the whole int_range nonsense, and have irange just hold a >> resizable range. This simplified things, but incurred a 7% penalty on >> ipa_cp. This was hard to pinpoint, and I'm not entirely convinced >> this wasn't some artifact of valgrind. However, until we're sure, >> let's avoid massive changes, especially since IPA changes are coming >> up. >> >> For the curious, a particular hot spot for IPA in this area was: >> >> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr) >> { >> ... >> ... >> value_range save (m_vr); >> m_vr.union_ (*other_vr); >> return m_vr != save; >> } >> >> The problem isn't the resizing (since we do that at most once) but the >> fact that for some functions with lots of callers we end up a huge >> range that gets copied and compared for every meet operation. Maybe >> the IPA algorithm could be adjusted somehow??. > > Well, the above just wants to know whether the union_ operation changed > the range. I suppose that would be an interesting (and easy to compute?) > secondary output of union_ and it seems it already computes that (but > maybe not correctly?). So I suggest to change the above to union_ returns a value specifically for that, which Andrew uses for cache optimization. For that matter, your suggestion was my first approach, but I quickly found out we were being overly pessimistic in some cases, and I was too lazy to figure out why. > > bool res; > if (flag_checking) > { > value_range save (m_vr); > res = m_vr.union_ (*other_vr); > gcc_assert (res == (m_vr != save)); > } > else > res = m_vr.union (*other_vr); > return res; With your suggested sanity check I chased the problem to a minor inconsistency when unioning nonzero masks. The issue wasn't a bug, just a pessimization. I'm attaching a patch that corrects the oversight (well, not oversight, everything was more expensive with trees)... It yields a 6.89% improvement to the ipa-cp pass!!! Thanks. I'll push it if it passes tests. BTW, without the annoying IPA-cp performance regression, this paves the way for nuking int_range in favor of just irange, and have everything resize as needed. I'll wait for Andrew to chime in when he returns from PTO, since we may want to leave int_range around since it does provide flexibility (at the expensive of fugly looking declarations). Aldy