public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
@ 2021-04-14 14:21 marxin at gcc dot gnu.org
  2021-04-14 14:21 ` [Bug tree-optimization/100081] " marxin at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-04-14 14:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

            Bug ID: 100081
           Summary: [11 Regression] Compile time hog in irange since
                    r11-4135-ge864d395b4e862ce
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: compile-time-hog
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marxin at gcc dot gnu.org
                CC: aldyh at gcc dot gnu.org, amacleod at redhat dot com
  Target Milestone: ---

Created attachment 50593
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50593&action=edit
test-case

Reduced from a yarpgen test-case:

$ g++ -O3 func.ii -c
... takes at least 8 minutes ...

while GCC 10 finishes in 20s.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
@ 2021-04-14 14:21 ` marxin at gcc dot gnu.org
  2021-04-14 14:22 ` marxin at gcc dot gnu.org
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-04-14 14:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |10.3.0
             Status|UNCONFIRMED                 |NEW
   Target Milestone|---                         |11.0
   Last reconfirmed|                            |2021-04-14
      Known to fail|                            |11.0
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
  2021-04-14 14:21 ` [Bug tree-optimization/100081] " marxin at gcc dot gnu.org
@ 2021-04-14 14:22 ` marxin at gcc dot gnu.org
  2021-04-15  6:45 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-04-14 14:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
I see the following in perf top:

     9.84%  cc1plus        [.] wide_int_to_tree_1
     6.59%  cc1plus        [.] irange::varying_p
     6.13%  cc1plus        [.] bitmap_bit_p
     4.35%  cc1plus        [.] wi::force_to_size
     3.69%  cc1plus        [.] cache_wide_int_in_type_cache
     2.83%  cc1plus        [.] irange::operator=
     2.81%  cc1plus        [.] get_int_cst_ext_nunits
     2.77%  cc1plus        [.] logical_stmt_cache::cacheable_p
     2.77%  cc1plus        [.] gori_compute::compute_logical_operands_in_chain
     2.43%  cc1plus        [.] gori_compute::compute_operand_range
     2.33%  cc1plus        [.] compare_values_warnv
     2.16%  cc1plus        [.] wi::max_value

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
  2021-04-14 14:21 ` [Bug tree-optimization/100081] " marxin at gcc dot gnu.org
  2021-04-14 14:22 ` marxin at gcc dot gnu.org
@ 2021-04-15  6:45 ` rguenth at gcc dot gnu.org
  2021-04-16 12:30 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-15  6:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
>From the profile it looks like there's a lot tree INTEGER_CST work being done
rather than sticking to wide_ints.  That's always (constant-time) more
expensive.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-04-15  6:45 ` rguenth at gcc dot gnu.org
@ 2021-04-16 12:30 ` rguenth at gcc dot gnu.org
  2021-04-16 12:39 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-16 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
inline bool
irange::varying_p () const
{
  if (legacy_mode_p ())
    return m_kind == VR_VARYING;

  if (m_num_ranges != 1)
    return false;

  tree l = m_base[0];
  tree u = m_base[1];
  tree t = TREE_TYPE (l);
  unsigned prec = TYPE_PRECISION (t);
  signop sign = TYPE_SIGN (t);
  if (INTEGRAL_TYPE_P (t))
    return (wi::to_wide (l) == wi::min_value (prec, sign)
            && wi::to_wide (u) == wi::max_value (prec, sign));
  if (POINTER_TYPE_P (t))
    return (wi::to_wide (l) == 0
            && wi::to_wide (u) == wi::max_value (prec, sign));
  return true;


is truly excessive for !legacy_mode_p () ..., I think the pointer vs.
int case is premature optimization and we might see more inlining
and optimization when doing unconditional

  return (wi::to_wide (l) == wi::min_value (prec, sign)
            && wi::to_wide (u) == wi::max_value (prec, sign));

possibly there are simply too many varying_p calls as well.  Like

inline bool
range_includes_zero_p (const irange *vr)
{
  if (vr->undefined_p ())
    return false;

  if (vr->varying_p ())
    return true;

  return vr->may_contain_p (build_zero_cst (vr->type ()));

also

inline value_range_kind
irange::kind () const
{
  if (legacy_mode_p ())
    return m_kind;

  if (undefined_p ())
    return VR_UNDEFINED;

  if (varying_p ())
    return VR_VARYING;

  return VR_RANGE;

looks like quite expensive.  IMHO since we have m_kind we should
keep it up-to-date and make those checks cheap.

And kill that legacy_mode_p () stuff!?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-04-16 12:30 ` rguenth at gcc dot gnu.org
@ 2021-04-16 12:39 ` rguenth at gcc dot gnu.org
  2021-04-16 13:54 ` aldyh at gcc dot gnu.org
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-16 12:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Or

bool
irange::symbolic_p () const
{
  return (!varying_p ()
          && !undefined_p ()
          && (!is_gimple_min_invariant (min ())
              || !is_gimple_min_invariant (max ())));
}

which should be simply

bool
irange::symbolic_p () const
{
  return (m_num_ranges == 1
          && (!is_gimple_min_invariant (min ())
              || !is_gimple_min_invariant (max ())));
}

?  Or do we have symbolic anti ranges represented with two ranges?

Likewise

bool
irange::constant_p () const
{
  return (!varying_p ()
          && !undefined_p ()
          && TREE_CODE (min ()) == INTEGER_CST
          && TREE_CODE (max ()) == INTEGER_CST);
}

err - I thought varying == constant...


Note the testcase is fully accounted to

 rest of compilation                : 850.63 ( 97%)   0.12 ( 24%) 864.41 ( 97%)
 4351M ( 96%)

so not sure where it is actually spent.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-04-16 12:39 ` rguenth at gcc dot gnu.org
@ 2021-04-16 13:54 ` aldyh at gcc dot gnu.org
  2021-04-16 14:04 ` aldyh at gcc dot gnu.org
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-04-16 13:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #5 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> Or
> 
> bool
> irange::symbolic_p () const
> {
>   return (!varying_p ()
>           && !undefined_p ()
>           && (!is_gimple_min_invariant (min ())
>               || !is_gimple_min_invariant (max ())));
> }
> 
> which should be simply
> 
> bool
> irange::symbolic_p () const
> {
>   return (m_num_ranges == 1
>           && (!is_gimple_min_invariant (min ())
>               || !is_gimple_min_invariant (max ())));
> }
> 
> ?  Or do we have symbolic anti ranges represented with two ranges?
> 
> Likewise
> 
> bool
> irange::constant_p () const
> {
>   return (!varying_p ()
>           && !undefined_p ()
>           && TREE_CODE (min ()) == INTEGER_CST
>           && TREE_CODE (max ()) == INTEGER_CST);
> }
> 
> err - I thought varying == constant...

Those varying_p checks definitely look suspect.  You should be able to just
look at the min/max as you suggest.  However, the undefined_p check must stay
because it is really a check for num_ranges > 0, otherwise in the case of
undefined_p, the is_gimple_*_invariant would dereference m_base[] which has
nothing of interest.

Perhaps a more obvious check would be m_num_ranges > 0, instead of the
confusing undefined_p.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-04-16 13:54 ` aldyh at gcc dot gnu.org
@ 2021-04-16 14:04 ` aldyh at gcc dot gnu.org
  2021-04-16 14:24 ` marxin at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-04-16 14:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #6 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
BTW, we're looking as to why there are so many calls to varying_p.  Something
seems off.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-04-16 14:04 ` aldyh at gcc dot gnu.org
@ 2021-04-16 14:24 ` marxin at gcc dot gnu.org
  2021-04-16 18:06 ` amacleod at redhat dot com
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-04-16 14:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |needs-reduction

--- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
I'm reducing the test-case right now...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-04-16 14:24 ` marxin at gcc dot gnu.org
@ 2021-04-16 18:06 ` amacleod at redhat dot com
  2021-04-16 18:39 ` amacleod at redhat dot com
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: amacleod at redhat dot com @ 2021-04-16 18:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #8 from Andrew Macleod <amacleod at redhat dot com> ---
OMG.  Don't bother reducing. I can see the problem.

EVRP is fine, but when wrestrict runs, its quite late, and the CFG has

<bb 560> [local count: 28382607]:
  <...>
  _571 = _61 >= _593;
  _3583 = &arr_724 + _3992;
  _2220 = _831 <= _3583;
  _47 = _571 | _2220;
  _2935 = _376 * 2;
  _3966 = &arr_725 + _2935;
  _3024 = _61 >= _3966;
  _4219 = _3992 * 2;
  _4218 = &arr_725 + _4219;
  _1836 = _831 <= _4218;
  _3080 = _1836 | _3024;
<...>
  _5348 = _5347 & _32080;
  _5349 = _5348 & _32151;
  _5350 = _5349 & _32176;
  _5351 = _5350 & _32488;
  _5352 = _5351 & _33691;
  _5353 = _5352 & _33762;
  _5354 = _5353 & _34753;
  _35662 = _5354 & _34824;
  if (_35662 != 0)
    goto <bb 561>; [90.00%]
  else
    goto <bb 1510>; [10.00%]

Its a 7200 stmt basic block, made up of calculations and 2614 ORs and 1480
ANDs.

A request is made for a range which can be exported from this block, and ranger
is dutifully trying everything it can to process those blocks.

 Each AND/OR is a logical expression which evaluates a TRUE and FALSE range for
each operands, so it calculates up to 4 ranges for each pair of operands. I
knew this could get out of hand in pathological cases, so we introduced a
logical cache to help resolve this and avoid extra work.  Its actually making
this one worse I think.

Regardless, I know what the issue is.  I have 2 things to try.
1) We have a patch in our branch that gives up early.. once it finds the result
is going to be varying...   I'll give that a shot first, it may stop this
lookup quickly. Its possible it won't. 

if not, then 
2) we've also discussed that in ridiculously large combinations of &&/|| we are
unlikely to be able to haul a useful range out of it, so limit the depth of
logical processing to something in a reasonable range. 4000 logicals operations
is not reasonable to look thru.  something more akin to 10 maybe at most..

Anyway, I know what the issue is and will have it resolved early next week at
the latest.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-04-16 18:06 ` amacleod at redhat dot com
@ 2021-04-16 18:39 ` amacleod at redhat dot com
  2021-04-19  6:56 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: amacleod at redhat dot com @ 2021-04-16 18:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

Andrew Macleod <amacleod at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |amacleod at redhat dot com

--- Comment #9 from Andrew Macleod <amacleod at redhat dot com> ---
Created attachment 50619
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50619&action=edit
proposed fix

I've added a depth limited for logical combinations.   In this patch its set to
6.. I'm going to try running it against various scenarios to make sure it
doesn't miss anything we do want... and of course I'll bootstrap and regtest
it.

If you want to try it, this should resolve the issue.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-04-16 18:39 ` amacleod at redhat dot com
@ 2021-04-19  6:56 ` rguenth at gcc dot gnu.org
  2021-04-19  7:30 ` marxin at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-19  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Macleod from comment #8)
> OMG.  Don't bother reducing. I can see the problem.
> 
> EVRP is fine, but when wrestrict runs, its quite late, and the CFG has
> 
> <bb 560> [local count: 28382607]:
>   <...>
>   _571 = _61 >= _593;
>   _3583 = &arr_724 + _3992;
>   _2220 = _831 <= _3583;
>   _47 = _571 | _2220;
>   _2935 = _376 * 2;
>   _3966 = &arr_725 + _2935;
>   _3024 = _61 >= _3966;
>   _4219 = _3992 * 2;
>   _4218 = &arr_725 + _4219;
>   _1836 = _831 <= _4218;
>   _3080 = _1836 | _3024;
> <...>
>   _5348 = _5347 & _32080;
>   _5349 = _5348 & _32151;
>   _5350 = _5349 & _32176;
>   _5351 = _5350 & _32488;
>   _5352 = _5351 & _33691;
>   _5353 = _5352 & _33762;
>   _5354 = _5353 & _34753;
>   _35662 = _5354 & _34824;
>   if (_35662 != 0)
>     goto <bb 561>; [90.00%]
>   else
>     goto <bb 1510>; [10.00%]
> 
> Its a 7200 stmt basic block, made up of calculations and 2614 ORs and 1480
> ANDs.
> 
> A request is made for a range which can be exported from this block, and
> ranger is dutifully trying everything it can to process those blocks.
> 
>  Each AND/OR is a logical expression which evaluates a TRUE and FALSE range
> for each operands, so it calculates up to 4 ranges for each pair of
> operands. I knew this could get out of hand in pathological cases, so we
> introduced a logical cache to help resolve this and avoid extra work.  Its
> actually making this one worse I think.

Hmm, still the overall work should be linear to produce ranges for all
of the SSA defs in this BB, no?  As heuristic you might want to avoid
producing ranges for single-use defs, like those that are just used in
another & or | combination?  Wrestrict should only be interested in
ranges for the "tails" of this &| tree (for example _61 in _61 >= _3966).

But yes, if you have any worse than O(n log n) algorithm then artificially
limiting it's cost by capping 'n' at some (--param controlled) value is
the way to go.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-04-19  6:56 ` rguenth at gcc dot gnu.org
@ 2021-04-19  7:30 ` marxin at gcc dot gnu.org
  2021-04-19 14:09 ` amacleod at redhat dot com
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-04-19  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-reduction             |

--- Comment #11 from Martin Liška <marxin at gcc dot gnu.org> ---
> If you want to try it, this should resolve the issue.

I can confirm the patch resolves that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-04-19  7:30 ` marxin at gcc dot gnu.org
@ 2021-04-19 14:09 ` amacleod at redhat dot com
  2021-04-19 19:49 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: amacleod at redhat dot com @ 2021-04-19 14:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #12 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Richard Biener from comment #10)
> (In reply to Andrew Macleod from comment #8)
> > OMG.  Don't bother reducing. I can see the problem.
> > 
> > EVRP is fine, but when wrestrict runs, its quite late, and the CFG has
> > 
> > <bb 560> [local count: 28382607]:
> >   <...>
> >   _571 = _61 >= _593;
> >   _3583 = &arr_724 + _3992;
> >   _2220 = _831 <= _3583;
> >   _47 = _571 | _2220;
> >   _2935 = _376 * 2;
> >   _3966 = &arr_725 + _2935;
> >   _3024 = _61 >= _3966;
> >   _4219 = _3992 * 2;
> >   _4218 = &arr_725 + _4219;
> >   _1836 = _831 <= _4218;
> >   _3080 = _1836 | _3024;
> > <...>
> >   _5348 = _5347 & _32080;
> >   _5349 = _5348 & _32151;
> >   _5350 = _5349 & _32176;
> >   _5351 = _5350 & _32488;
> >   _5352 = _5351 & _33691;
> >   _5353 = _5352 & _33762;
> >   _5354 = _5353 & _34753;
> >   _35662 = _5354 & _34824;
> >   if (_35662 != 0)
> >     goto <bb 561>; [90.00%]
> >   else
> >     goto <bb 1510>; [10.00%]
> > 
> > Its a 7200 stmt basic block, made up of calculations and 2614 ORs and 1480
> > ANDs.
> > 
> > A request is made for a range which can be exported from this block, and
> > ranger is dutifully trying everything it can to process those blocks.
> > 
> >  Each AND/OR is a logical expression which evaluates a TRUE and FALSE range
> > for each operands, so it calculates up to 4 ranges for each pair of
> > operands. I knew this could get out of hand in pathological cases, so we
> > introduced a logical cache to help resolve this and avoid extra work.  Its
> > actually making this one worse I think.
> 
> Hmm, still the overall work should be linear to produce ranges for all
> of the SSA defs in this BB, no?  As heuristic you might want to avoid
> producing ranges for single-use defs, like those that are just used in
> another & or | combination?  Wrestrict should only be interested in
> ranges for the "tails" of this &| tree (for example _61 in _61 >= _3966).
> 

Since the direction is bottom up, it is no longer linear. This has probably
never been explain very well.  lets make up a simple example:

    if (x > 2 && x < 10 || x == 15)
for unsigned x turns into:

    _1 = x_8(D) + 4294967293;
    _2 = _1 <= 6;
    _3 = x_8(D) == 15;
    _4 = _2 | _3;
    if (_4 != 0)
      goto <bb 3>; [INV]
    else
      goto <bb 5>; [INV]

and we can calculate the following ranges (note none of them are calculated in
advance, only if asked/required) :

2->3  (T) _4 :  bool [1, 1]
2->3  (T) x_8(D) :      unsigned int [3, 9][15, 15]
2->5  (F) _1 :  unsigned int [7, +INF]
2->5  (F) _2 :  bool [0, 0]
2->5  (F) _3 :  bool [0, 0]
2->5  (F) _4 :  bool [0, 0]
2->5  (F) x_8(D) :      unsigned int [0, 2][10, 14][16, +INF]

When a client asks for the range of x_8 on the true edge, we have to solve
[1,1] = _4 != 0, which in turn feeds back to the def of _4 as:
[1,1] = _2 | _3

There are 3 possible ways this branch can be taken..
a) _2 = [1, 1], _3 = [1, 1]
b) _2 = [0, 0], _3 = [1, 1]
c) _2 = [1, 1], _3 = [0, 0]

In order to calculate a precise range for x_8, we need to calculate the range
of x_8 for both possible values of _2 and _3  and combine them.. 

I wont trace the actual calculation for each one, but it boils down to
_2 = [0, 0] produces x_8 = ~[3, 9]
_2 = [1, 1] produces x_8 = [3, 9]
_3 = [0, 0] produces x_8 = ~[15, 15]
_3 = [1, 1] produces x_8 = [15, 15]

Then we combine them with the 2 possible combinations, and produce the final
range of unsigned int [3, 9][15, 15].

Once upon a time I tried to "simplify" this a couple of different ways, but in
more complex situations, it inevitably fails to produce the correct range.. so
instead, we simply do the calculations exactly as the statement requires and
combine them.

The logical OR spawned 4 requests for the range of x_8.. so when these logical
expressions feed each other, we get the exponential growth of computations.

The logical cache was suppose to resolve this by caching the true and false
values of x_8 for _2 and _3 eliminating the need to recalculate them.   More
complex cases with many ssa_names feeding through a boolean condition cause it
to not function well.


As for single use-use defs.. There is nothing special about them. We never
produce ranges for anything that is not used an an outgoing edge calculation,
regardless of how many uses there are.  Those are tagged and we simply use
their global value.

Furthermore, we never produce ranges for anything that isn't either explicitly
requested, or used in a calculation that affects an explicit request.

In this case for instance, I forget the name that restrict asked for, but lets
say it was  _3992.  we start at the bottom of the block and work back to the
definition of _3992.  During the evaluation, we go through many single-use
cases which we will need the ranges for as they feed the condition at the
bottom and may therefore affect the outcome.  Anything above _3992's def is
never evaluated.

Up until now, I haven't really throttled anything.. Since we only calculate
ranges for things that are actually useful, that helps to compensate for the
time spent doing the computations.

We knew that the logical combination was potentially an issue, and
thought/hoped the cache would contain bad behaviour...  but not in this case.

I plan to focus more time in the next release trying to evaluate when a good
time to "give up" is, and maybe find additional efficiencies, but for now, just
limited the depth of logical evaluation should suffice.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2021-04-19 14:09 ` amacleod at redhat dot com
@ 2021-04-19 19:49 ` cvs-commit at gcc dot gnu.org
  2021-04-27 11:40 ` [Bug tree-optimization/100081] [11/12 " jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-04-19 19:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:329d2f0df7d6d22c87ab3338b94caef68139cd58

commit r11-8251-g329d2f0df7d6d22c87ab3338b94caef68139cd58
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Fri Apr 16 17:08:51 2021 -0400

    tree-optimization/100081 - Limit depth of logical expression windback.

    Limit how many logical expressions GORI will look back through when
    evaluating outgoing edge range.

            PR tree-optimization/100081
            * gimple-range-cache.h (ranger_cache): Inherit from gori_compute
            rather than gori_compute_cache.
            * gimple-range-gori.cc (is_gimple_logical_p): Move to top of file.
            (range_def_chain::m_logical_depth): New member.
            (range_def_chain::range_def_chain): Initialize m_logical_depth.
            (range_def_chain::get_def_chain): Don't build defchains through
more
            than LOGICAL_LIMIT logical expressions.
            * params.opt (param_ranger_logical_depth): New.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11/12 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2021-04-19 19:49 ` cvs-commit at gcc dot gnu.org
@ 2021-04-27 11:40 ` jakub at gcc dot gnu.org
  2021-07-14 18:32 ` amacleod at redhat dot com
  2022-01-26 15:45 ` amacleod at redhat dot com
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-27 11:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.0                        |11.2

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.1 has been released, retargeting bugs to GCC 11.2.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11/12 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2021-04-27 11:40 ` [Bug tree-optimization/100081] [11/12 " jakub at gcc dot gnu.org
@ 2021-07-14 18:32 ` amacleod at redhat dot com
  2022-01-26 15:45 ` amacleod at redhat dot com
  16 siblings, 0 replies; 18+ messages in thread
From: amacleod at redhat dot com @ 2021-07-14 18:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

Andrew Macleod <amacleod at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #15 from Andrew Macleod <amacleod at redhat dot com> ---
Fixed in both gcc 11 and trunk I believe.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/100081] [11/12 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
  2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2021-07-14 18:32 ` amacleod at redhat dot com
@ 2022-01-26 15:45 ` amacleod at redhat dot com
  16 siblings, 0 replies; 18+ messages in thread
From: amacleod at redhat dot com @ 2022-01-26 15:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #16 from Andrew Macleod <amacleod at redhat dot com> ---
Looking back at this, I explained why its pathological, but I realize I wasn't
clear about a couple of things.   So for the record :-)

> > <bb 560> [local count: 28382607]:
> >   <...>
> >   _571 = _61 >= _593;
> >   _3583 = &arr_724 + _3992;
> >   _2220 = _831 <= _3583;
> >   _47 = _571 | _2220;
> >   _2935 = _376 * 2;
> >   _3966 = &arr_725 + _2935;
> >   _3024 = _61 >= _3966;
> >   _4219 = _3992 * 2;
> >   _4218 = &arr_725 + _4219;
> >   _1836 = _831 <= _4218;
> >   _3080 = _1836 | _3024;
> > <...>
> >   _5348 = _5347 & _32080;
> >   _5349 = _5348 & _32151;
> >   _5350 = _5349 & _32176;
> >   _5351 = _5350 & _32488;
> >   _5352 = _5351 & _33691;
> >   _5353 = _5352 & _33762;
> >   _5354 = _5353 & _34753;
> >   _35662 = _5354 & _34824;
> >   if (_35662 != 0)
> >     goto <bb 561>; [90.00%]
> >   else
> >     goto <bb 1510>; [10.00%]

We do create ranges for everything as we go thru this block. the forward walk
which calculates ranges is linear, and everything does get calculated.

_61 will have a range, and when we get to
   _3024 = _61 >= _3966;
it will either evaluate to [0,0], [1,1], or [0,1] depending on its value.
Assume that it evaluates to [0,1] AKA varying.

What will not happen now is we do not try to figure out if its [0,0] or [1,1]
on either of the outgoing edges. To answer that question, we have to know
whether _3024 is [0,0] or [1,1] on the outgoing edge.  Trying to determine that
through too many combinations of || and && is expensive/exponential, and I
detailed in my description why that is. 

As long as there are not too many logical combining operations, we will make an
attempt to determine its range by checking the possible paths thru those
logical expressions.

If it too complicated, you are stuck with whatever range we could figure out
via the linear walk.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-26 15:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14 14:21 [Bug tree-optimization/100081] New: [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce marxin at gcc dot gnu.org
2021-04-14 14:21 ` [Bug tree-optimization/100081] " marxin at gcc dot gnu.org
2021-04-14 14:22 ` marxin at gcc dot gnu.org
2021-04-15  6:45 ` rguenth at gcc dot gnu.org
2021-04-16 12:30 ` rguenth at gcc dot gnu.org
2021-04-16 12:39 ` rguenth at gcc dot gnu.org
2021-04-16 13:54 ` aldyh at gcc dot gnu.org
2021-04-16 14:04 ` aldyh at gcc dot gnu.org
2021-04-16 14:24 ` marxin at gcc dot gnu.org
2021-04-16 18:06 ` amacleod at redhat dot com
2021-04-16 18:39 ` amacleod at redhat dot com
2021-04-19  6:56 ` rguenth at gcc dot gnu.org
2021-04-19  7:30 ` marxin at gcc dot gnu.org
2021-04-19 14:09 ` amacleod at redhat dot com
2021-04-19 19:49 ` cvs-commit at gcc dot gnu.org
2021-04-27 11:40 ` [Bug tree-optimization/100081] [11/12 " jakub at gcc dot gnu.org
2021-07-14 18:32 ` amacleod at redhat dot com
2022-01-26 15:45 ` amacleod at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).