public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/102253] New: scalability issues with large loop depth
@ 2021-09-09  9:45 rguenth at gcc dot gnu.org
  2021-09-09 10:04 ` [Bug middle-end/102253] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-09  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

            Bug ID: 102253
           Summary: scalability issues with large loop depth
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When investigating an improvement to LIMs fill_always_executed_in I created
the following testcase which creates a loop nest of depth N with conditionally
executed subloops.

extern void foobar (int);
template <int a>
struct bar
{
  static void baz(int b, int)
  {
    if (b & (1 << (a % 32)))
      for (int i = 0; i < 1024; ++i)
        bar<a-1>::baz (b, i);
  }
};
template <>
struct bar<0>
{
  static void baz (int, int i) { foobar (i); }
};
void __attribute__((flatten)) foo(int b)
{
#ifndef N
#define N 10
#endif
  bar<N>::baz (b, 0);
}

For N == 900 (the maximum unless you also specify -ftemplate-depth) and -O1 we
see

 tree canonical iv                  :   1.42 ( 13%)   0.00 (  0%)   1.42 ( 13%)
   28M ( 13%)
 complete unrolling                 :   2.80 ( 27%)   0.00 (  0%)   2.81 ( 26%)
   42M ( 19%)
 integrated RA                      :   3.41 ( 32%)   0.32 ( 80%)   3.72 ( 34%)
  640k (  0%)
 TOTAL                              :  10.54          0.40         10.96       
  224M

For N == 1800 and -O1 it is already

 tree canonical iv                  :  30.43 ( 28%)   0.05 ( 14%)  30.50 ( 28%)
  116M ( 15%)
 complete unrolling                 :  63.96 ( 59%)   0.06 ( 17%)  64.04 ( 59%)
  175M ( 22%)
 tree iv optimization               :   5.75 (  5%)   0.00 (  0%)   5.77 (  5%)
  126M ( 16%)
 integrated RA                      :   1.40 (  1%)   0.12 ( 34%)   1.53 (  1%)
 1754k (  0%)
 TOTAL                              : 108.35          0.35        108.75       
  796M

For reference compile-time with N == 450 is 2.5s with

 tree canonical iv                  :   0.18 (  7%)   0.00 (  0%)   0.19 (  7%)
 6904k ( 10%)
 complete unrolling                 :   0.34 ( 14%)   0.00 (  0%)   0.34 ( 13%)
 8412k ( 13%)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/102253] scalability issues with large loop depth
  2021-09-09  9:45 [Bug middle-end/102253] New: scalability issues with large loop depth rguenth at gcc dot gnu.org
@ 2021-09-09 10:04 ` rguenth at gcc dot gnu.org
  2021-09-09 10:53 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-09 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
replacing the loop bound 1024 with 'b' moves the two offenders off the radar
leaving, with N == 10000

 ipa function summary               :  12.91 ( 11%)   0.00 (  0%)  12.92 ( 11%)
 9800k (  0%)
 loop init                          :  11.08 (  9%)   0.12 ( 15%)  11.19 (  9%)
 1677M ( 65%)
 branch prediction                  :  81.17 ( 68%)   0.05 (  6%)  81.12 ( 67%)
   51M (  2%)
 TOTAL                              : 119.93          0.78        120.77       
 2567M

where IPA function summary might be because of the artificial testcase
construction relying on inlining.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/102253] scalability issues with large loop depth
  2021-09-09  9:45 [Bug middle-end/102253] New: scalability issues with large loop depth rguenth at gcc dot gnu.org
  2021-09-09 10:04 ` [Bug middle-end/102253] " rguenth at gcc dot gnu.org
@ 2021-09-09 10:53 ` rguenth at gcc dot gnu.org
  2023-06-24 19:40 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-09 10:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org
           Keywords|                            |compile-time-hog

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The unroll and ivcanon issues are all from invoking
estimate_numbers_of_iterations which for each loop walks the whole loop body
(including conditionally executed parts) and on the signed IV increment
invokes infer_loop_bounds_from_signedness which performs SCEV analysis.

It's highly unlikely that there's useful info from stmts in inner loop
given we're looking for an evolution in the outer loop.

The code is also calling analyze_scalar_evolution with a loop that's not the
loop the stmt is in which might be a correctness issue.  idx_infer_loop_bounds
seems to do the correct thing here.

Possibly all SCEV analysis could be performed once and instantiated in the
outermost loop we're interested in.  That is, we'd walk the outermost loop
and fill in bounds for the whole subnest with a single IL walk.

While scalar evolutions are cached, the instantiations are not.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/102253] scalability issues with large loop depth
  2021-09-09  9:45 [Bug middle-end/102253] New: scalability issues with large loop depth rguenth at gcc dot gnu.org
  2021-09-09 10:04 ` [Bug middle-end/102253] " rguenth at gcc dot gnu.org
  2021-09-09 10:53 ` rguenth at gcc dot gnu.org
@ 2023-06-24 19:40 ` pinskia at gcc dot gnu.org
  2023-06-24 19:41 ` pinskia at gcc dot gnu.org
  2023-06-24 20:13 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-24 19:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |luydorarko at vusra dot com

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 110396 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/102253] scalability issues with large loop depth
  2021-09-09  9:45 [Bug middle-end/102253] New: scalability issues with large loop depth rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-06-24 19:40 ` pinskia at gcc dot gnu.org
@ 2023-06-24 19:41 ` pinskia at gcc dot gnu.org
  2023-06-24 20:13 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-24 19:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
VRP/ranger uses SCEV now so it might even be worse, the testcase from PR 110396
has that behavior too.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/102253] scalability issues with large loop depth
  2021-09-09  9:45 [Bug middle-end/102253] New: scalability issues with large loop depth rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-06-24 19:41 ` pinskia at gcc dot gnu.org
@ 2023-06-24 20:13 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-24 20:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
On the trunk with the original testcase here we get:
 tree copy headers                  :  12.16 ( 19%)   0.01 (  2%)  21.57 ( 28%)
  771k (  0%)


(I suspect the rest is due to not setting release checking ...)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-24 20:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09  9:45 [Bug middle-end/102253] New: scalability issues with large loop depth rguenth at gcc dot gnu.org
2021-09-09 10:04 ` [Bug middle-end/102253] " rguenth at gcc dot gnu.org
2021-09-09 10:53 ` rguenth at gcc dot gnu.org
2023-06-24 19:40 ` pinskia at gcc dot gnu.org
2023-06-24 19:41 ` pinskia at gcc dot gnu.org
2023-06-24 20:13 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).