[Bug tree-optimization/57249] New: Unrolling too late for inlining

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/57249] New: Unrolling too late for inlining
@ 2013-05-11  8:38 glisse at gcc dot gnu.org
  2013-05-13  8:40 ` [Bug tree-optimization/57249] " rguenth at gcc dot gnu.org
  2014-11-27 21:34 ` hubicka at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-05-11  8:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57249

            Bug ID: 57249
           Summary: Unrolling too late for inlining
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org

Hello,

this code is a variant of the code at
http://stackoverflow.com/questions/16493290/why-is-inlined-function-slower-than-function-pointer

typedef void (*Fn)();

long sum = 0;

inline void accu() {
  sum+=4;
}

static const Fn map[4] = {&accu, &accu, &accu, &accu};

void f(bool opt) {
  const long N = 10000000L;
  if (opt)
  {
    for (long i = 0; i < N; i++)
    {
      accu();
      accu();
      accu();
      accu();
    }
  }
  else
  {
    for (long i = 0; i < N; i++)
    {
      for (int j = 0; j < 4; j++)
        (*map[j])();
    }
  }
}


In the first loop, g++ -O3 inlines the 4 accu() calls in the einline pass.
Later passes optimize the whole loop to a single +=. In the second loop, we
need to wait until the inner loop is unrolled to see the accu() calls, and
there is no inlining pass after that (and then it would still need the right
passes to optimize the outer loop to sum+=160000000).

I am not sure what the right solution is, since too aggressive early unrolling
can be bad for other optimizations. Note that LLVM manages to optimize the
whole function to a single +=.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/57249] Unrolling too late for inlining
  2013-05-11  8:38 [Bug tree-optimization/57249] New: Unrolling too late for inlining glisse at gcc dot gnu.org
@ 2013-05-13  8:40 ` rguenth at gcc dot gnu.org
  2014-11-27 21:34 ` hubicka at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-05-13  8:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57249

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-05-13
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
There is the long-standing idea of moving cunrolli from after IPA inlining
into the early optimization pipeline.  But we'd have to tame it down
quite a bit to make it viable there.  Note that originally cunrolli was
desiged to remove C++ abstraction penalty from code like

template <int i>
struct S {
  int a[i];
  S() { for (int j = 0; j < i; ++j) a[j] = 0; }
};

for small i (tramp3d has code similar to the above for i == 3 and uses
template metaprogramming to avoid loops and force unrolling by abusing
the inliner ...)

In the mean time cunrolli does more (which is not necessarily good).

Note that I wouldn't unroll loops with calls in them (eventually
special-casing that indirect-call-via-global-constructor-with-initializer...)

For early optimizations unrolling should only be applied if the code size
shrinks by the transform.

Btw, this case could be handled by value-numbering / folding, too, given
that all map[]'s elements have the same value.  But the testcase is very
very artificial so this will never help a real case.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/57249] Unrolling too late for inlining
  2013-05-11  8:38 [Bug tree-optimization/57249] New: Unrolling too late for inlining glisse at gcc dot gnu.org
  2013-05-13  8:40 ` [Bug tree-optimization/57249] " rguenth at gcc dot gnu.org
@ 2014-11-27 21:34 ` hubicka at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: hubicka at gcc dot gnu.org @ 2014-11-27 21:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57249

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We probably should simply move cunrolli to early passes and let it to cunroll
only when size is going to decrease (i.e. when we know late cunrolli will
unroll no matter on the profile).


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-11-27 21:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-11  8:38 [Bug tree-optimization/57249] New: Unrolling too late for inlining glisse at gcc dot gnu.org
2013-05-13  8:40 ` [Bug tree-optimization/57249] " rguenth at gcc dot gnu.org
2014-11-27 21:34 ` hubicka at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).