Empty loops

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Empty loops
@ 2004-11-24 14:19 jlh
  2004-11-24 14:47 ` Nathan Sidwell
  0 siblings, 1 reply; 2+ messages in thread
From: jlh @ 2004-11-24 14:19 UTC (permalink / raw)
  To: gcc-help

[-- Attachment #1: Type: text/plain, Size: 4529 bytes --]

Hi everyone,

This is a quote from the docs of gcc-3.4.3:

docs> Historically, GCC has not deleted empty loops under the assumption
docs> that the most likely reason you would put one in a program is to
docs> have a delay

It might have once been a common thing to do, but using empty loops as
delays is bad style (there probably are exceptions), most will agree on
that.  And history is a bad reason to do things the non-optimal way.
But this is not my point.

docs> ...so deleting them will not make real programs run any faster.
docs> However, the rationale here is that optimization of a nonempty
docs> loop cannot produce an empty one, which holds for C but is not
docs> always the case for C++.

This is the part I don't get.  Why can the optimization of a C loop
never yield an empty loop?  Take this trivial example:

long f()
{
         long i, ret = 0;

         for (i = 0; i < 1000000000; i++)
                 ret++;

         return(ret);
}

Of course no-one would write this, but it's an example where an
obviously non-empty loop gets converted to an empty one after
optimization.  "gcc-3.4.3 -c -O2" produces (on x86):

f:      pushl   %ebp
         movl    $999999999, %eax
         movl    %esp, %ebp
.L5:    decl    %eax
         jns     .L5
         popl    %ebp
         movl    $1000000000, %eax
         ret

.L5 is an empty loop because the "ret++" has been moved out of the loop
and the return value can be set directly.  Removing this loop would make
the code faster, because looping to a billion is very expensive, and in
this situation, the programmer probably didn't intend to cause a delay.

Of course the example above is silly, but there surely are situations
where such trivial loops like the one above can result, without that the
programmer obviously notices it (for example after macro expansion).

For C++, the docs say that things are different.  Here's another
example, that is somewhat more realistic.  First we define a function
template that counts how many numbers in a hardcoded range have a given
property (passed as a functor).

template<class F>
static long f(F func)
{
         long ret = 0;

         for (long l = 0; l < 1000000003; l++)
                 if (func(l))
                         ret++;

         return(ret);
}

Such a function template might very well appear in a real life example.
Now I call this using a functor that happens to always return true.

class Functor
{
public:
         inline bool operator()(long l)
         {
                 return(true);
         }
};

class Functor my_functor;

long g()
{
         return(f(my_functor));
}

gcc-3.4.3 inlines the template function f<Functor>() in g() and
optimizes the loop in a similar way as in the first C example.  The
result is as follows:

_Z1gv:  pushl   %ebp
         movl    $999999999, %eax
         movl    %esp, %ebp
.L7:    subl    $4, %eax
         jns     .L7
         popl    %ebp
         movl    $1000000003, %eax
         ret

Again, I get an empty loop that delays the execution unecessarily.
Though, you'll notice that the loop has been partially unrolled to take
steps of 4 in every iteration.  If you let the loop go to 1'000'000'000
instead of 1'000'000'003, you'll even get steps of 25.  But this would
then still mean 40'000'000 iterations, which take about 70ms to execute
on my system.  Too much for something that does nothing.

Removing loops that were initially empty in the source is one thing.
(But removing them would encourage people to write proper delays; and
there would still be the work-arounds to keep empty loops.)

IMHO, removing initially non-empty loops that were transformed into
empty ones by optimizations is a different thing, which should be done.

docs> Moreover, with -funroll-loops small "empty" loops are already
docs> removed, so the current behavior is both sub-optimal and
docs> inconsistent and will change in the future.

Ok, so it's sub-optimal and it is known.  I have only a very vague
understanding of GCC's internals, but removing empty loops can't be very
complicated, can it?  Under this assumption (correct me if I'm wrong),
there must be other reasons to keep them.  So what are they?

(btw, I tried gcc-3.4-20041119, with the same results and
gcc-4.0-20041121 with IMHO worse results. )

Ps: As a side question: Why does gcc not partially unroll the loop to
take several steps per iteration, as g++ did?  (Note: gcc-4.0-20041121
didn't do it either.)  There are probably options to enable it, but
shouldn't this be enabled with -O2?

jlh

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Empty loops
  2004-11-24 14:19 Empty loops jlh
@ 2004-11-24 14:47 ` Nathan Sidwell
  0 siblings, 0 replies; 2+ messages in thread
From: Nathan Sidwell @ 2004-11-24 14:47 UTC (permalink / raw)
  To: jlh; +Cc: gcc-help

jlh wrote:

> docs> ...so deleting them will not make real programs run any faster.
> docs> However, the rationale here is that optimization of a nonempty
> docs> loop cannot produce an empty one, which holds for C but is not
> docs> always the case for C++.
> 
> This is the part I don't get.  Why can the optimization of a C loop
> never yield an empty loop?  Take this trivial example:

> Of course no-one would write this, but it's an example where an
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
you've answered your own question.  The writer of the docs meant to
say 'holds for C written by someone who is careful and compiled
by __DATE__ era compilers'.  Unfortunately

1) programmers' concentration can slip

2) optimizers have become more powerful



> For C++, the docs say that things are different.  Here's another
> example, that is somewhat more realistic.  First we define a function
What's being hinted at is that the definition
	A ary[50];
needs to emit code to construct the elements, that will be a loop.  If
the ctor turns out to be a nop, the loop should evaporate.

The documentation needs revising, methinks

nathan

-- 
Nathan Sidwell    ::   http://www.codesourcery.com   ::     CodeSourcery LLC
nathan@codesourcery.com    ::     http://www.planetfall.pwp.blueyonder.co.uk

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-11-24 14:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-24 14:19 Empty loops jlh
2004-11-24 14:47 ` Nathan Sidwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).