public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* empty loop elimination (lack of) ?
@ 2002-01-28 14:59 Cyrille Chepelov
  2002-01-28 15:37 ` Andrew Pinski
  0 siblings, 1 reply; 3+ messages in thread
From: Cyrille Chepelov @ 2002-01-28 14:59 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 2159 bytes --]

Hi,

  while testing some code on Borland C++, and as a matter of cross-checking,
I've found that there is an opportunty for optimisation that g++ misses
(bcc32 also, but that is off-topic).

Consider the attached loop-destroy.cpp file, in particular the compilation
of List<int>::isValid(). gcc (2.95.3 and 3.0.3 on ix86 give me the following
results (with -O3):

_isValid__Ct4List1Zi:
        pushl %ebp
        xorl %eax,%eax
        movl %esp,%ebp
        .align 4
L15:
        incl %eax
        cmpl $3,%eax
        jbe L15
        movl %ebp,%esp
        movb $1,%al
        popl %ebp
        ret

... including an empty, useless stunt with %eax, which is finally replaced
with the contents of the last movb. (There is also some games with %esp and
%ebp, which don't look totally useful, but 3.0.3 looks better than 2.95.3
with that respect).

(in case you wonder, bcc32 makes even more stupid things).

Compiling with -O3 -funroll-loops gives this:
_isValid__Ct4List1Zi:
        pushl %ebp
        movb $1,%al
        movl %esp,%ebp
        movl %ebp,%esp
        popl %ebp
        ret

.. somewhat better, still some stuff which probably ought to be optimised.

As a point of comparison, VC++ 6.0 is completely inlining this stuff (with
fairly default optimisations) into a single (true) constant. Just as what 
g++ does when the original function is simply a "return true;"

2.95.3 and 3.0.3 give the same results, give or take a movl %ebp,%esp.

1) How hard would it be to notice that the loop is empty and produces no 
useful results (without unrolling it), and then to remove it ?

2) How hard would it be to re-do an inlining pass once the loop
unrolling/empty loop removal pass has been done ?

3) How hard would it be to omit stack frames whenever there is no
dereference in a tail function ? Wouldn't that be a sensible default ?

I'd be more than happy to give that a try, if someone gently shows me where
to fiddle with (no warranties, though).

	-- Cyrille

For comparison's sake, there is a tarball at
http://www.chepelov.org/cyrille/loop-destroy.tar.gz, which contains this
.cpp file, and assembly outputs of a few compilers.
-- 
Grumpf.


[-- Attachment #2: loop-destroy.cpp --]
[-- Type: text/x-c++src, Size: 580 bytes --]

#include <cstdio>

template<typename T>
class Validator {
public:
    static bool isValid(const T* t) { return true; }
};

template<typename T> class List {
private:
    T a_values[4];

    Validator<T> a_validator;
public:
    List();

    bool isValid() const;
};

template<typename T> bool List<T>::isValid() const {
    for (unsigned int i = 0; i < 4; ++i) {
        if (!a_validator.isValid(&(a_values[i]))) return false;
    }
    return true;
}
    

int main (int argc, char **argv) {

    List<int> foo;
    std::printf("Hello: %d\n",foo.isValid());
    
    return 0;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: empty loop elimination (lack of) ?
  2002-01-28 14:59 empty loop elimination (lack of) ? Cyrille Chepelov
@ 2002-01-28 15:37 ` Andrew Pinski
  2002-02-10 12:26   ` Cyrille Chepelov
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Pinski @ 2002-01-28 15:37 UTC (permalink / raw)
  To: Cyrille Chepelov; +Cc: gcc

I compiled you test with the following options:
-O3 -W -Wall -fssa-dce -fssa-ccp -fregmove -fcprop-registers 
-ffunction-cse -fbranch-count-reg -fsched-interblock -frename-registers 
-freorder-blocks -fschedule-insns2 -march=pentiumpro 
-foptimize-register-move -fpeephole2 -frerun-cse-after-loop -fident 
-fmerge-all-constants -fmerge-constants -fmessage-length=0 -fmem-report 
-ftime-report -fthread-jumps -freduce-all-givs 
-fguess-branch-probability -frerun-loop-opt -finline 
-fexpensive-optimizations -foptimize-sibling-calls -fomit-frame-pointer 
-save-temps -fmove-all-movables -fprefetch-loop-arrays 
-funroll-all-loops -mmmx -msse --param max-gcse-passes=10 --param 
max-gcse-memory=100 --param max-inline-insns=10000 
-Wdisabled-optimization

I did not see the results you got.
I even don't get a List<int>::isValid() const

I get the following asm:

main:
.LFB1:
         pushl   %ebp
.LCFI0:
         movl    %esp, %ebp
.LCFI1:
         subl    $56, %esp
.LCFI2:
         andl    $-16, %esp
         leal    -40(%ebp), %edx
         movl    %edx, (%esp)
         call    List<int>::List()
         movl    $.LC0, (%esp)
         movl    $1, 4(%esp)
         call    printf
         movl    %ebp, %esp
         xorl    %eax, %eax
         popl    %ebp
         ret



But if I compiled with just -O3 I get that results.
So one of the options I used just fixed that problem, I don't know which 
one.
Thanks,
Andrew Pinski

On Monday, January 28, 2002, at 04:03 , Cyrille Chepelov wrote:

> Hi,
>
>   while testing some code on Borland C++, and as a matter of 
> cross-checking,
> I've found that there is an opportunty for optimisation that g++ misses
> (bcc32 also, but that is off-topic).
>
> Consider the attached loop-destroy.cpp file, in particular the 
> compilation
> of List<int>::isValid(). gcc (2.95.3 and 3.0.3 on ix86 give me the 
> following
> results (with -O3):
>
> _isValid__Ct4List1Zi:
>         pushl %ebp
>         xorl %eax,%eax
>         movl %esp,%ebp
>         .align 4
> L15:
>         incl %eax
>         cmpl $3,%eax
>         jbe L15
>         movl %ebp,%esp
>         movb $1,%al
>         popl %ebp
>         ret
>
> ... including an empty, useless stunt with %eax, which is finally 
> replaced
> with the contents of the last movb. (There is also some games with %esp 
> and
> %ebp, which don't look totally useful, but 3.0.3 looks better than 
> 2.95.3
> with that respect).
>
> (in case you wonder, bcc32 makes even more stupid things).
>
> Compiling with -O3 -funroll-loops gives this:
> _isValid__Ct4List1Zi:
>         pushl %ebp
>         movb $1,%al
>         movl %esp,%ebp
>         movl %ebp,%esp
>         popl %ebp
>         ret
>
> .. somewhat better, still some stuff which probably ought to be 
> optimised.
>
> As a point of comparison, VC++ 6.0 is completely inlining this stuff 
> (with
> fairly default optimisations) into a single (true) constant. Just as 
> what
> g++ does when the original function is simply a "return true;"
>
> 2.95.3 and 3.0.3 give the same results, give or take a movl %ebp,%esp.
>
> 1) How hard would it be to notice that the loop is empty and produces no
> useful results (without unrolling it), and then to remove it ?
>
> 2) How hard would it be to re-do an inlining pass once the loop
> unrolling/empty loop removal pass has been done ?
>
> 3) How hard would it be to omit stack frames whenever there is no
> dereference in a tail function ? Wouldn't that be a sensible default ?
>
> I'd be more than happy to give that a try, if someone gently shows me 
> where
> to fiddle with (no warranties, though).
>
> 	-- Cyrille
>
> For comparison's sake, there is a tarball at
> http://www.chepelov.org/cyrille/loop-destroy.tar.gz, which contains this
> .cpp file, and assembly outputs of a few compilers.
> --
> Grumpf.
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: empty loop elimination (lack of) ?
  2002-01-28 15:37 ` Andrew Pinski
@ 2002-02-10 12:26   ` Cyrille Chepelov
  0 siblings, 0 replies; 3+ messages in thread
From: Cyrille Chepelov @ 2002-02-10 12:26 UTC (permalink / raw)
  To: gcc


> I did not see the results you got.
> I even don't get a List<int>::isValid() const

OK, it took me some time to investigate, and even more to report something
meaningful out of the brute force results I've produced.

At first, I noticed that you had -funroll-loops in your command line; I had
mentioned that -funroll-loops gives the result I'm looking for (elimination
of the loop and inlinage of the resulting function), but I don't always want
to unroll loops, merely eliminating empty loops would be enough for me.

So, I tried a brute force solution: I've tried all combinations of 1, 2, 3,
4 and 5 arguments picked in the whole list of arguments in your command
line, in addition to -O3 -Wall. The results are at
http://www.chepelov.org/cyrille/loop-destroy/ (warning, lots of .s files)
matches.log contains all the "good" command lines

It looks like the following arguments are winners (they appear in every line
in the previously mentioned matches.log file):
    --param max-inline-insns=10000 -save-temps

Would that be sane defaults for, say, -O3, or do these arguments carry
unwanted behaviour ?

	-- Cyrille

-- 
Grumpf.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-02-10 20:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-01-28 14:59 empty loop elimination (lack of) ? Cyrille Chepelov
2002-01-28 15:37 ` Andrew Pinski
2002-02-10 12:26   ` Cyrille Chepelov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).