* empty loop elimination (lack of) ?
@ 2002-01-28 14:59 Cyrille Chepelov
2002-01-28 15:37 ` Andrew Pinski
0 siblings, 1 reply; 3+ messages in thread
From: Cyrille Chepelov @ 2002-01-28 14:59 UTC (permalink / raw)
To: gcc
[-- Attachment #1: Type: text/plain, Size: 2159 bytes --]
Hi,
while testing some code on Borland C++, and as a matter of cross-checking,
I've found that there is an opportunty for optimisation that g++ misses
(bcc32 also, but that is off-topic).
Consider the attached loop-destroy.cpp file, in particular the compilation
of List<int>::isValid(). gcc (2.95.3 and 3.0.3 on ix86 give me the following
results (with -O3):
_isValid__Ct4List1Zi:
pushl %ebp
xorl %eax,%eax
movl %esp,%ebp
.align 4
L15:
incl %eax
cmpl $3,%eax
jbe L15
movl %ebp,%esp
movb $1,%al
popl %ebp
ret
... including an empty, useless stunt with %eax, which is finally replaced
with the contents of the last movb. (There is also some games with %esp and
%ebp, which don't look totally useful, but 3.0.3 looks better than 2.95.3
with that respect).
(in case you wonder, bcc32 makes even more stupid things).
Compiling with -O3 -funroll-loops gives this:
_isValid__Ct4List1Zi:
pushl %ebp
movb $1,%al
movl %esp,%ebp
movl %ebp,%esp
popl %ebp
ret
.. somewhat better, still some stuff which probably ought to be optimised.
As a point of comparison, VC++ 6.0 is completely inlining this stuff (with
fairly default optimisations) into a single (true) constant. Just as what
g++ does when the original function is simply a "return true;"
2.95.3 and 3.0.3 give the same results, give or take a movl %ebp,%esp.
1) How hard would it be to notice that the loop is empty and produces no
useful results (without unrolling it), and then to remove it ?
2) How hard would it be to re-do an inlining pass once the loop
unrolling/empty loop removal pass has been done ?
3) How hard would it be to omit stack frames whenever there is no
dereference in a tail function ? Wouldn't that be a sensible default ?
I'd be more than happy to give that a try, if someone gently shows me where
to fiddle with (no warranties, though).
-- Cyrille
For comparison's sake, there is a tarball at
http://www.chepelov.org/cyrille/loop-destroy.tar.gz, which contains this
.cpp file, and assembly outputs of a few compilers.
--
Grumpf.
[-- Attachment #2: loop-destroy.cpp --]
[-- Type: text/x-c++src, Size: 580 bytes --]
#include <cstdio>
template<typename T>
class Validator {
public:
static bool isValid(const T* t) { return true; }
};
template<typename T> class List {
private:
T a_values[4];
Validator<T> a_validator;
public:
List();
bool isValid() const;
};
template<typename T> bool List<T>::isValid() const {
for (unsigned int i = 0; i < 4; ++i) {
if (!a_validator.isValid(&(a_values[i]))) return false;
}
return true;
}
int main (int argc, char **argv) {
List<int> foo;
std::printf("Hello: %d\n",foo.isValid());
return 0;
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: empty loop elimination (lack of) ?
2002-01-28 14:59 empty loop elimination (lack of) ? Cyrille Chepelov
@ 2002-01-28 15:37 ` Andrew Pinski
2002-02-10 12:26 ` Cyrille Chepelov
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Pinski @ 2002-01-28 15:37 UTC (permalink / raw)
To: Cyrille Chepelov; +Cc: gcc
I compiled you test with the following options:
-O3 -W -Wall -fssa-dce -fssa-ccp -fregmove -fcprop-registers
-ffunction-cse -fbranch-count-reg -fsched-interblock -frename-registers
-freorder-blocks -fschedule-insns2 -march=pentiumpro
-foptimize-register-move -fpeephole2 -frerun-cse-after-loop -fident
-fmerge-all-constants -fmerge-constants -fmessage-length=0 -fmem-report
-ftime-report -fthread-jumps -freduce-all-givs
-fguess-branch-probability -frerun-loop-opt -finline
-fexpensive-optimizations -foptimize-sibling-calls -fomit-frame-pointer
-save-temps -fmove-all-movables -fprefetch-loop-arrays
-funroll-all-loops -mmmx -msse --param max-gcse-passes=10 --param
max-gcse-memory=100 --param max-inline-insns=10000
-Wdisabled-optimization
I did not see the results you got.
I even don't get a List<int>::isValid() const
I get the following asm:
main:
.LFB1:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
subl $56, %esp
.LCFI2:
andl $-16, %esp
leal -40(%ebp), %edx
movl %edx, (%esp)
call List<int>::List()
movl $.LC0, (%esp)
movl $1, 4(%esp)
call printf
movl %ebp, %esp
xorl %eax, %eax
popl %ebp
ret
But if I compiled with just -O3 I get that results.
So one of the options I used just fixed that problem, I don't know which
one.
Thanks,
Andrew Pinski
On Monday, January 28, 2002, at 04:03 , Cyrille Chepelov wrote:
> Hi,
>
> while testing some code on Borland C++, and as a matter of
> cross-checking,
> I've found that there is an opportunty for optimisation that g++ misses
> (bcc32 also, but that is off-topic).
>
> Consider the attached loop-destroy.cpp file, in particular the
> compilation
> of List<int>::isValid(). gcc (2.95.3 and 3.0.3 on ix86 give me the
> following
> results (with -O3):
>
> _isValid__Ct4List1Zi:
> pushl %ebp
> xorl %eax,%eax
> movl %esp,%ebp
> .align 4
> L15:
> incl %eax
> cmpl $3,%eax
> jbe L15
> movl %ebp,%esp
> movb $1,%al
> popl %ebp
> ret
>
> ... including an empty, useless stunt with %eax, which is finally
> replaced
> with the contents of the last movb. (There is also some games with %esp
> and
> %ebp, which don't look totally useful, but 3.0.3 looks better than
> 2.95.3
> with that respect).
>
> (in case you wonder, bcc32 makes even more stupid things).
>
> Compiling with -O3 -funroll-loops gives this:
> _isValid__Ct4List1Zi:
> pushl %ebp
> movb $1,%al
> movl %esp,%ebp
> movl %ebp,%esp
> popl %ebp
> ret
>
> .. somewhat better, still some stuff which probably ought to be
> optimised.
>
> As a point of comparison, VC++ 6.0 is completely inlining this stuff
> (with
> fairly default optimisations) into a single (true) constant. Just as
> what
> g++ does when the original function is simply a "return true;"
>
> 2.95.3 and 3.0.3 give the same results, give or take a movl %ebp,%esp.
>
> 1) How hard would it be to notice that the loop is empty and produces no
> useful results (without unrolling it), and then to remove it ?
>
> 2) How hard would it be to re-do an inlining pass once the loop
> unrolling/empty loop removal pass has been done ?
>
> 3) How hard would it be to omit stack frames whenever there is no
> dereference in a tail function ? Wouldn't that be a sensible default ?
>
> I'd be more than happy to give that a try, if someone gently shows me
> where
> to fiddle with (no warranties, though).
>
> -- Cyrille
>
> For comparison's sake, there is a tarball at
> http://www.chepelov.org/cyrille/loop-destroy.tar.gz, which contains this
> .cpp file, and assembly outputs of a few compilers.
> --
> Grumpf.
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: empty loop elimination (lack of) ?
2002-01-28 15:37 ` Andrew Pinski
@ 2002-02-10 12:26 ` Cyrille Chepelov
0 siblings, 0 replies; 3+ messages in thread
From: Cyrille Chepelov @ 2002-02-10 12:26 UTC (permalink / raw)
To: gcc
> I did not see the results you got.
> I even don't get a List<int>::isValid() const
OK, it took me some time to investigate, and even more to report something
meaningful out of the brute force results I've produced.
At first, I noticed that you had -funroll-loops in your command line; I had
mentioned that -funroll-loops gives the result I'm looking for (elimination
of the loop and inlinage of the resulting function), but I don't always want
to unroll loops, merely eliminating empty loops would be enough for me.
So, I tried a brute force solution: I've tried all combinations of 1, 2, 3,
4 and 5 arguments picked in the whole list of arguments in your command
line, in addition to -O3 -Wall. The results are at
http://www.chepelov.org/cyrille/loop-destroy/ (warning, lots of .s files)
matches.log contains all the "good" command lines
It looks like the following arguments are winners (they appear in every line
in the previously mentioned matches.log file):
--param max-inline-insns=10000 -save-temps
Would that be sane defaults for, say, -O3, or do these arguments carry
unwanted behaviour ?
-- Cyrille
--
Grumpf.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-02-10 20:22 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-01-28 14:59 empty loop elimination (lack of) ? Cyrille Chepelov
2002-01-28 15:37 ` Andrew Pinski
2002-02-10 12:26 ` Cyrille Chepelov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).