public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2
@ 2006-03-21 21:06 roebel at ircam dot fr
2006-03-21 21:14 ` [Bug c++/26788] " pinskia at gcc dot gnu dot org
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: roebel at ircam dot fr @ 2006-03-21 21:06 UTC (permalink / raw)
To: gcc-bugs
Hi,
I just installed gcc 4.1.0 to compile my template expression
matrix arithmetric library (a la Blitz).
I recently did benchmarks with g++ 3.4.4
and 4.0.2 an I was pretty much impressed that g++ 4.0.2 managed to
optimize the expressions such that I obtained performance nearly
twice as fast as with g++ 3.4.4, and even better
the performance was the same as my hand coded pointer only implementation.
I was rather happy with this result. It seems that the handling of
pointer arrays that are stored in a struct that represents the expression
has been significantly improved.
Now, the downside. I tried 4.1.0 and I noticed that the performance dropped
down too a level even worse than gcc 3.4.4. I wondered about the reason and
scanned the optimization parameters. I found salias-max-implicit-fields
with a default value of 5. I guessed that might be the reason
and increased the value to 50. With this value I've got back the impressive
performance of g++ 4.0.2.
I wonder why the default value has been set so low that apparently it
cripples the optimizer to a level of optimization consierably
below what has been achieved with g++ 4.0.2 (where this option does not exist).
Does this option negatively affects performance elsewhere? If not
it seems to me that a default value that resembles the
settings in gcc 4.0.2 would be more sensible.
Kind regards,
and thanks anyway for this great compiler suite.
Axel
--
Summary: optimization of expression templates not as performant
as g++ 4.0.2
Product: gcc
Version: 4.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: roebel at ircam dot fr
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
@ 2006-03-21 21:14 ` pinskia at gcc dot gnu dot org
2006-03-21 21:27 ` pinskia at gcc dot gnu dot org
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-03-21 21:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from pinskia at gcc dot gnu dot org 2006-03-21 21:14 -------
salias-max-implicit-fields did not exist in 4.0.x.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
2006-03-21 21:14 ` [Bug c++/26788] " pinskia at gcc dot gnu dot org
@ 2006-03-21 21:27 ` pinskia at gcc dot gnu dot org
2006-03-21 21:28 ` pinskia at gcc dot gnu dot org
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-03-21 21:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from pinskia at gcc dot gnu dot org 2006-03-21 21:27 -------
And the reason why salias-max-implicit-fields is set so low is to keep the
compile time in check since we get bug reports about that also.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
2006-03-21 21:14 ` [Bug c++/26788] " pinskia at gcc dot gnu dot org
2006-03-21 21:27 ` pinskia at gcc dot gnu dot org
@ 2006-03-21 21:28 ` pinskia at gcc dot gnu dot org
2006-03-22 9:12 ` rguenth at gcc dot gnu dot org
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-03-21 21:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from pinskia at gcc dot gnu dot org 2006-03-21 21:28 -------
Also do you have a testcase that can be attached to the bug since the
information here is not enough to figure out what is going wrong.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (2 preceding siblings ...)
2006-03-21 21:28 ` pinskia at gcc dot gnu dot org
@ 2006-03-22 9:12 ` rguenth at gcc dot gnu dot org
2006-03-22 11:13 ` roebel at ircam dot fr
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-03-22 9:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2006-03-22 09:12 -------
If the salias-max-implicit-fields setting helps you then this is a PTA issue.
I never hit PTA issues with the expression templates in POOMA, so it might be
interesting to get a testcase for this. A testcase is also necessary to do
anything about it.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org
Keywords| |alias, missed-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (3 preceding siblings ...)
2006-03-22 9:12 ` rguenth at gcc dot gnu dot org
@ 2006-03-22 11:13 ` roebel at ircam dot fr
2006-03-22 11:14 ` roebel at ircam dot fr
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: roebel at ircam dot fr @ 2006-03-22 11:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from roebel at ircam dot fr 2006-03-22 11:13 -------
Created an attachment (id=11090)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11090&action=view)
Results file for testcase
As you requested I provide a testcase. It consists of 2 shell scripts
that run the different compilers and then run the testcase.
The testcase has two cases and two compilation modes:
switch 1
compiled with -DHAND it gives hand optimized pointer only version
compiled with -DMATMTL it gives the equivalent expression templates version
switch 2
compiled with -DBENCH=1 it calculates an addition of three vectors
compiled with -DBENCH=2 it calculates an addition of three vectors with some
scalar multiplications
The name of the excutable will indicate the experiment by two final characters
H1 stands for hand optimized first benchmark, M2 stands for matmtl second
benchmark ...
The two scripts comp.sh and master.sh run the whole experiment:
comp.sh runs the experiment for a single compiler and a user supplied set of
vector sizes. Note, that each experiment always uses 100000000
vector element operations. By means of the vector size the amount of overhead
can be controlled.
master.sh runs comp.sh with a single compiler and the vector size arguments 5
and 1000
Results are produced with
./master.sh 2>&1 | tee mout
egrep "#|user" mout
First result is that gcc 4.1.0 with --param salias-max-implicit-fields=50
is a real success. As you see the compile time does not change "at least for
this testcase" but the performance is identical to the pointer only
case!!!!!!!!!!!!!!!!!!!!
second result is that for gcc 4.1.0 with default parameter set
we get performance worse then gcc 4.0.2 especially for small vectors
(large overhead). The larger the vectors become the more
gcc 4.1.0 approaches 4.0.2
###############################################################
# g++ 4.0.2 the reference
###################################################
#compile times
user 0m0.702s
user 0m0.697s
user 0m1.066s
user 0m1.077s
#run times : vector size 5
# benchmarkredH1
user 0m0.295s
# benchmarkredM1
user 0m0.307s
# benchmarkredH2
user 0m0.381s
# benchmarkredM2
user 0m0.412s
#run times : vector size 1000
# benchmarkredH1
user 0m0.230s
# benchmarkredM1
user 0m0.243s
# benchmarkredH2
user 0m0.287s
# benchmarkredM2
user 0m0.370s
# g++ 4.1.0 default
###################################################
#compile times
user 0m0.747s
user 0m0.752s
user 0m1.211s
user 0m1.227s
#run times : vector size 5
# benchmarkredH1
user 0m0.264s
# benchmarkredM1
user 0m0.519s
# benchmarkredH2
user 0m0.347s
# benchmarkredM2
user 0m1.211s
#run times : vector size 1000
# benchmarkredH1
user 0m0.222s
# benchmarkredM1
user 0m0.286s
# benchmarkredH2
user 0m0.298s
# benchmarkredM2
user 0m0.375s
# g++ 4.1.0 salias=50
###################################################
#compile times
user 0m0.753s
user 0m0.741s
user 0m1.225s
user 0m1.239s
#run times : vector size 5
# benchmarkredH1
user 0m0.262s
# benchmarkredM1
user 0m0.307s
# benchmarkredH2
user 0m0.344s
# benchmarkredM2
user 0m0.313s
#run times : vector size 1000
# benchmarkredH1
user 0m0.223s
# benchmarkredM1
user 0m0.234s
# benchmarkredH2
user 0m0.299s
# benchmarkredM2
user 0m0.260s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (4 preceding siblings ...)
2006-03-22 11:13 ` roebel at ircam dot fr
@ 2006-03-22 11:14 ` roebel at ircam dot fr
2006-03-22 11:15 ` roebel at ircam dot fr
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: roebel at ircam dot fr @ 2006-03-22 11:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from roebel at ircam dot fr 2006-03-22 11:14 -------
Created an attachment (id=11091)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11091&action=view)
master shell script
for comments
see 11090: Results file for testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (5 preceding siblings ...)
2006-03-22 11:14 ` roebel at ircam dot fr
@ 2006-03-22 11:15 ` roebel at ircam dot fr
2006-03-22 11:16 ` roebel at ircam dot fr
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: roebel at ircam dot fr @ 2006-03-22 11:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from roebel at ircam dot fr 2006-03-22 11:15 -------
Created an attachment (id=11092)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11092&action=view)
single experiment shell script
for comments
see 11090: Results file for testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (6 preceding siblings ...)
2006-03-22 11:15 ` roebel at ircam dot fr
@ 2006-03-22 11:16 ` roebel at ircam dot fr
2006-03-22 11:39 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: roebel at ircam dot fr @ 2006-03-22 11:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from roebel at ircam dot fr 2006-03-22 11:16 -------
Created an attachment (id=11093)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11093&action=view)
testcase source file
for comments
see 11090: Results file for testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (7 preceding siblings ...)
2006-03-22 11:16 ` roebel at ircam dot fr
@ 2006-03-22 11:39 ` rguenth at gcc dot gnu dot org
2006-03-22 11:55 ` roebel at ircam dot fr
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-03-22 11:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2006-03-22 11:39 -------
This is another case of find_used_portions missing explicit uses due to C++ and
lots of inlining without any cleanup after that. And inserting cleanup being
difficult because structure-aliasing pass running before going into SSA. A
forwprop pass before it would do wonders here.
Danny - any plans to look at making salias pass work on SSA form? With
inlining
on SSA like on IPA branch this looks necessary anyway (Honza simply moved the
pass to before (final) inlining, which will make the situation just worse).
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dberlin at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (8 preceding siblings ...)
2006-03-22 11:39 ` rguenth at gcc dot gnu dot org
@ 2006-03-22 11:55 ` roebel at ircam dot fr
2006-04-30 4:11 ` pinskia at gcc dot gnu dot org
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: roebel at ircam dot fr @ 2006-03-22 11:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from roebel at ircam dot fr 2006-03-22 11:55 -------
Not that I understand what you just said, but, I wanted to mention, that
in contrast to my initial email the data I just sent
indicates a small performance penalty of about 25% for g++ 4.0.2
for large vectors on a pentium 4 (that are the results I've sent)
while there is no such
penalty for large vectors on a pentium m. On a pentium m g++ 4.0.2
works as well as g++ 4.1.0 on pentium 4 with the --param salias...=50.
Unfortunately, I dont have gcc 4.1.0 on my pentium m machine
thanks,
Axel
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (9 preceding siblings ...)
2006-03-22 11:55 ` roebel at ircam dot fr
@ 2006-04-30 4:11 ` pinskia at gcc dot gnu dot org
2008-01-26 23:03 ` [Bug tree-optimization/26788] " rguenth at gcc dot gnu dot org
2008-01-27 12:56 ` roebel at ircam dot fr
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-04-30 4:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from pinskia at gcc dot gnu dot org 2006-04-30 04:11 -------
(In reply to comment #9)
But that only applies to 4.2 and not 4.1.0.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |minor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (10 preceding siblings ...)
2006-04-30 4:11 ` pinskia at gcc dot gnu dot org
@ 2008-01-26 23:03 ` rguenth at gcc dot gnu dot org
2008-01-27 12:56 ` roebel at ircam dot fr
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-26 23:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from rguenth at gcc dot gnu dot org 2008-01-26 22:58 -------
Can you check 4.2 and 4.3?
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/26788] optimization of expression templates not as performant as g++ 4.0.2
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
` (11 preceding siblings ...)
2008-01-26 23:03 ` [Bug tree-optimization/26788] " rguenth at gcc dot gnu dot org
@ 2008-01-27 12:56 ` roebel at ircam dot fr
12 siblings, 0 replies; 14+ messages in thread
From: roebel at ircam dot fr @ 2008-01-27 12:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from roebel at ircam dot fr 2008-01-27 12:35 -------
Hi,
I run the tests with g++ 422 and it seems to me the issue is closed.
Compilation without the salias-max-implicit-fields flag is nor producing
any substantial increase in run time any more and with and without
this parameter the hand optimized and compiler template version
of the code have very similar run time.
I would be really happy with this, if gcc422 would produce
correct code in all my projects. I tried it already a while ago
and found a problem with std::set where the optimized version of the program
simply did and up with duplicate entries in the set
(while gcc 4.1.2 has no problems with the very same code)!!!
Besides that show stopper we had other problems with code using
sse/sse2 intrinsics producing wrong results when optimization was enabled.
All this may have changed in gcc4.3. I will give it another trial.
Thanks
--
roebel at ircam dot fr changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-01-27 12:36 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-21 21:06 [Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2 roebel at ircam dot fr
2006-03-21 21:14 ` [Bug c++/26788] " pinskia at gcc dot gnu dot org
2006-03-21 21:27 ` pinskia at gcc dot gnu dot org
2006-03-21 21:28 ` pinskia at gcc dot gnu dot org
2006-03-22 9:12 ` rguenth at gcc dot gnu dot org
2006-03-22 11:13 ` roebel at ircam dot fr
2006-03-22 11:14 ` roebel at ircam dot fr
2006-03-22 11:15 ` roebel at ircam dot fr
2006-03-22 11:16 ` roebel at ircam dot fr
2006-03-22 11:39 ` rguenth at gcc dot gnu dot org
2006-03-22 11:55 ` roebel at ircam dot fr
2006-04-30 4:11 ` pinskia at gcc dot gnu dot org
2008-01-26 23:03 ` [Bug tree-optimization/26788] " rguenth at gcc dot gnu dot org
2008-01-27 12:56 ` roebel at ircam dot fr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).