public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/39046] New: gcc 4.4.0 20090116 loop unrolling causes unaccountable performance degradation
@ 2009-01-30 21:58 blu dot dark at gmail dot com
2009-01-31 3:24 ` [Bug middle-end/39046] " blu dot dark at gmail dot com
0 siblings, 1 reply; 2+ messages in thread
From: blu dot dark at gmail dot com @ 2009-01-30 21:58 UTC (permalink / raw)
To: gcc-bugs
Version info:
$ powerpc-apple-darwin8.11.0-gcc-4.4.0 -v
Using built-in specs.
Target: powerpc-apple-darwin8.11.0
Configured with: ../gcc-4.4-20090116/configure --prefix=/opt/local
--enable-languages=c,c++,objc,obj-c++ --libdir=/opt/local/lib/gcc44
--includedir=/opt/local/include/gcc44 --infodir=/opt/local/share/info
--mandir=/opt/local/share/man --with-local-prefix=/opt/local --with-system-zlib
--disable-nls --program-suffix=-mp-4.4
--with-gxx-include-dir=/opt/local/include/gcc44/c++/ --with-gmp=/opt/local
--with-mpfr=/opt/local --disable-multilib
Thread model: posix
gcc version 4.4.0 20090116 (experimental) (GCC)
Above is a macports (formerly darwin ports) build of gcc4.4.0 on an
OSX 10.4.11 ppc7450 host.
Following C++ function produces different code depending on the use of
'loop_assignment_ai' vs 'flat_assignment_ai' snippets:
#include <stdio.h>
inline static void
mmul(
float (&c)[4][4],
const float (&a)[4][4],
const float (&b)[4][4])
{
// iterate by product's rows
for (unsigned i = 0; i < 4; i++)
{
register float ai[4][4];
// swizzle each element of the i-th row of A into a full vector
for (unsigned j = 0; j < 4; j++)
// flat_assignment_ai:
/* ai[j][0] = ai[j][1] = ai[j][2] = ai[j][3] = a[i][j];
*/
// loop_assignment_ai:
for (unsigned k = 0; k < 4; k++)
ai[j][k] = a[i][j];
// multiply the first element of the i-th row of A by the first row of
B
for (unsigned k = 0; k < 4; k++)
{
c[i][k] = ai[0][k] * b[0][k];
}
// multiply-add all subsequent elements of the i-th row of A by the
respective rows of B
for (unsigned j = 1; j < 4; j++)
{
for (unsigned k = 0; k < 4; k++)
{
c[i][k] += ai[j][k] * b[j][k];
}
}
}
}
// function invoked with following parameters (statics)
float a[4][4] __attribute__ ((aligned (16)));
float b[4][4] __attribute__ ((aligned (16)));
float c[4][4] __attribute__ ((aligned (16)));
int main(int argc, char * const argv[])
{
// omitted here is assignment of sample test values to arguments a & b
unsigned ndz; // non-deterministic zero
printf("enter a zero: ");
if (1 != scanf("%u", &ndz)) // user expected to punch in a zero here
return -1;
const unsigned ndf = ndz ? 1 : 0; // non-deterministic const factor: it is
meant to be zero, but the cc does not know that thus it can't declare our loop
'redundant'
unsigned r = 10000000;
do
{
mmul(*(&c + ndf * r), *(&a + ndf * r), *(&b + ndf * r));
}
while (--r);
return r;
}
/code
Observed ~10% performance degradation when using 'loop_assignment_ai' instead
of
'direct_assignment_ai'. It appears that the differences in the generated ppc
code are mainly in instruction scheduling.
Following optimization-related compiler options were used for the test:
-fno-exceptions -fno-rtti -faltivec -maltivec -mtune=7450 -O3
-funroll-loops -ffast-math -fstrict-aliasing
-ftree-vectorize -ftree-vectorizer-verbose=3
-fvisibility-inlines-hidden -fno-threadsafe-statics
For the record, the intended vectorization fails, so the resulting code is
entirely scalar.
-martin
--
Summary: gcc 4.4.0 20090116 loop unrolling causes unaccountable
performance degradation
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: blu dot dark at gmail dot com
GCC build triplet: powerpc-apple-darwin8.11.0
GCC host triplet: powerpc-apple-darwin8.11.0
GCC target triplet: powerpc-apple-darwin8.11.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39046
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug middle-end/39046] gcc 4.4.0 20090116 loop unrolling causes unaccountable performance degradation
2009-01-30 21:58 [Bug c++/39046] New: gcc 4.4.0 20090116 loop unrolling causes unaccountable performance degradation blu dot dark at gmail dot com
@ 2009-01-31 3:24 ` blu dot dark at gmail dot com
0 siblings, 0 replies; 2+ messages in thread
From: blu dot dark at gmail dot com @ 2009-01-31 3:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from blu dot dark at gmail dot com 2009-01-31 03:23 -------
Result unreproducible under the same compiler version and same compile options
on a OSX 10.5.6 core2duo host. Both 'flat_assignment_ai' and
'loop_assignment_ai' versions generate identical code.
$ i386-apple-darwin9.6.0-gcc-4.4.0 -v
Using built-in specs.
Target: i386-apple-darwin9.6.0
Configured with: ../gcc-4.4-20090116/configure --prefix=/opt/local
--enable-languages=c,c++,objc,obj-c++ --libdir=/opt/local/lib/gcc44
--includedir=/opt/local/include/gcc44 --infodir=/opt/local/share/info
--mandir=/opt/local/share/man --with-local-prefix=/opt/local --with-system-zlib
--disable-nls --program-suffix=-mp-4.4
--with-gxx-include-dir=/opt/local/include/gcc44/c++/ --with-gmp=/opt/local
--with-mpfr=/opt/local
Thread model: posix
gcc version 4.4.0 20090116 (experimental) (GCC)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39046
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-01-31 3:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-30 21:58 [Bug c++/39046] New: gcc 4.4.0 20090116 loop unrolling causes unaccountable performance degradation blu dot dark at gmail dot com
2009-01-31 3:24 ` [Bug middle-end/39046] " blu dot dark at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).