* Strange Performance Hit on 2D-Loop
@ 2009-07-09 14:19 Andreas Schäfer
2009-07-09 14:37 ` Richard Guenther
2011-01-13 16:28 ` Andreas Schäfer
0 siblings, 2 replies; 4+ messages in thread
From: Andreas Schäfer @ 2009-07-09 14:19 UTC (permalink / raw)
To: gcc
[-- Attachment #1.1: Type: text/plain, Size: 1950 bytes --]
Hey guys,
I noticed a strange performance hit in one of our stencil codes,
causing it to run twice as long.
To nail down the error, I reduced our code to the two attached demo
programs. Basically they take two matrices and average each matrix
element with its four direct neighbors. Depending on how these
matrices are allocated, the performance hit occurs -- or does not.
Here is the diff of the two files:
@@ -17,8 +17,7 @@
void test(double (*grid)[GRID_WIDTH])
{
- double (*gridOld)[GRID_WIDTH] =
- malloc(GRID_WIDTH * GRID_HEIGHT * sizeof(double));
+ double (*gridOld)[GRID_WIDTH] = gridOldArray;
double (*gridNew)[GRID_WIDTH] = gridNewArray;
printAddress(&gridNew[0][0]);
printAddress(&gridOld[0][0]);
where gridOldArray is a statically allocated array. Depending on the
machines processor the performance hit varies from negligible to
dramatic:
Processor GCC Version Time(slow) Time(fast) Performance Hit
------------------ ----------- ---------- ---------- ---------------
Core 2 Quad Q9550 4.3.3 12.19s 5.11s 138%
Athlon 64 X2 3800+ 4.3.3 7.34s 6.61s 11%
Opteron 2378 4.3.2 6.13s 5.60s 9%
Opteron 2352 4.3.3 8.16s 7.96s 2%
Xeon 3.00GHz 4.3.3 18.98s 14.67s 29%
Apparently Intel systems are more susceptible to this effect.
Can anyone reproduce these results?
And could anyone explain, why this happens?
Thanks in advance
-Andreas
--
============================================
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
0049/3641-9-46376
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
============================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
[-- Attachment #1.2: slowdown.slow.c --]
[-- Type: text/x-csrc, Size: 1880 bytes --]
#define GRID_WIDTH 1024
#define GRID_HEIGHT 1024
#define MAX_STEPS 1024
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
double grid[GRID_HEIGHT][GRID_WIDTH];
double gridNewArray[GRID_HEIGHT][GRID_WIDTH];
double gridOldArray[GRID_HEIGHT][GRID_WIDTH];
void printAddress(void *p)
{
printf("address %p\n", p);
}
void test(double (*grid)[GRID_WIDTH])
{
double (*gridOld)[GRID_WIDTH] = gridOldArray;
double (*gridNew)[GRID_WIDTH] = gridNewArray;
printAddress(&gridNew[0][0]);
printAddress(&gridOld[0][0]);
// copy initial state
for (int y = 0; y < GRID_HEIGHT; ++y) {
memcpy(&gridOld[y][0], &grid[y][0], GRID_WIDTH * sizeof(double));
memset(&gridNew[y][0], 0, GRID_WIDTH * sizeof(double));
}
// update matrices
for (int step = 0; step < MAX_STEPS; ++step) {
for (int y = 1; y < GRID_HEIGHT-1; ++y)
for (int x = 1; x < GRID_WIDTH-1; ++x)
gridNew[y][x] =
(gridOld[y-1][x ] +
gridOld[y ][x-1] +
gridOld[y ][x ] +
gridOld[y ][x+1] +
gridOld[y+1][x ]) * 0.2;
double (*tmp)[GRID_WIDTH] = gridOld;
gridOld = gridNew;
gridNew = tmp;
}
// copy result back
for (int y = 0; y < GRID_HEIGHT; ++y)
memcpy(&grid[y][0], &gridOld[y][0], GRID_WIDTH * sizeof(double));
}
void setupGrid()
{
for (int y = 0; y < GRID_HEIGHT; ++y)
for (int x = 0; x < GRID_WIDTH; ++x)
grid[y][x] = 0;
for (int y = 10; y < 20; ++y)
for (int x = 10; x < 20; ++x)
grid[y][x] = 1;
}
int main(int argc, char** argv)
{
setupGrid();
test(grid);
printf("res: %f\n", grid[10][10]); // prevent dead code elimination
return 0;
}
[-- Attachment #1.3: slowdown.fast --]
[-- Type: application/octet-stream, Size: 8392 bytes --]
[-- Attachment #1.4: test.sh --]
[-- Type: application/x-sh, Size: 233 bytes --]
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange Performance Hit on 2D-Loop
2009-07-09 14:19 Strange Performance Hit on 2D-Loop Andreas Schäfer
@ 2009-07-09 14:37 ` Richard Guenther
2009-07-09 14:48 ` Andreas Schäfer
2011-01-13 16:28 ` Andreas Schäfer
1 sibling, 1 reply; 4+ messages in thread
From: Richard Guenther @ 2009-07-09 14:37 UTC (permalink / raw)
To: Andreas Schäfer; +Cc: gcc
On Thu, Jul 9, 2009 at 4:19 PM, Andreas Schäfer<gentryx@gmx.de> wrote:
> Hey guys,
>
> I noticed a strange performance hit in one of our stencil codes,
> causing it to run twice as long.
>
> To nail down the error, I reduced our code to the two attached demo
> programs. Basically they take two matrices and average each matrix
> element with its four direct neighbors. Depending on how these
> matrices are allocated, the performance hit occurs -- or does not.
>
> Here is the diff of the two files:
> @@ -17,8 +17,7 @@
>
> void test(double (*grid)[GRID_WIDTH])
> {
> - double (*gridOld)[GRID_WIDTH] =
> - malloc(GRID_WIDTH * GRID_HEIGHT * sizeof(double));
> + double (*gridOld)[GRID_WIDTH] = gridOldArray;
> double (*gridNew)[GRID_WIDTH] = gridNewArray;
> printAddress(&gridNew[0][0]);
> printAddress(&gridOld[0][0]);
>
> where gridOldArray is a statically allocated array. Depending on the
> machines processor the performance hit varies from negligible to
> dramatic:
>
>
> Processor GCC Version Time(slow) Time(fast) Performance Hit
> ------------------ ----------- ---------- ---------- ---------------
> Core 2 Quad Q9550 4.3.3 12.19s 5.11s 138%
> Athlon 64 X2 3800+ 4.3.3 7.34s 6.61s 11%
> Opteron 2378 4.3.2 6.13s 5.60s 9%
> Opteron 2352 4.3.3 8.16s 7.96s 2%
> Xeon 3.00GHz 4.3.3 18.98s 14.67s 29%
>
> Apparently Intel systems are more susceptible to this effect.
>
> Can anyone reproduce these results?
> And could anyone explain, why this happens?
Depends on the GCC version used. First of all
printAddress(&gridNew[0][0]);
printAddress(&gridOld[0][0]);
makes the addresses escape and GCC versions other than the
current development trunk think that the malloced address can
alias the global variables.
Richard.
> Thanks in advance
> -Andreas
>
>
> --
> ============================================
> Andreas Schäfer
> Cluster and Metacomputing Working Group
> Friedrich-Schiller-Universität Jena, Germany
> 0049/3641-9-46376
> PGP/GPG key via keyserver
> I'm a bright... http://www.the-brights.net
> ============================================
>
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange Performance Hit on 2D-Loop
2009-07-09 14:37 ` Richard Guenther
@ 2009-07-09 14:48 ` Andreas Schäfer
0 siblings, 0 replies; 4+ messages in thread
From: Andreas Schäfer @ 2009-07-09 14:48 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
[-- Attachment #1: Type: text/plain, Size: 873 bytes --]
On 16:37 Thu 09 Jul , Richard Guenther wrote:
> Depends on the GCC version used. First of all
>
> printAddress(&gridNew[0][0]);
> printAddress(&gridOld[0][0]);
>
> makes the addresses escape and GCC versions other than the
> current development trunk think that the malloced address can
> alias the global variables.
AFAICS that doesn't really matter: I still get the same results, even
if I remove printAddress().
-Andreas
--
============================================
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
0049/3641-9-46376
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
============================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange Performance Hit on 2D-Loop
2009-07-09 14:19 Strange Performance Hit on 2D-Loop Andreas Schäfer
2009-07-09 14:37 ` Richard Guenther
@ 2011-01-13 16:28 ` Andreas Schäfer
1 sibling, 0 replies; 4+ messages in thread
From: Andreas Schäfer @ 2011-01-13 16:28 UTC (permalink / raw)
To: gcc
[-- Attachment #1: Type: text/plain, Size: 4931 bytes --]
Just for the records: I finally found the issue here. It's a problem
of both, alignment and cache thrashing. When using aligned memory
(e.g. via posix_memalign()) and using a suitable offset within that
memory, the effect goes away. So it's a processor effect, not a
compiler issue. :-)
Best
-Andreas
On 16:19 Thu 09 Jul , Andreas Schäfer wrote:
> Hey guys,
>
> I noticed a strange performance hit in one of our stencil codes,
> causing it to run twice as long.
>
> To nail down the error, I reduced our code to the two attached demo
> programs. Basically they take two matrices and average each matrix
> element with its four direct neighbors. Depending on how these
> matrices are allocated, the performance hit occurs -- or does not.
>
> Here is the diff of the two files:
> @@ -17,8 +17,7 @@
>
> void test(double (*grid)[GRID_WIDTH])
> {
> - double (*gridOld)[GRID_WIDTH] =
> - malloc(GRID_WIDTH * GRID_HEIGHT * sizeof(double));
> + double (*gridOld)[GRID_WIDTH] = gridOldArray;
> double (*gridNew)[GRID_WIDTH] = gridNewArray;
> printAddress(&gridNew[0][0]);
> printAddress(&gridOld[0][0]);
>
> where gridOldArray is a statically allocated array. Depending on the
> machines processor the performance hit varies from negligible to
> dramatic:
>
>
> Processor GCC Version Time(slow) Time(fast) Performance Hit
> ------------------ ----------- ---------- ---------- ---------------
> Core 2 Quad Q9550 4.3.3 12.19s 5.11s 138%
> Athlon 64 X2 3800+ 4.3.3 7.34s 6.61s 11%
> Opteron 2378 4.3.2 6.13s 5.60s 9%
> Opteron 2352 4.3.3 8.16s 7.96s 2%
> Xeon 3.00GHz 4.3.3 18.98s 14.67s 29%
>
> Apparently Intel systems are more susceptible to this effect.
>
> Can anyone reproduce these results?
> And could anyone explain, why this happens?
>
> Thanks in advance
> -Andreas
>
>
> --
> ============================================
> Andreas Schäfer
> Cluster and Metacomputing Working Group
> Friedrich-Schiller-Universität Jena, Germany
> 0049/3641-9-46376
> PGP/GPG key via keyserver
> I'm a bright... http://www.the-brights.net
> ============================================
>
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
> #define GRID_WIDTH 1024
> #define GRID_HEIGHT 1024
> #define MAX_STEPS 1024
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> double grid[GRID_HEIGHT][GRID_WIDTH];
> double gridNewArray[GRID_HEIGHT][GRID_WIDTH];
> double gridOldArray[GRID_HEIGHT][GRID_WIDTH];
>
> void printAddress(void *p)
> {
> printf("address %p\n", p);
> }
>
> void test(double (*grid)[GRID_WIDTH])
> {
> double (*gridOld)[GRID_WIDTH] = gridOldArray;
> double (*gridNew)[GRID_WIDTH] = gridNewArray;
> printAddress(&gridNew[0][0]);
> printAddress(&gridOld[0][0]);
>
> // copy initial state
> for (int y = 0; y < GRID_HEIGHT; ++y) {
> memcpy(&gridOld[y][0], &grid[y][0], GRID_WIDTH * sizeof(double));
> memset(&gridNew[y][0], 0, GRID_WIDTH * sizeof(double));
> }
>
> // update matrices
> for (int step = 0; step < MAX_STEPS; ++step) {
> for (int y = 1; y < GRID_HEIGHT-1; ++y)
> for (int x = 1; x < GRID_WIDTH-1; ++x)
> gridNew[y][x] =
> (gridOld[y-1][x ] +
> gridOld[y ][x-1] +
> gridOld[y ][x ] +
> gridOld[y ][x+1] +
> gridOld[y+1][x ]) * 0.2;
> double (*tmp)[GRID_WIDTH] = gridOld;
> gridOld = gridNew;
> gridNew = tmp;
> }
>
> // copy result back
> for (int y = 0; y < GRID_HEIGHT; ++y)
> memcpy(&grid[y][0], &gridOld[y][0], GRID_WIDTH * sizeof(double));
> }
>
> void setupGrid()
> {
> for (int y = 0; y < GRID_HEIGHT; ++y)
> for (int x = 0; x < GRID_WIDTH; ++x)
> grid[y][x] = 0;
>
> for (int y = 10; y < 20; ++y)
> for (int x = 10; x < 20; ++x)
> grid[y][x] = 1;
> }
>
> int main(int argc, char** argv)
> {
> setupGrid();
> test(grid);
> printf("res: %f\n", grid[10][10]); // prevent dead code elimination
> return 0;
> }
--
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
==========================================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-01-13 16:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-09 14:19 Strange Performance Hit on 2D-Loop Andreas Schäfer
2009-07-09 14:37 ` Richard Guenther
2009-07-09 14:48 ` Andreas Schäfer
2011-01-13 16:28 ` Andreas Schäfer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).