* Utilizing GCC Prefetch Analysis -- Instructions not being generated
@ 2014-08-21 4:33 Malek Musleh
2014-08-21 6:17 ` Malek Musleh
0 siblings, 1 reply; 2+ messages in thread
From: Malek Musleh @ 2014-08-21 4:33 UTC (permalink / raw)
To: gcc-help
Hi,
I am trying to determine the performance impact of gcc's internal
software prefetching analysis. I have compiled my benchmarks with the
following flags:
CFLAGS=-O3 -ffast-math -funroll-loops -fprefetch-loop-arrays
However, after compiling, and examining the objdump of the binary, I
do not see any inserted prefetch instructions. Specifically, I am
using an ALPHA cross compiler (gcc version 4.2, so I know it has
prefetching support), and the prefetch instructions that should be
generated are: lds, ldl, or ldq
http://www.eecg.toronto.edu/~moshovos/ACA05/read/Performance%20tips%20for%20Alpha%20Linux%20C%20programmers.htm
My example program code snippet is:
int main (int argc, char *argv[])
{
for (i = 0; i < 10000; i++){
for (j = 0; j < 10000; j++){
a[i][j] = b[j][0] + b[j+1][0];
}
}
}
The loops are large, and regular enough so the analysis pass should
determine that prefetching is possible. Would anyone know why the
instructions are not being generated, or if the objdump is not
capturing those prefetch instructions?
As a separate note, I did try to use the gcc prefetch intrinsics, and
examined the objdump:
__builtin_prefetch (&a[i+j], 1, 1);
12000060c: 20 00 4f a0 .long 0xa04f0020
120000610: 1c 00 2f a0 .long 0xa02f001c
120000614: 01 00 41 40 .long 0x40410001
120000618: 01 00 e1 43 .long 0x43e10001
12000061c: 42 16 20 40 .long 0x40201642
120000620: 30 00 2f 20 lda t0,48(fp)
120000624: 01 04 22 40 .long 0x40220401
120000628: 00 00 e1 8b .long 0x8be10000
__builtin_prefetch (&b[i+j], 0, 1);
12000062c: 20 00 4f a0 .long 0xa04f0020
120000630: 1c 00 2f a0 .long 0xa02f001c
120000634: 01 00 41 40 .long 0x40410001
120000638: 01 00 e1 43 .long 0x43e10001
12000063c: 42 16 20 40 .long 0x40201642
120000640: 70 1f 2f 20 lda t0,8048(fp)
120000644: 01 04 22 40 .long 0x40220401
120000648: 00 00 e1 a3 .long 0xa3e10000
In this case, it seems that the compiler is generating a different set
of instructions for the prefetch instrinsic, and not using what the
alpha manual says.
Thanks,
Malek
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Utilizing GCC Prefetch Analysis -- Instructions not being generated
2014-08-21 4:33 Utilizing GCC Prefetch Analysis -- Instructions not being generated Malek Musleh
@ 2014-08-21 6:17 ` Malek Musleh
0 siblings, 0 replies; 2+ messages in thread
From: Malek Musleh @ 2014-08-21 6:17 UTC (permalink / raw)
To: gcc-help
To clarify the second part,
the objdump I am showing is the expansion of the built_in prefetch
intrinsics. I used objdump --source -d ./my_program. Hence, I was
expecting built_in prefetch to use ldl, lds,or ldq rather than lda.
Thanks,
Malek
On Thu, Aug 21, 2014 at 12:33 AM, Malek Musleh <malek.musleh@gmail.com> wrote:
> Hi,
>
> I am trying to determine the performance impact of gcc's internal
> software prefetching analysis. I have compiled my benchmarks with the
> following flags:
>
> CFLAGS=-O3 -ffast-math -funroll-loops -fprefetch-loop-arrays
>
> However, after compiling, and examining the objdump of the binary, I
> do not see any inserted prefetch instructions. Specifically, I am
> using an ALPHA cross compiler (gcc version 4.2, so I know it has
> prefetching support), and the prefetch instructions that should be
> generated are: lds, ldl, or ldq
>
> http://www.eecg.toronto.edu/~moshovos/ACA05/read/Performance%20tips%20for%20Alpha%20Linux%20C%20programmers.htm
>
>
> My example program code snippet is:
>
> int main (int argc, char *argv[])
> {
>
> for (i = 0; i < 10000; i++){
> for (j = 0; j < 10000; j++){
> a[i][j] = b[j][0] + b[j+1][0];
> }
> }
> }
>
> The loops are large, and regular enough so the analysis pass should
> determine that prefetching is possible. Would anyone know why the
> instructions are not being generated, or if the objdump is not
> capturing those prefetch instructions?
>
> As a separate note, I did try to use the gcc prefetch intrinsics, and
> examined the objdump:
>
> __builtin_prefetch (&a[i+j], 1, 1);
> 12000060c: 20 00 4f a0 .long 0xa04f0020
> 120000610: 1c 00 2f a0 .long 0xa02f001c
> 120000614: 01 00 41 40 .long 0x40410001
> 120000618: 01 00 e1 43 .long 0x43e10001
> 12000061c: 42 16 20 40 .long 0x40201642
> 120000620: 30 00 2f 20 lda t0,48(fp)
> 120000624: 01 04 22 40 .long 0x40220401
> 120000628: 00 00 e1 8b .long 0x8be10000
> __builtin_prefetch (&b[i+j], 0, 1);
> 12000062c: 20 00 4f a0 .long 0xa04f0020
> 120000630: 1c 00 2f a0 .long 0xa02f001c
> 120000634: 01 00 41 40 .long 0x40410001
> 120000638: 01 00 e1 43 .long 0x43e10001
> 12000063c: 42 16 20 40 .long 0x40201642
> 120000640: 70 1f 2f 20 lda t0,8048(fp)
> 120000644: 01 04 22 40 .long 0x40220401
> 120000648: 00 00 e1 a3 .long 0xa3e10000
>
> In this case, it seems that the compiler is generating a different set
> of instructions for the prefetch instrinsic, and not using what the
> alpha manual says.
>
> Thanks,
>
> Malek
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-08-21 6:17 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-21 4:33 Utilizing GCC Prefetch Analysis -- Instructions not being generated Malek Musleh
2014-08-21 6:17 ` Malek Musleh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).