public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Utilizing GCC Prefetch Analysis -- Instructions not being generated
@ 2014-08-21  4:33 Malek Musleh
  2014-08-21  6:17 ` Malek Musleh
  0 siblings, 1 reply; 2+ messages in thread
From: Malek Musleh @ 2014-08-21  4:33 UTC (permalink / raw)
  To: gcc-help

Hi,

I am trying to determine the performance impact of gcc's internal
software prefetching analysis. I have compiled my benchmarks with the
following flags:

CFLAGS=-O3 -ffast-math -funroll-loops -fprefetch-loop-arrays

However, after compiling, and examining the objdump of the binary, I
do not see any inserted prefetch instructions. Specifically, I am
using an ALPHA cross compiler (gcc version 4.2, so I know it has
prefetching support), and the prefetch instructions that should be
generated are: lds, ldl, or ldq

http://www.eecg.toronto.edu/~moshovos/ACA05/read/Performance%20tips%20for%20Alpha%20Linux%20C%20programmers.htm


My example program code snippet is:

int main (int argc, char *argv[])
{

  for (i = 0; i < 10000; i++){
    for (j = 0; j < 10000; j++){
      a[i][j] = b[j][0] + b[j+1][0];
    }
  }
}

The loops are large, and regular enough so the analysis pass should
determine that prefetching is possible. Would anyone know why the
instructions are not being generated, or if the objdump is not
capturing those prefetch instructions?

As a separate note, I did try to use the gcc prefetch intrinsics, and
examined the objdump:

        __builtin_prefetch (&a[i+j], 1, 1);
   12000060c:   20 00 4f a0     .long 0xa04f0020
   120000610:   1c 00 2f a0     .long 0xa02f001c
   120000614:   01 00 41 40     .long 0x40410001
   120000618:   01 00 e1 43     .long 0x43e10001
   12000061c:   42 16 20 40     .long 0x40201642
   120000620:   30 00 2f 20     lda     t0,48(fp)
   120000624:   01 04 22 40     .long 0x40220401
   120000628:   00 00 e1 8b     .long 0x8be10000
        __builtin_prefetch (&b[i+j], 0, 1);
   12000062c:   20 00 4f a0     .long 0xa04f0020
   120000630:   1c 00 2f a0     .long 0xa02f001c
   120000634:   01 00 41 40     .long 0x40410001
   120000638:   01 00 e1 43     .long 0x43e10001
   12000063c:   42 16 20 40     .long 0x40201642
   120000640:   70 1f 2f 20     lda     t0,8048(fp)
   120000644:   01 04 22 40     .long 0x40220401
   120000648:   00 00 e1 a3     .long 0xa3e10000

In this case, it seems that the compiler is generating a different set
of instructions for the prefetch instrinsic, and not using what the
alpha manual says.

Thanks,

Malek

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Utilizing GCC Prefetch Analysis -- Instructions not being generated
  2014-08-21  4:33 Utilizing GCC Prefetch Analysis -- Instructions not being generated Malek Musleh
@ 2014-08-21  6:17 ` Malek Musleh
  0 siblings, 0 replies; 2+ messages in thread
From: Malek Musleh @ 2014-08-21  6:17 UTC (permalink / raw)
  To: gcc-help

To clarify the second part,

the objdump I am showing is the expansion of the built_in prefetch
intrinsics. I used objdump --source -d ./my_program. Hence, I was
expecting built_in prefetch to use ldl, lds,or ldq rather than lda.

Thanks,

Malek

On Thu, Aug 21, 2014 at 12:33 AM, Malek Musleh <malek.musleh@gmail.com> wrote:
> Hi,
>
> I am trying to determine the performance impact of gcc's internal
> software prefetching analysis. I have compiled my benchmarks with the
> following flags:
>
> CFLAGS=-O3 -ffast-math -funroll-loops -fprefetch-loop-arrays
>
> However, after compiling, and examining the objdump of the binary, I
> do not see any inserted prefetch instructions. Specifically, I am
> using an ALPHA cross compiler (gcc version 4.2, so I know it has
> prefetching support), and the prefetch instructions that should be
> generated are: lds, ldl, or ldq
>
> http://www.eecg.toronto.edu/~moshovos/ACA05/read/Performance%20tips%20for%20Alpha%20Linux%20C%20programmers.htm
>
>
> My example program code snippet is:
>
> int main (int argc, char *argv[])
> {
>
>   for (i = 0; i < 10000; i++){
>     for (j = 0; j < 10000; j++){
>       a[i][j] = b[j][0] + b[j+1][0];
>     }
>   }
> }
>
> The loops are large, and regular enough so the analysis pass should
> determine that prefetching is possible. Would anyone know why the
> instructions are not being generated, or if the objdump is not
> capturing those prefetch instructions?
>
> As a separate note, I did try to use the gcc prefetch intrinsics, and
> examined the objdump:
>
>         __builtin_prefetch (&a[i+j], 1, 1);
>    12000060c:   20 00 4f a0     .long 0xa04f0020
>    120000610:   1c 00 2f a0     .long 0xa02f001c
>    120000614:   01 00 41 40     .long 0x40410001
>    120000618:   01 00 e1 43     .long 0x43e10001
>    12000061c:   42 16 20 40     .long 0x40201642
>    120000620:   30 00 2f 20     lda     t0,48(fp)
>    120000624:   01 04 22 40     .long 0x40220401
>    120000628:   00 00 e1 8b     .long 0x8be10000
>         __builtin_prefetch (&b[i+j], 0, 1);
>    12000062c:   20 00 4f a0     .long 0xa04f0020
>    120000630:   1c 00 2f a0     .long 0xa02f001c
>    120000634:   01 00 41 40     .long 0x40410001
>    120000638:   01 00 e1 43     .long 0x43e10001
>    12000063c:   42 16 20 40     .long 0x40201642
>    120000640:   70 1f 2f 20     lda     t0,8048(fp)
>    120000644:   01 04 22 40     .long 0x40220401
>    120000648:   00 00 e1 a3     .long 0xa3e10000
>
> In this case, it seems that the compiler is generating a different set
> of instructions for the prefetch instrinsic, and not using what the
> alpha manual says.
>
> Thanks,
>
> Malek

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-08-21  6:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-21  4:33 Utilizing GCC Prefetch Analysis -- Instructions not being generated Malek Musleh
2014-08-21  6:17 ` Malek Musleh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).