public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* prefetch revisited
@ 2001-10-30  8:54 Janis Johnson
  2001-10-30  9:14 ` Jan Hubicka
  2001-10-30  9:20 ` Joseph S. Myers
  0 siblings, 2 replies; 11+ messages in thread
From: Janis Johnson @ 2001-10-30  8:54 UTC (permalink / raw)
  To: gcc

In April 2000, Jan Hubicka proposed adding prefetch support to GCC
( http://gcc.gnu.org/ml/gcc/2000-04/msg00194.html ).  This was met with
much excitement and discussion, and Jan sent a few versions of a
prefetch patch to gcc-patches.  The discussion died and Jan apparently
dropped work on the patch, although he added prefetch support for SSE
and 3dNOW! to config/i386.md.

I'd like to revisit prefetch support in GCC and start by defining an
infrastructure that can allow various optimizations to eventually take
advantage of the prefetch capabilities of multiple architectures.  I
hope to use it for greedy prefetching of memory referenced by pointers,
as described in the paper "Compiler- Based Prefetching for Recursive
Data Structures" by Chi-Keung Luk and Todd C. Mowry, available via
http:/www.cs.cmu.edu/~tcm/Papers.html.  Jan's patch used it in loop
optimizations; that area is apparently undergoing a lot of changes, so
perhaps the people working on that would like to revisit Jan's prefetch
work for loops.  In the meantime I'll be using his old loop optimizer
changes to generate prefetches to let me test the underlying prefetch
support, with machine-specific support for IA-64 and Pentium III.

A new prefetch instruction pattern can take an address operand and a
list of options or flags indicating which kinds of prefetch support to
use, depending on what the machine supports.  The rtl code for prefetch
can be recognized throughout the compiler and handled appropriately.  A
machine description will map the options and flags to the appropriate
instruction for that machine, ignoring the ones that aren't relevant for
its prefetch support.  Each architecture will also define a set of
parameters for prefetching, including the cache line size and the number
of prefetches that can be done in parallel (as in Jan's patches).

The earlier discussions mentioned the following machines as supporting
prefetch: Athlon, ia64, Pentium III, hppa, mips, 3dNOW!, Sparc, PowerPC,
and Alpha.  Some of the variations of prefetch support that might be
taken into consideration are read vs. write accesses, base update form,
spatial and temporal locality, single vs. multiple reads, and multiple
cache levels; some also support both faulting and non-faulting versions,
but I assume that we can limit support to non-faulting prefetches.  Are
there other capabilities of prefetch support to consider?  Which
prefetch attributes are likely to be useful within GCC?

Each prefetch optimization can be controlled by a separate flag.  For
example:

-fprefetch-loop-arrays
      If supported for the target machine, generate prefetch
      instructions to improve the performance of loops that access
      large arrays.

-fprefetch-pointers
      If supported for the target machine, generate prefetch
      instructions to improve the performance of accesses to recursive
      data structures.

Am I on the right track?  I'm working on a patch as I figure out how all
of this stuff works in GCC and I'll be asking for advice on
implementation details later, but first I'd like to settle the wider
issues.

Janis

^ permalink raw reply	[flat|nested] 11+ messages in thread
* prefetch revisited
@ 2001-10-30 14:18 Janis Johnson
  2001-10-30 14:36 ` Janis Johnson
  0 siblings, 1 reply; 11+ messages in thread
From: Janis Johnson @ 2001-10-30 14:18 UTC (permalink / raw)
  To: gcc

In April 2000, Jan Hubicka proposed adding prefetch support to GCC
( http://gcc.gnu.org/ml/gcc/2000-04/msg00194.html ).  This was met with
much excitement and discussion, and Jan sent a few versions of a
prefetch patch to gcc-patches.  The discussion died and Jan apparently
dropped work on the patch, although he added prefetch support for SSE
and 3dNOW! to config/i386.md.

I'd like to revisit prefetch support in GCC and start by defining an
infrastructure that can allow various optimizations to eventually take
advantage of the prefetch capabilities of multiple architectures.  I
hope to use it for greedy prefetching of memory referenced by pointers,
as described in the paper "Compiler- Based Prefetching for Recursive
Data Structures" by Chi-Keung Luk and Todd C. Mowry, available via
http:/www.cs.cmu.edu/~tcm/Papers.html.  Jan's patch used it in loop
optimizations; that area is apparently undergoing a lot of changes, so
perhaps the people working on that would like to revisit Jan's prefetch
work for loops.  In the meantime I'll be using his old loop optimizer
changes to generate prefetches to let me test the underlying prefetch
support, with machine-specific support for IA-64 and Pentium III.

A new prefetch instruction pattern can take an address operand and a
list of options or flags indicating which kinds of prefetch support to
use, depending on what the machine supports.  The rtl code for prefetch
can be recognized throughout the compiler and handled appropriately.  A
machine description will map the options and flags to the appropriate
instruction for that machine, ignoring the ones that aren't relevant for
its prefetch support.  Each architecture will also define a set of
parameters for prefetching, including the cache line size and the number
of prefetches that can be done in parallel (as in Jan's patches).

The earlier discussions mentioned the following machines as supporting
prefetch: Athlon, ia64, Pentium III, hppa, mips, 3dNOW!, Sparc, PowerPC,
and Alpha.  Some of the variations of prefetch support that might be
taken into consideration are read vs. write accesses, base update form,
spatial and temporal locality, single vs. multiple reads, and multiple
cache levels; some also support both faulting and non-faulting versions,
but I assume that we can limit support to non-faulting prefetches.  Are
there other capabilities of prefetch support to consider?  Which
prefetch attributes are likely to be useful within GCC?

Each prefetch optimization can be controlled by a separate flag.  For
example:

-fprefetch-loop-arrays
      If supported for the target machine, generate prefetch
      instructions to improve the performance of loops that access
      large arrays.

-fprefetch-pointers
      If supported for the target machine, generate prefetch
      instructions to improve the performance of accesses to recursive
      data structures.

Am I on the right track?  I'll be asking for advice on implementation
details later, but first I'd like to concentrate on the wider issues.

Janis

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-10-31  5:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-30  8:54 prefetch revisited Janis Johnson
2001-10-30  9:14 ` Jan Hubicka
2001-10-30 10:32   ` Jan Hubicka
2001-10-30 10:25     ` Janis Johnson
2001-10-30 10:34       ` Jan Hubicka
2001-10-30 11:40       ` Jan Hubicka
2001-10-30 15:49         ` Janis Johnson
2001-10-31  5:57           ` Jan Hubicka
2001-10-30  9:20 ` Joseph S. Myers
2001-10-30 14:18 Janis Johnson
2001-10-30 14:36 ` Janis Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).