public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC: Representing vector lane load/store operations
@ 2011-03-22 16:52 Richard Sandiford
  2011-03-22 17:10 ` Richard Guenther
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2011-03-22 16:52 UTC (permalink / raw)
  To: gcc

This is an RFC about adding gimple and optab support for things like
ARM's load-lane and store-lane instructions.  It builds on an earlier
discussion between Ira and Julian, with the aim of allowing these
instructions to be used by the vectoriser.

These instructions operate on N vector registers of M elements each and
on a sequence of 1 or M N-element structures.  They come in three forms:

  - full load/store:

      0<=I<N, 0<=J<M, register[I][J] = memory[J*M+I]

    E.g., for N=3, M=4:

         Registers                   Memory
         ----------------            ---------------
         RRRR  GGGG  BBBB    <--->   RGB RGB RGB RGB

  - lane load/store:

      given L, 0<=I<N register[I][L] = memory[I]

    E.g., for N=3. M=4, L=2:

         Registers                   Memory
         ----------------            ---------------
         ..R.  ..G.  ..B.    <--->   RGB

  - load-and-duplicate:

      0<=I<N, 0<=J<M, register[I][J] = memory[I]

    E.g. for N=3 V4HIs:

         Registers                   Memory
         ----------------            ----------------
         RRRR  GGGG  BBBB    <----   RGB

Starting points:

  1) Memory references should be MEM_REFs at the gimple level.
     We shouldn't add new tree codes for memory references.

  2) Because of the large data involved (at least in the "full" case),
     the gimple statement that represents the lane interleaving should
     also have the MEM_REF.  The two shouldn't be split between
     statements.

  3) The ARM doubleword instructions allow the N vectors to be in
     consecutive registers (DM, DM+1, ...) or in every second register
     (DM, DM+2, ...).  However, the latter case is only interesting
     if we're dealing with halves of quadword vectors.  It's therefore
     reasonable to view the N vectors as one big value.

(3) significantly simplifies things at the rtl level for ARM, because it
avoids having to find some way of saying that N separate pseudos must
be allocated to N consecutive hard registers.  If other targets allow the
N vectors to be stored in arbitrary (non-consecutive) registers, then
they could split the register up into subregs at expand time.
The lower-subreg pass should then optimise things nicely.

The easiest way of dealing with (1) and (2) seems to be to model the
operations as built-in functions.  And if we do treat the N vectors as
a single value, the load functions can simply return that value.  So we
could have something like:

  - full load/store:

      combined_vectors = __builtin_load_lanes (memory);
      memory = __builtin_store_lanes (combined_vectors);

  - lane load/store:

      combined_vectors = __builltin_load_lane (memory, combined_vectors, lane);
      memory = __builtin_store_lane (combined_vectors, lane);

  - load-and-duplicate:

      combined_vectors = __builtin_load_dup (memory);

We could then use normal component references to set or get the individual
vectors of combined_vectors.  Does that sound OK so far?

The question then is: what type should combined_vectors have?  (At this
point I'm just talking about types, not modes.)  The main possibilities
seemed to be:

1. an integer type

     Pros
       * Gimple registers can store integers.

     Cons
       * As Julian points out, GCC doesn't really support integer types
         that are wider than 2 HOST_WIDE_INTs.  It would be good to
         remove that restriction, but it might be a lot of work.

       * We're not really using the type as an integer.

       * The combination of the integer type and the __builtin_load_lanes
         array argument wouldn't be enough to determine the correct
         load operation.  __builtin_load_lanes would need something
         like a vector count argument (N in the above description) as well.

2. a vector type

     Pros
       * Gimple registers can store vectors.

     Cons
       * For vld3, this would mean creating vector types with non-power-
         of-two vectors.  GCC doesn't support those yet, and you get
         ICEs as soon as you try to use them.  (Remember that this is
         all about types, not modes.)

         It _might_ be interesting to implement this support, but as
         above, it would be a lot of work.  It also raises some tricky
         semantic questions, such as: what is the alignment of the new
         vectors? Which leads to...

       * The alignment of the type would be strange.  E.g. suppose
         we're dealing with M=2, and use uint32xY_t to represent a
         vector of Y uint32_ts.  The types and alignments would be:

           N=2 uint32x4_t, alignment 16
           N=3 uint32x6_t, alignment 8 (if we follow the convention for modes)
           N=4 uint32x8_t, alignment 32

         We don't need alignments greater than 8 in our intended use;
         16 and 32 are overkill.

       * We're not really using the type as a single vector,
         but as a collection of vectors.

       * The combination of the vector type and the __builtin_load_lanes
         array argument wouldn't be enough to determine the correct
         load operation.  __builtin_load_lanes would need something
         like a vector count argument (N in the above description) as well.

3. an array-of-vectors type

     Pros
       * No support for new GCC features (large integers or non-power-of-two
         vectors) is needed.

       * The alignment of the type would be taken from the alignment of the
         individual vectors, which is correct.

       * It accurately reflects how the loaded value is going to be used.

       * The type uniquely identifies the correct load operation,
         without need for additional arguments.  (This is minor.)

     Cons
       * Gimple registers can't store array values.

So I think the only disadvantage of using an array of vectors is that the
result can never be a gimple register.  But that isn't much of a disadvantage
really; the things we care about are the individual vectors, which can
of course be treated as gimple registers.  I think our tracking of memory
values is good enough for combined_vectors to be treated as such.

These arrays of vectors would still need to have a non-BLK mode,
so that they can be stored in _rtl_ registers.  But we need that anyway
for ARM's arm_neon.h; the code that today's GCC produces for the intrinsic
functions is very poor.

So how about the following functions?  (Forgive the pascally syntax.)

    __builtin_load_lanes (REF : array N*M of X)
      returns array N of vector M of X
      maps to vldN on ARM
      in practice, the result would be used in assignments of the form:
        vectorY = ARRAY_REF <result, Y>

    __builtin_store_lanes (VECTORS : array N of vector M of X)
      returns array N*M of X
      maps to vstN on ARM
      in practice, the argument would be populated by assignments of the form:
        ARRAY_REF <VECTORS, Y> = vectorY

    __builtin_load_lane (REF : array N of X,
			 VECTORS : array N of vector M of X,
			 LANE : integer)
      returns array N of vector M of X
      maps to vldN_lane on ARM

    __builtin_store_lane (VECTORS : array N of vector M of X,
			  LANE : integer)
      returns array N of X
      maps to vstN_lane on ARM

    __builtin_load_dup (REF : array N of X)
      returns array N of vector M of X
      maps to vldN_dup on ARM

I've hacked up a prototype of this and it seems to produce good code.
What do you think?

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-03-29 13:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-22 16:52 RFC: Representing vector lane load/store operations Richard Sandiford
2011-03-22 17:10 ` Richard Guenther
2011-03-22 19:43   ` Richard Sandiford
2011-03-23  9:23     ` Richard Guenther
2011-03-23 10:38       ` Richard Sandiford
2011-03-23 11:52         ` Richard Guenther
2011-03-23 12:18           ` Richard Sandiford
2011-03-23 12:37             ` Richard Guenther
2011-03-23 13:01               ` Richard Sandiford
2011-03-23 13:14                 ` Richard Guenther
2011-03-23 14:14                   ` Richard Sandiford
2011-03-23 14:28                     ` Richard Guenther
2011-03-23 14:41                       ` Richard Sandiford
2011-03-29 12:50                         ` Richard Sandiford
2011-03-29 14:05                           ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).