public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Add an array_mode_supported_p target hook
@ 2011-03-31 13:43 Richard Sandiford
  2011-03-31 13:56 ` Richard Guenther
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2011-03-31 13:43 UTC (permalink / raw)
  To: gcc-patches

This patch adds an array_mode_supported_p hook, which says whether
MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
It follows on from the discussion here:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html

The intended use of the hook is to allow small arrays of vectors
to have a non-BLK mode, and hence to be stored in rtl registers.
These arrays are used both in the ARM arm_neon.h API and in the
optabs proposed in:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html

The tail end of the thread was about the definition of TYPE_MODE:

#define TYPE_MODE(NODE) \
  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
   ? vector_type_mode (NODE) : (NODE)->type.mode)

with this outcome:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html

To summarise my take on it:

- The current definition of TYPE_MODE isn't sufficient even for vector
  modes and vector_mode_supported_p, because non-vector types can have
  vector modes.

- We should no longer treat types as having one mode everywhere.
  We should instead replace TYPE_MODE with a function that takes
  a context.  Tests of things like vector_mode_supported_p would
  move from layout_type to this new function.

I think this patch fits within that scheme.  array_mode_supported_p
would be treated in the same way as vector_mode_supported_p.

I realise the ideal would be to get rid of TYPE_MODE first.
But that's going to be a longer-term thing.  Now that there's
at least a plan, I'd like to press ahead with the array stuff
on the basis that

(a) although the new hook won't work with the "target" attribute,
    our current mode handling doesn't work in just the same way.

(b) the new hook doesn't interfere with the plan.

(c) getting good code from the intrinsics (and support for these
    instructions in the vectoriser) is going to be much more important
    to most ARM users than the ability to turn Neon on and off for
    individual functions in a TU.

To give an example of the difference, the Neon code posted here:

    http://hilbert-space.de/?p=22

produces this inner loop before the patch (but with
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):

.L3:
	vld3.8	{d16-d18}, [r1]!
	vstmia	ip, {d16-d18}
	fldd	d19, [sp, #24]
	adr	r5, .L6
	ldmia	r5, {r4-r5}
	fldd	d16, [sp, #32]
	vmov	d18, r4, r5  @ v8qi
	vmull.u8	q9, d19, d18
	adr	r5, .L6+8
	ldmia	r5, {r4-r5}
	vmov	d17, r4, r5  @ v8qi
	vstmia	sp, {d18-d19}
	vmlal.u8	q9, d16, d17
	fldd	d16, [sp, #40]
	adr	r5, .L6+16
	ldmia	r5, {r4-r5}
	vmov	d17, r4, r5  @ v8qi
	vmlal.u8	q9, d16, d17
	add	r3, r3, #1
	vshrn.i16	d16, q9, #8
	cmp	r3, r2
	vst1.8	{d16}, [r0]!
	bne	.L3

With both patches applied, the inner loop is:

.L3:
	vld3.8	{d18-d20}, [r1]!
	vmull.u8	q8, d18, d21
	vmlal.u8	q8, d19, d22
	vmlal.u8	q8, d20, d23
	add	r3, r3, #1
	vshrn.i16	d16, q8, #8
	cmp	r3, r2
	vst1.8	{d16}, [r0]!
	bne	.L3

Tested on arm-linux-gnueabi.  OK to install?

Richard


gcc/
	* hooks.h (hook_bool_mode_uhwi_false): Declare.
	* hooks.c (hook_bool_mode_uhwi_false): New function.
	* target.def (array_mode_supported_p): New hook.
	* doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook.
	* doc/tm.texi: Regenerate.
	* stor-layout.c (mode_for_array): New function.
	(layout_type): Use it.
	* config/arm/arm.c (arm_array_mode_supported_p): New function.
	(TARGET_ARRAY_MODE_SUPPORTED_P): Define.

Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h	2011-03-31 10:57:26.000000000 +0100
+++ gcc/hooks.h	2011-03-31 14:18:21.000000000 +0100
@@ -34,6 +34,8 @@ extern bool hook_bool_mode_false (enum m
 extern bool hook_bool_mode_true (enum machine_mode);
 extern bool hook_bool_mode_const_rtx_false (enum machine_mode, const_rtx);
 extern bool hook_bool_mode_const_rtx_true (enum machine_mode, const_rtx);
+extern bool hook_bool_mode_uhwi_false (enum machine_mode,
+				       unsigned HOST_WIDE_INT);
 extern bool hook_bool_tree_false (tree);
 extern bool hook_bool_const_tree_false (const_tree);
 extern bool hook_bool_tree_true (tree);
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c	2011-03-31 10:57:26.000000000 +0100
+++ gcc/hooks.c	2011-03-31 14:18:21.000000000 +0100
@@ -101,6 +101,15 @@ hook_bool_mode_const_rtx_true (enum mach
   return true;
 }
 
+/* Generic hook that takes (enum machine_mode, unsigned HOST_WIDE_INT)
+   and returns false.  */
+bool
+hook_bool_mode_uhwi_false (enum machine_mode mode ATTRIBUTE_UNUSED,
+			   unsigned HOST_WIDE_INT value ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 /* Generic hook that takes (FILE *, const char *) and does nothing.  */
 void
 hook_void_FILEptr_constcharptr (FILE *a ATTRIBUTE_UNUSED, const char *b ATTRIBUTE_UNUSED)
Index: gcc/target.def
===================================================================
--- gcc/target.def	2011-03-31 10:57:26.000000000 +0100
+++ gcc/target.def	2011-03-31 14:18:41.000000000 +0100
@@ -1611,6 +1611,38 @@ DEFHOOK
  bool, (enum machine_mode mode),
  hook_bool_mode_false)
 
+/* True if we should try to use a scalar mode to represent an array,
+   overriding the usual MAX_FIXED_MODE limit.  */
+DEFHOOK
+(array_mode_supported_p,
+ "Return true if GCC should try to use a scalar mode to store an array\n\
+of @var{nelems} elements, given that each element has mode @var{mode}.\n\
+Returning true here overrides the usual @code{MAX_FIXED_MODE} limit\n\
+and allows GCC to use any defined integer mode.\n\
+\n\
+One use of this hook is to support vector load and store operations\n\
+that operate on several homogeneous vectors.  For example, ARM Neon\n\
+has operations like:\n\
+\n\
+@smallexample\n\
+int8x8x3_t vld3_s8 (const int8_t *)\n\
+@end smallexample\n\
+\n\
+where the return type is defined as:\n\
+\n\
+@smallexample\n\
+typedef struct int8x8x3_t\n\
+@{\n\
+  int8x8_t val[3];\n\
+@} int8x8x3_t;\n\
+@end smallexample\n\
+\n\
+If this hook allows @code{val} to have a scalar mode, then\n\
+@code{int8x8x3_t} can have the same mode.  GCC can then store\n\
+@code{int8x8x3_t}s in registers rather than forcing them onto the stack.",
+ bool, (enum machine_mode mode, unsigned HOST_WIDE_INT nelems),
+ hook_bool_mode_uhwi_false)
+
 /* Compute cost of moving data from a register of class FROM to one of
    TO, using MODE.  */
 DEFHOOK
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2011-03-29 10:32:08.000000000 +0100
+++ gcc/doc/tm.texi.in	2011-03-31 14:27:42.000000000 +0100
@@ -4271,6 +4271,8 @@ insns involving vector mode @var{mode}. 
 must have move patterns for this mode.
 @end deftypefn
 
+@hook TARGET_ARRAY_MODE_SUPPORTED_P
+
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 Define this to return nonzero for machine modes for which the port has
 small register classes.  If this target hook returns nonzero for a given
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2011-03-31 10:57:26.000000000 +0100
+++ gcc/stor-layout.c	2011-03-31 14:22:23.000000000 +0100
@@ -546,6 +546,34 @@ get_mode_alignment (enum machine_mode mo
   return MIN (BIGGEST_ALIGNMENT, MAX (1, mode_base_align[mode]*BITS_PER_UNIT));
 }
 
+/* Return the natural mode of an array, given that it is SIZE bytes in
+   total and has elements of type ELEM_TYPE.  */
+
+static enum machine_mode
+mode_for_array (tree elem_type, tree size)
+{
+  tree elem_size;
+  unsigned HOST_WIDE_INT int_size, int_elem_size;
+  bool limit_p;
+
+  /* One-element arrays get the component type's mode.  */
+  elem_size = TYPE_SIZE (elem_type);
+  if (simple_cst_equal (size, elem_size))
+    return TYPE_MODE (elem_type);
+
+  limit_p = true;
+  if (host_integerp (size, 1) && host_integerp (elem_size, 1))
+    {
+      int_size = tree_low_cst (size, 1);
+      int_elem_size = tree_low_cst (elem_size, 1);
+      if (int_elem_size > 0
+	  && int_size % int_elem_size == 0
+	  && targetm.array_mode_supported_p (TYPE_MODE (elem_type),
+					     int_size / int_elem_size))
+	limit_p = false;
+    }
+  return mode_for_size_tree (size, MODE_INT, limit_p);
+}
 \f
 /* Subroutine of layout_decl: Force alignment required for the data type.
    But if the decl itself wants greater alignment, don't override that.  */
@@ -2039,14 +2067,8 @@ layout_type (tree type)
 	    && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
 		|| TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
 	  {
-	    /* One-element arrays get the component type's mode.  */
-	    if (simple_cst_equal (TYPE_SIZE (type),
-				  TYPE_SIZE (TREE_TYPE (type))))
-	      SET_TYPE_MODE (type, TYPE_MODE (TREE_TYPE (type)));
-	    else
-	      SET_TYPE_MODE (type, mode_for_size_tree (TYPE_SIZE (type),
-						       MODE_INT, 1));
-
+	    SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
+						 TYPE_SIZE (type)));
 	    if (TYPE_MODE (type) != BLKmode
 		&& STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
 		&& TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2011-03-31 14:10:12.000000000 +0100
+++ gcc/config/arm/arm.c	2011-03-31 14:18:21.000000000 +0100
@@ -243,6 +243,8 @@ static rtx arm_pic_static_addr (rtx orig
 static bool cortex_a9_sched_adjust_cost (rtx, rtx, rtx, int *);
 static bool xscale_sched_adjust_cost (rtx, rtx, rtx, int *);
 static bool fa726te_sched_adjust_cost (rtx, rtx, rtx, int *);
+static bool arm_array_mode_supported_p (enum machine_mode,
+					unsigned HOST_WIDE_INT);
 static enum machine_mode arm_preferred_simd_mode (enum machine_mode);
 static bool arm_class_likely_spilled_p (reg_class_t);
 static bool arm_vector_alignment_reachable (const_tree type, bool is_packed);
@@ -403,6 +405,8 @@ #define TARGET_ADDRESS_COST arm_address_
 #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p
+#undef TARGET_ARRAY_MODE_SUPPORTED_P
+#define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
 #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
@@ -22377,6 +22381,20 @@ arm_vector_mode_supported_p (enum machin
   return false;
 }
 
+/* Implements target hook array_mode_supported_p.  */
+
+static bool
+arm_array_mode_supported_p (enum machine_mode mode,
+			    unsigned HOST_WIDE_INT nelems)
+{
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
+      && (nelems >= 2 && nelems <= 4))
+    return true;
+
+  return false;
+}
+
 /* Use the option -mvectorize-with-neon-quad to override the use of doubleword
    registers when autovectorizing for Neon, at least until multiple vector
    widths are supported properly by the middle-end.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Add an array_mode_supported_p target hook
  2011-03-31 13:43 Add an array_mode_supported_p target hook Richard Sandiford
@ 2011-03-31 13:56 ` Richard Guenther
  2011-03-31 14:56   ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Guenther @ 2011-03-31 13:56 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds an array_mode_supported_p hook, which says whether
> MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
> It follows on from the discussion here:
>
>    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
>
> The intended use of the hook is to allow small arrays of vectors
> to have a non-BLK mode, and hence to be stored in rtl registers.
> These arrays are used both in the ARM arm_neon.h API and in the
> optabs proposed in:
>
>    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
>
> The tail end of the thread was about the definition of TYPE_MODE:
>
> #define TYPE_MODE(NODE) \
>  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
>   ? vector_type_mode (NODE) : (NODE)->type.mode)
>
> with this outcome:
>
>    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html
>
> To summarise my take on it:
>
> - The current definition of TYPE_MODE isn't sufficient even for vector
>  modes and vector_mode_supported_p, because non-vector types can have
>  vector modes.
>
> - We should no longer treat types as having one mode everywhere.
>  We should instead replace TYPE_MODE with a function that takes
>  a context.  Tests of things like vector_mode_supported_p would
>  move from layout_type to this new function.
>
> I think this patch fits within that scheme.  array_mode_supported_p
> would be treated in the same way as vector_mode_supported_p.
>
> I realise the ideal would be to get rid of TYPE_MODE first.
> But that's going to be a longer-term thing.  Now that there's
> at least a plan, I'd like to press ahead with the array stuff
> on the basis that
>
> (a) although the new hook won't work with the "target" attribute,
>    our current mode handling doesn't work in just the same way.
>
> (b) the new hook doesn't interfere with the plan.
>
> (c) getting good code from the intrinsics (and support for these
>    instructions in the vectoriser) is going to be much more important
>    to most ARM users than the ability to turn Neon on and off for
>    individual functions in a TU.
>
> To give an example of the difference, the Neon code posted here:
>
>    http://hilbert-space.de/?p=22
>
> produces this inner loop before the patch (but with
> http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):
>
> .L3:
>        vld3.8  {d16-d18}, [r1]!
>        vstmia  ip, {d16-d18}
>        fldd    d19, [sp, #24]
>        adr     r5, .L6
>        ldmia   r5, {r4-r5}
>        fldd    d16, [sp, #32]
>        vmov    d18, r4, r5  @ v8qi
>        vmull.u8        q9, d19, d18
>        adr     r5, .L6+8
>        ldmia   r5, {r4-r5}
>        vmov    d17, r4, r5  @ v8qi
>        vstmia  sp, {d18-d19}
>        vmlal.u8        q9, d16, d17
>        fldd    d16, [sp, #40]
>        adr     r5, .L6+16
>        ldmia   r5, {r4-r5}
>        vmov    d17, r4, r5  @ v8qi
>        vmlal.u8        q9, d16, d17
>        add     r3, r3, #1
>        vshrn.i16       d16, q9, #8
>        cmp     r3, r2
>        vst1.8  {d16}, [r0]!
>        bne     .L3
>
> With both patches applied, the inner loop is:
>
> .L3:
>        vld3.8  {d18-d20}, [r1]!
>        vmull.u8        q8, d18, d21
>        vmlal.u8        q8, d19, d22
>        vmlal.u8        q8, d20, d23
>        add     r3, r3, #1
>        vshrn.i16       d16, q8, #8
>        cmp     r3, r2
>        vst1.8  {d16}, [r0]!
>        bne     .L3
>
> Tested on arm-linux-gnueabi.  OK to install?

It looks reasonable given the past discussion, but - can you move forward
with the Neon stuff a bit to see if it really fits?  Or is this all
that is needed
for the load/store lane support as well (apart from vectorizer changes of
course).

Can you check the code generated by for example

float foo(char *p)
{
  float a[2];
  int i;
  ((char *)a)[0] = p[0];
  ((char *)a)[1] = p[1];
  ((char *)a)[2] = p[2];
  ((char *)a)[3] = p[3];
  ((char *)a)[4] = p[4];
  ((char *)a)[5] = p[5];
  ((char *)a)[6] = p[6];
  ((char *)a)[7] = p[7];
  return a[0] + a[1];
}

for an array a that would get such a larger mode?  Thus, check what
happens with partial defs of different types (just to avoid ICEs like the
ones Jakub was fixing yesterday).

Thanks,
Richard.

> Richard
>
>
> gcc/
>        * hooks.h (hook_bool_mode_uhwi_false): Declare.
>        * hooks.c (hook_bool_mode_uhwi_false): New function.
>        * target.def (array_mode_supported_p): New hook.
>        * doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook.
>        * doc/tm.texi: Regenerate.
>        * stor-layout.c (mode_for_array): New function.
>        (layout_type): Use it.
>        * config/arm/arm.c (arm_array_mode_supported_p): New function.
>        (TARGET_ARRAY_MODE_SUPPORTED_P): Define.
>
> Index: gcc/hooks.h
> ===================================================================
> --- gcc/hooks.h 2011-03-31 10:57:26.000000000 +0100
> +++ gcc/hooks.h 2011-03-31 14:18:21.000000000 +0100
> @@ -34,6 +34,8 @@ extern bool hook_bool_mode_false (enum m
>  extern bool hook_bool_mode_true (enum machine_mode);
>  extern bool hook_bool_mode_const_rtx_false (enum machine_mode, const_rtx);
>  extern bool hook_bool_mode_const_rtx_true (enum machine_mode, const_rtx);
> +extern bool hook_bool_mode_uhwi_false (enum machine_mode,
> +                                      unsigned HOST_WIDE_INT);
>  extern bool hook_bool_tree_false (tree);
>  extern bool hook_bool_const_tree_false (const_tree);
>  extern bool hook_bool_tree_true (tree);
> Index: gcc/hooks.c
> ===================================================================
> --- gcc/hooks.c 2011-03-31 10:57:26.000000000 +0100
> +++ gcc/hooks.c 2011-03-31 14:18:21.000000000 +0100
> @@ -101,6 +101,15 @@ hook_bool_mode_const_rtx_true (enum mach
>   return true;
>  }
>
> +/* Generic hook that takes (enum machine_mode, unsigned HOST_WIDE_INT)
> +   and returns false.  */
> +bool
> +hook_bool_mode_uhwi_false (enum machine_mode mode ATTRIBUTE_UNUSED,
> +                          unsigned HOST_WIDE_INT value ATTRIBUTE_UNUSED)
> +{
> +  return false;
> +}
> +
>  /* Generic hook that takes (FILE *, const char *) and does nothing.  */
>  void
>  hook_void_FILEptr_constcharptr (FILE *a ATTRIBUTE_UNUSED, const char *b ATTRIBUTE_UNUSED)
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2011-03-31 10:57:26.000000000 +0100
> +++ gcc/target.def      2011-03-31 14:18:41.000000000 +0100
> @@ -1611,6 +1611,38 @@ DEFHOOK
>  bool, (enum machine_mode mode),
>  hook_bool_mode_false)
>
> +/* True if we should try to use a scalar mode to represent an array,
> +   overriding the usual MAX_FIXED_MODE limit.  */
> +DEFHOOK
> +(array_mode_supported_p,
> + "Return true if GCC should try to use a scalar mode to store an array\n\
> +of @var{nelems} elements, given that each element has mode @var{mode}.\n\
> +Returning true here overrides the usual @code{MAX_FIXED_MODE} limit\n\
> +and allows GCC to use any defined integer mode.\n\
> +\n\
> +One use of this hook is to support vector load and store operations\n\
> +that operate on several homogeneous vectors.  For example, ARM Neon\n\
> +has operations like:\n\
> +\n\
> +@smallexample\n\
> +int8x8x3_t vld3_s8 (const int8_t *)\n\
> +@end smallexample\n\
> +\n\
> +where the return type is defined as:\n\
> +\n\
> +@smallexample\n\
> +typedef struct int8x8x3_t\n\
> +@{\n\
> +  int8x8_t val[3];\n\
> +@} int8x8x3_t;\n\
> +@end smallexample\n\
> +\n\
> +If this hook allows @code{val} to have a scalar mode, then\n\
> +@code{int8x8x3_t} can have the same mode.  GCC can then store\n\
> +@code{int8x8x3_t}s in registers rather than forcing them onto the stack.",
> + bool, (enum machine_mode mode, unsigned HOST_WIDE_INT nelems),
> + hook_bool_mode_uhwi_false)
> +
>  /* Compute cost of moving data from a register of class FROM to one of
>    TO, using MODE.  */
>  DEFHOOK
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in  2011-03-29 10:32:08.000000000 +0100
> +++ gcc/doc/tm.texi.in  2011-03-31 14:27:42.000000000 +0100
> @@ -4271,6 +4271,8 @@ insns involving vector mode @var{mode}.
>  must have move patterns for this mode.
>  @end deftypefn
>
> +@hook TARGET_ARRAY_MODE_SUPPORTED_P
> +
>  @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
>  Define this to return nonzero for machine modes for which the port has
>  small register classes.  If this target hook returns nonzero for a given
> Index: gcc/stor-layout.c
> ===================================================================
> --- gcc/stor-layout.c   2011-03-31 10:57:26.000000000 +0100
> +++ gcc/stor-layout.c   2011-03-31 14:22:23.000000000 +0100
> @@ -546,6 +546,34 @@ get_mode_alignment (enum machine_mode mo
>   return MIN (BIGGEST_ALIGNMENT, MAX (1, mode_base_align[mode]*BITS_PER_UNIT));
>  }
>
> +/* Return the natural mode of an array, given that it is SIZE bytes in
> +   total and has elements of type ELEM_TYPE.  */
> +
> +static enum machine_mode
> +mode_for_array (tree elem_type, tree size)
> +{
> +  tree elem_size;
> +  unsigned HOST_WIDE_INT int_size, int_elem_size;
> +  bool limit_p;
> +
> +  /* One-element arrays get the component type's mode.  */
> +  elem_size = TYPE_SIZE (elem_type);
> +  if (simple_cst_equal (size, elem_size))
> +    return TYPE_MODE (elem_type);
> +
> +  limit_p = true;
> +  if (host_integerp (size, 1) && host_integerp (elem_size, 1))
> +    {
> +      int_size = tree_low_cst (size, 1);
> +      int_elem_size = tree_low_cst (elem_size, 1);
> +      if (int_elem_size > 0
> +         && int_size % int_elem_size == 0
> +         && targetm.array_mode_supported_p (TYPE_MODE (elem_type),
> +                                            int_size / int_elem_size))
> +       limit_p = false;
> +    }
> +  return mode_for_size_tree (size, MODE_INT, limit_p);
> +}
>
>  /* Subroutine of layout_decl: Force alignment required for the data type.
>    But if the decl itself wants greater alignment, don't override that.  */
> @@ -2039,14 +2067,8 @@ layout_type (tree type)
>            && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
>                || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
>          {
> -           /* One-element arrays get the component type's mode.  */
> -           if (simple_cst_equal (TYPE_SIZE (type),
> -                                 TYPE_SIZE (TREE_TYPE (type))))
> -             SET_TYPE_MODE (type, TYPE_MODE (TREE_TYPE (type)));
> -           else
> -             SET_TYPE_MODE (type, mode_for_size_tree (TYPE_SIZE (type),
> -                                                      MODE_INT, 1));
> -
> +           SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> +                                                TYPE_SIZE (type)));
>            if (TYPE_MODE (type) != BLKmode
>                && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
>                && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c        2011-03-31 14:10:12.000000000 +0100
> +++ gcc/config/arm/arm.c        2011-03-31 14:18:21.000000000 +0100
> @@ -243,6 +243,8 @@ static rtx arm_pic_static_addr (rtx orig
>  static bool cortex_a9_sched_adjust_cost (rtx, rtx, rtx, int *);
>  static bool xscale_sched_adjust_cost (rtx, rtx, rtx, int *);
>  static bool fa726te_sched_adjust_cost (rtx, rtx, rtx, int *);
> +static bool arm_array_mode_supported_p (enum machine_mode,
> +                                       unsigned HOST_WIDE_INT);
>  static enum machine_mode arm_preferred_simd_mode (enum machine_mode);
>  static bool arm_class_likely_spilled_p (reg_class_t);
>  static bool arm_vector_alignment_reachable (const_tree type, bool is_packed);
> @@ -403,6 +405,8 @@ #define TARGET_ADDRESS_COST arm_address_
>  #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p
> +#undef TARGET_ARRAY_MODE_SUPPORTED_P
> +#define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
>  #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> @@ -22377,6 +22381,20 @@ arm_vector_mode_supported_p (enum machin
>   return false;
>  }
>
> +/* Implements target hook array_mode_supported_p.  */
> +
> +static bool
> +arm_array_mode_supported_p (enum machine_mode mode,
> +                           unsigned HOST_WIDE_INT nelems)
> +{
> +  if (TARGET_NEON
> +      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
> +      && (nelems >= 2 && nelems <= 4))
> +    return true;
> +
> +  return false;
> +}
> +
>  /* Use the option -mvectorize-with-neon-quad to override the use of doubleword
>    registers when autovectorizing for Neon, at least until multiple vector
>    widths are supported properly by the middle-end.  */
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Add an array_mode_supported_p target hook
  2011-03-31 13:56 ` Richard Guenther
@ 2011-03-31 14:56   ` Richard Sandiford
  2011-04-21 10:43     ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2011-03-31 14:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

Richard Guenther <richard.guenther@gmail.com> writes:
> On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds an array_mode_supported_p hook, which says whether
>> MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
>> It follows on from the discussion here:
>>
>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
>>
>> The intended use of the hook is to allow small arrays of vectors
>> to have a non-BLK mode, and hence to be stored in rtl registers.
>> These arrays are used both in the ARM arm_neon.h API and in the
>> optabs proposed in:
>>
>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
>>
>> The tail end of the thread was about the definition of TYPE_MODE:
>>
>> #define TYPE_MODE(NODE) \
>>  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
>>   ? vector_type_mode (NODE) : (NODE)->type.mode)
>>
>> with this outcome:
>>
>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html
>>
>> To summarise my take on it:
>>
>> - The current definition of TYPE_MODE isn't sufficient even for vector
>>  modes and vector_mode_supported_p, because non-vector types can have
>>  vector modes.
>>
>> - We should no longer treat types as having one mode everywhere.
>>  We should instead replace TYPE_MODE with a function that takes
>>  a context.  Tests of things like vector_mode_supported_p would
>>  move from layout_type to this new function.
>>
>> I think this patch fits within that scheme.  array_mode_supported_p
>> would be treated in the same way as vector_mode_supported_p.
>>
>> I realise the ideal would be to get rid of TYPE_MODE first.
>> But that's going to be a longer-term thing.  Now that there's
>> at least a plan, I'd like to press ahead with the array stuff
>> on the basis that
>>
>> (a) although the new hook won't work with the "target" attribute,
>>    our current mode handling doesn't work in just the same way.
>>
>> (b) the new hook doesn't interfere with the plan.
>>
>> (c) getting good code from the intrinsics (and support for these
>>    instructions in the vectoriser) is going to be much more important
>>    to most ARM users than the ability to turn Neon on and off for
>>    individual functions in a TU.
>>
>> To give an example of the difference, the Neon code posted here:
>>
>>    http://hilbert-space.de/?p=22
>>
>> produces this inner loop before the patch (but with
>> http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):
>>
>> .L3:
>>        vld3.8  {d16-d18}, [r1]!
>>        vstmia  ip, {d16-d18}
>>        fldd    d19, [sp, #24]
>>        adr     r5, .L6
>>        ldmia   r5, {r4-r5}
>>        fldd    d16, [sp, #32]
>>        vmov    d18, r4, r5  @ v8qi
>>        vmull.u8        q9, d19, d18
>>        adr     r5, .L6+8
>>        ldmia   r5, {r4-r5}
>>        vmov    d17, r4, r5  @ v8qi
>>        vstmia  sp, {d18-d19}
>>        vmlal.u8        q9, d16, d17
>>        fldd    d16, [sp, #40]
>>        adr     r5, .L6+16
>>        ldmia   r5, {r4-r5}
>>        vmov    d17, r4, r5  @ v8qi
>>        vmlal.u8        q9, d16, d17
>>        add     r3, r3, #1
>>        vshrn.i16       d16, q9, #8
>>        cmp     r3, r2
>>        vst1.8  {d16}, [r0]!
>>        bne     .L3
>>
>> With both patches applied, the inner loop is:
>>
>> .L3:
>>        vld3.8  {d18-d20}, [r1]!
>>        vmull.u8        q8, d18, d21
>>        vmlal.u8        q8, d19, d22
>>        vmlal.u8        q8, d20, d23
>>        add     r3, r3, #1
>>        vshrn.i16       d16, q8, #8
>>        cmp     r3, r2
>>        vst1.8  {d16}, [r0]!
>>        bne     .L3
>>
>> Tested on arm-linux-gnueabi.  OK to install?
>
> It looks reasonable given the past discussion, but - can you move forward
> with the Neon stuff a bit to see if it really fits?  Or is this all
> that is needed
> for the load/store lane support as well (apart from vectorizer changes of
> course).

Yeah, I have a prototype that hacks up some C support for generating the
(otherwise internal-only) load/store built-in functions that the vectoriser
is suppsoed to generate.  This patch is all that seems to be needed for the
types and optabs generation to work in the natural way.

I'm happy to leave it until the vectoriser stuff is in a more
submittable state though.  Especially given:

> Can you check the code generated by for example
>
> float foo(char *p)
> {
>   float a[2];
>   int i;
>   ((char *)a)[0] = p[0];
>   ((char *)a)[1] = p[1];
>   ((char *)a)[2] = p[2];
>   ((char *)a)[3] = p[3];
>   ((char *)a)[4] = p[4];
>   ((char *)a)[5] = p[5];
>   ((char *)a)[6] = p[6];
>   ((char *)a)[7] = p[7];
>   return a[0] + a[1];
> }
>
> for an array a that would get such a larger mode?  Thus, check what
> happens with partial defs of different types (just to avoid ICEs like the
> ones Jakub was fixing yesterday).

OK, I tried:

#include "arm_neon.h"

uint32x2_t foo(char *p)
{
  uint32x2_t a[2];
  int i;
  ((char *)a)[0] = p[0];
  ((char *)a)[1] = p[1];
  ((char *)a)[2] = p[2];
  ((char *)a)[3] = p[3];
  ((char *)a)[4] = p[4];
  ((char *)a)[5] = p[5];
  ((char *)a)[6] = p[6];
  ((char *)a)[7] = p[7];
  ((char *)a)[8] = p[8];
  ((char *)a)[9] = p[9];
  ((char *)a)[10] = p[10];
  ((char *)a)[11] = p[11];
  ((char *)a)[12] = p[12];
  ((char *)a)[13] = p[13];
  ((char *)a)[14] = p[14];
  ((char *)a)[15] = p[15];
  return vadd_u32 (a[0], a[1]);
}

uint32x4_t bar(char *p, uint32x4_t *b)
{
  uint32x4_t a[2];
  int i;
  ((char *)a)[0] = p[0];
  ((char *)a)[1] = p[1];
  ((char *)a)[2] = p[2];
  ((char *)a)[3] = p[3];
  ((char *)a)[4] = p[4];
  ((char *)a)[5] = p[5];
  ((char *)a)[6] = p[6];
  ((char *)a)[7] = p[7];
  ((char *)a)[8] = p[8];
  ((char *)a)[9] = p[9];
  ((char *)a)[10] = p[10];
  ((char *)a)[11] = p[11];
  ((char *)a)[12] = p[12];
  ((char *)a)[13] = p[13];
  ((char *)a)[14] = p[14];
  ((char *)a)[15] = p[15];
  ((char *)a)[16 + 0] = p[16 + 0];
  ((char *)a)[16 + 1] = p[16 + 1];
  ((char *)a)[16 + 2] = p[16 + 2];
  ((char *)a)[16 + 3] = p[16 + 3];
  ((char *)a)[16 + 4] = p[16 + 4];
  ((char *)a)[16 + 5] = p[16 + 5];
  ((char *)a)[16 + 6] = p[16 + 6];
  ((char *)a)[16 + 7] = p[16 + 7];
  ((char *)a)[16 + 8] = p[16 + 8];
  ((char *)a)[16 + 9] = p[16 + 9];
  ((char *)a)[16 + 10] = p[16 + 10];
  ((char *)a)[16 + 11] = p[16 + 11];
  ((char *)a)[16 + 12] = p[16 + 12];
  ((char *)a)[16 + 13] = p[16 + 13];
  ((char *)a)[16 + 14] = p[16 + 14];
  ((char *)a)[16 + 15] = p[16 + 15];
  return vaddq_u32 (a[0], a[1]);
}

It seemed to avoid the problem Jakub was seeing, but the second function
hit the known const_int reload failure for these modes:

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46329

Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Add an array_mode_supported_p target hook
  2011-03-31 14:56   ` Richard Sandiford
@ 2011-04-21 10:43     ` Richard Sandiford
  2011-04-21 11:28       ` Richard Guenther
  2011-05-06 10:29       ` Richard Earnshaw
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Sandiford @ 2011-04-21 10:43 UTC (permalink / raw)
  To: Richard Guenther
  Cc: gcc-patches, nickc, richard.earnshaw, paul, ramana.radhakrishnan

To get back to this...

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds an array_mode_supported_p hook, which says whether
>>> MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
>>> It follows on from the discussion here:
>>>
>>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
>>>
>>> The intended use of the hook is to allow small arrays of vectors
>>> to have a non-BLK mode, and hence to be stored in rtl registers.
>>> These arrays are used both in the ARM arm_neon.h API and in the
>>> optabs proposed in:
>>>
>>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
>>>
>>> The tail end of the thread was about the definition of TYPE_MODE:
>>>
>>> #define TYPE_MODE(NODE) \
>>>  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
>>>   ? vector_type_mode (NODE) : (NODE)->type.mode)
>>>
>>> with this outcome:
>>>
>>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html
>>>
>>> To summarise my take on it:
>>>
>>> - The current definition of TYPE_MODE isn't sufficient even for vector
>>>  modes and vector_mode_supported_p, because non-vector types can have
>>>  vector modes.
>>>
>>> - We should no longer treat types as having one mode everywhere.
>>>  We should instead replace TYPE_MODE with a function that takes
>>>  a context.  Tests of things like vector_mode_supported_p would
>>>  move from layout_type to this new function.
>>>
>>> I think this patch fits within that scheme.  array_mode_supported_p
>>> would be treated in the same way as vector_mode_supported_p.
>>>
>>> I realise the ideal would be to get rid of TYPE_MODE first.
>>> But that's going to be a longer-term thing.  Now that there's
>>> at least a plan, I'd like to press ahead with the array stuff
>>> on the basis that
>>>
>>> (a) although the new hook won't work with the "target" attribute,
>>>    our current mode handling doesn't work in just the same way.
>>>
>>> (b) the new hook doesn't interfere with the plan.
>>>
>>> (c) getting good code from the intrinsics (and support for these
>>>    instructions in the vectoriser) is going to be much more important
>>>    to most ARM users than the ability to turn Neon on and off for
>>>    individual functions in a TU.
>>>
>>> To give an example of the difference, the Neon code posted here:
>>>
>>>    http://hilbert-space.de/?p=22
>>>
>>> produces this inner loop before the patch (but with
>>> http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):
>>>
>>> .L3:
>>>        vld3.8  {d16-d18}, [r1]!
>>>        vstmia  ip, {d16-d18}
>>>        fldd    d19, [sp, #24]
>>>        adr     r5, .L6
>>>        ldmia   r5, {r4-r5}
>>>        fldd    d16, [sp, #32]
>>>        vmov    d18, r4, r5  @ v8qi
>>>        vmull.u8        q9, d19, d18
>>>        adr     r5, .L6+8
>>>        ldmia   r5, {r4-r5}
>>>        vmov    d17, r4, r5  @ v8qi
>>>        vstmia  sp, {d18-d19}
>>>        vmlal.u8        q9, d16, d17
>>>        fldd    d16, [sp, #40]
>>>        adr     r5, .L6+16
>>>        ldmia   r5, {r4-r5}
>>>        vmov    d17, r4, r5  @ v8qi
>>>        vmlal.u8        q9, d16, d17
>>>        add     r3, r3, #1
>>>        vshrn.i16       d16, q9, #8
>>>        cmp     r3, r2
>>>        vst1.8  {d16}, [r0]!
>>>        bne     .L3
>>>
>>> With both patches applied, the inner loop is:
>>>
>>> .L3:
>>>        vld3.8  {d18-d20}, [r1]!
>>>        vmull.u8        q8, d18, d21
>>>        vmlal.u8        q8, d19, d22
>>>        vmlal.u8        q8, d20, d23
>>>        add     r3, r3, #1
>>>        vshrn.i16       d16, q8, #8
>>>        cmp     r3, r2
>>>        vst1.8  {d16}, [r0]!
>>>        bne     .L3
>>>
>>> Tested on arm-linux-gnueabi.  OK to install?
>>
>> It looks reasonable given the past discussion, but - can you move forward
>> with the Neon stuff a bit to see if it really fits?  Or is this all
>> that is needed
>> for the load/store lane support as well (apart from vectorizer changes of
>> course).
>
> Yeah, I have a prototype that hacks up some C support for generating the
> (otherwise internal-only) load/store built-in functions that the vectoriser
> is suppsoed to generate.  This patch is all that seems to be needed for the
> types and optabs generation to work in the natural way.
>
> I'm happy to leave it until the vectoriser stuff is in a more
> submittable state though.

The vectorisation stuff has now been approved and uses this hook to
detect whether interleaved loads & stores are supported.  Also...

> Especially given:
>
>> Can you check the code generated by for example
>>
>> float foo(char *p)
>> {
>>   float a[2];
>>   int i;
>>   ((char *)a)[0] = p[0];
>>   ((char *)a)[1] = p[1];
>>   ((char *)a)[2] = p[2];
>>   ((char *)a)[3] = p[3];
>>   ((char *)a)[4] = p[4];
>>   ((char *)a)[5] = p[5];
>>   ((char *)a)[6] = p[6];
>>   ((char *)a)[7] = p[7];
>>   return a[0] + a[1];
>> }
>>
>> for an array a that would get such a larger mode?  Thus, check what
>> happens with partial defs of different types (just to avoid ICEs like the
>> ones Jakub was fixing yesterday).
>
> OK, I tried:
>
> #include "arm_neon.h"
>
> uint32x2_t foo(char *p)
> {
>   uint32x2_t a[2];
>   int i;
>   ((char *)a)[0] = p[0];
>   ((char *)a)[1] = p[1];
>   ((char *)a)[2] = p[2];
>   ((char *)a)[3] = p[3];
>   ((char *)a)[4] = p[4];
>   ((char *)a)[5] = p[5];
>   ((char *)a)[6] = p[6];
>   ((char *)a)[7] = p[7];
>   ((char *)a)[8] = p[8];
>   ((char *)a)[9] = p[9];
>   ((char *)a)[10] = p[10];
>   ((char *)a)[11] = p[11];
>   ((char *)a)[12] = p[12];
>   ((char *)a)[13] = p[13];
>   ((char *)a)[14] = p[14];
>   ((char *)a)[15] = p[15];
>   return vadd_u32 (a[0], a[1]);
> }
>
> uint32x4_t bar(char *p, uint32x4_t *b)
> {
>   uint32x4_t a[2];
>   int i;
>   ((char *)a)[0] = p[0];
>   ((char *)a)[1] = p[1];
>   ((char *)a)[2] = p[2];
>   ((char *)a)[3] = p[3];
>   ((char *)a)[4] = p[4];
>   ((char *)a)[5] = p[5];
>   ((char *)a)[6] = p[6];
>   ((char *)a)[7] = p[7];
>   ((char *)a)[8] = p[8];
>   ((char *)a)[9] = p[9];
>   ((char *)a)[10] = p[10];
>   ((char *)a)[11] = p[11];
>   ((char *)a)[12] = p[12];
>   ((char *)a)[13] = p[13];
>   ((char *)a)[14] = p[14];
>   ((char *)a)[15] = p[15];
>   ((char *)a)[16 + 0] = p[16 + 0];
>   ((char *)a)[16 + 1] = p[16 + 1];
>   ((char *)a)[16 + 2] = p[16 + 2];
>   ((char *)a)[16 + 3] = p[16 + 3];
>   ((char *)a)[16 + 4] = p[16 + 4];
>   ((char *)a)[16 + 5] = p[16 + 5];
>   ((char *)a)[16 + 6] = p[16 + 6];
>   ((char *)a)[16 + 7] = p[16 + 7];
>   ((char *)a)[16 + 8] = p[16 + 8];
>   ((char *)a)[16 + 9] = p[16 + 9];
>   ((char *)a)[16 + 10] = p[16 + 10];
>   ((char *)a)[16 + 11] = p[16 + 11];
>   ((char *)a)[16 + 12] = p[16 + 12];
>   ((char *)a)[16 + 13] = p[16 + 13];
>   ((char *)a)[16 + 14] = p[16 + 14];
>   ((char *)a)[16 + 15] = p[16 + 15];
>   return vaddq_u32 (a[0], a[1]);
> }
>
> It seemed to avoid the problem Jakub was seeing, but the second function
> hit the known const_int reload failure for these modes:
>
>     http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46329

...I've just committed the fix for this PR.  Thanks to everyone for
all the reviews.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  Do the
target-independent bits look OK?  How about the ARM bits?

Thanks,
Richard


gcc/
	* hooks.h (hook_bool_mode_uhwi_false): Declare.
	* hooks.c (hook_bool_mode_uhwi_false): New function.
	* target.def (array_mode_supported_p): New hook.
	* doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook.
	* doc/tm.texi: Regenerate.
	* stor-layout.c (mode_for_array): New function.
	(layout_type): Use it.
	* config/arm/arm.c (arm_array_mode_supported_p): New function.
	(TARGET_ARRAY_MODE_SUPPORTED_P): Define.

Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h	2011-04-21 10:47:30.000000000 +0100
+++ gcc/hooks.h	2011-04-21 10:47:48.000000000 +0100
@@ -36,6 +36,8 @@ extern bool hook_bool_mode_const_rtx_fal
 extern bool hook_bool_mode_const_rtx_true (enum machine_mode, const_rtx);
 extern bool hook_bool_mode_rtx_false (enum machine_mode, rtx);
 extern bool hook_bool_mode_rtx_true (enum machine_mode, rtx);
+extern bool hook_bool_mode_uhwi_false (enum machine_mode,
+				       unsigned HOST_WIDE_INT);
 extern bool hook_bool_tree_false (tree);
 extern bool hook_bool_const_tree_false (const_tree);
 extern bool hook_bool_tree_true (tree);
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c	2011-04-21 10:47:30.000000000 +0100
+++ gcc/hooks.c	2011-04-21 10:47:48.000000000 +0100
@@ -117,6 +117,15 @@ hook_bool_mode_rtx_true (enum machine_mo
   return true;
 }
 
+/* Generic hook that takes (enum machine_mode, unsigned HOST_WIDE_INT)
+   and returns false.  */
+bool
+hook_bool_mode_uhwi_false (enum machine_mode mode ATTRIBUTE_UNUSED,
+			   unsigned HOST_WIDE_INT value ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 /* Generic hook that takes (FILE *, const char *) and does nothing.  */
 void
 hook_void_FILEptr_constcharptr (FILE *a ATTRIBUTE_UNUSED, const char *b ATTRIBUTE_UNUSED)
Index: gcc/target.def
===================================================================
--- gcc/target.def	2011-04-21 10:47:30.000000000 +0100
+++ gcc/target.def	2011-04-21 10:47:48.000000000 +0100
@@ -1565,6 +1565,38 @@ DEFHOOK
  bool, (enum machine_mode mode),
  hook_bool_mode_false)
 
+/* True if we should try to use a scalar mode to represent an array,
+   overriding the usual MAX_FIXED_MODE limit.  */
+DEFHOOK
+(array_mode_supported_p,
+ "Return true if GCC should try to use a scalar mode to store an array\n\
+of @var{nelems} elements, given that each element has mode @var{mode}.\n\
+Returning true here overrides the usual @code{MAX_FIXED_MODE} limit\n\
+and allows GCC to use any defined integer mode.\n\
+\n\
+One use of this hook is to support vector load and store operations\n\
+that operate on several homogeneous vectors.  For example, ARM NEON\n\
+has operations like:\n\
+\n\
+@smallexample\n\
+int8x8x3_t vld3_s8 (const int8_t *)\n\
+@end smallexample\n\
+\n\
+where the return type is defined as:\n\
+\n\
+@smallexample\n\
+typedef struct int8x8x3_t\n\
+@{\n\
+  int8x8_t val[3];\n\
+@} int8x8x3_t;\n\
+@end smallexample\n\
+\n\
+If this hook allows @code{val} to have a scalar mode, then\n\
+@code{int8x8x3_t} can have the same mode.  GCC can then store\n\
+@code{int8x8x3_t}s in registers rather than forcing them onto the stack.",
+ bool, (enum machine_mode mode, unsigned HOST_WIDE_INT nelems),
+ hook_bool_mode_uhwi_false)
+
 /* Compute cost of moving data from a register of class FROM to one of
    TO, using MODE.  */
 DEFHOOK
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2011-04-21 10:47:30.000000000 +0100
+++ gcc/doc/tm.texi.in	2011-04-21 10:47:48.000000000 +0100
@@ -4263,6 +4263,8 @@ insns involving vector mode @var{mode}. 
 must have move patterns for this mode.
 @end deftypefn
 
+@hook TARGET_ARRAY_MODE_SUPPORTED_P
+
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 Define this to return nonzero for machine modes for which the port has
 small register classes.  If this target hook returns nonzero for a given
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2011-04-21 10:47:30.000000000 +0100
+++ gcc/doc/tm.texi	2011-04-21 10:47:48.000000000 +0100
@@ -4277,6 +4277,34 @@ insns involving vector mode @var{mode}. 
 must have move patterns for this mode.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_ARRAY_MODE_SUPPORTED_P (enum machine_mode @var{mode}, unsigned HOST_WIDE_INT @var{nelems})
+Return true if GCC should try to use a scalar mode to store an array
+of @var{nelems} elements, given that each element has mode @var{mode}.
+Returning true here overrides the usual @code{MAX_FIXED_MODE} limit
+and allows GCC to use any defined integer mode.
+
+One use of this hook is to support vector load and store operations
+that operate on several homogeneous vectors.  For example, ARM NEON
+has operations like:
+
+@smallexample
+int8x8x3_t vld3_s8 (const int8_t *)
+@end smallexample
+
+where the return type is defined as:
+
+@smallexample
+typedef struct int8x8x3_t
+@{
+  int8x8_t val[3];
+@} int8x8x3_t;
+@end smallexample
+
+If this hook allows @code{val} to have a scalar mode, then
+@code{int8x8x3_t} can have the same mode.  GCC can then store
+@code{int8x8x3_t}s in registers rather than forcing them onto the stack.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P (enum machine_mode @var{mode})
 Define this to return nonzero for machine modes for which the port has
 small register classes.  If this target hook returns nonzero for a given
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2011-04-21 10:47:30.000000000 +0100
+++ gcc/stor-layout.c	2011-04-21 10:47:48.000000000 +0100
@@ -546,6 +546,34 @@ get_mode_alignment (enum machine_mode mo
   return MIN (BIGGEST_ALIGNMENT, MAX (1, mode_base_align[mode]*BITS_PER_UNIT));
 }
 
+/* Return the natural mode of an array, given that it is SIZE bytes in
+   total and has elements of type ELEM_TYPE.  */
+
+static enum machine_mode
+mode_for_array (tree elem_type, tree size)
+{
+  tree elem_size;
+  unsigned HOST_WIDE_INT int_size, int_elem_size;
+  bool limit_p;
+
+  /* One-element arrays get the component type's mode.  */
+  elem_size = TYPE_SIZE (elem_type);
+  if (simple_cst_equal (size, elem_size))
+    return TYPE_MODE (elem_type);
+
+  limit_p = true;
+  if (host_integerp (size, 1) && host_integerp (elem_size, 1))
+    {
+      int_size = tree_low_cst (size, 1);
+      int_elem_size = tree_low_cst (elem_size, 1);
+      if (int_elem_size > 0
+	  && int_size % int_elem_size == 0
+	  && targetm.array_mode_supported_p (TYPE_MODE (elem_type),
+					     int_size / int_elem_size))
+	limit_p = false;
+    }
+  return mode_for_size_tree (size, MODE_INT, limit_p);
+}
 \f
 /* Subroutine of layout_decl: Force alignment required for the data type.
    But if the decl itself wants greater alignment, don't override that.  */
@@ -2040,14 +2068,8 @@ layout_type (tree type)
 	    && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
 		|| TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
 	  {
-	    /* One-element arrays get the component type's mode.  */
-	    if (simple_cst_equal (TYPE_SIZE (type),
-				  TYPE_SIZE (TREE_TYPE (type))))
-	      SET_TYPE_MODE (type, TYPE_MODE (TREE_TYPE (type)));
-	    else
-	      SET_TYPE_MODE (type, mode_for_size_tree (TYPE_SIZE (type),
-						       MODE_INT, 1));
-
+	    SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
+						 TYPE_SIZE (type)));
 	    if (TYPE_MODE (type) != BLKmode
 		&& STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
 		&& TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2011-04-21 10:47:30.000000000 +0100
+++ gcc/config/arm/arm.c	2011-04-21 10:47:48.000000000 +0100
@@ -243,6 +243,8 @@ static rtx arm_pic_static_addr (rtx orig
 static bool cortex_a9_sched_adjust_cost (rtx, rtx, rtx, int *);
 static bool xscale_sched_adjust_cost (rtx, rtx, rtx, int *);
 static bool fa726te_sched_adjust_cost (rtx, rtx, rtx, int *);
+static bool arm_array_mode_supported_p (enum machine_mode,
+					unsigned HOST_WIDE_INT);
 static enum machine_mode arm_preferred_simd_mode (enum machine_mode);
 static bool arm_class_likely_spilled_p (reg_class_t);
 static bool arm_vector_alignment_reachable (const_tree type, bool is_packed);
@@ -399,6 +401,8 @@ #define TARGET_ADDRESS_COST arm_address_
 #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p
+#undef TARGET_ARRAY_MODE_SUPPORTED_P
+#define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
 #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
@@ -22514,6 +22518,20 @@ arm_vector_mode_supported_p (enum machin
   return false;
 }
 
+/* Implements target hook array_mode_supported_p.  */
+
+static bool
+arm_array_mode_supported_p (enum machine_mode mode,
+			    unsigned HOST_WIDE_INT nelems)
+{
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
+      && (nelems >= 2 && nelems <= 4))
+    return true;
+
+  return false;
+}
+
 /* Use the option -mvectorize-with-neon-quad to override the use of doubleword
    registers when autovectorizing for Neon, at least until multiple vector
    widths are supported properly by the middle-end.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Add an array_mode_supported_p target hook
  2011-04-21 10:43     ` Richard Sandiford
@ 2011-04-21 11:28       ` Richard Guenther
  2011-05-06 10:29       ` Richard Earnshaw
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Guenther @ 2011-04-21 11:28 UTC (permalink / raw)
  To: Richard Guenther, gcc-patches, nickc, richard.earnshaw, paul,
	ramana.radhakrishnan, richard.sandiford

On Thu, Apr 21, 2011 at 11:50 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> To get back to this...
>
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> Richard Guenther <richard.guenther@gmail.com> writes:
>>> On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch adds an array_mode_supported_p hook, which says whether
>>>> MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
>>>> It follows on from the discussion here:
>>>>
>>>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
>>>>
>>>> The intended use of the hook is to allow small arrays of vectors
>>>> to have a non-BLK mode, and hence to be stored in rtl registers.
>>>> These arrays are used both in the ARM arm_neon.h API and in the
>>>> optabs proposed in:
>>>>
>>>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
>>>>
>>>> The tail end of the thread was about the definition of TYPE_MODE:
>>>>
>>>> #define TYPE_MODE(NODE) \
>>>>  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
>>>>   ? vector_type_mode (NODE) : (NODE)->type.mode)
>>>>
>>>> with this outcome:
>>>>
>>>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html
>>>>
>>>> To summarise my take on it:
>>>>
>>>> - The current definition of TYPE_MODE isn't sufficient even for vector
>>>>  modes and vector_mode_supported_p, because non-vector types can have
>>>>  vector modes.
>>>>
>>>> - We should no longer treat types as having one mode everywhere.
>>>>  We should instead replace TYPE_MODE with a function that takes
>>>>  a context.  Tests of things like vector_mode_supported_p would
>>>>  move from layout_type to this new function.
>>>>
>>>> I think this patch fits within that scheme.  array_mode_supported_p
>>>> would be treated in the same way as vector_mode_supported_p.
>>>>
>>>> I realise the ideal would be to get rid of TYPE_MODE first.
>>>> But that's going to be a longer-term thing.  Now that there's
>>>> at least a plan, I'd like to press ahead with the array stuff
>>>> on the basis that
>>>>
>>>> (a) although the new hook won't work with the "target" attribute,
>>>>    our current mode handling doesn't work in just the same way.
>>>>
>>>> (b) the new hook doesn't interfere with the plan.
>>>>
>>>> (c) getting good code from the intrinsics (and support for these
>>>>    instructions in the vectoriser) is going to be much more important
>>>>    to most ARM users than the ability to turn Neon on and off for
>>>>    individual functions in a TU.
>>>>
>>>> To give an example of the difference, the Neon code posted here:
>>>>
>>>>    http://hilbert-space.de/?p=22
>>>>
>>>> produces this inner loop before the patch (but with
>>>> http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):
>>>>
>>>> .L3:
>>>>        vld3.8  {d16-d18}, [r1]!
>>>>        vstmia  ip, {d16-d18}
>>>>        fldd    d19, [sp, #24]
>>>>        adr     r5, .L6
>>>>        ldmia   r5, {r4-r5}
>>>>        fldd    d16, [sp, #32]
>>>>        vmov    d18, r4, r5  @ v8qi
>>>>        vmull.u8        q9, d19, d18
>>>>        adr     r5, .L6+8
>>>>        ldmia   r5, {r4-r5}
>>>>        vmov    d17, r4, r5  @ v8qi
>>>>        vstmia  sp, {d18-d19}
>>>>        vmlal.u8        q9, d16, d17
>>>>        fldd    d16, [sp, #40]
>>>>        adr     r5, .L6+16
>>>>        ldmia   r5, {r4-r5}
>>>>        vmov    d17, r4, r5  @ v8qi
>>>>        vmlal.u8        q9, d16, d17
>>>>        add     r3, r3, #1
>>>>        vshrn.i16       d16, q9, #8
>>>>        cmp     r3, r2
>>>>        vst1.8  {d16}, [r0]!
>>>>        bne     .L3
>>>>
>>>> With both patches applied, the inner loop is:
>>>>
>>>> .L3:
>>>>        vld3.8  {d18-d20}, [r1]!
>>>>        vmull.u8        q8, d18, d21
>>>>        vmlal.u8        q8, d19, d22
>>>>        vmlal.u8        q8, d20, d23
>>>>        add     r3, r3, #1
>>>>        vshrn.i16       d16, q8, #8
>>>>        cmp     r3, r2
>>>>        vst1.8  {d16}, [r0]!
>>>>        bne     .L3
>>>>
>>>> Tested on arm-linux-gnueabi.  OK to install?
>>>
>>> It looks reasonable given the past discussion, but - can you move forward
>>> with the Neon stuff a bit to see if it really fits?  Or is this all
>>> that is needed
>>> for the load/store lane support as well (apart from vectorizer changes of
>>> course).
>>
>> Yeah, I have a prototype that hacks up some C support for generating the
>> (otherwise internal-only) load/store built-in functions that the vectoriser
>> is suppsoed to generate.  This patch is all that seems to be needed for the
>> types and optabs generation to work in the natural way.
>>
>> I'm happy to leave it until the vectoriser stuff is in a more
>> submittable state though.
>
> The vectorisation stuff has now been approved and uses this hook to
> detect whether interleaved loads & stores are supported.  Also...
>
>> Especially given:
>>
>>> Can you check the code generated by for example
>>>
>>> float foo(char *p)
>>> {
>>>   float a[2];
>>>   int i;
>>>   ((char *)a)[0] = p[0];
>>>   ((char *)a)[1] = p[1];
>>>   ((char *)a)[2] = p[2];
>>>   ((char *)a)[3] = p[3];
>>>   ((char *)a)[4] = p[4];
>>>   ((char *)a)[5] = p[5];
>>>   ((char *)a)[6] = p[6];
>>>   ((char *)a)[7] = p[7];
>>>   return a[0] + a[1];
>>> }
>>>
>>> for an array a that would get such a larger mode?  Thus, check what
>>> happens with partial defs of different types (just to avoid ICEs like the
>>> ones Jakub was fixing yesterday).
>>
>> OK, I tried:
>>
>> #include "arm_neon.h"
>>
>> uint32x2_t foo(char *p)
>> {
>>   uint32x2_t a[2];
>>   int i;
>>   ((char *)a)[0] = p[0];
>>   ((char *)a)[1] = p[1];
>>   ((char *)a)[2] = p[2];
>>   ((char *)a)[3] = p[3];
>>   ((char *)a)[4] = p[4];
>>   ((char *)a)[5] = p[5];
>>   ((char *)a)[6] = p[6];
>>   ((char *)a)[7] = p[7];
>>   ((char *)a)[8] = p[8];
>>   ((char *)a)[9] = p[9];
>>   ((char *)a)[10] = p[10];
>>   ((char *)a)[11] = p[11];
>>   ((char *)a)[12] = p[12];
>>   ((char *)a)[13] = p[13];
>>   ((char *)a)[14] = p[14];
>>   ((char *)a)[15] = p[15];
>>   return vadd_u32 (a[0], a[1]);
>> }
>>
>> uint32x4_t bar(char *p, uint32x4_t *b)
>> {
>>   uint32x4_t a[2];
>>   int i;
>>   ((char *)a)[0] = p[0];
>>   ((char *)a)[1] = p[1];
>>   ((char *)a)[2] = p[2];
>>   ((char *)a)[3] = p[3];
>>   ((char *)a)[4] = p[4];
>>   ((char *)a)[5] = p[5];
>>   ((char *)a)[6] = p[6];
>>   ((char *)a)[7] = p[7];
>>   ((char *)a)[8] = p[8];
>>   ((char *)a)[9] = p[9];
>>   ((char *)a)[10] = p[10];
>>   ((char *)a)[11] = p[11];
>>   ((char *)a)[12] = p[12];
>>   ((char *)a)[13] = p[13];
>>   ((char *)a)[14] = p[14];
>>   ((char *)a)[15] = p[15];
>>   ((char *)a)[16 + 0] = p[16 + 0];
>>   ((char *)a)[16 + 1] = p[16 + 1];
>>   ((char *)a)[16 + 2] = p[16 + 2];
>>   ((char *)a)[16 + 3] = p[16 + 3];
>>   ((char *)a)[16 + 4] = p[16 + 4];
>>   ((char *)a)[16 + 5] = p[16 + 5];
>>   ((char *)a)[16 + 6] = p[16 + 6];
>>   ((char *)a)[16 + 7] = p[16 + 7];
>>   ((char *)a)[16 + 8] = p[16 + 8];
>>   ((char *)a)[16 + 9] = p[16 + 9];
>>   ((char *)a)[16 + 10] = p[16 + 10];
>>   ((char *)a)[16 + 11] = p[16 + 11];
>>   ((char *)a)[16 + 12] = p[16 + 12];
>>   ((char *)a)[16 + 13] = p[16 + 13];
>>   ((char *)a)[16 + 14] = p[16 + 14];
>>   ((char *)a)[16 + 15] = p[16 + 15];
>>   return vaddq_u32 (a[0], a[1]);
>> }
>>
>> It seemed to avoid the problem Jakub was seeing, but the second function
>> hit the known const_int reload failure for these modes:
>>
>>     http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46329
>
> ...I've just committed the fix for this PR.  Thanks to everyone for
> all the reviews.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  Do the
> target-independent bits look OK?  How about the ARM bits?

The middle-end pieces look OK.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
>        * hooks.h (hook_bool_mode_uhwi_false): Declare.
>        * hooks.c (hook_bool_mode_uhwi_false): New function.
>        * target.def (array_mode_supported_p): New hook.
>        * doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook.
>        * doc/tm.texi: Regenerate.
>        * stor-layout.c (mode_for_array): New function.
>        (layout_type): Use it.
>        * config/arm/arm.c (arm_array_mode_supported_p): New function.
>        (TARGET_ARRAY_MODE_SUPPORTED_P): Define.
>
> Index: gcc/hooks.h
> ===================================================================
> --- gcc/hooks.h 2011-04-21 10:47:30.000000000 +0100
> +++ gcc/hooks.h 2011-04-21 10:47:48.000000000 +0100
> @@ -36,6 +36,8 @@ extern bool hook_bool_mode_const_rtx_fal
>  extern bool hook_bool_mode_const_rtx_true (enum machine_mode, const_rtx);
>  extern bool hook_bool_mode_rtx_false (enum machine_mode, rtx);
>  extern bool hook_bool_mode_rtx_true (enum machine_mode, rtx);
> +extern bool hook_bool_mode_uhwi_false (enum machine_mode,
> +                                      unsigned HOST_WIDE_INT);
>  extern bool hook_bool_tree_false (tree);
>  extern bool hook_bool_const_tree_false (const_tree);
>  extern bool hook_bool_tree_true (tree);
> Index: gcc/hooks.c
> ===================================================================
> --- gcc/hooks.c 2011-04-21 10:47:30.000000000 +0100
> +++ gcc/hooks.c 2011-04-21 10:47:48.000000000 +0100
> @@ -117,6 +117,15 @@ hook_bool_mode_rtx_true (enum machine_mo
>   return true;
>  }
>
> +/* Generic hook that takes (enum machine_mode, unsigned HOST_WIDE_INT)
> +   and returns false.  */
> +bool
> +hook_bool_mode_uhwi_false (enum machine_mode mode ATTRIBUTE_UNUSED,
> +                          unsigned HOST_WIDE_INT value ATTRIBUTE_UNUSED)
> +{
> +  return false;
> +}
> +
>  /* Generic hook that takes (FILE *, const char *) and does nothing.  */
>  void
>  hook_void_FILEptr_constcharptr (FILE *a ATTRIBUTE_UNUSED, const char *b ATTRIBUTE_UNUSED)
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2011-04-21 10:47:30.000000000 +0100
> +++ gcc/target.def      2011-04-21 10:47:48.000000000 +0100
> @@ -1565,6 +1565,38 @@ DEFHOOK
>  bool, (enum machine_mode mode),
>  hook_bool_mode_false)
>
> +/* True if we should try to use a scalar mode to represent an array,
> +   overriding the usual MAX_FIXED_MODE limit.  */
> +DEFHOOK
> +(array_mode_supported_p,
> + "Return true if GCC should try to use a scalar mode to store an array\n\
> +of @var{nelems} elements, given that each element has mode @var{mode}.\n\
> +Returning true here overrides the usual @code{MAX_FIXED_MODE} limit\n\
> +and allows GCC to use any defined integer mode.\n\
> +\n\
> +One use of this hook is to support vector load and store operations\n\
> +that operate on several homogeneous vectors.  For example, ARM NEON\n\
> +has operations like:\n\
> +\n\
> +@smallexample\n\
> +int8x8x3_t vld3_s8 (const int8_t *)\n\
> +@end smallexample\n\
> +\n\
> +where the return type is defined as:\n\
> +\n\
> +@smallexample\n\
> +typedef struct int8x8x3_t\n\
> +@{\n\
> +  int8x8_t val[3];\n\
> +@} int8x8x3_t;\n\
> +@end smallexample\n\
> +\n\
> +If this hook allows @code{val} to have a scalar mode, then\n\
> +@code{int8x8x3_t} can have the same mode.  GCC can then store\n\
> +@code{int8x8x3_t}s in registers rather than forcing them onto the stack.",
> + bool, (enum machine_mode mode, unsigned HOST_WIDE_INT nelems),
> + hook_bool_mode_uhwi_false)
> +
>  /* Compute cost of moving data from a register of class FROM to one of
>    TO, using MODE.  */
>  DEFHOOK
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in  2011-04-21 10:47:30.000000000 +0100
> +++ gcc/doc/tm.texi.in  2011-04-21 10:47:48.000000000 +0100
> @@ -4263,6 +4263,8 @@ insns involving vector mode @var{mode}.
>  must have move patterns for this mode.
>  @end deftypefn
>
> +@hook TARGET_ARRAY_MODE_SUPPORTED_P
> +
>  @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
>  Define this to return nonzero for machine modes for which the port has
>  small register classes.  If this target hook returns nonzero for a given
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi     2011-04-21 10:47:30.000000000 +0100
> +++ gcc/doc/tm.texi     2011-04-21 10:47:48.000000000 +0100
> @@ -4277,6 +4277,34 @@ insns involving vector mode @var{mode}.
>  must have move patterns for this mode.
>  @end deftypefn
>
> +@deftypefn {Target Hook} bool TARGET_ARRAY_MODE_SUPPORTED_P (enum machine_mode @var{mode}, unsigned HOST_WIDE_INT @var{nelems})
> +Return true if GCC should try to use a scalar mode to store an array
> +of @var{nelems} elements, given that each element has mode @var{mode}.
> +Returning true here overrides the usual @code{MAX_FIXED_MODE} limit
> +and allows GCC to use any defined integer mode.
> +
> +One use of this hook is to support vector load and store operations
> +that operate on several homogeneous vectors.  For example, ARM NEON
> +has operations like:
> +
> +@smallexample
> +int8x8x3_t vld3_s8 (const int8_t *)
> +@end smallexample
> +
> +where the return type is defined as:
> +
> +@smallexample
> +typedef struct int8x8x3_t
> +@{
> +  int8x8_t val[3];
> +@} int8x8x3_t;
> +@end smallexample
> +
> +If this hook allows @code{val} to have a scalar mode, then
> +@code{int8x8x3_t} can have the same mode.  GCC can then store
> +@code{int8x8x3_t}s in registers rather than forcing them onto the stack.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P (enum machine_mode @var{mode})
>  Define this to return nonzero for machine modes for which the port has
>  small register classes.  If this target hook returns nonzero for a given
> Index: gcc/stor-layout.c
> ===================================================================
> --- gcc/stor-layout.c   2011-04-21 10:47:30.000000000 +0100
> +++ gcc/stor-layout.c   2011-04-21 10:47:48.000000000 +0100
> @@ -546,6 +546,34 @@ get_mode_alignment (enum machine_mode mo
>   return MIN (BIGGEST_ALIGNMENT, MAX (1, mode_base_align[mode]*BITS_PER_UNIT));
>  }
>
> +/* Return the natural mode of an array, given that it is SIZE bytes in
> +   total and has elements of type ELEM_TYPE.  */
> +
> +static enum machine_mode
> +mode_for_array (tree elem_type, tree size)
> +{
> +  tree elem_size;
> +  unsigned HOST_WIDE_INT int_size, int_elem_size;
> +  bool limit_p;
> +
> +  /* One-element arrays get the component type's mode.  */
> +  elem_size = TYPE_SIZE (elem_type);
> +  if (simple_cst_equal (size, elem_size))
> +    return TYPE_MODE (elem_type);
> +
> +  limit_p = true;
> +  if (host_integerp (size, 1) && host_integerp (elem_size, 1))
> +    {
> +      int_size = tree_low_cst (size, 1);
> +      int_elem_size = tree_low_cst (elem_size, 1);
> +      if (int_elem_size > 0
> +         && int_size % int_elem_size == 0
> +         && targetm.array_mode_supported_p (TYPE_MODE (elem_type),
> +                                            int_size / int_elem_size))
> +       limit_p = false;
> +    }
> +  return mode_for_size_tree (size, MODE_INT, limit_p);
> +}
>
>  /* Subroutine of layout_decl: Force alignment required for the data type.
>    But if the decl itself wants greater alignment, don't override that.  */
> @@ -2040,14 +2068,8 @@ layout_type (tree type)
>            && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
>                || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
>          {
> -           /* One-element arrays get the component type's mode.  */
> -           if (simple_cst_equal (TYPE_SIZE (type),
> -                                 TYPE_SIZE (TREE_TYPE (type))))
> -             SET_TYPE_MODE (type, TYPE_MODE (TREE_TYPE (type)));
> -           else
> -             SET_TYPE_MODE (type, mode_for_size_tree (TYPE_SIZE (type),
> -                                                      MODE_INT, 1));
> -
> +           SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> +                                                TYPE_SIZE (type)));
>            if (TYPE_MODE (type) != BLKmode
>                && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
>                && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c        2011-04-21 10:47:30.000000000 +0100
> +++ gcc/config/arm/arm.c        2011-04-21 10:47:48.000000000 +0100
> @@ -243,6 +243,8 @@ static rtx arm_pic_static_addr (rtx orig
>  static bool cortex_a9_sched_adjust_cost (rtx, rtx, rtx, int *);
>  static bool xscale_sched_adjust_cost (rtx, rtx, rtx, int *);
>  static bool fa726te_sched_adjust_cost (rtx, rtx, rtx, int *);
> +static bool arm_array_mode_supported_p (enum machine_mode,
> +                                       unsigned HOST_WIDE_INT);
>  static enum machine_mode arm_preferred_simd_mode (enum machine_mode);
>  static bool arm_class_likely_spilled_p (reg_class_t);
>  static bool arm_vector_alignment_reachable (const_tree type, bool is_packed);
> @@ -399,6 +401,8 @@ #define TARGET_ADDRESS_COST arm_address_
>  #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p
> +#undef TARGET_ARRAY_MODE_SUPPORTED_P
> +#define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
>  #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> @@ -22514,6 +22518,20 @@ arm_vector_mode_supported_p (enum machin
>   return false;
>  }
>
> +/* Implements target hook array_mode_supported_p.  */
> +
> +static bool
> +arm_array_mode_supported_p (enum machine_mode mode,
> +                           unsigned HOST_WIDE_INT nelems)
> +{
> +  if (TARGET_NEON
> +      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
> +      && (nelems >= 2 && nelems <= 4))
> +    return true;
> +
> +  return false;
> +}
> +
>  /* Use the option -mvectorize-with-neon-quad to override the use of doubleword
>    registers when autovectorizing for Neon, at least until multiple vector
>    widths are supported properly by the middle-end.  */
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Add an array_mode_supported_p target hook
  2011-04-21 10:43     ` Richard Sandiford
  2011-04-21 11:28       ` Richard Guenther
@ 2011-05-06 10:29       ` Richard Earnshaw
  2011-05-06 10:44         ` Richard Sandiford
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Earnshaw @ 2011-05-06 10:29 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Richard Guenther, gcc-patches, nickc, paul, ramana.radhakrishnan


On Thu, 2011-04-21 at 10:50 +0100, Richard Sandiford wrote:
> To get back to this...
> 
> Richard Sandiford <richard.sandiford@linaro.org> writes:
> > Richard Guenther <richard.guenther@gmail.com> writes:
> >> On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
> >> <richard.sandiford@linaro.org> wrote:
> >>> This patch adds an array_mode_supported_p hook, which says whether
> >>> MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
> >>> It follows on from the discussion here:
> >>>
> >>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
> >>>
> >>> The intended use of the hook is to allow small arrays of vectors
> >>> to have a non-BLK mode, and hence to be stored in rtl registers.
> >>> These arrays are used both in the ARM arm_neon.h API and in the
> >>> optabs proposed in:
> >>>
> >>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
> >>>
> >>> The tail end of the thread was about the definition of TYPE_MODE:
> >>>
> >>> #define TYPE_MODE(NODE) \
> >>>  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
> >>>   ? vector_type_mode (NODE) : (NODE)->type.mode)
> >>>
> >>> with this outcome:
> >>>
> >>>    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html
> >>>
> >>> To summarise my take on it:
> >>>
> >>> - The current definition of TYPE_MODE isn't sufficient even for vector
> >>>  modes and vector_mode_supported_p, because non-vector types can have
> >>>  vector modes.
> >>>
> >>> - We should no longer treat types as having one mode everywhere.
> >>>  We should instead replace TYPE_MODE with a function that takes
> >>>  a context.  Tests of things like vector_mode_supported_p would
> >>>  move from layout_type to this new function.
> >>>
> >>> I think this patch fits within that scheme.  array_mode_supported_p
> >>> would be treated in the same way as vector_mode_supported_p.
> >>>
> >>> I realise the ideal would be to get rid of TYPE_MODE first.
> >>> But that's going to be a longer-term thing.  Now that there's
> >>> at least a plan, I'd like to press ahead with the array stuff
> >>> on the basis that
> >>>
> >>> (a) although the new hook won't work with the "target" attribute,
> >>>    our current mode handling doesn't work in just the same way.
> >>>
> >>> (b) the new hook doesn't interfere with the plan.
> >>>
> >>> (c) getting good code from the intrinsics (and support for these
> >>>    instructions in the vectoriser) is going to be much more important
> >>>    to most ARM users than the ability to turn Neon on and off for
> >>>    individual functions in a TU.
> >>>
> >>> To give an example of the difference, the Neon code posted here:
> >>>
> >>>    http://hilbert-space.de/?p=22
> >>>
> >>> produces this inner loop before the patch (but with
> >>> http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):
> >>>
> >>> .L3:
> >>>        vld3.8  {d16-d18}, [r1]!
> >>>        vstmia  ip, {d16-d18}
> >>>        fldd    d19, [sp, #24]
> >>>        adr     r5, .L6
> >>>        ldmia   r5, {r4-r5}
> >>>        fldd    d16, [sp, #32]
> >>>        vmov    d18, r4, r5  @ v8qi
> >>>        vmull.u8        q9, d19, d18
> >>>        adr     r5, .L6+8
> >>>        ldmia   r5, {r4-r5}
> >>>        vmov    d17, r4, r5  @ v8qi
> >>>        vstmia  sp, {d18-d19}
> >>>        vmlal.u8        q9, d16, d17
> >>>        fldd    d16, [sp, #40]
> >>>        adr     r5, .L6+16
> >>>        ldmia   r5, {r4-r5}
> >>>        vmov    d17, r4, r5  @ v8qi
> >>>        vmlal.u8        q9, d16, d17
> >>>        add     r3, r3, #1
> >>>        vshrn.i16       d16, q9, #8
> >>>        cmp     r3, r2
> >>>        vst1.8  {d16}, [r0]!
> >>>        bne     .L3
> >>>
> >>> With both patches applied, the inner loop is:
> >>>
> >>> .L3:
> >>>        vld3.8  {d18-d20}, [r1]!
> >>>        vmull.u8        q8, d18, d21
> >>>        vmlal.u8        q8, d19, d22
> >>>        vmlal.u8        q8, d20, d23
> >>>        add     r3, r3, #1
> >>>        vshrn.i16       d16, q8, #8
> >>>        cmp     r3, r2
> >>>        vst1.8  {d16}, [r0]!
> >>>        bne     .L3
> >>>
> >>> Tested on arm-linux-gnueabi.  OK to install?
> >>
> >> It looks reasonable given the past discussion, but - can you move forward
> >> with the Neon stuff a bit to see if it really fits?  Or is this all
> >> that is needed
> >> for the load/store lane support as well (apart from vectorizer changes of
> >> course).
> >
> > Yeah, I have a prototype that hacks up some C support for generating the
> > (otherwise internal-only) load/store built-in functions that the vectoriser
> > is suppsoed to generate.  This patch is all that seems to be needed for the
> > types and optabs generation to work in the natural way.
> >
> > I'm happy to leave it until the vectoriser stuff is in a more
> > submittable state though.
> 
> The vectorisation stuff has now been approved and uses this hook to
> detect whether interleaved loads & stores are supported.  Also...
> 
> > Especially given:
> >
> >> Can you check the code generated by for example
> >>
> >> float foo(char *p)
> >> {
> >>   float a[2];
> >>   int i;
> >>   ((char *)a)[0] = p[0];
> >>   ((char *)a)[1] = p[1];
> >>   ((char *)a)[2] = p[2];
> >>   ((char *)a)[3] = p[3];
> >>   ((char *)a)[4] = p[4];
> >>   ((char *)a)[5] = p[5];
> >>   ((char *)a)[6] = p[6];
> >>   ((char *)a)[7] = p[7];
> >>   return a[0] + a[1];
> >> }
> >>
> >> for an array a that would get such a larger mode?  Thus, check what
> >> happens with partial defs of different types (just to avoid ICEs like the
> >> ones Jakub was fixing yesterday).
> >
> > OK, I tried:
> >
> > #include "arm_neon.h"
> >
> > uint32x2_t foo(char *p)
> > {
> >   uint32x2_t a[2];
> >   int i;
> >   ((char *)a)[0] = p[0];
> >   ((char *)a)[1] = p[1];
> >   ((char *)a)[2] = p[2];
> >   ((char *)a)[3] = p[3];
> >   ((char *)a)[4] = p[4];
> >   ((char *)a)[5] = p[5];
> >   ((char *)a)[6] = p[6];
> >   ((char *)a)[7] = p[7];
> >   ((char *)a)[8] = p[8];
> >   ((char *)a)[9] = p[9];
> >   ((char *)a)[10] = p[10];
> >   ((char *)a)[11] = p[11];
> >   ((char *)a)[12] = p[12];
> >   ((char *)a)[13] = p[13];
> >   ((char *)a)[14] = p[14];
> >   ((char *)a)[15] = p[15];
> >   return vadd_u32 (a[0], a[1]);
> > }
> >
> > uint32x4_t bar(char *p, uint32x4_t *b)
> > {
> >   uint32x4_t a[2];
> >   int i;
> >   ((char *)a)[0] = p[0];
> >   ((char *)a)[1] = p[1];
> >   ((char *)a)[2] = p[2];
> >   ((char *)a)[3] = p[3];
> >   ((char *)a)[4] = p[4];
> >   ((char *)a)[5] = p[5];
> >   ((char *)a)[6] = p[6];
> >   ((char *)a)[7] = p[7];
> >   ((char *)a)[8] = p[8];
> >   ((char *)a)[9] = p[9];
> >   ((char *)a)[10] = p[10];
> >   ((char *)a)[11] = p[11];
> >   ((char *)a)[12] = p[12];
> >   ((char *)a)[13] = p[13];
> >   ((char *)a)[14] = p[14];
> >   ((char *)a)[15] = p[15];
> >   ((char *)a)[16 + 0] = p[16 + 0];
> >   ((char *)a)[16 + 1] = p[16 + 1];
> >   ((char *)a)[16 + 2] = p[16 + 2];
> >   ((char *)a)[16 + 3] = p[16 + 3];
> >   ((char *)a)[16 + 4] = p[16 + 4];
> >   ((char *)a)[16 + 5] = p[16 + 5];
> >   ((char *)a)[16 + 6] = p[16 + 6];
> >   ((char *)a)[16 + 7] = p[16 + 7];
> >   ((char *)a)[16 + 8] = p[16 + 8];
> >   ((char *)a)[16 + 9] = p[16 + 9];
> >   ((char *)a)[16 + 10] = p[16 + 10];
> >   ((char *)a)[16 + 11] = p[16 + 11];
> >   ((char *)a)[16 + 12] = p[16 + 12];
> >   ((char *)a)[16 + 13] = p[16 + 13];
> >   ((char *)a)[16 + 14] = p[16 + 14];
> >   ((char *)a)[16 + 15] = p[16 + 15];
> >   return vaddq_u32 (a[0], a[1]);
> > }
> >
> > It seemed to avoid the problem Jakub was seeing, but the second function
> > hit the known const_int reload failure for these modes:
> >
> >     http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46329
> 
> ...I've just committed the fix for this PR.  Thanks to everyone for
> all the reviews.
> 
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  Do the
> target-independent bits look OK?  How about the ARM bits?
> 
> Thanks,
> Richard
> 
> 
> gcc/
> 	* hooks.h (hook_bool_mode_uhwi_false): Declare.
> 	* hooks.c (hook_bool_mode_uhwi_false): New function.
> 	* target.def (array_mode_supported_p): New hook.
> 	* doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook.
> 	* doc/tm.texi: Regenerate.
> 	* stor-layout.c (mode_for_array): New function.
> 	(layout_type): Use it.
> 	* config/arm/arm.c (arm_array_mode_supported_p): New function.
> 	(TARGET_ARRAY_MODE_SUPPORTED_P): Define.

> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c	2011-04-21 10:47:30.000000000 +0100
> +++ gcc/config/arm/arm.c	2011-04-21 10:47:48.000000000 +0100
> @@ -243,6 +243,8 @@ static rtx arm_pic_static_addr (rtx orig
>  static bool cortex_a9_sched_adjust_cost (rtx, rtx, rtx, int *);
>  static bool xscale_sched_adjust_cost (rtx, rtx, rtx, int *);
>  static bool fa726te_sched_adjust_cost (rtx, rtx, rtx, int *);
> +static bool arm_array_mode_supported_p (enum machine_mode,
> +					unsigned HOST_WIDE_INT);
>  static enum machine_mode arm_preferred_simd_mode (enum machine_mode);
>  static bool arm_class_likely_spilled_p (reg_class_t);
>  static bool arm_vector_alignment_reachable (const_tree type, bool is_packed);
> @@ -399,6 +401,8 @@ #define TARGET_ADDRESS_COST arm_address_
>  #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p
> +#undef TARGET_ARRAY_MODE_SUPPORTED_P
> +#define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
>  #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> @@ -22514,6 +22518,20 @@ arm_vector_mode_supported_p (enum machin
>    return false;
>  }
>  
> +/* Implements target hook array_mode_supported_p.  */
> +
> +static bool
> +arm_array_mode_supported_p (enum machine_mode mode,
> +			    unsigned HOST_WIDE_INT nelems)
> +{
> +  if (TARGET_NEON
> +      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
> +      && (nelems >= 2 && nelems <= 4))
> +    return true;
> +
> +  return false;
> +}

I'm not sure I understand why this is limited to 4 or fewer elements. A
Q reg of chars would surely be 16 elements.

R.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Add an array_mode_supported_p target hook
  2011-05-06 10:29       ` Richard Earnshaw
@ 2011-05-06 10:44         ` Richard Sandiford
  2011-05-06 10:51           ` Richard Earnshaw
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2011-05-06 10:44 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Guenther, gcc-patches, nickc, paul, ramana.radhakrishnan

Richard Earnshaw <rearnsha@arm.com> writes:
>> +/* Implements target hook array_mode_supported_p.  */
>> +
>> +static bool
>> +arm_array_mode_supported_p (enum machine_mode mode,
>> +			    unsigned HOST_WIDE_INT nelems)
>> +{
>> +  if (TARGET_NEON
>> +      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
>> +      && (nelems >= 2 && nelems <= 4))
>> +    return true;
>> +
>> +  return false;
>> +}
>
> I'm not sure I understand why this is limited to 4 or fewer elements. A
> Q reg of chars would surely be 16 elements.

The mode here is the mode of the array element, which for the cases
we're interested in would be something like V4HI (D) or V4SI (Q).
nelems says how many of those (in our case, vector) elements there
are in the array.

The element range we want is 1-4 because that matches the number
of vectors that can be loaded by the vld1-vld4 instructions.
We don't include 1 because arrays of one element are already
treated as having the same mode as their element.

Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Add an array_mode_supported_p target hook
  2011-05-06 10:44         ` Richard Sandiford
@ 2011-05-06 10:51           ` Richard Earnshaw
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Earnshaw @ 2011-05-06 10:51 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Richard Guenther, gcc-patches, nickc, paul, ramana.radhakrishnan


On Fri, 2011-05-06 at 11:35 +0100, Richard Sandiford wrote:
> Richard Earnshaw <rearnsha@arm.com> writes:
> >> +/* Implements target hook array_mode_supported_p.  */
> >> +
> >> +static bool
> >> +arm_array_mode_supported_p (enum machine_mode mode,
> >> +			    unsigned HOST_WIDE_INT nelems)
> >> +{
> >> +  if (TARGET_NEON
> >> +      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
> >> +      && (nelems >= 2 && nelems <= 4))
> >> +    return true;
> >> +
> >> +  return false;
> >> +}
> >
> > I'm not sure I understand why this is limited to 4 or fewer elements. A
> > Q reg of chars would surely be 16 elements.
> 
> The mode here is the mode of the array element, which for the cases
> we're interested in would be something like V4HI (D) or V4SI (Q).
> nelems says how many of those (in our case, vector) elements there
> are in the array.
> 
> The element range we want is 1-4 because that matches the number
> of vectors that can be loaded by the vld1-vld4 instructions.
> We don't include 1 because arrays of one element are already
> treated as having the same mode as their element.
> 
> Richard

I understand now...

Ok.

R.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-06 10:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-31 13:43 Add an array_mode_supported_p target hook Richard Sandiford
2011-03-31 13:56 ` Richard Guenther
2011-03-31 14:56   ` Richard Sandiford
2011-04-21 10:43     ` Richard Sandiford
2011-04-21 11:28       ` Richard Guenther
2011-05-06 10:29       ` Richard Earnshaw
2011-05-06 10:44         ` Richard Sandiford
2011-05-06 10:51           ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).