[PATCH] Vectorize conversions directly

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Vectorize conversions directly
@ 2010-11-24 16:09 Dmitry Plotnikov
  2010-11-24 16:35 ` Dmitry Plotnikov
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Dmitry Plotnikov @ 2010-11-24 16:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: rearnsha, IRAR, dm

Hi,

This patch enables vector conversions for ARM NEON architecture.  In its 
current state vectorizer can't handle type conversions in the hottest 
loop of libmp3lame on NEON since its backend doesn't have appropriate 
builtins for type conversion.  For x86_64 and rs6000 architectures that 
also can vectorize conversions the default behavior is retained.  We 
have rewritten condition in vectorizable_conversion() in 
tree-vect-stmts.c for the case of NONE modifier.  Now It first looks in 
convert_optab for suitable operation and then in builtins.  It's hard to 
make such fix in arm backend, because neon builtins are not saved and 
enumerated as it's done for x86_64 and rs6000.  Bootstrapped and 
regtested on x86_64 without any regressions.

Ok for trunk? 4.7?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Vectorize conversions directly
  2010-11-24 16:09 [PATCH] Vectorize conversions directly Dmitry Plotnikov
@ 2010-11-24 16:35 ` Dmitry Plotnikov
  2010-11-27  4:12   ` Richard Henderson
  2010-11-24 17:28 ` [PATCH] " Richard Guenther
  2010-11-25 18:25 ` Ramana Radhakrishnan
  2 siblings, 1 reply; 21+ messages in thread
From: Dmitry Plotnikov @ 2010-11-24 16:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: IRAR, rearnsha, dm

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

On 11/24/2010 06:23 PM, Dmitry Plotnikov wrote:
> Hi,
>
> This patch enables vector conversions for ARM NEON architecture.  In 
> its current state vectorizer can't handle type conversions in the 
> hottest loop of libmp3lame on NEON since its backend doesn't have 
> appropriate builtins for type conversion.  For x86_64 and rs6000 
> architectures that also can vectorize conversions the default behavior 
> is retained.  We have rewritten condition in vectorizable_conversion() 
> in tree-vect-stmts.c for the case of NONE modifier.  Now It first 
> looks in convert_optab for suitable operation and then in builtins.  
> It's hard to make such fix in arm backend, because neon builtins are 
> not saved and enumerated as it's done for x86_64 and rs6000.  
> Bootstrapped and regtested on x86_64 without any regressions.
>
> Ok for trunk? 4.7?
>
>
Sorry, I forgot to attach the patch.

[-- Attachment #2: vect-conv.patch --]
[-- Type: text/x-patch, Size: 12404 bytes --]

2010-11-24  Dmitry Plotnikov  <dplotnikov@ispras.ru>

gcc/
	* tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
	* tree-vect-stmts.c (supportable_convert_operation): New function.
	  (vectorizable_conversion): Call it.  Change condition and behavior 
	  for NONE modifier case.
	* tree-vectorizer.h (supportable_convert_operation): New prototype.
	* tree.h (VECTOR_INTEGER_TYPE_P): New macro.

gcc/config/arm/
	* neon.md (floatv2siv2sf2): New.
	  (floatunsv2siv2sf2): New.
	  (fix_truncv2sfv2si2): New.
	  (fix_truncunsv2sfv2si2): New.
	  (floatv4siv4sf2): New.
	  (floatunsv4siv4sf2): New.
	  (fix_truncv4sfv4si2): New.
	  (fix_truncunsv4sfv4si2): New.
	
gcc/testsuite/
	* gcc.target/arm/vect-vcvt.c: New test.
	* gcc.target/arm/vect-vcvtq.c: New test.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 06bbc52..3eeb5a5 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -143,7 +143,9 @@
    (UNSPEC_VZIP2               204)
    (UNSPEC_MISALIGNED_ACCESS   205)
    (UNSPEC_VCLE                        206)
-   (UNSPEC_VCLT                        207)])
+   (UNSPEC_VCLT                        207)
+   (UNSPEC_FIXU                 208)
+   (UNSPEC_FLOATU               209)])
 
 
 ;; Attribute used to permit string comparisons against <VQH_mnem> in
@@ -3053,6 +3055,66 @@
   [(set_attr "neon_type" "neon_bp_simple")]
 )
 
+(define_insn "floatv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (fix:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.s32\t%P0, %P1"
+)
+
+(define_insn "floatunsv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (unspec:V2SF [(match_operand:V2SI 1 "s_register_operand" "w")] 
+                    UNSPEC_FLOATU))]
+  "TARGET_NEON"
+  "vcvt.f32.u32\t%P0, %P1"
+)
+
+(define_insn "fix_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%P0, %P1"
+)
+
+(define_insn "fixuns_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (unspec:V2SI [(match_operand:V2SF 1 "s_register_operand" "w")]
+                     UNSPEC_FIXU))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%P0, %P1"
+)
+
+(define_insn "floatv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (fix:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.s32\t%q0, %q1"
+)
+
+(define_insn "floatunsv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (unspec:V4SF [(match_operand:V4SI 1 "s_register_operand" "w")]
+                    UNSPEC_FLOATU))]
+  "TARGET_NEON"
+  "vcvt.f32.u32\t%q0, %q1"
+)
+
+(define_insn "fix_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%q0, %q1"
+)
+
+(define_insn "fixuns_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")]
+                     UNSPEC_FIXU))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%q0, %q1"
+)
+
 (define_insn "neon_vcvt<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
        (unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index bffa679..bf151eb 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3278,7 +3278,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FLOAT_EXPR:
       {
-       if (!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+       if ((!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+           && (!VECTOR_INTEGER_TYPE_P (rhs1_type)
+               || !VECTOR_FLOAT_TYPE_P(lhs_type)))
          {
            error ("invalid types in conversion to floating point");
            debug_generic_expr (lhs_type);
@@ -3291,7 +3293,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FIX_TRUNC_EXPR:
       {
-       if (!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+       if ((!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+           && (!VECTOR_INTEGER_TYPE_P (lhs_type)
+               || !VECTOR_FLOAT_TYPE_P(rhs1_type)))
          {
            error ("invalid types in conversion to integer");
            debug_generic_expr (lhs_type);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3617ec3..94fbd11 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1640,6 +1640,59 @@ vect_gen_widened_results_half (enum tree_code code,
   return new_stmt;
 }
 
+/* Function supportable_convert_operation
+
+   Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form (i.e., when operating on arguments of type VECTYPE_IN
+   producing a result of type VECTYPE_OUT).
+   
+   Convert operations we currently support directly are FIX_TRUNC and FLOAT.
+   This function checks if these operations are supported
+   by the target platform either directly (via vector tree-codes), or via
+   target builtins.
+   
+   Output:
+   - CODE1 is code of vector operation to be used when
+   vectorizing the operation, if available.
+   - DECL is decl of target builtin functions to be used
+   when vectorizing the operation, if available.  In this case,
+   CODE1 is CALL_EXPR.  */
+
+bool
+supportable_convert_operation (enum tree_code code,
+                                   tree vectype_out, tree vectype_in,
+                                   tree *decl, enum tree_code *code1)
+{
+  enum machine_mode m1,m2;
+  convert_optab optab1 = NULL;
+
+  /* First check if we can done conversion directly.  */
+  if (code == FIX_TRUNC_EXPR)
+    optab1 = (TYPE_UNSIGNED (vectype_out)) ? ufixtrunc_optab : sfixtrunc_optab;
+  else if (code == FLOAT_EXPR)
+    optab1 = (TYPE_UNSIGNED (vectype_in)) ? ufloat_optab : sfloat_optab;
+  
+  m1 = TYPE_MODE (vectype_in);
+  m2 = TYPE_MODE (vectype_out);
+
+  if (convert_optab_handler (optab1, m2, m1) != CODE_FOR_nothing)
+    {
+      *code1 = code;
+      return true;
+    }
+  
+  /* Now check for builtin.  */
+  if (targetm.vectorize.builtin_conversion
+      && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+    {
+      *code1 = CALL_EXPR;
+      *decl = targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in);
+      return true;
+    }
+  return false;
+}
+
 
 /* Check if STMT performs a conversion operation, that can be vectorized.
    If VEC_STMT is also passed, vectorize the STMT: create a vectorized
@@ -1669,7 +1722,6 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
   tree vectype_out, vectype_in;
   int ncopies, j;
   tree rhs_type;
-  tree builtin_decl;
   enum { NARROW, NONE, WIDEN } modifier;
   int i;
   VEC(tree,heap) *vec_oprnds0 = NULL;
@@ -1758,7 +1810,7 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 
   /* Supportable by target?  */
   if ((modifier == NONE
-       && !targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+       && !supportable_convert_operation (code, vectype_out, vectype_in, &decl1, &code1))
       || (modifier == WIDEN
          && !supportable_widening_operation (code, stmt,
                                              vectype_out, vectype_in,
@@ -1808,19 +1860,28 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
          else
            vect_get_vec_defs_for_stmt_copy (dt, &vec_oprnds0, NULL);
 
-         builtin_decl =
-           targetm.vectorize.builtin_conversion (code,
-                                                 vectype_out, vectype_in);
-         FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vop0)
-           {
-             /* Arguments are ready. create the new vector stmt.  */
-             new_stmt = gimple_build_call (builtin_decl, 1, vop0);
-             new_temp = make_ssa_name (vec_dest, new_stmt);
-             gimple_call_set_lhs (new_stmt, new_temp);
-             vect_finish_stmt_generation (stmt, new_stmt, gsi);
-             if (slp_node)
-               VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
-           }
+         FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vop0)
+         {
+           /* Arguments are ready, create the new vector stmt.  */
+            if (code1 == CALL_EXPR)
+             {
+               new_stmt = gimple_build_call (decl1, 1, vop0);
+               new_temp = make_ssa_name (vec_dest, new_stmt);
+               gimple_call_set_lhs (new_stmt, new_temp);
+             }
+           else
+              {
+                gcc_assert (TREE_CODE_LENGTH (code) == unary_op);
+                new_stmt = gimple_build_assign_with_ops (code, vec_dest, vop0,
+                                                        NULL);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_assign_set_lhs (new_stmt, new_temp);
+             }
+
+           vect_finish_stmt_generation (stmt, new_stmt, gsi);
+            if (slp_node)
+              VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
+         }
 
          if (j == 0)
            STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f2a5889..c016963 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -815,6 +815,9 @@ extern bool vect_transform_stmt (gimple, gimple_stmt_iterator *,
                                  bool *, slp_tree, slp_instance);
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
+extern bool supportable_convert_operation (enum tree_code, tree, tree,
+                                          tree *, enum tree_code *);
+
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
                                     tree, int);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
diff --git a/gcc/tree.h b/gcc/tree.h
index 3877ae5..e4b4501 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1047,6 +1047,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (TREE_CODE (TYPE) == COMPLEX_TYPE    \
    && TREE_CODE (TREE_TYPE (TYPE)) == REAL_TYPE)
 
+/* Nonzero if TYPE represents a vector integer type.  */
+                
+#define VECTOR_INTEGER_TYPE_P(TYPE)                   \
+             (TREE_CODE (TYPE) == VECTOR_TYPE      \
+                 && TREE_CODE (TREE_TYPE (TYPE)) == INTEGER_TYPE)
+
+
 /* Nonzero if TYPE represents a vector floating-point type.  */
 
 #define VECTOR_FLOAT_TYPE_P(TYPE)      \
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
new file mode 100644
index 0000000..f33206c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details" } */
+
+#include <stdarg.h>
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
new file mode 100644
index 0000000..3412cf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details -mvectorize-with-neon-quad" } */
+
+#include <stdarg.h>
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Vectorize conversions directly
  2010-11-24 16:09 [PATCH] Vectorize conversions directly Dmitry Plotnikov
  2010-11-24 16:35 ` Dmitry Plotnikov
@ 2010-11-24 17:28 ` Richard Guenther
  2010-11-25 18:25 ` Ramana Radhakrishnan
  2 siblings, 0 replies; 21+ messages in thread
From: Richard Guenther @ 2010-11-24 17:28 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches, rearnsha, IRAR, dm

On Wed, Nov 24, 2010 at 4:23 PM, Dmitry Plotnikov <dplotnikov@ispras.ru> wrote:
> Hi,
>
> This patch enables vector conversions for ARM NEON architecture.  In its
> current state vectorizer can't handle type conversions in the hottest loop
> of libmp3lame on NEON since its backend doesn't have appropriate builtins
> for type conversion.  For x86_64 and rs6000 architectures that also can
> vectorize conversions the default behavior is retained.  We have rewritten
> condition in vectorizable_conversion() in tree-vect-stmts.c for the case of
> NONE modifier.  Now It first looks in convert_optab for suitable operation
> and then in builtins.  It's hard to make such fix in arm backend, because
> neon builtins are not saved and enumerated as it's done for x86_64 and
> rs6000.  Bootstrapped and regtested on x86_64 without any regressions.
>
> Ok for trunk? 4.7?

Hm.  For proper LTO support you need to be able to index builtins
anyway, so I'd prefer if you fix the arm backend accordingly.

In general your patch requires adjusting the documentation of
FIX_TRUNC_EXPR and FLOAT_EXPR so that they also accept
vectors.  I also expect various fallout in folders or optimization
passes if you allow that.  So I don't think this patch is appropriate
for trunk at this stage.

I don't see a reason to not do this change for 4.7 though, so please
ping the patch when stage1 opens again.  And still consider fixing
the arm backend ;)

Thanks,
Richard.

>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Vectorize conversions directly
  2010-11-24 16:09 [PATCH] Vectorize conversions directly Dmitry Plotnikov
  2010-11-24 16:35 ` Dmitry Plotnikov
  2010-11-24 17:28 ` [PATCH] " Richard Guenther
@ 2010-11-25 18:25 ` Ramana Radhakrishnan
  2 siblings, 0 replies; 21+ messages in thread
From: Ramana Radhakrishnan @ 2010-11-25 18:25 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches, rearnsha, IRAR, dm


On Wed, 2010-11-24 at 18:23 +0300, Dmitry Plotnikov wrote:
> Hi,
>  It's hard to make such fix in arm backend, because neon builtins are not saved and 
> enumerated as it's done for x86_64 and rs6000.  Bootstrapped and 
> regtested on x86_64 without any regressions.


IIRC there was this patch / set of patches from Jie that did implement
TARGET_BUILTIN_DECL if that's what was needed.

http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00851.html
http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00853.html
http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00852.html



I haven't looked at your patch in great detail but ... 

> +   (UNSPEC_FIXU                 208)
> +   (UNSPEC_FLOATU               209)])

These don't seem to get used anywhere else. Are they really needed or
are some other portions of your patch missing ? 


Cheers
Ramana





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Vectorize conversions directly
  2010-11-24 16:35 ` Dmitry Plotnikov
@ 2010-11-27  4:12   ` Richard Henderson
  2010-12-09 14:06     ` Dmitry Plotnikov
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2010-11-27  4:12 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches, IRAR, rearnsha, dm

I agree with Richi that this isn't appropriate for gcc 4.6, but we 
do want this for gcc 4.7.

In my opinion, the vectorizer uses builtins for too many things;
conversions included.  We should provide direct gimple mechanisms
for this, such as extending these conversion codes to vectors.

That said, 

> +(define_insn "floatv2siv2sf2"
> +  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
> +       (fix:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))]

Wrong rtl code here; s/fix/float/.

I'm surprised that this actually works, given that fix_truncv4sfv4si2
should be matching this pattern and generating the wrong insn.  At a
minimum this suggests that your testing is incomplete.

> +(define_insn "floatunsv2siv2sf2"
> +  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
> +       (unspec:V2SF [(match_operand:V2SI 1 "s_register_operand" "w")] 
> +                    UNSPEC_FLOATU))]

Why are you not using the unsigned_float rtl code?

> +(define_insn "fixuns_truncv2sfv2si2"
> +  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
> +        (unspec:V2SI [(match_operand:V2SF 1 "s_register_operand" "w")]
> +                     UNSPEC_FIXU))]

Similarly, the unsigned_fix code.

r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Vectorize conversions directly
  2010-11-27  4:12   ` Richard Henderson
@ 2010-12-09 14:06     ` Dmitry Plotnikov
  2010-12-10 16:05       ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry Plotnikov @ 2010-12-09 14:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, IRAR, rearnsha, dm

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

Thank you for comments!  New patch attached.

On 11/27/2010 12:10 AM, Richard Henderson wrote:
>> +(define_insn "floatv2siv2sf2"
>> +  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
>> +       (fix:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))]
> Wrong rtl code here; s/fix/float/.
Fixed.
>> +(define_insn "floatunsv2siv2sf2"
>> +  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
>> +       (unspec:V2SF [(match_operand:V2SI 1 "s_register_operand" "w")]
>> +                    UNSPEC_FLOATU))]
> Why are you not using the unsigned_float rtl code?
>
Unspecs replaced with unsigned_float and unsigned_fix.

[-- Attachment #2: vect-conv.patch --]
[-- Type: text/x-patch, Size: 11601 bytes --]

2010-12-09  Dmitry Plotnikov  <dplotnikov@ispras.ru>

gcc/
	* tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
	* tree-vect-stmts.c (supportable_convert_operation): New function.
	  (vectorizable_conversion): Call it.  Change condition and behavior 
	  for NONE modifier case.
	* tree-vectorizer.h (supportable_convert_operation): New prototype.
	* tree.h (VECTOR_INTEGER_TYPE_P): New macro.

gcc/config/arm/
	* neon.md (floatv2siv2sf2): New.
	  (floatunsv2siv2sf2): New.
	  (fix_truncv2sfv2si2): New.
	  (fix_truncunsv2sfv2si2): New.
	  (floatv4siv4sf2): New.
	  (floatunsv4siv4sf2): New.
	  (fix_truncv4sfv4si2): New.
	  (fix_truncunsv4sfv4si2): New.
	
gcc/testsuite/
	* gcc.target/arm/vect-vcvt.c: New test.
	* gcc.target/arm/vect-vcvtq.c: New test.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 06bbc52..d484060 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3053,6 +3053,62 @@
   [(set_attr "neon_type" "neon_bp_simple")]
 )
 
+(define_insn "floatv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.s32\t%P0, %P1"
+)
+
+(define_insn "floatunsv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (unsigned_float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))] 
+  "TARGET_NEON"
+  "vcvt.f32.u32\t%P0, %P1"
+)
+
+(define_insn "fix_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%P0, %P1"
+)
+
+(define_insn "fixuns_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (unsigned_fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%P0, %P1"
+)
+
+(define_insn "floatv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.s32\t%q0, %q1"
+)
+
+(define_insn "floatunsv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (unsigned_float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.u32\t%q0, %q1"
+)
+
+(define_insn "fix_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%q0, %q1"
+)
+
+(define_insn "fixuns_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (unsigned_fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%q0, %q1"
+)
+
 (define_insn "neon_vcvt<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index e3ab9d9..6b1fb4f 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3277,7 +3277,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FLOAT_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	if ((!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	    && (!VECTOR_INTEGER_TYPE_P (rhs1_type)
+	        || !VECTOR_FLOAT_TYPE_P(lhs_type)))
 	  {
 	    error ("invalid types in conversion to floating point");
 	    debug_generic_expr (lhs_type);
@@ -3290,7 +3292,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FIX_TRUNC_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+        if ((!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+            && (!VECTOR_INTEGER_TYPE_P (lhs_type)
+                || !VECTOR_FLOAT_TYPE_P(rhs1_type)))
 	  {
 	    error ("invalid types in conversion to integer");
 	    debug_generic_expr (lhs_type);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index e5bfcbe..bc05c55 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1638,6 +1638,59 @@ vect_gen_widened_results_half (enum tree_code code,
   return new_stmt;
 }
 
+/* Function supportable_convert_operation
+
+   Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form (i.e., when operating on arguments of type VECTYPE_IN
+   producing a result of type VECTYPE_OUT).
+   
+   Convert operations we currently support directly are FIX_TRUNC and FLOAT.
+   This function checks if these operations are supported
+   by the target platform either directly (via vector tree-codes), or via
+   target builtins.
+   
+   Output:
+   - CODE1 is code of vector operation to be used when
+   vectorizing the operation, if available.
+   - DECL is decl of target builtin functions to be used
+   when vectorizing the operation, if available.  In this case,
+   CODE1 is CALL_EXPR.  */
+
+bool
+supportable_convert_operation (enum tree_code code,
+                                   tree vectype_out, tree vectype_in,
+                                   tree *decl, enum tree_code *code1)
+{
+  enum machine_mode m1,m2;
+  convert_optab optab1 = NULL;
+
+  /* First check if we can done conversion directly.  */
+  if (code == FIX_TRUNC_EXPR)
+    optab1 = (TYPE_UNSIGNED (vectype_out)) ? ufixtrunc_optab : sfixtrunc_optab;
+  else if (code == FLOAT_EXPR)
+    optab1 = (TYPE_UNSIGNED (vectype_in)) ? ufloat_optab : sfloat_optab;
+  
+  m1 = TYPE_MODE (vectype_in);
+  m2 = TYPE_MODE (vectype_out);
+
+  if (convert_optab_handler (optab1, m2, m1) != CODE_FOR_nothing)
+    {
+      *code1 = code;
+      return true;
+    }
+  
+  /* Now check for builtin.  */
+  if (targetm.vectorize.builtin_conversion
+      && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+    {
+      *code1 = CALL_EXPR;
+      *decl = targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in);
+      return true;
+    }
+  return false;
+}
+
 
 /* Check if STMT performs a conversion operation, that can be vectorized.
    If VEC_STMT is also passed, vectorize the STMT: create a vectorized
@@ -1667,7 +1720,6 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
   tree vectype_out, vectype_in;
   int ncopies, j;
   tree rhs_type;
-  tree builtin_decl;
   enum { NARROW, NONE, WIDEN } modifier;
   int i;
   VEC(tree,heap) *vec_oprnds0 = NULL;
@@ -1756,7 +1808,7 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 
   /* Supportable by target?  */
   if ((modifier == NONE
-       && !targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+       && !supportable_convert_operation (code, vectype_out, vectype_in, &decl1, &code1))
       || (modifier == WIDEN
 	  && !supportable_widening_operation (code, stmt,
 					      vectype_out, vectype_in,
@@ -1806,19 +1858,28 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 	  else
 	    vect_get_vec_defs_for_stmt_copy (dt, &vec_oprnds0, NULL);
 
-	  builtin_decl =
-	    targetm.vectorize.builtin_conversion (code,
-						  vectype_out, vectype_in);
 	  FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vop0)
-	    {
-	      /* Arguments are ready. create the new vector stmt.  */
-	      new_stmt = gimple_build_call (builtin_decl, 1, vop0);
-	      new_temp = make_ssa_name (vec_dest, new_stmt);
-	      gimple_call_set_lhs (new_stmt, new_temp);
-	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	      if (slp_node)
-		VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
-	    }
+          {
+            /* Arguments are ready, create the new vector stmt.  */
+            if (code1 == CALL_EXPR)
+              {
+                new_stmt = gimple_build_call (decl1, 1, vop0);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_call_set_lhs (new_stmt, new_temp);
+              }
+            else
+              {
+                gcc_assert (TREE_CODE_LENGTH (code) == unary_op);
+                new_stmt = gimple_build_assign_with_ops (code, vec_dest, vop0,
+                                                        NULL);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_assign_set_lhs (new_stmt, new_temp);
+              }
+
+            vect_finish_stmt_generation (stmt, new_stmt, gsi);
+            if (slp_node)
+              VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
+          }
 
 	  if (j == 0)
 	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b2cc2d1..8d61608 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -799,6 +799,9 @@ extern bool vect_transform_stmt (gimple, gimple_stmt_iterator *,
                                  bool *, slp_tree, slp_instance);
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
+extern bool supportable_convert_operation (enum tree_code, tree, tree,
+                                          tree *, enum tree_code *);
+
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
                                     tree, int);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
diff --git a/gcc/tree.h b/gcc/tree.h
index 8ba2044..34d3bbc 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1047,6 +1047,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (TREE_CODE (TYPE) == COMPLEX_TYPE	\
    && TREE_CODE (TREE_TYPE (TYPE)) == REAL_TYPE)
 
+/* Nonzero if TYPE represents a vector integer type.  */
+                
+#define VECTOR_INTEGER_TYPE_P(TYPE)                   \
+             (TREE_CODE (TYPE) == VECTOR_TYPE      \
+                 && TREE_CODE (TREE_TYPE (TYPE)) == INTEGER_TYPE)
+
+
 /* Nonzero if TYPE represents a vector floating-point type.  */
 
 #define VECTOR_FLOAT_TYPE_P(TYPE)	\
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
new file mode 100644
index 0000000..f33206c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details" } */
+
+#include <stdarg.h>
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
new file mode 100644
index 0000000..3412cf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details -mvectorize-with-neon-quad" } */
+
+#include <stdarg.h>
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Vectorize conversions directly
  2010-12-09 14:06     ` Dmitry Plotnikov
@ 2010-12-10 16:05       ` Richard Henderson
       [not found]         ` <4EA04B20.1090009@ispras.ru>
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2010-12-10 16:05 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches, IRAR, rearnsha, dm

On 12/09/2010 04:56 AM, Dmitry Plotnikov wrote:
> 2010-12-09  Dmitry Plotnikov  <dplotnikov@ispras.ru>
> 
> gcc/
> 	* tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
> 	* tree-vect-stmts.c (supportable_convert_operation): New function.
> 	  (vectorizable_conversion): Call it.  Change condition and behavior 
> 	  for NONE modifier case.
> 	* tree-vectorizer.h (supportable_convert_operation): New prototype.
> 	* tree.h (VECTOR_INTEGER_TYPE_P): New macro.
> 
> gcc/config/arm/
> 	* neon.md (floatv2siv2sf2): New.
> 	  (floatunsv2siv2sf2): New.
> 	  (fix_truncv2sfv2si2): New.
> 	  (fix_truncunsv2sfv2si2): New.
> 	  (floatv4siv4sf2): New.
> 	  (floatunsv4siv4sf2): New.
> 	  (fix_truncv4sfv4si2): New.
> 	  (fix_truncunsv4sfv4si2): New.
> 	
> gcc/testsuite/
> 	* gcc.target/arm/vect-vcvt.c: New test.
> 	* gcc.target/arm/vect-vcvtq.c: New test.

Patch looks generally ok; I'll let another Richard approve the ARM bits.

> +  /* First check if we can done conversion directly.  */

 ... if we can do the conversion ...

> +++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details" } */

Missing dg-final bits for this test.

Neither test requires stdarg.h; remove it.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
       [not found]         ` <4EA04B20.1090009@ispras.ru>
@ 2011-10-20 17:46           ` Richard Henderson
  2011-10-21 12:23             ` Ramana Radhakrishnan
  2011-10-24  9:25           ` Dmitry Plotnikov
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2011-10-20 17:46 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches, IRAR, rearnsha, dm

On 10/20/2011 09:24 AM, Dmitry Plotnikov wrote:
> gcc/
>     * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
>     * tree-vect-stmts.c (supportable_convert_operation): New function.
>       (vectorizable_conversion): Call it.  Change condition and behavior
>       for NONE modifier case.
>     * tree-vectorizer.h (supportable_convert_operation): New prototype.
>     * tree.h (VECTOR_INTEGER_TYPE_P): New macro.
> 
> gcc/config/arm/
>     * neon.md (floatv2siv2sf2): New.
>       (floatunsv2siv2sf2): New.
>       (fix_truncv2sfv2si2): New.
>       (fix_truncunsv2sfv2si2): New.
>       (floatv4siv4sf2): New.
>       (floatunsv4siv4sf2): New.
>       (fix_truncv4sfv4si2): New.
>       (fix_truncunsv4sfv4si2): New.
>    
> gcc/testsuite/
>     * gcc.target/arm/vect-vcvt.c: New test.
>     * gcc.target/arm/vect-vcvtq.c: New test.
> 
> gcc/testsuite/lib/
>     * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
>       for ARM NEON.
>       (check_effective_target_vect_uintfloat_cvt): Likewise.
>       (check_effective_target_vect_intfloat_cvt): Likewise.
>       (check_effective_target_vect_floatuint_cvt): Likewise.
>       (check_effective_target_vect_floatint_cvt): Likewise.
>       (check_effective_target_vect_extract_even_odd): Likewise.

Please move supportable_convert_operation to optabs.c; eventually
we ought to use can_fix_p/can_float_p.

> +  if (code == FIX_TRUNC_EXPR)
> +    optab1 = (TYPE_UNSIGNED (vectype_out)) ? ufixtrunc_optab : sfixtrunc_optab;
> +  else if (code == FLOAT_EXPR)
> +    optab1 = (TYPE_UNSIGNED (vectype_in)) ? ufloat_optab : sfloat_optab;
> +  
> +  m1 = TYPE_MODE (vectype_in);

Looks like a missing 

	else
	  gcc_unreachable()

there, since there's no check for optab1 != NULL later.

Otherwise the generic parts of the patch look good.
Please get separate approval for the arm portions of the patch.

After the generic parts of the patch goes in I will endevour to adjust the i386
and rs6000 backends to similarly populate the optabs, so that we can remove the
builtin path here.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-10-20 17:46           ` [PATCH][PING] " Richard Henderson
@ 2011-10-21 12:23             ` Ramana Radhakrishnan
  0 siblings, 0 replies; 21+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-21 12:23 UTC (permalink / raw)
  To: Richard Henderson, Dmitry Melnik
  Cc: Dmitry Plotnikov, gcc-patches, IRAR, rearnsha

> Otherwise the generic parts of the patch look good.
> Please get separate approval for the arm portions of the patch.

Is it just me or has no one else seen this patch on the archives at
gcc-patches@. ?


Ramana

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH][PING] Vectorize conversions directly
       [not found]         ` <4EA04B20.1090009@ispras.ru>
  2011-10-20 17:46           ` [PATCH][PING] " Richard Henderson
@ 2011-10-24  9:25           ` Dmitry Plotnikov
  2011-10-24 14:39             ` Joseph S. Myers
  1 sibling, 1 reply; 21+ messages in thread
From: Dmitry Plotnikov @ 2011-10-24  9:25 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]

Original discussion here: 
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00751.html

This patch enables vector conversions for ARM NEON architecture.  In its 
current state vectorizer can't handle type conversions in the hottest 
loop of libmp3lame on NEON since its backend doesn't have appropriate 
builtins for type conversion.  For x86_64 and rs6000 architectures that 
also can vectorize conversions the default behavior is retained.  We 
have rewritten condition in vectorizable_conversion() in 
tree-vect-stmts.c for the case of NONE modifier.  Now It first looks in 
convert_optab for suitable operation and then in builtins.

Regtested with arm-qemu ok.
Initially few tests failed (gcc.dg/vect/slp-10.c, gcc.dg/vect/slp-11c.c, 
gcc.dg/vect/slp-33.c, gcc.dg/vect/fast-math-pr35982.c) because now it 
vectorizes more loops than they expected to.  We adjusted 
target-supports.exp so vectorizable conversions and even/odd extractions 
are now supported for NEON.
Ok for trunk?

2011-10-20  Dmitry Plotnikov <dplotnikov@ispras.ru>

gcc/
     * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
     * tree-vect-stmts.c (supportable_convert_operation): New function.
       (vectorizable_conversion): Call it.  Change condition and behavior
       for NONE modifier case.
     * tree-vectorizer.h (supportable_convert_operation): New prototype.
     * tree.h (VECTOR_INTEGER_TYPE_P): New macro.

gcc/config/arm/
     * neon.md (floatv2siv2sf2): New.
       (floatunsv2siv2sf2): New.
       (fix_truncv2sfv2si2): New.
       (fix_truncunsv2sfv2si2): New.
       (floatv4siv4sf2): New.
       (floatunsv4siv4sf2): New.
       (fix_truncv4sfv4si2): New.
       (fix_truncunsv4sfv4si2): New.

gcc/testsuite/
     * gcc.target/arm/vect-vcvt.c: New test.
     * gcc.target/arm/vect-vcvtq.c: New test.

gcc/testsuite/lib/
     * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
       for ARM NEON.
       (check_effective_target_vect_uintfloat_cvt): Likewise.
       (check_effective_target_vect_intfloat_cvt): Likewise.
       (check_effective_target_vect_floatuint_cvt): Likewise.
       (check_effective_target_vect_floatint_cvt): Likewise.
       (check_effective_target_vect_extract_even_odd): Likewise.



[-- Attachment #2: vect-conv.patch --]
[-- Type: text/x-patch, Size: 13359 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index ea09da2..0dd13a6 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2945,6 +2945,62 @@
                    (const_string "neon_fp_vadd_qqq_vabs_qq")))]
 )
 
+(define_insn "floatv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.s32\t%P0, %P1"
+)
+
+(define_insn "floatunsv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (unsigned_float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))] 
+  "TARGET_NEON"
+  "vcvt.f32.u32\t%P0, %P1"
+)
+
+(define_insn "fix_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%P0, %P1"
+)
+
+(define_insn "fixuns_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (unsigned_fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%P0, %P1"
+)
+
+(define_insn "floatv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.s32\t%q0, %q1"
+)
+
+(define_insn "floatunsv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (unsigned_float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.f32.u32\t%q0, %q1"
+)
+
+(define_insn "fix_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%q0, %q1"
+)
+
+(define_insn "fixuns_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (unsigned_fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%q0, %q1"
+)
+
 (define_insn "neon_vcvt<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a3b5311..c785b0c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1806,7 +1806,9 @@ proc check_effective_target_vect_intfloat_cvt { } {
         if { [istarget i?86-*-*]
               || ([istarget powerpc*-*-*]
                    && ![istarget powerpc-*-linux*paired*])
-              || [istarget x86_64-*-*] } {
+              || [istarget x86_64-*-*] 
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_intfloat_cvt_saved 1
         }
     }
@@ -1842,7 +1844,9 @@ proc check_effective_target_vect_uintfloat_cvt { } {
         if { [istarget i?86-*-*]
 	      || ([istarget powerpc*-*-*]
 		  && ![istarget powerpc-*-linux*paired*])
-	      || [istarget x86_64-*-*] } {
+	      || [istarget x86_64-*-*] 
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_uintfloat_cvt_saved 1
         }
     }
@@ -1865,7 +1869,9 @@ proc check_effective_target_vect_floatint_cvt { } {
         if { [istarget i?86-*-*]
               || ([istarget powerpc*-*-*]
                    && ![istarget powerpc-*-linux*paired*])
-              || [istarget x86_64-*-*] } {
+              || [istarget x86_64-*-*]
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_floatint_cvt_saved 1
         }
     }
@@ -1885,7 +1891,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
     } else {
         set et_vect_floatuint_cvt_saved 0
         if { ([istarget powerpc*-*-*]
-	      && ![istarget powerpc-*-linux*paired*]) } {
+	      && ![istarget powerpc-*-linux*paired*]) 
+             || ([istarget arm*-*-*]
+                 && [check_effective_target_arm_neon_ok])} {
            set et_vect_floatuint_cvt_saved 1
         }
     }
@@ -3335,7 +3343,9 @@ proc check_effective_target_vect_extract_even_odd { } {
              || [istarget i?86-*-*]
              || [istarget x86_64-*-*]
              || [istarget ia64-*-*]
-             || [istarget spu-*-*] } {
+             || [istarget spu-*-*] 
+             || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_extract_even_odd_saved 1
         }
     }
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index bcf71b9..1f3f10a 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3342,7 +3342,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FLOAT_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	if ((!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	    && (!VECTOR_INTEGER_TYPE_P (rhs1_type)
+	        || !VECTOR_FLOAT_TYPE_P(lhs_type)))
 	  {
 	    error ("invalid types in conversion to floating point");
 	    debug_generic_expr (lhs_type);
@@ -3355,7 +3357,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FIX_TRUNC_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+        if ((!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+            && (!VECTOR_INTEGER_TYPE_P (lhs_type)
+                || !VECTOR_FLOAT_TYPE_P(rhs1_type)))
 	  {
 	    error ("invalid types in conversion to integer");
 	    debug_generic_expr (lhs_type);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f2ac8c7..ecf6a91 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1821,6 +1821,59 @@ vect_gen_widened_results_half (enum tree_code code,
   return new_stmt;
 }
 
+/* Function supportable_convert_operation
+
+   Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form (i.e., when operating on arguments of type VECTYPE_IN
+   producing a result of type VECTYPE_OUT).
+   
+   Convert operations we currently support directly are FIX_TRUNC and FLOAT.
+   This function checks if these operations are supported
+   by the target platform either directly (via vector tree-codes), or via
+   target builtins.
+   
+   Output:
+   - CODE1 is code of vector operation to be used when
+   vectorizing the operation, if available.
+   - DECL is decl of target builtin functions to be used
+   when vectorizing the operation, if available.  In this case,
+   CODE1 is CALL_EXPR.  */
+
+bool
+supportable_convert_operation (enum tree_code code,
+                                   tree vectype_out, tree vectype_in,
+                                   tree *decl, enum tree_code *code1)
+{
+  enum machine_mode m1,m2;
+  convert_optab optab1 = NULL;
+
+  /* First check if we can do the conversion directly.  */
+  if (code == FIX_TRUNC_EXPR)
+    optab1 = (TYPE_UNSIGNED (vectype_out)) ? ufixtrunc_optab : sfixtrunc_optab;
+  else if (code == FLOAT_EXPR)
+    optab1 = (TYPE_UNSIGNED (vectype_in)) ? ufloat_optab : sfloat_optab;
+  
+  m1 = TYPE_MODE (vectype_in);
+  m2 = TYPE_MODE (vectype_out);
+
+  if (convert_optab_handler (optab1, m2, m1) != CODE_FOR_nothing)
+    {
+      *code1 = code;
+      return true;
+    }
+  
+  /* Now check for builtin.  */
+  if (targetm.vectorize.builtin_conversion
+      && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+    {
+      *code1 = CALL_EXPR;
+      *decl = targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in);
+      return true;
+    }
+  return false;
+}
+
 
 /* Check if STMT performs a conversion operation, that can be vectorized.
    If VEC_STMT is also passed, vectorize the STMT: create a vectorized
@@ -1850,7 +1903,6 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
   tree vectype_out, vectype_in;
   int ncopies, j;
   tree rhs_type;
-  tree builtin_decl;
   enum { NARROW, NONE, WIDEN } modifier;
   int i;
   VEC(tree,heap) *vec_oprnds0 = NULL;
@@ -1939,7 +1991,7 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 
   /* Supportable by target?  */
   if ((modifier == NONE
-       && !targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+       && !supportable_convert_operation (code, vectype_out, vectype_in, &decl1, &code1))
       || (modifier == WIDEN
 	  && !supportable_widening_operation (code, stmt,
 					      vectype_out, vectype_in,
@@ -1989,19 +2041,28 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 	  else
 	    vect_get_vec_defs_for_stmt_copy (dt, &vec_oprnds0, NULL);
 
-	  builtin_decl =
-	    targetm.vectorize.builtin_conversion (code,
-						  vectype_out, vectype_in);
 	  FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vop0)
-	    {
-	      /* Arguments are ready. create the new vector stmt.  */
-	      new_stmt = gimple_build_call (builtin_decl, 1, vop0);
-	      new_temp = make_ssa_name (vec_dest, new_stmt);
-	      gimple_call_set_lhs (new_stmt, new_temp);
-	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	      if (slp_node)
-		VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
-	    }
+          {
+            /* Arguments are ready, create the new vector stmt.  */
+            if (code1 == CALL_EXPR)
+              {
+                new_stmt = gimple_build_call (decl1, 1, vop0);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_call_set_lhs (new_stmt, new_temp);
+              }
+            else
+              {
+                gcc_assert (TREE_CODE_LENGTH (code) == unary_op);
+                new_stmt = gimple_build_assign_with_ops (code, vec_dest, vop0,
+                                                        NULL);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_assign_set_lhs (new_stmt, new_temp);
+              }
+
+            vect_finish_stmt_generation (stmt, new_stmt, gsi);
+            if (slp_node)
+              VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
+          }
 
 	  if (j == 0)
 	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f22add6..d1d1835 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -818,6 +818,9 @@ extern bool vect_transform_stmt (gimple, gimple_stmt_iterator *,
                                  bool *, slp_tree, slp_instance);
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
+extern bool supportable_convert_operation (enum tree_code, tree, tree,
+                                          tree *, enum tree_code *);
+
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
                                     tree, int);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
diff --git a/gcc/tree.h b/gcc/tree.h
index 18fdd07..537e54b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1120,6 +1120,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (TREE_CODE (TYPE) == COMPLEX_TYPE	\
    && TREE_CODE (TREE_TYPE (TYPE)) == REAL_TYPE)
 
+/* Nonzero if TYPE represents a vector integer type.  */
+                
+#define VECTOR_INTEGER_TYPE_P(TYPE)                   \
+             (TREE_CODE (TYPE) == VECTOR_TYPE      \
+                 && TREE_CODE (TREE_TYPE (TYPE)) == INTEGER_TYPE)
+
+
 /* Nonzero if TYPE represents a vector floating-point type.  */
 
 #define VECTOR_FLOAT_TYPE_P(TYPE)	\
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
new file mode 100644
index 0000000..f33206c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details" } */
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
new file mode 100644
index 0000000..3412cf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details -mvectorize-with-neon-quad" } */
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-10-24  9:25           ` Dmitry Plotnikov
@ 2011-10-24 14:39             ` Joseph S. Myers
  2011-10-24 15:48               ` Ramana Radhakrishnan
  0 siblings, 1 reply; 21+ messages in thread
From: Joseph S. Myers @ 2011-10-24 14:39 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches

On Mon, 24 Oct 2011, Dmitry Plotnikov wrote:

>     * neon.md (floatv2siv2sf2): New.
>       (floatunsv2siv2sf2): New.

>       (floatv4siv4sf2): New.
>       (floatunsv4siv4sf2): New.

My undertstanding is that the NEON conversions of integer vectors to 
floating point always round to nearest - so do these patterns need to be 
conditioned on !flag_rounding_math?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-10-24 14:39             ` Joseph S. Myers
@ 2011-10-24 15:48               ` Ramana Radhakrishnan
  2011-10-24 17:11                 ` Joseph S. Myers
  0 siblings, 1 reply; 21+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-24 15:48 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Dmitry Plotnikov, gcc-patches

On 24 October 2011 15:02, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Mon, 24 Oct 2011, Dmitry Plotnikov wrote:
>
>>     * neon.md (floatv2siv2sf2): New.
>>       (floatunsv2siv2sf2): New.
>
>>       (floatv4siv4sf2): New.
>>       (floatunsv4siv4sf2): New.
>
> My undertstanding is that the NEON conversions of integer vectors to
> floating point always round to nearest - so do these patterns need to be
> conditioned on !flag_rounding_math?

That is correct - they round towards nearest if converting from
integer to floating point and round towards zero if converting in the
reverse direction. !flag_rounding_math should be the case at the very
least. I'm not yet convinced that you can get away without a check for
flag_unsafe_math_optimizations because at the very least input
denormals are flushed to zero and hence the inexact bits won't be set.
Thus are we completely compliant when we allow this by default ?

Dmitry :

The testcases shouldn't be adding mfpu=neon etc.

> +/* { dg-options "-O2 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -fdump-tree-vect-details" } */

Instead you should be doing -

/* { dg-require-effective-target arm_neon_ok } */
/* { dg-options "-O2 -ftree-vectorize" } */
/* { dg-add-options arm_neon } */


cheers
Ramana

>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-10-24 15:48               ` Ramana Radhakrishnan
@ 2011-10-24 17:11                 ` Joseph S. Myers
  2011-10-28  8:51                   ` Dmitry Plotnikov
  0 siblings, 1 reply; 21+ messages in thread
From: Joseph S. Myers @ 2011-10-24 17:11 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Dmitry Plotnikov, gcc-patches

On Mon, 24 Oct 2011, Ramana Radhakrishnan wrote:

> That is correct - they round towards nearest if converting from
> integer to floating point and round towards zero if converting in the
> reverse direction. !flag_rounding_math should be the case at the very
> least. I'm not yet convinced that you can get away without a check for
> flag_unsafe_math_optimizations because at the very least input
> denormals are flushed to zero and hence the inexact bits won't be set.
> Thus are we completely compliant when we allow this by default ?

I only commented on the conversion from integers to floating point, which 
is supposed to follow the current rounding mode.  Conversions from 
floating point to integer always round towards zero in C, and I believe 
the standard RTL patterns do that as well.  It's left unspecified in C99 
and C1X Annex F whether "inexact" is raised for values where the integer 
part is within the range of the integer type but the conversion is 
inexact, which should cover flushing denormals to zero - so you may not 
need to check any flags on the conversions to integer if that's the only 
issue.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-10-24 17:11                 ` Joseph S. Myers
@ 2011-10-28  8:51                   ` Dmitry Plotnikov
  2011-10-28 15:25                     ` Richard Henderson
                                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Dmitry Plotnikov @ 2011-10-28  8:51 UTC (permalink / raw)
  To: gcc-patches; +Cc: Joseph S. Myers, Ramana Radhakrishnan, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]

Here is the patch updated according to recent comments.

2011-10-28  Dmitry Plotnikov <dplotnikov@ispras.ru>

gcc/
     * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
     * optabs.c (supportable_convert_operation): New function.
     * optabs.h (supportable_convert_operation): New prototype.
     * tree-vect-stmts.c (vectorizable_conversion): Change condition and 
behavior
       for NONE modifier case.
     * tree.h (VECTOR_INTEGER_TYPE_P): New macro.

gcc/config/arm/
     * neon.md (floatv2siv2sf2): New.
       (floatunsv2siv2sf2): New.
       (fix_truncv2sfv2si2): New.
       (fix_truncunsv2sfv2si2): New.
       (floatv4siv4sf2): New.
       (floatunsv4siv4sf2): New.
       (fix_truncv4sfv4si2): New.
       (fix_truncunsv4sfv4si2): New.

gcc/testsuite/
     * gcc.target/arm/vect-vcvt.c: New test.
     * gcc.target/arm/vect-vcvtq.c: New test.

gcc/testsuite/lib/
     * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
       for ARM NEON.
       (check_effective_target_vect_uintfloat_cvt): Likewise.
       (check_effective_target_vect_intfloat_cvt): Likewise.
       (check_effective_target_vect_floatuint_cvt): Likewise.
       (check_effective_target_vect_floatint_cvt): Likewise.
       (check_effective_target_vect_extract_even_odd): Likewise.

[-- Attachment #2: vect-conv.patch --]
[-- Type: text/x-patch, Size: 13919 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index ea09da2..0dd13a6 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2945,6 +2945,62 @@
                    (const_string "neon_fp_vadd_qqq_vabs_qq")))]
 )
 
+(define_insn "floatv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.s32\t%P0, %P1"
+)
+
+(define_insn "floatunsv2siv2sf2"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+       (unsigned_float:V2SF (match_operand:V2SI 1 "s_register_operand" "w")))] 
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.u32\t%P0, %P1"
+)
+
+(define_insn "fix_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%P0, %P1"
+)
+
+(define_insn "fixuns_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (unsigned_fix:V2SI (match_operand:V2SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%P0, %P1"
+)
+
+(define_insn "floatv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.s32\t%q0, %q1"
+)
+
+(define_insn "floatunsv4siv4sf2"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+       (unsigned_float:V4SF (match_operand:V4SI 1 "s_register_operand" "w")))]
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.u32\t%q0, %q1"
+)
+
+(define_insn "fix_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%q0, %q1"
+)
+
+(define_insn "fixuns_truncv4sfv4si2"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+        (unsigned_fix:V4SI (match_operand:V4SF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%q0, %q1"
+)
+
 (define_insn "neon_vcvt<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 0ba1333..920d756 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4727,6 +4727,60 @@ can_float_p (enum machine_mode fltmode, enum machine_mode fixmode,
   tab = unsignedp ? ufloat_optab : sfloat_optab;
   return convert_optab_handler (tab, fltmode, fixmode);
 }
+
+/* Function supportable_convert_operation
+
+   Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form (i.e., when operating on arguments of type VECTYPE_IN
+   producing a result of type VECTYPE_OUT).
+   
+   Convert operations we currently support directly are FIX_TRUNC and FLOAT.
+   This function checks if these operations are supported
+   by the target platform either directly (via vector tree-codes), or via
+   target builtins.
+   
+   Output:
+   - CODE1 is code of vector operation to be used when
+   vectorizing the operation, if available.
+   - DECL is decl of target builtin functions to be used
+   when vectorizing the operation, if available.  In this case,
+   CODE1 is CALL_EXPR.  */
+
+bool
+supportable_convert_operation (enum tree_code code,
+                                    tree vectype_out, tree vectype_in,
+                                    tree *decl, enum tree_code *code1)
+{
+  enum machine_mode m1,m2;
+  int truncp;
+
+  m1 = TYPE_MODE (vectype_out);
+  m2 = TYPE_MODE (vectype_in);
+
+  /* First check if we can done conversion directly.  */
+  if ((code == FIX_TRUNC_EXPR 
+       && can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp) 
+          != CODE_FOR_nothing)
+      || (code == FLOAT_EXPR
+          && can_float_p (m1,m2,TYPE_UNSIGNED (vectype_in))
+	     != CODE_FOR_nothing))
+    {
+      *code1 = code;
+      return true;
+    }
+
+  /* Now check for builtin.  */
+  if (targetm.vectorize.builtin_conversion
+      && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+    {
+      *code1 = CALL_EXPR;
+      *decl = targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in);
+      return true;
+    }
+  return false;
+}
+
 \f
 /* Generate code to convert FROM to floating point
    and store in TO.  FROM must be fixed point and not VOIDmode.
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 41ae7eb..ce3605b 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -871,6 +871,12 @@ extern void expand_float (rtx, rtx, int);
 /* Return the insn_code for a FLOAT_EXPR.  */
 enum insn_code can_float_p (enum machine_mode, enum machine_mode, int);
 
+/* Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form */
+bool supportable_convert_operation (enum tree_code, tree, tree, tree *, 
+                                    enum tree_code *);
+
 /* Generate code for a FIX_EXPR.  */
 extern void expand_fix (rtx, rtx, int);
 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a3b5311..c785b0c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1806,7 +1806,9 @@ proc check_effective_target_vect_intfloat_cvt { } {
         if { [istarget i?86-*-*]
               || ([istarget powerpc*-*-*]
                    && ![istarget powerpc-*-linux*paired*])
-              || [istarget x86_64-*-*] } {
+              || [istarget x86_64-*-*] 
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_intfloat_cvt_saved 1
         }
     }
@@ -1842,7 +1844,9 @@ proc check_effective_target_vect_uintfloat_cvt { } {
         if { [istarget i?86-*-*]
 	      || ([istarget powerpc*-*-*]
 		  && ![istarget powerpc-*-linux*paired*])
-	      || [istarget x86_64-*-*] } {
+	      || [istarget x86_64-*-*] 
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_uintfloat_cvt_saved 1
         }
     }
@@ -1865,7 +1869,9 @@ proc check_effective_target_vect_floatint_cvt { } {
         if { [istarget i?86-*-*]
               || ([istarget powerpc*-*-*]
                    && ![istarget powerpc-*-linux*paired*])
-              || [istarget x86_64-*-*] } {
+              || [istarget x86_64-*-*]
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_floatint_cvt_saved 1
         }
     }
@@ -1885,7 +1891,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
     } else {
         set et_vect_floatuint_cvt_saved 0
         if { ([istarget powerpc*-*-*]
-	      && ![istarget powerpc-*-linux*paired*]) } {
+	      && ![istarget powerpc-*-linux*paired*]) 
+             || ([istarget arm*-*-*]
+                 && [check_effective_target_arm_neon_ok])} {
            set et_vect_floatuint_cvt_saved 1
         }
     }
@@ -3335,7 +3343,9 @@ proc check_effective_target_vect_extract_even_odd { } {
              || [istarget i?86-*-*]
              || [istarget x86_64-*-*]
              || [istarget ia64-*-*]
-             || [istarget spu-*-*] } {
+             || [istarget spu-*-*] 
+             || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_extract_even_odd_saved 1
         }
     }
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index bcf71b9..1f3f10a 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3342,7 +3342,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FLOAT_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	if ((!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	    && (!VECTOR_INTEGER_TYPE_P (rhs1_type)
+	        || !VECTOR_FLOAT_TYPE_P(lhs_type)))
 	  {
 	    error ("invalid types in conversion to floating point");
 	    debug_generic_expr (lhs_type);
@@ -3355,7 +3357,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FIX_TRUNC_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+        if ((!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+            && (!VECTOR_INTEGER_TYPE_P (lhs_type)
+                || !VECTOR_FLOAT_TYPE_P(rhs1_type)))
 	  {
 	    error ("invalid types in conversion to integer");
 	    debug_generic_expr (lhs_type);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f2ac8c7..106f292 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1821,7 +1821,6 @@ vect_gen_widened_results_half (enum tree_code code,
   return new_stmt;
 }
 
-
 /* Check if STMT performs a conversion operation, that can be vectorized.
    If VEC_STMT is also passed, vectorize the STMT: create a vectorized
    stmt to replace it, put it in VEC_STMT, and insert it at BSI.
@@ -1850,7 +1849,6 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
   tree vectype_out, vectype_in;
   int ncopies, j;
   tree rhs_type;
-  tree builtin_decl;
   enum { NARROW, NONE, WIDEN } modifier;
   int i;
   VEC(tree,heap) *vec_oprnds0 = NULL;
@@ -1939,7 +1937,7 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 
   /* Supportable by target?  */
   if ((modifier == NONE
-       && !targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+       && !supportable_convert_operation (code, vectype_out, vectype_in, &decl1, &code1))
       || (modifier == WIDEN
 	  && !supportable_widening_operation (code, stmt,
 					      vectype_out, vectype_in,
@@ -1989,19 +1987,28 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 	  else
 	    vect_get_vec_defs_for_stmt_copy (dt, &vec_oprnds0, NULL);
 
-	  builtin_decl =
-	    targetm.vectorize.builtin_conversion (code,
-						  vectype_out, vectype_in);
 	  FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vop0)
-	    {
-	      /* Arguments are ready. create the new vector stmt.  */
-	      new_stmt = gimple_build_call (builtin_decl, 1, vop0);
-	      new_temp = make_ssa_name (vec_dest, new_stmt);
-	      gimple_call_set_lhs (new_stmt, new_temp);
-	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	      if (slp_node)
-		VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
-	    }
+          {
+            /* Arguments are ready, create the new vector stmt.  */
+            if (code1 == CALL_EXPR)
+              {
+                new_stmt = gimple_build_call (decl1, 1, vop0);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_call_set_lhs (new_stmt, new_temp);
+              }
+            else
+              {
+                gcc_assert (TREE_CODE_LENGTH (code) == unary_op);
+                new_stmt = gimple_build_assign_with_ops (code, vec_dest, vop0,
+                                                        NULL);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_assign_set_lhs (new_stmt, new_temp);
+              }
+
+            vect_finish_stmt_generation (stmt, new_stmt, gsi);
+            if (slp_node)
+              VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
+          }
 
 	  if (j == 0)
 	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
diff --git a/gcc/tree.h b/gcc/tree.h
index 18fdd07..537e54b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1120,6 +1120,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (TREE_CODE (TYPE) == COMPLEX_TYPE	\
    && TREE_CODE (TREE_TYPE (TYPE)) == REAL_TYPE)
 
+/* Nonzero if TYPE represents a vector integer type.  */
+                
+#define VECTOR_INTEGER_TYPE_P(TYPE)                   \
+             (TREE_CODE (TYPE) == VECTOR_TYPE      \
+                 && TREE_CODE (TREE_TYPE (TYPE)) == INTEGER_TYPE)
+
+
 /* Nonzero if TYPE represents a vector floating-point type.  */
 
 #define VECTOR_FLOAT_TYPE_P(TYPE)	\
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
new file mode 100644
index 0000000..f33206c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-add-options arm_neon } */
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
new file mode 100644
index 0000000..3412cf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -mvectorize-with-neon-quad" } */
+/* { dg-add-options arm_neon } */
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-10-28  8:51                   ` Dmitry Plotnikov
@ 2011-10-28 15:25                     ` Richard Henderson
  2011-11-08  9:16                     ` [PATCH][PING^2] " Dmitry Plotnikov
  2011-11-22 14:04                     ` [PATCH][PING] " Ramana Radhakrishnan
  2 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2011-10-28 15:25 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches, Joseph S. Myers, Ramana Radhakrishnan

On 10/28/2011 01:22 AM, Dmitry Plotnikov wrote:
> gcc/
>     * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
>     * optabs.c (supportable_convert_operation): New function.
>     * optabs.h (supportable_convert_operation): New prototype.
>     * tree-vect-stmts.c (vectorizable_conversion): Change condition and behavior
>       for NONE modifier case.
>     * tree.h (VECTOR_INTEGER_TYPE_P): New macro.
...
> gcc/testsuite/
>     * gcc.target/arm/vect-vcvt.c: New test.
>     * gcc.target/arm/vect-vcvtq.c: New test.
> 
> gcc/testsuite/lib/
>     * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
>       for ARM NEON.
>       (check_effective_target_vect_uintfloat_cvt): Likewise.
>       (check_effective_target_vect_intfloat_cvt): Likewise.
>       (check_effective_target_vect_floatuint_cvt): Likewise.
>       (check_effective_target_vect_floatint_cvt): Likewise.
>       (check_effective_target_vect_extract_even_odd): Likewise. 



Ok.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING^2] Vectorize conversions directly
  2011-10-28  8:51                   ` Dmitry Plotnikov
  2011-10-28 15:25                     ` Richard Henderson
@ 2011-11-08  9:16                     ` Dmitry Plotnikov
  2011-11-22 13:40                       ` [PATCH][PING^3] " Dmitry Plotnikov
  2011-11-22 14:04                     ` [PATCH][PING] " Ramana Radhakrishnan
  2 siblings, 1 reply; 21+ messages in thread
From: Dmitry Plotnikov @ 2011-11-08  9:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Joseph S. Myers, Ramana Radhakrishnan, Richard Henderson

Ping.

On 10/28/2011 12:22 PM, Dmitry Plotnikov wrote:
> Here is the patch updated according to recent comments.
>
> 2011-10-28 Dmitry Plotnikov <dplotnikov@ispras.ru>
>
> gcc/
> * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
> * optabs.c (supportable_convert_operation): New function.
> * optabs.h (supportable_convert_operation): New prototype.
> * tree-vect-stmts.c (vectorizable_conversion): Change condition and
> behavior
> for NONE modifier case.
> * tree.h (VECTOR_INTEGER_TYPE_P): New macro.
>
> gcc/config/arm/
> * neon.md (floatv2siv2sf2): New.
> (floatunsv2siv2sf2): New.
> (fix_truncv2sfv2si2): New.
> (fix_truncunsv2sfv2si2): New.
> (floatv4siv4sf2): New.
> (floatunsv4siv4sf2): New.
> (fix_truncv4sfv4si2): New.
> (fix_truncunsv4sfv4si2): New.
>
> gcc/testsuite/
> * gcc.target/arm/vect-vcvt.c: New test.
> * gcc.target/arm/vect-vcvtq.c: New test.
>
> gcc/testsuite/lib/
> * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
> for ARM NEON.
> (check_effective_target_vect_uintfloat_cvt): Likewise.
> (check_effective_target_vect_intfloat_cvt): Likewise.
> (check_effective_target_vect_floatuint_cvt): Likewise.
> (check_effective_target_vect_floatint_cvt): Likewise.
> (check_effective_target_vect_extract_even_odd): Likewise.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING^3] Vectorize conversions directly
  2011-11-08  9:16                     ` [PATCH][PING^2] " Dmitry Plotnikov
@ 2011-11-22 13:40                       ` Dmitry Plotnikov
  0 siblings, 0 replies; 21+ messages in thread
From: Dmitry Plotnikov @ 2011-11-22 13:40 UTC (permalink / raw)
  To: gcc-patches
  Cc: Joseph S. Myers, Ramana Radhakrishnan, Richard Henderson,
	Richard Earnshaw

Ping.  The ARM portion of this patch is still awaiting approval.

On 11/08/2011 12:35 PM, Dmitry Plotnikov wrote:
> Ping.
>
> On 10/28/2011 12:22 PM, Dmitry Plotnikov wrote:
>> Here is the patch updated according to recent comments.
>>
>> 2011-10-28 Dmitry Plotnikov <dplotnikov@ispras.ru>
>>
>> gcc/
>> * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
>> * optabs.c (supportable_convert_operation): New function.
>> * optabs.h (supportable_convert_operation): New prototype.
>> * tree-vect-stmts.c (vectorizable_conversion): Change condition and
>> behavior
>> for NONE modifier case.
>> * tree.h (VECTOR_INTEGER_TYPE_P): New macro.
>>
>> gcc/config/arm/
>> * neon.md (floatv2siv2sf2): New.
>> (floatunsv2siv2sf2): New.
>> (fix_truncv2sfv2si2): New.
>> (fix_truncunsv2sfv2si2): New.
>> (floatv4siv4sf2): New.
>> (floatunsv4siv4sf2): New.
>> (fix_truncv4sfv4si2): New.
>> (fix_truncunsv4sfv4si2): New.
>>
>> gcc/testsuite/
>> * gcc.target/arm/vect-vcvt.c: New test.
>> * gcc.target/arm/vect-vcvtq.c: New test.
>>
>> gcc/testsuite/lib/
>> * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
>> for ARM NEON.
>> (check_effective_target_vect_uintfloat_cvt): Likewise.
>> (check_effective_target_vect_intfloat_cvt): Likewise.
>> (check_effective_target_vect_floatuint_cvt): Likewise.
>> (check_effective_target_vect_floatint_cvt): Likewise.
>> (check_effective_target_vect_extract_even_odd): Likewise.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-10-28  8:51                   ` Dmitry Plotnikov
  2011-10-28 15:25                     ` Richard Henderson
  2011-11-08  9:16                     ` [PATCH][PING^2] " Dmitry Plotnikov
@ 2011-11-22 14:04                     ` Ramana Radhakrishnan
  2011-11-26 15:15                       ` Ira Rosen
  2 siblings, 1 reply; 21+ messages in thread
From: Ramana Radhakrishnan @ 2011-11-22 14:04 UTC (permalink / raw)
  To: Dmitry Plotnikov; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

Sorry , it's taken me a while to get to this.

On 28 October 2011 09:22, Dmitry Plotnikov <dplotnikov@ispras.ru> wrote:
>
> gcc/config/arm/
>    * neon.md (floatv2siv2sf2): New.
>      (floatunsv2siv2sf2): New.
>      (fix_truncv2sfv2si2): New.
>      (fix_truncunsv2sfv2si2): New.
>      (floatv4siv4sf2): New.
>      (floatunsv4siv4sf2): New.
>      (fix_truncv4sfv4si2): New.
>      (fix_truncunsv4sfv4si2): New.

It would have been better to write these in the iterator forms as well
as is the style in all neon.md. Also, you are missing neon_type
attributes - therefore these would be treated as being standard ALU
instructions rather than the neon instructions and hence would flow
through the ALU pipeline description . In the V2SI / V2SF case,  these
instructions should have a neon_type of  neon_fp_vadd_ddd_vabs_dd and
the V4SI / V4SF case treat them as having a neon type of
neon_fp_vadd_qqq_vabs_qq.

For bonus points integrate this with the patterns already defined for
the neon intrinsics expansion and thus essentially remove the UNSPEC's
from the neon_vcvt<mode> patterns. Thus essentially converting your
define_insn patterns to define_expands and massaging the whole thing
through.

>
> gcc/testsuite/
>    * gcc.target/arm/vect-vcvt.c: New test.
>    * gcc.target/arm/vect-vcvtq.c: New test.

There's no need for -mvectorize-with-neon-quad in the tests. That is
the default these days on trunk.

>
> gcc/testsuite/lib/
>    * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
>      for ARM NEON.
>      (check_effective_target_vect_uintfloat_cvt): Likewise.
>      (check_effective_target_vect_intfloat_cvt): Likewise.
>      (check_effective_target_vect_floatuint_cvt): Likewise.
>      (check_effective_target_vect_floatint_cvt): Likewise.
>      (check_effective_target_vect_extract_even_odd): Likewise.

I'm not sure about enabling the vect_extract_even_odd case. If this
assumes the presence of an extract-even-odd from registers type
operation, then the Neon port doesn't really support vec_extract_even
/ vec_extract_odd forms -  You do have them in one single instruction
if you tried to load them from / or store them to memory which is the
vld2 / vst2 instruction while the register form of vuzp which reads
and writes to both source operands is not really supported directly
from the backend.

The other testsuite changes look OK to me.

cheers
Ramana

>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-11-22 14:04                     ` [PATCH][PING] " Ramana Radhakrishnan
@ 2011-11-26 15:15                       ` Ira Rosen
  2011-12-22 13:27                         ` Dmitry Plotnikov
  0 siblings, 1 reply; 21+ messages in thread
From: Ira Rosen @ 2011-11-26 15:15 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Dmitry Plotnikov, gcc-patches, Joseph S. Myers, Richard Henderson



gcc-patches-owner@gcc.gnu.org wrote on 22/11/2011 03:31:22 PM:

> From: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
> > gcc/testsuite/lib/
> >    * target-supports.exp (check_effective_target_vect_intfloat_cvt):
True
> >      for ARM NEON.
> >      (check_effective_target_vect_uintfloat_cvt): Likewise.
> >      (check_effective_target_vect_intfloat_cvt): Likewise.
> >      (check_effective_target_vect_floatuint_cvt): Likewise.
> >      (check_effective_target_vect_floatint_cvt): Likewise.
> >      (check_effective_target_vect_extract_even_odd): Likewise.
>
> I'm not sure about enabling the vect_extract_even_odd case. If this
> assumes the presence of an extract-even-odd from registers type
> operation, then the Neon port doesn't really support vec_extract_even
> / vec_extract_odd forms -  You do have them in one single instruction
> if you tried to load them from / or store them to memory which is the
> vld2 / vst2 instruction while the register form of vuzp which reads
> and writes to both source operands is not really supported directly
> from the backend.

Right.
Dmitry, you can do this instead:

Index: fast-math-pr35982.c
===================================================================
--- fast-math-pr35982.c (revision 181150)
+++ fast-math-pr35982.c (working copy)
@@ -20,7 +20,7 @@
   return avg;
 }

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1
"vect" { target vect_extract_even_odd  } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail
vect_extract_even_odd  } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1
"vect" { target { vect_extract_even_odd || vect_strided2 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail
{ vect_extract_even_odd  || vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */

Ira

>
> The other testsuite changes look OK to me.
>
> cheers
> Ramana
>
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-11-26 15:15                       ` Ira Rosen
@ 2011-12-22 13:27                         ` Dmitry Plotnikov
  2011-12-22 13:48                           ` Richard Earnshaw
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry Plotnikov @ 2011-12-22 13:27 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ira Rosen, Ramana Radhakrishnan, Joseph S. Myers, Richard Henderson, dm

[-- Attachment #1: Type: text/plain, Size: 1487 bytes --]

Here is the patch with iterators for instructions and neon_type 
attributes.  Also fast-math-pr35982.c is changed according to Ira's 
comment.  I will look at integration with patterns for neon intrinsics 
later.

2011-12-22  Dmitry Plotnikov <dplotnikov@ispras.ru>

gcc/
     * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
     * optabs.c (supportable_convert_operation): New function.
     * optabs.h (supportable_convert_operation): New prototype.
     * tree-vect-stmts.c (vectorizable_conversion): Change condition
       and behavior for NONE modifier case.
     * tree.h (VECTOR_INTEGER_TYPE_P): New macro.

gcc/config/arm/
     * neon.md (float<mode><V_CVTTOF>2): New.
       (floatuns<mode><V_CVTTOF>2): New.
       (fix_trunc<mode><V_CVTTOI>2): New.
       (fix_truncuns<mode><V_CVTTOI>2): New.
     * iterators.md (V_CVTTOF): New iterator.
       (V_CVTTOI): New iterator.

gcc/testsuite/
     * gcc.target/arm/vect-vcvt.c: New test.
     * gcc.target/arm/vect-vcvtq.c: New test.

gcc/testsuite/gcc.dg/vect/
     * fast-math-pr35982.c: Added vect_strided2 alternative in final
       check.

gcc/testsuite/lib/
     * target-supports.exp (check_effective_target_vect_intfloat_cvt):
       True for ARM NEON.
       (check_effective_target_vect_uintfloat_cvt): Likewise.
       (check_effective_target_vect_intfloat_cvt): Likewise.
       (check_effective_target_vect_floatuint_cvt): Likewise.
       (check_effective_target_vect_floatint_cvt): Likewise.

[-- Attachment #2: vect-conv1.patch --]
[-- Type: text/x-patch, Size: 15568 bytes --]

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 85dd641..de4340c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -197,6 +197,10 @@
 (define_mode_attr V_CVTTO [(V2SI "V2SF") (V2SF "V2SI")
                (V4SI "V4SF") (V4SF "V4SI")])
 
+(define_mode_attr V_CVTTOF [(V2SI "v2sf") (V4SI "v4sf")])
+
+(define_mode_attr V_CVTTOI [(V2SF "v2si") (V4SF "v4si")])
+
 ;; Define element mode for each vector mode.
 (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
               (V4HI "HI") (V8HI "HI")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index ea09da2..dc715ed 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2932,11 +2932,55 @@
   DONE;
 })
 
+(define_insn "float<mode><V_CVTTOF>2"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+       (float:<V_CVTTO> (match_operand:VCVTI 1 "s_register_operand" "w")))]
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.s32\t%<V_reg>0, %<V_reg>1"
+  [(set (attr "neon_type")
+     (if_then_else (match_test "<Is_d_reg>")
+                   (const_string "neon_fp_vadd_ddd_vabs_dd")
+                   (const_string "neon_fp_vadd_qqq_vabs_qq")))]
+)
+
+(define_insn "floatuns<mode><V_CVTTOF>2"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+       (unsigned_float:<V_CVTTO> (match_operand:VCVTI 1 "s_register_operand" "w")))] 
+  "TARGET_NEON && !flag_rounding_math"
+  "vcvt.f32.u32\t%<V_reg>0, %<V_reg>1"
+  [(set (attr "neon_type")
+     (if_then_else (match_test "<Is_d_reg>")
+                   (const_string "neon_fp_vadd_ddd_vabs_dd")
+                   (const_string "neon_fp_vadd_qqq_vabs_qq")))]
+)
+
+(define_insn "fix_trunc<mode><V_CVTTOI>2"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+        (fix:<V_CVTTO> (match_operand:VCVTF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.s32.f32\t%<V_reg>0, %<V_reg>1"
+  [(set (attr "neon_type")
+     (if_then_else (match_test "<Is_d_reg>")
+                   (const_string "neon_fp_vadd_ddd_vabs_dd")
+                   (const_string "neon_fp_vadd_qqq_vabs_qq")))]
+)
+
+(define_insn "fixuns_trunc<mode><V_CVTTOI>2"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+        (unsigned_fix:<V_CVTTO> (match_operand:VCVTF 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vcvt.u32.f32\t%<V_reg>0, %<V_reg>1"
+  [(set (attr "neon_type")
+     (if_then_else (match_test "<Is_d_reg>")
+                   (const_string "neon_fp_vadd_ddd_vabs_dd")
+                   (const_string "neon_fp_vadd_qqq_vabs_qq")))]
+)
+
 (define_insn "neon_vcvt<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
diff --git a/gcc/optabs.c b/gcc/optabs.c
index a373d7a..e504284 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4792,6 +4792,60 @@ can_float_p (enum machine_mode fltmode, enum machine_mode fixmode,
   tab = unsignedp ? ufloat_optab : sfloat_optab;
   return convert_optab_handler (tab, fltmode, fixmode);
 }
+
+/* Function supportable_convert_operation
+
+   Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form (i.e., when operating on arguments of type VECTYPE_IN
+   producing a result of type VECTYPE_OUT).
+   
+   Convert operations we currently support directly are FIX_TRUNC and FLOAT.
+   This function checks if these operations are supported
+   by the target platform either directly (via vector tree-codes), or via
+   target builtins.
+   
+   Output:
+   - CODE1 is code of vector operation to be used when
+   vectorizing the operation, if available.
+   - DECL is decl of target builtin functions to be used
+   when vectorizing the operation, if available.  In this case,
+   CODE1 is CALL_EXPR.  */
+
+bool
+supportable_convert_operation (enum tree_code code,
+                                    tree vectype_out, tree vectype_in,
+                                    tree *decl, enum tree_code *code1)
+{
+  enum machine_mode m1,m2;
+  int truncp;
+
+  m1 = TYPE_MODE (vectype_out);
+  m2 = TYPE_MODE (vectype_in);
+
+  /* First check if we can done conversion directly.  */
+  if ((code == FIX_TRUNC_EXPR 
+       && can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp) 
+          != CODE_FOR_nothing)
+      || (code == FLOAT_EXPR
+          && can_float_p (m1,m2,TYPE_UNSIGNED (vectype_in))
+	     != CODE_FOR_nothing))
+    {
+      *code1 = code;
+      return true;
+    }
+
+  /* Now check for builtin.  */
+  if (targetm.vectorize.builtin_conversion
+      && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+    {
+      *code1 = CALL_EXPR;
+      *decl = targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in);
+      return true;
+    }
+  return false;
+}
+
 \f
 /* Generate code to convert FROM to floating point
    and store in TO.  FROM must be fixed point and not VOIDmode.
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 926d21f..4747ab6 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -873,6 +873,12 @@ extern void expand_float (rtx, rtx, int);
 /* Return the insn_code for a FLOAT_EXPR.  */
 enum insn_code can_float_p (enum machine_mode, enum machine_mode, int);
 
+/* Check whether an operation represented by the code CODE is a
+   convert operation that is supported by the target platform in
+   vector form */
+bool supportable_convert_operation (enum tree_code, tree, tree, tree *, 
+                                    enum tree_code *);
+
 /* Generate code for a FIX_EXPR.  */
 extern void expand_fix (rtx, rtx, int);
 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 77f8a00..d6c9180 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1806,7 +1806,9 @@ proc check_effective_target_vect_intfloat_cvt { } {
         if { [istarget i?86-*-*]
               || ([istarget powerpc*-*-*]
                    && ![istarget powerpc-*-linux*paired*])
-              || [istarget x86_64-*-*] } {
+              || [istarget x86_64-*-*] 
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_intfloat_cvt_saved 1
         }
     }
@@ -1842,7 +1844,9 @@ proc check_effective_target_vect_uintfloat_cvt { } {
         if { [istarget i?86-*-*]
 	      || ([istarget powerpc*-*-*]
 		  && ![istarget powerpc-*-linux*paired*])
-	      || [istarget x86_64-*-*] } {
+	      || [istarget x86_64-*-*] 
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_uintfloat_cvt_saved 1
         }
     }
@@ -1865,7 +1869,9 @@ proc check_effective_target_vect_floatint_cvt { } {
         if { [istarget i?86-*-*]
               || ([istarget powerpc*-*-*]
                    && ![istarget powerpc-*-linux*paired*])
-              || [istarget x86_64-*-*] } {
+              || [istarget x86_64-*-*]
+              || ([istarget arm*-*-*]
+                  && [check_effective_target_arm_neon_ok])} {
            set et_vect_floatint_cvt_saved 1
         }
     }
@@ -1885,7 +1891,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
     } else {
         set et_vect_floatuint_cvt_saved 0
         if { ([istarget powerpc*-*-*]
-	      && ![istarget powerpc-*-linux*paired*]) } {
+	      && ![istarget powerpc-*-linux*paired*]) 
+             || ([istarget arm*-*-*]
+                 && [check_effective_target_arm_neon_ok])} {
            set et_vect_floatuint_cvt_saved 1
         }
     }
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index bcf71b9..1f3f10a 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3342,7 +3342,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FLOAT_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	if ((!INTEGRAL_TYPE_P (rhs1_type) || !SCALAR_FLOAT_TYPE_P (lhs_type))
+	    && (!VECTOR_INTEGER_TYPE_P (rhs1_type)
+	        || !VECTOR_FLOAT_TYPE_P(lhs_type)))
 	  {
 	    error ("invalid types in conversion to floating point");
 	    debug_generic_expr (lhs_type);
@@ -3355,7 +3357,9 @@ verify_gimple_assign_unary (gimple stmt)
 
     case FIX_TRUNC_EXPR:
       {
-	if (!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+        if ((!INTEGRAL_TYPE_P (lhs_type) || !SCALAR_FLOAT_TYPE_P (rhs1_type))
+            && (!VECTOR_INTEGER_TYPE_P (lhs_type)
+                || !VECTOR_FLOAT_TYPE_P(rhs1_type)))
 	  {
 	    error ("invalid types in conversion to integer");
 	    debug_generic_expr (lhs_type);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d986ff8..2dbae9a 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1821,7 +1821,6 @@ vect_gen_widened_results_half (enum tree_code code,
   return new_stmt;
 }
 
-
 /* Check if STMT performs a conversion operation, that can be vectorized.
    If VEC_STMT is also passed, vectorize the STMT: create a vectorized
    stmt to replace it, put it in VEC_STMT, and insert it at BSI.
@@ -1850,7 +1849,6 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
   tree vectype_out, vectype_in;
   int ncopies, j;
   tree rhs_type;
-  tree builtin_decl;
   enum { NARROW, NONE, WIDEN } modifier;
   int i;
   VEC(tree,heap) *vec_oprnds0 = NULL;
@@ -1939,7 +1937,7 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 
   /* Supportable by target?  */
   if ((modifier == NONE
-       && !targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
+       && !supportable_convert_operation (code, vectype_out, vectype_in, &decl1, &code1))
       || (modifier == WIDEN
 	  && !supportable_widening_operation (code, stmt,
 					      vectype_out, vectype_in,
@@ -1989,19 +1987,28 @@ vectorizable_conversion (gimple stmt, gimple_stmt_iterator *gsi,
 	  else
 	    vect_get_vec_defs_for_stmt_copy (dt, &vec_oprnds0, NULL);
 
-	  builtin_decl =
-	    targetm.vectorize.builtin_conversion (code,
-						  vectype_out, vectype_in);
 	  FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vop0)
-	    {
-	      /* Arguments are ready. create the new vector stmt.  */
-	      new_stmt = gimple_build_call (builtin_decl, 1, vop0);
-	      new_temp = make_ssa_name (vec_dest, new_stmt);
-	      gimple_call_set_lhs (new_stmt, new_temp);
-	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	      if (slp_node)
-		VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
-	    }
+          {
+            /* Arguments are ready, create the new vector stmt.  */
+            if (code1 == CALL_EXPR)
+              {
+                new_stmt = gimple_build_call (decl1, 1, vop0);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_call_set_lhs (new_stmt, new_temp);
+              }
+            else
+              {
+                gcc_assert (TREE_CODE_LENGTH (code) == unary_op);
+                new_stmt = gimple_build_assign_with_ops (code, vec_dest, vop0,
+                                                        NULL);
+                new_temp = make_ssa_name (vec_dest, new_stmt);
+                gimple_assign_set_lhs (new_stmt, new_temp);
+              }
+
+            vect_finish_stmt_generation (stmt, new_stmt, gsi);
+            if (slp_node)
+              VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
+          }
 
 	  if (j == 0)
 	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f22add6..d1d1835 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -818,6 +818,9 @@ extern bool vect_transform_stmt (gimple, gimple_stmt_iterator *,
                                  bool *, slp_tree, slp_instance);
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
+extern bool supportable_convert_operation (enum tree_code, tree, tree,
+                                          tree *, enum tree_code *);
+
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
                                     tree, int);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
diff --git a/gcc/tree.h b/gcc/tree.h
index 18fdd07..537e54b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1120,6 +1120,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (TREE_CODE (TYPE) == COMPLEX_TYPE	\
    && TREE_CODE (TREE_TYPE (TYPE)) == REAL_TYPE)
 
+/* Nonzero if TYPE represents a vector integer type.  */
+                
+#define VECTOR_INTEGER_TYPE_P(TYPE)                   \
+             (TREE_CODE (TYPE) == VECTOR_TYPE      \
+                 && TREE_CODE (TREE_TYPE (TYPE)) == INTEGER_TYPE)
+
+
 /* Nonzero if TYPE represents a vector floating-point type.  */
 
 #define VECTOR_FLOAT_TYPE_P(TYPE)	\
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c b/gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c
index d839406..0d4c43a 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c
@@ -20,7 +20,7 @@ float method2_int16 (struct mem *mem)
   return avg;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd  } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd  } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd || vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd || vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
new file mode 100644
index 0000000..f33206c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -mvectorize-with-neon-double" } */
+/* { dg-add-options arm_neon } */
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
new file mode 100644
index 0000000..3412cf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-add-options arm_neon } */
+
+#define N 32
+
+int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+float fa[N];
+int ia[N];
+
+int convert()
+{
+  int i;
+
+  /* int -> float */
+  for (i = 0; i < N; i++)
+    fa[i] = (float) ib[i];
+
+  /* float -> int */
+  for (i = 0; i < N; i++)
+    ia[i] = (int) fa[i];
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][PING] Vectorize conversions directly
  2011-12-22 13:27                         ` Dmitry Plotnikov
@ 2011-12-22 13:48                           ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2011-12-22 13:48 UTC (permalink / raw)
  To: Dmitry Plotnikov
  Cc: gcc-patches, Ira Rosen, Ramana Radhakrishnan, Joseph S. Myers,
	Richard Henderson, dm

On 22/12/11 13:14, Dmitry Plotnikov wrote:

> gcc/config/arm/
>      * neon.md (float<mode><V_CVTTOF>2): New.
>        (floatuns<mode><V_CVTTOF>2): New.
>        (fix_trunc<mode><V_CVTTOI>2): New.
>        (fix_truncuns<mode><V_CVTTOI>2): New.
>      * iterators.md (V_CVTTOF): New iterator.
>        (V_CVTTOI): New iterator.
> 
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 85dd641..de4340c 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -197,6 +197,10 @@
>  (define_mode_attr V_CVTTO [(V2SI "V2SF") (V2SF "V2SI")
>                 (V4SI "V4SF") (V4SF "V4SI")])
>  
> +(define_mode_attr V_CVTTOF [(V2SI "v2sf") (V4SI "v4sf")])
> +
> +(define_mode_attr V_CVTTOI [(V2SF "v2si") (V4SF "v4si")])
> +

attributes can be any superset of the iterator, so you don't need two
separate attributes here.

;; As above but in lower case.
(define_mode_attr V_cvtto [V2SI "v2sf") (V2SF "v2si")
                           (V4SI "v4sf") (V4SF "v4si")])

is perfectly adequate and matches other attributes in the ARM back-end.

The ARM bits are OK with that change.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-12-22 13:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-24 16:09 [PATCH] Vectorize conversions directly Dmitry Plotnikov
2010-11-24 16:35 ` Dmitry Plotnikov
2010-11-27  4:12   ` Richard Henderson
2010-12-09 14:06     ` Dmitry Plotnikov
2010-12-10 16:05       ` Richard Henderson
     [not found]         ` <4EA04B20.1090009@ispras.ru>
2011-10-20 17:46           ` [PATCH][PING] " Richard Henderson
2011-10-21 12:23             ` Ramana Radhakrishnan
2011-10-24  9:25           ` Dmitry Plotnikov
2011-10-24 14:39             ` Joseph S. Myers
2011-10-24 15:48               ` Ramana Radhakrishnan
2011-10-24 17:11                 ` Joseph S. Myers
2011-10-28  8:51                   ` Dmitry Plotnikov
2011-10-28 15:25                     ` Richard Henderson
2011-11-08  9:16                     ` [PATCH][PING^2] " Dmitry Plotnikov
2011-11-22 13:40                       ` [PATCH][PING^3] " Dmitry Plotnikov
2011-11-22 14:04                     ` [PATCH][PING] " Ramana Radhakrishnan
2011-11-26 15:15                       ` Ira Rosen
2011-12-22 13:27                         ` Dmitry Plotnikov
2011-12-22 13:48                           ` Richard Earnshaw
2010-11-24 17:28 ` [PATCH] " Richard Guenther
2010-11-25 18:25 ` Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).