public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
@ 2012-05-29  4:13 Matt Turner
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 3/5] built in define and expand Matt Turner
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Matt Turner @ 2012-05-29  4:13 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, Nick Clifton, Paul Brook,
	Xinyu Qi


This series was written by Marvell and sent by Xinyu Qi <xyqi@marvell.com>
a number of times in the last year.

We (One Laptop per Child) need these patches for reasonable iWMMXt support
and performance. Without them, logical and shift intrinsics cause ICEs,
see PR 35294 and its duplicates 36798 and 36966.

The software compositing library pixman uses MMX intrinsics to optimize
various compositing routines. The following are the minimum execution times
of cairo-perf-trace graphics work loads without and with iWMMXt-optimized
pixman for the image and image16 backends (32-bpp and 16-bpp respectively).

                             image               image16
           evolution   33.492 ->  29.590    30.334 ->  24.751
firefox-planet-gnome  191.465 -> 173.835   211.297 -> 187.570
gnome-system-monitor   51.956 ->  44.549    52.272 ->  40.525
  gnome-terminal-vim   53.625 ->  54.554    47.593 ->  47.341
      grads-heat-map    4.439 ->   4.165     4.548 ->   4.624
       midori-zoomed   38.033 ->  28.500    38.576 ->  26.937
             poppler   41.096 ->  31.949    41.230 ->  31.749
  swfdec-giant-steps   20.062 ->  16.912    28.294 ->  17.286
      swfdec-youtube   42.281 ->  37.335    52.848 ->  47.053
   xfce4-terminal-a1   64.311 ->  51.011    62.592 ->  51.191

We have cleaned up some white-space issues with the patches and fixed a
small bug in patch 4/5 since the last time they were posted in December
(added tandc,textrc,torc,torvsc to the "wtype" attribute)

Please commit them for 4.8.

For 4.7 and 4.6 please consider committing my patch
"[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)."
which only fixes the logical and shift intrinsics.

Thanks,

Matt Turner

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH ARM iWMMXt 3/5] built in define and expand
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
@ 2012-05-29  4:14 ` Matt Turner
  2012-06-06 11:55   ` Ramana Radhakrishnan
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 5/5] pipeline description Matt Turner
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Matt Turner @ 2012-05-29  4:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, Nick Clifton, Paul Brook,
	Xinyu Qi

From: Xinyu Qi <xyqi@marvell.com>

	gcc/
	* config/arm/arm.c (enum arm_builtins): Revise built-in fcode.
	(IWMMXT2_BUILTIN): New define.
	(IWMMXT2_BUILTIN2): Likewise.
	(iwmmx2_mbuiltin): Likewise.
	(builtin_description bdesc_2arg): Revise built in declaration.
	(builtin_description bdesc_1arg): Likewise.
	(arm_init_iwmmxt_builtins): Revise built in initialization.
	(arm_expand_builtin): Revise built in expansion.
---
 gcc/config/arm/arm.c |  620 +++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 559 insertions(+), 61 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b0680ab..51eed40 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19637,8 +19637,15 @@ static neon_builtin_datum neon_builtin_data[] =
    FIXME?  */
 enum arm_builtins
 {
-  ARM_BUILTIN_GETWCX,
-  ARM_BUILTIN_SETWCX,
+  ARM_BUILTIN_GETWCGR0,
+  ARM_BUILTIN_GETWCGR1,
+  ARM_BUILTIN_GETWCGR2,
+  ARM_BUILTIN_GETWCGR3,
+
+  ARM_BUILTIN_SETWCGR0,
+  ARM_BUILTIN_SETWCGR1,
+  ARM_BUILTIN_SETWCGR2,
+  ARM_BUILTIN_SETWCGR3,
 
   ARM_BUILTIN_WZERO,
 
@@ -19661,7 +19668,11 @@ enum arm_builtins
   ARM_BUILTIN_WSADH,
   ARM_BUILTIN_WSADHZ,
 
-  ARM_BUILTIN_WALIGN,
+  ARM_BUILTIN_WALIGNI,
+  ARM_BUILTIN_WALIGNR0,
+  ARM_BUILTIN_WALIGNR1,
+  ARM_BUILTIN_WALIGNR2,
+  ARM_BUILTIN_WALIGNR3,
 
   ARM_BUILTIN_TMIA,
   ARM_BUILTIN_TMIAPH,
@@ -19797,6 +19808,81 @@ enum arm_builtins
   ARM_BUILTIN_WUNPCKELUH,
   ARM_BUILTIN_WUNPCKELUW,
 
+  ARM_BUILTIN_WABSB,
+  ARM_BUILTIN_WABSH,
+  ARM_BUILTIN_WABSW,
+
+  ARM_BUILTIN_WADDSUBHX,
+  ARM_BUILTIN_WSUBADDHX,
+
+  ARM_BUILTIN_WABSDIFFB,
+  ARM_BUILTIN_WABSDIFFH,
+  ARM_BUILTIN_WABSDIFFW,
+
+  ARM_BUILTIN_WADDCH,
+  ARM_BUILTIN_WADDCW,
+
+  ARM_BUILTIN_WAVG4,
+  ARM_BUILTIN_WAVG4R,
+
+  ARM_BUILTIN_WMADDSX,
+  ARM_BUILTIN_WMADDUX,
+
+  ARM_BUILTIN_WMADDSN,
+  ARM_BUILTIN_WMADDUN,
+
+  ARM_BUILTIN_WMULWSM,
+  ARM_BUILTIN_WMULWUM,
+
+  ARM_BUILTIN_WMULWSMR,
+  ARM_BUILTIN_WMULWUMR,
+
+  ARM_BUILTIN_WMULWL,
+
+  ARM_BUILTIN_WMULSMR,
+  ARM_BUILTIN_WMULUMR,
+
+  ARM_BUILTIN_WQMULM,
+  ARM_BUILTIN_WQMULMR,
+
+  ARM_BUILTIN_WQMULWM,
+  ARM_BUILTIN_WQMULWMR,
+
+  ARM_BUILTIN_WADDBHUSM,
+  ARM_BUILTIN_WADDBHUSL,
+
+  ARM_BUILTIN_WQMIABB,
+  ARM_BUILTIN_WQMIABT,
+  ARM_BUILTIN_WQMIATB,
+  ARM_BUILTIN_WQMIATT,
+
+  ARM_BUILTIN_WQMIABBN,
+  ARM_BUILTIN_WQMIABTN,
+  ARM_BUILTIN_WQMIATBN,
+  ARM_BUILTIN_WQMIATTN,
+
+  ARM_BUILTIN_WMIABB,
+  ARM_BUILTIN_WMIABT,
+  ARM_BUILTIN_WMIATB,
+  ARM_BUILTIN_WMIATT,
+
+  ARM_BUILTIN_WMIABBN,
+  ARM_BUILTIN_WMIABTN,
+  ARM_BUILTIN_WMIATBN,
+  ARM_BUILTIN_WMIATTN,
+
+  ARM_BUILTIN_WMIAWBB,
+  ARM_BUILTIN_WMIAWBT,
+  ARM_BUILTIN_WMIAWTB,
+  ARM_BUILTIN_WMIAWTT,
+
+  ARM_BUILTIN_WMIAWBBN,
+  ARM_BUILTIN_WMIAWBTN,
+  ARM_BUILTIN_WMIAWTBN,
+  ARM_BUILTIN_WMIAWTTN,
+
+  ARM_BUILTIN_WMERGE,
+
   ARM_BUILTIN_THREAD_POINTER,
 
   ARM_BUILTIN_NEON_BASE,
@@ -20329,6 +20415,10 @@ static const struct builtin_description bdesc_2arg[] =
   { FL_IWMMXT, CODE_FOR_##code, "__builtin_arm_" string, \
     ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
+#define IWMMXT2_BUILTIN(code, string, builtin) \
+  { FL_IWMMXT2, CODE_FOR_##code, "__builtin_arm_" string, \
+    ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
   IWMMXT_BUILTIN (addv8qi3, "waddb", WADDB)
   IWMMXT_BUILTIN (addv4hi3, "waddh", WADDH)
   IWMMXT_BUILTIN (addv2si3, "waddw", WADDW)
@@ -20385,44 +20475,45 @@ static const struct builtin_description bdesc_2arg[] =
   IWMMXT_BUILTIN (iwmmxt_wunpckihb, "wunpckihb", WUNPCKIHB)
   IWMMXT_BUILTIN (iwmmxt_wunpckihh, "wunpckihh", WUNPCKIHH)
   IWMMXT_BUILTIN (iwmmxt_wunpckihw, "wunpckihw", WUNPCKIHW)
-  IWMMXT_BUILTIN (iwmmxt_wmadds, "wmadds", WMADDS)
-  IWMMXT_BUILTIN (iwmmxt_wmaddu, "wmaddu", WMADDU)
+  IWMMXT2_BUILTIN (iwmmxt_waddsubhx, "waddsubhx", WADDSUBHX)
+  IWMMXT2_BUILTIN (iwmmxt_wsubaddhx, "wsubaddhx", WSUBADDHX)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffb, "wabsdiffb", WABSDIFFB)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffh, "wabsdiffh", WABSDIFFH)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffw, "wabsdiffw", WABSDIFFW)
+  IWMMXT2_BUILTIN (iwmmxt_avg4, "wavg4", WAVG4)
+  IWMMXT2_BUILTIN (iwmmxt_avg4r, "wavg4r", WAVG4R)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsm, "wmulwsm", WMULWSM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwum, "wmulwum", WMULWUM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsmr, "wmulwsmr", WMULWSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwumr, "wmulwumr", WMULWUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwl, "wmulwl", WMULWL)
+  IWMMXT2_BUILTIN (iwmmxt_wmulsmr, "wmulsmr", WMULSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulumr, "wmulumr", WMULUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulm, "wqmulm", WQMULM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulmr, "wqmulmr", WQMULMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwm, "wqmulwm", WQMULWM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwmr, "wqmulwmr", WQMULWMR)
+  IWMMXT_BUILTIN (iwmmxt_walignr0, "walignr0", WALIGNR0)
+  IWMMXT_BUILTIN (iwmmxt_walignr1, "walignr1", WALIGNR1)
+  IWMMXT_BUILTIN (iwmmxt_walignr2, "walignr2", WALIGNR2)
+  IWMMXT_BUILTIN (iwmmxt_walignr3, "walignr3", WALIGNR3)
 
 #define IWMMXT_BUILTIN2(code, builtin) \
   { FL_IWMMXT, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
+#define IWMMXT2_BUILTIN2(code, builtin) \
+  { FL_IWMMXT2, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
+  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusm, WADDBHUSM)
+  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusl, WADDBHUSL)
   IWMMXT_BUILTIN2 (iwmmxt_wpackhss, WPACKHSS)
   IWMMXT_BUILTIN2 (iwmmxt_wpackwss, WPACKWSS)
   IWMMXT_BUILTIN2 (iwmmxt_wpackdss, WPACKDSS)
   IWMMXT_BUILTIN2 (iwmmxt_wpackhus, WPACKHUS)
   IWMMXT_BUILTIN2 (iwmmxt_wpackwus, WPACKWUS)
   IWMMXT_BUILTIN2 (iwmmxt_wpackdus, WPACKDUS)
-  IWMMXT_BUILTIN2 (ashlv4hi3_di,    WSLLH)
-  IWMMXT_BUILTIN2 (ashlv4hi3_iwmmxt, WSLLHI)
-  IWMMXT_BUILTIN2 (ashlv2si3_di,    WSLLW)
-  IWMMXT_BUILTIN2 (ashlv2si3_iwmmxt, WSLLWI)
-  IWMMXT_BUILTIN2 (ashldi3_di,      WSLLD)
-  IWMMXT_BUILTIN2 (ashldi3_iwmmxt,  WSLLDI)
-  IWMMXT_BUILTIN2 (lshrv4hi3_di,    WSRLH)
-  IWMMXT_BUILTIN2 (lshrv4hi3_iwmmxt, WSRLHI)
-  IWMMXT_BUILTIN2 (lshrv2si3_di,    WSRLW)
-  IWMMXT_BUILTIN2 (lshrv2si3_iwmmxt, WSRLWI)
-  IWMMXT_BUILTIN2 (lshrdi3_di,      WSRLD)
-  IWMMXT_BUILTIN2 (lshrdi3_iwmmxt,  WSRLDI)
-  IWMMXT_BUILTIN2 (ashrv4hi3_di,    WSRAH)
-  IWMMXT_BUILTIN2 (ashrv4hi3_iwmmxt, WSRAHI)
-  IWMMXT_BUILTIN2 (ashrv2si3_di,    WSRAW)
-  IWMMXT_BUILTIN2 (ashrv2si3_iwmmxt, WSRAWI)
-  IWMMXT_BUILTIN2 (ashrdi3_di,      WSRAD)
-  IWMMXT_BUILTIN2 (ashrdi3_iwmmxt,  WSRADI)
-  IWMMXT_BUILTIN2 (rorv4hi3_di,     WRORH)
-  IWMMXT_BUILTIN2 (rorv4hi3,        WRORHI)
-  IWMMXT_BUILTIN2 (rorv2si3_di,     WRORW)
-  IWMMXT_BUILTIN2 (rorv2si3,        WRORWI)
-  IWMMXT_BUILTIN2 (rordi3_di,       WRORD)
-  IWMMXT_BUILTIN2 (rordi3,          WRORDI)
-  IWMMXT_BUILTIN2 (iwmmxt_wmacuz,   WMACUZ)
-  IWMMXT_BUILTIN2 (iwmmxt_wmacsz,   WMACSZ)
+  IWMMXT_BUILTIN2 (iwmmxt_wmacuz, WMACUZ)
+  IWMMXT_BUILTIN2 (iwmmxt_wmacsz, WMACSZ)
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -20445,6 +20536,12 @@ static const struct builtin_description bdesc_1arg[] =
   IWMMXT_BUILTIN (iwmmxt_wunpckelsb, "wunpckelsb", WUNPCKELSB)
   IWMMXT_BUILTIN (iwmmxt_wunpckelsh, "wunpckelsh", WUNPCKELSH)
   IWMMXT_BUILTIN (iwmmxt_wunpckelsw, "wunpckelsw", WUNPCKELSW)
+  IWMMXT2_BUILTIN (iwmmxt_wabsv8qi3, "wabsb", WABSB)
+  IWMMXT2_BUILTIN (iwmmxt_wabsv4hi3, "wabsh", WABSH)
+  IWMMXT2_BUILTIN (iwmmxt_wabsv2si3, "wabsw", WABSW)
+  IWMMXT_BUILTIN (tbcstv8qi, "tbcstb", TBCSTB)
+  IWMMXT_BUILTIN (tbcstv4hi, "tbcsth", TBCSTH)
+  IWMMXT_BUILTIN (tbcstv2si, "tbcstw", TBCSTW)
 };
 
 /* Set up all the iWMMXt builtins.  This is not called if
@@ -20460,9 +20557,6 @@ arm_init_iwmmxt_builtins (void)
   tree V4HI_type_node = build_vector_type_for_mode (intHI_type_node, V4HImode);
   tree V8QI_type_node = build_vector_type_for_mode (intQI_type_node, V8QImode);
 
-  tree int_ftype_int
-    = build_function_type_list (integer_type_node,
-				integer_type_node, NULL_TREE);
   tree v8qi_ftype_v8qi_v8qi_int
     = build_function_type_list (V8QI_type_node,
 				V8QI_type_node, V8QI_type_node,
@@ -20524,6 +20618,9 @@ arm_init_iwmmxt_builtins (void)
   tree v4hi_ftype_v2si_v2si
     = build_function_type_list (V4HI_type_node,
 				V2SI_type_node, V2SI_type_node, NULL_TREE);
+  tree v8qi_ftype_v4hi_v8qi
+    = build_function_type_list (V8QI_type_node,
+	                        V4HI_type_node, V8QI_type_node, NULL_TREE);
   tree v2si_ftype_v4hi_v4hi
     = build_function_type_list (V2SI_type_node,
 				V4HI_type_node, V4HI_type_node, NULL_TREE);
@@ -20538,12 +20635,10 @@ arm_init_iwmmxt_builtins (void)
     = build_function_type_list (V2SI_type_node,
 				V2SI_type_node, long_long_integer_type_node,
 				NULL_TREE);
-  tree void_ftype_int_int
-    = build_function_type_list (void_type_node,
-				integer_type_node, integer_type_node,
-				NULL_TREE);
   tree di_ftype_void
     = build_function_type_list (long_long_unsigned_type_node, NULL_TREE);
+  tree int_ftype_void
+    = build_function_type_list (integer_type_node, NULL_TREE);
   tree di_ftype_v8qi
     = build_function_type_list (long_long_integer_type_node,
 				V8QI_type_node, NULL_TREE);
@@ -20559,6 +20654,15 @@ arm_init_iwmmxt_builtins (void)
   tree v4hi_ftype_v8qi
     = build_function_type_list (V4HI_type_node,
 				V8QI_type_node, NULL_TREE);
+  tree v8qi_ftype_v8qi
+    = build_function_type_list (V8QI_type_node,
+	                        V8QI_type_node, NULL_TREE);
+  tree v4hi_ftype_v4hi
+    = build_function_type_list (V4HI_type_node,
+	                        V4HI_type_node, NULL_TREE);
+  tree v2si_ftype_v2si
+    = build_function_type_list (V2SI_type_node,
+	                        V2SI_type_node, NULL_TREE);
 
   tree di_ftype_di_v4hi_v4hi
     = build_function_type_list (long_long_unsigned_type_node,
@@ -20571,6 +20675,48 @@ arm_init_iwmmxt_builtins (void)
 				V4HI_type_node,V4HI_type_node,
 				NULL_TREE);
 
+  tree v2si_ftype_v2si_v4hi_v4hi
+    = build_function_type_list (V2SI_type_node,
+                                V2SI_type_node, V4HI_type_node,
+                                V4HI_type_node, NULL_TREE);
+
+  tree v2si_ftype_v2si_v8qi_v8qi
+    = build_function_type_list (V2SI_type_node,
+                                V2SI_type_node, V8QI_type_node,
+                                V8QI_type_node, NULL_TREE);
+
+  tree di_ftype_di_v2si_v2si
+     = build_function_type_list (long_long_unsigned_type_node,
+                                 long_long_unsigned_type_node,
+                                 V2SI_type_node, V2SI_type_node,
+                                 NULL_TREE);
+
+   tree di_ftype_di_di_int
+     = build_function_type_list (long_long_unsigned_type_node,
+                                 long_long_unsigned_type_node,
+                                 long_long_unsigned_type_node,
+                                 integer_type_node, NULL_TREE);
+
+   tree void_ftype_void
+     = build_function_type_list (void_type_node,
+                                 NULL_TREE);
+
+   tree void_ftype_int
+     = build_function_type_list (void_type_node,
+                                 integer_type_node, NULL_TREE);
+
+   tree v8qi_ftype_char
+     = build_function_type_list (V8QI_type_node,
+                                 signed_char_type_node, NULL_TREE);
+
+   tree v4hi_ftype_short
+     = build_function_type_list (V4HI_type_node,
+                                 short_integer_type_node, NULL_TREE);
+
+   tree v2si_ftype_int
+     = build_function_type_list (V2SI_type_node,
+                                 integer_type_node, NULL_TREE);
+
   /* Normal vector binops.  */
   tree v8qi_ftype_v8qi_v8qi
     = build_function_type_list (V8QI_type_node,
@@ -20628,9 +20774,19 @@ arm_init_iwmmxt_builtins (void)
   def_mbuiltin (FL_IWMMXT, "__builtin_arm_" NAME, (TYPE),	\
 		ARM_BUILTIN_ ## CODE)
 
+#define iwmmx2_mbuiltin(NAME, TYPE, CODE)                      \
+  def_mbuiltin (FL_IWMMXT2, "__builtin_arm_" NAME, (TYPE),     \
+               ARM_BUILTIN_ ## CODE)
+
   iwmmx_mbuiltin ("wzero", di_ftype_void, WZERO);
-  iwmmx_mbuiltin ("setwcx", void_ftype_int_int, SETWCX);
-  iwmmx_mbuiltin ("getwcx", int_ftype_int, GETWCX);
+  iwmmx_mbuiltin ("setwcgr0", void_ftype_int, SETWCGR0);
+  iwmmx_mbuiltin ("setwcgr1", void_ftype_int, SETWCGR1);
+  iwmmx_mbuiltin ("setwcgr2", void_ftype_int, SETWCGR2);
+  iwmmx_mbuiltin ("setwcgr3", void_ftype_int, SETWCGR3);
+  iwmmx_mbuiltin ("getwcgr0", int_ftype_void, GETWCGR0);
+  iwmmx_mbuiltin ("getwcgr1", int_ftype_void, GETWCGR1);
+  iwmmx_mbuiltin ("getwcgr2", int_ftype_void, GETWCGR2);
+  iwmmx_mbuiltin ("getwcgr3", int_ftype_void, GETWCGR3);
 
   iwmmx_mbuiltin ("wsllh", v4hi_ftype_v4hi_di, WSLLH);
   iwmmx_mbuiltin ("wsllw", v2si_ftype_v2si_di, WSLLW);
@@ -20662,8 +20818,14 @@ arm_init_iwmmxt_builtins (void)
 
   iwmmx_mbuiltin ("wshufh", v4hi_ftype_v4hi_int, WSHUFH);
 
-  iwmmx_mbuiltin ("wsadb", v2si_ftype_v8qi_v8qi, WSADB);
-  iwmmx_mbuiltin ("wsadh", v2si_ftype_v4hi_v4hi, WSADH);
+  iwmmx_mbuiltin ("wsadb", v2si_ftype_v2si_v8qi_v8qi, WSADB);
+  iwmmx_mbuiltin ("wsadh", v2si_ftype_v2si_v4hi_v4hi, WSADH);
+  iwmmx_mbuiltin ("wmadds", v2si_ftype_v4hi_v4hi, WMADDS);
+  iwmmx2_mbuiltin ("wmaddsx", v2si_ftype_v4hi_v4hi, WMADDSX);
+  iwmmx2_mbuiltin ("wmaddsn", v2si_ftype_v4hi_v4hi, WMADDSN);
+  iwmmx_mbuiltin ("wmaddu", v2si_ftype_v4hi_v4hi, WMADDU);
+  iwmmx2_mbuiltin ("wmaddux", v2si_ftype_v4hi_v4hi, WMADDUX);
+  iwmmx2_mbuiltin ("wmaddun", v2si_ftype_v4hi_v4hi, WMADDUN);
   iwmmx_mbuiltin ("wsadbz", v2si_ftype_v8qi_v8qi, WSADBZ);
   iwmmx_mbuiltin ("wsadhz", v2si_ftype_v4hi_v4hi, WSADHZ);
 
@@ -20685,6 +20847,9 @@ arm_init_iwmmxt_builtins (void)
   iwmmx_mbuiltin ("tmovmskh", int_ftype_v4hi, TMOVMSKH);
   iwmmx_mbuiltin ("tmovmskw", int_ftype_v2si, TMOVMSKW);
 
+  iwmmx2_mbuiltin ("waddbhusm", v8qi_ftype_v4hi_v8qi, WADDBHUSM);
+  iwmmx2_mbuiltin ("waddbhusl", v8qi_ftype_v4hi_v8qi, WADDBHUSL);
+
   iwmmx_mbuiltin ("wpackhss", v8qi_ftype_v4hi_v4hi, WPACKHSS);
   iwmmx_mbuiltin ("wpackhus", v8qi_ftype_v4hi_v4hi, WPACKHUS);
   iwmmx_mbuiltin ("wpackwus", v4hi_ftype_v2si_v2si, WPACKWUS);
@@ -20710,7 +20875,7 @@ arm_init_iwmmxt_builtins (void)
   iwmmx_mbuiltin ("wmacu", di_ftype_di_v4hi_v4hi, WMACU);
   iwmmx_mbuiltin ("wmacuz", di_ftype_v4hi_v4hi, WMACUZ);
 
-  iwmmx_mbuiltin ("walign", v8qi_ftype_v8qi_v8qi_int, WALIGN);
+  iwmmx_mbuiltin ("walign", v8qi_ftype_v8qi_v8qi_int, WALIGNI);
   iwmmx_mbuiltin ("tmia", di_ftype_di_int_int, TMIA);
   iwmmx_mbuiltin ("tmiaph", di_ftype_di_int_int, TMIAPH);
   iwmmx_mbuiltin ("tmiabb", di_ftype_di_int_int, TMIABB);
@@ -20718,7 +20883,48 @@ arm_init_iwmmxt_builtins (void)
   iwmmx_mbuiltin ("tmiatb", di_ftype_di_int_int, TMIATB);
   iwmmx_mbuiltin ("tmiatt", di_ftype_di_int_int, TMIATT);
 
+  iwmmx2_mbuiltin ("wabsb", v8qi_ftype_v8qi, WABSB);
+  iwmmx2_mbuiltin ("wabsh", v4hi_ftype_v4hi, WABSH);
+  iwmmx2_mbuiltin ("wabsw", v2si_ftype_v2si, WABSW);
+
+  iwmmx2_mbuiltin ("wqmiabb", v2si_ftype_v2si_v4hi_v4hi, WQMIABB);
+  iwmmx2_mbuiltin ("wqmiabt", v2si_ftype_v2si_v4hi_v4hi, WQMIABT);
+  iwmmx2_mbuiltin ("wqmiatb", v2si_ftype_v2si_v4hi_v4hi, WQMIATB);
+  iwmmx2_mbuiltin ("wqmiatt", v2si_ftype_v2si_v4hi_v4hi, WQMIATT);
+
+  iwmmx2_mbuiltin ("wqmiabbn", v2si_ftype_v2si_v4hi_v4hi, WQMIABBN);
+  iwmmx2_mbuiltin ("wqmiabtn", v2si_ftype_v2si_v4hi_v4hi, WQMIABTN);
+  iwmmx2_mbuiltin ("wqmiatbn", v2si_ftype_v2si_v4hi_v4hi, WQMIATBN);
+  iwmmx2_mbuiltin ("wqmiattn", v2si_ftype_v2si_v4hi_v4hi, WQMIATTN);
+
+  iwmmx2_mbuiltin ("wmiabb", di_ftype_di_v4hi_v4hi, WMIABB);
+  iwmmx2_mbuiltin ("wmiabt", di_ftype_di_v4hi_v4hi, WMIABT);
+  iwmmx2_mbuiltin ("wmiatb", di_ftype_di_v4hi_v4hi, WMIATB);
+  iwmmx2_mbuiltin ("wmiatt", di_ftype_di_v4hi_v4hi, WMIATT);
+
+  iwmmx2_mbuiltin ("wmiabbn", di_ftype_di_v4hi_v4hi, WMIABBN);
+  iwmmx2_mbuiltin ("wmiabtn", di_ftype_di_v4hi_v4hi, WMIABTN);
+  iwmmx2_mbuiltin ("wmiatbn", di_ftype_di_v4hi_v4hi, WMIATBN);
+  iwmmx2_mbuiltin ("wmiattn", di_ftype_di_v4hi_v4hi, WMIATTN);
+
+  iwmmx2_mbuiltin ("wmiawbb", di_ftype_di_v2si_v2si, WMIAWBB);
+  iwmmx2_mbuiltin ("wmiawbt", di_ftype_di_v2si_v2si, WMIAWBT);
+  iwmmx2_mbuiltin ("wmiawtb", di_ftype_di_v2si_v2si, WMIAWTB);
+  iwmmx2_mbuiltin ("wmiawtt", di_ftype_di_v2si_v2si, WMIAWTT);
+
+  iwmmx2_mbuiltin ("wmiawbbn", di_ftype_di_v2si_v2si, WMIAWBBN);
+  iwmmx2_mbuiltin ("wmiawbtn", di_ftype_di_v2si_v2si, WMIAWBTN);
+  iwmmx2_mbuiltin ("wmiawtbn", di_ftype_di_v2si_v2si, WMIAWTBN);
+  iwmmx2_mbuiltin ("wmiawttn", di_ftype_di_v2si_v2si, WMIAWTTN);
+
+  iwmmx2_mbuiltin ("wmerge", di_ftype_di_di_int, WMERGE);
+
+  iwmmx_mbuiltin ("tbcstb", v8qi_ftype_char, TBCSTB);
+  iwmmx_mbuiltin ("tbcsth", v4hi_ftype_short, TBCSTH);
+  iwmmx_mbuiltin ("tbcstw", v2si_ftype_int, TBCSTW);
+
 #undef iwmmx_mbuiltin
+#undef iwmmx2_mbuiltin
 }
 
 static void
@@ -21375,6 +21581,10 @@ arm_expand_builtin (tree exp,
   enum machine_mode mode0;
   enum machine_mode mode1;
   enum machine_mode mode2;
+  int opint;
+  int selector;
+  int mask;
+  int imm;
 
   if (fcode >= ARM_BUILTIN_NEON_BASE)
     return arm_expand_neon_builtin (fcode, exp, target);
@@ -21409,6 +21619,24 @@ arm_expand_builtin (tree exp,
 	  error ("selector must be an immediate");
 	  return gen_reg_rtx (tmode);
 	}
+
+      opint = INTVAL (op1);
+      if (fcode == ARM_BUILTIN_TEXTRMSB || fcode == ARM_BUILTIN_TEXTRMUB)
+	{
+	  if (opint > 7 || opint < 0)
+	    error ("the range of selector should be in 0 to 7");
+	}
+      else if (fcode == ARM_BUILTIN_TEXTRMSH || fcode == ARM_BUILTIN_TEXTRMUH)
+	{
+	  if (opint > 3 || opint < 0)
+	    error ("the range of selector should be in 0 to 3");
+	}
+      else /* ARM_BUILTIN_TEXTRMSW || ARM_BUILTIN_TEXTRMUW.  */
+	{
+	  if (opint > 1 || opint < 0)
+	    error ("the range of selector should be in 0 to 1");
+	}
+
       if (target == 0
 	  || GET_MODE (target) != tmode
 	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
@@ -21419,11 +21647,61 @@ arm_expand_builtin (tree exp,
       emit_insn (pat);
       return target;
 
+    case ARM_BUILTIN_WALIGNI:
+      /* If op2 is immediate, call walighi, else call walighr.  */
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      arg2 = CALL_EXPR_ARG (exp, 2);
+      op0 = expand_normal (arg0);
+      op1 = expand_normal (arg1);
+      op2 = expand_normal (arg2);
+      if (GET_CODE (op2) == CONST_INT)
+        {
+	  icode = CODE_FOR_iwmmxt_waligni;
+          tmode = insn_data[icode].operand[0].mode;
+	  mode0 = insn_data[icode].operand[1].mode;
+	  mode1 = insn_data[icode].operand[2].mode;
+	  mode2 = insn_data[icode].operand[3].mode;
+          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
+	    op0 = copy_to_mode_reg (mode0, op0);
+          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
+	    op1 = copy_to_mode_reg (mode1, op1);
+          gcc_assert ((*insn_data[icode].operand[3].predicate) (op2, mode2));
+	  selector = INTVAL (op2);
+	  if (selector > 7 || selector < 0)
+	    error ("the range of selector should be in 0 to 7");
+	}
+      else
+        {
+	  icode = CODE_FOR_iwmmxt_walignr;
+          tmode = insn_data[icode].operand[0].mode;
+	  mode0 = insn_data[icode].operand[1].mode;
+	  mode1 = insn_data[icode].operand[2].mode;
+	  mode2 = insn_data[icode].operand[3].mode;
+          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
+	    op0 = copy_to_mode_reg (mode0, op0);
+          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
+	    op1 = copy_to_mode_reg (mode1, op1);
+          if (!(*insn_data[icode].operand[3].predicate) (op2, mode2))
+	    op2 = copy_to_mode_reg (mode2, op2);
+	}
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target, op0, op1, op2);
+      if (!pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+
     case ARM_BUILTIN_TINSRB:
     case ARM_BUILTIN_TINSRH:
     case ARM_BUILTIN_TINSRW:
+    case ARM_BUILTIN_WMERGE:
       icode = (fcode == ARM_BUILTIN_TINSRB ? CODE_FOR_iwmmxt_tinsrb
 	       : fcode == ARM_BUILTIN_TINSRH ? CODE_FOR_iwmmxt_tinsrh
+	       : fcode == ARM_BUILTIN_WMERGE ? CODE_FOR_iwmmxt_wmerge
 	       : CODE_FOR_iwmmxt_tinsrw);
       arg0 = CALL_EXPR_ARG (exp, 0);
       arg1 = CALL_EXPR_ARG (exp, 1);
@@ -21442,10 +21720,30 @@ arm_expand_builtin (tree exp,
 	op1 = copy_to_mode_reg (mode1, op1);
       if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
 	{
-	  /* @@@ better error message */
 	  error ("selector must be an immediate");
 	  return const0_rtx;
 	}
+      if (icode == CODE_FOR_iwmmxt_wmerge)
+	{
+	  selector = INTVAL (op2);
+	  if (selector > 7 || selector < 0)
+	    error ("the range of selector should be in 0 to 7");
+	}
+      if ((icode == CODE_FOR_iwmmxt_tinsrb)
+	  || (icode == CODE_FOR_iwmmxt_tinsrh)
+	  || (icode == CODE_FOR_iwmmxt_tinsrw))
+        {
+	  mask = 0x01;
+	  selector= INTVAL (op2);
+	  if (icode == CODE_FOR_iwmmxt_tinsrb && (selector < 0 || selector > 7))
+	    error ("the range of selector should be in 0 to 7");
+	  else if (icode == CODE_FOR_iwmmxt_tinsrh && (selector < 0 ||selector > 3))
+	    error ("the range of selector should be in 0 to 3");
+	  else if (icode == CODE_FOR_iwmmxt_tinsrw && (selector < 0 ||selector > 1))
+	    error ("the range of selector should be in 0 to 1");
+	  mask <<= selector;
+	  op2 = gen_rtx_CONST_INT (SImode, mask);
+	}
       if (target == 0
 	  || GET_MODE (target) != tmode
 	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
@@ -21456,19 +21754,42 @@ arm_expand_builtin (tree exp,
       emit_insn (pat);
       return target;
 
-    case ARM_BUILTIN_SETWCX:
+    case ARM_BUILTIN_SETWCGR0:
+    case ARM_BUILTIN_SETWCGR1:
+    case ARM_BUILTIN_SETWCGR2:
+    case ARM_BUILTIN_SETWCGR3:
+      icode = (fcode == ARM_BUILTIN_SETWCGR0 ? CODE_FOR_iwmmxt_setwcgr0
+	       : fcode == ARM_BUILTIN_SETWCGR1 ? CODE_FOR_iwmmxt_setwcgr1
+	       : fcode == ARM_BUILTIN_SETWCGR2 ? CODE_FOR_iwmmxt_setwcgr2
+	       : CODE_FOR_iwmmxt_setwcgr3);
       arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = force_reg (SImode, expand_normal (arg0));
-      op1 = expand_normal (arg1);
-      emit_insn (gen_iwmmxt_tmcr (op1, op0));
+      op0 = expand_normal (arg0);
+      mode0 = insn_data[icode].operand[0].mode;
+      if (!(*insn_data[icode].operand[0].predicate) (op0, mode0))
+        op0 = copy_to_mode_reg (mode0, op0);
+      pat = GEN_FCN (icode) (op0);
+      if (!pat)
+	return 0;
+      emit_insn (pat);
       return 0;
 
-    case ARM_BUILTIN_GETWCX:
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      op0 = expand_normal (arg0);
-      target = gen_reg_rtx (SImode);
-      emit_insn (gen_iwmmxt_tmrc (target, op0));
+    case ARM_BUILTIN_GETWCGR0:
+    case ARM_BUILTIN_GETWCGR1:
+    case ARM_BUILTIN_GETWCGR2:
+    case ARM_BUILTIN_GETWCGR3:
+      icode = (fcode == ARM_BUILTIN_GETWCGR0 ? CODE_FOR_iwmmxt_getwcgr0
+	       : fcode == ARM_BUILTIN_GETWCGR1 ? CODE_FOR_iwmmxt_getwcgr1
+	       : fcode == ARM_BUILTIN_GETWCGR2 ? CODE_FOR_iwmmxt_getwcgr2
+	       : CODE_FOR_iwmmxt_getwcgr3);
+      tmode = insn_data[icode].operand[0].mode;
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+        target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target);
+      if (!pat)
+        return 0;
+      emit_insn (pat);
       return target;
 
     case ARM_BUILTIN_WSHUFH:
@@ -21485,10 +21806,12 @@ arm_expand_builtin (tree exp,
 	op0 = copy_to_mode_reg (mode1, op0);
       if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
 	{
-	  /* @@@ better error message */
 	  error ("mask must be an immediate");
 	  return const0_rtx;
 	}
+      selector = INTVAL (op1);
+      if (selector < 0 || selector > 255)
+	error ("the range of mask should be in 0 to 255");
       if (target == 0
 	  || GET_MODE (target) != tmode
 	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
@@ -21499,10 +21822,18 @@ arm_expand_builtin (tree exp,
       emit_insn (pat);
       return target;
 
-    case ARM_BUILTIN_WSADB:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadb, exp, target);
-    case ARM_BUILTIN_WSADH:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadh, exp, target);
+    case ARM_BUILTIN_WMADDS:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmadds, exp, target);
+    case ARM_BUILTIN_WMADDSX:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsx, exp, target);
+    case ARM_BUILTIN_WMADDSN:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsn, exp, target);
+    case ARM_BUILTIN_WMADDU:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddu, exp, target);
+    case ARM_BUILTIN_WMADDUX:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddux, exp, target);
+    case ARM_BUILTIN_WMADDUN:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddun, exp, target);
     case ARM_BUILTIN_WSADBZ:
       return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target);
     case ARM_BUILTIN_WSADHZ:
@@ -21511,13 +21842,38 @@ arm_expand_builtin (tree exp,
       /* Several three-argument builtins.  */
     case ARM_BUILTIN_WMACS:
     case ARM_BUILTIN_WMACU:
-    case ARM_BUILTIN_WALIGN:
     case ARM_BUILTIN_TMIA:
     case ARM_BUILTIN_TMIAPH:
     case ARM_BUILTIN_TMIATT:
     case ARM_BUILTIN_TMIATB:
     case ARM_BUILTIN_TMIABT:
     case ARM_BUILTIN_TMIABB:
+    case ARM_BUILTIN_WQMIABB:
+    case ARM_BUILTIN_WQMIABT:
+    case ARM_BUILTIN_WQMIATB:
+    case ARM_BUILTIN_WQMIATT:
+    case ARM_BUILTIN_WQMIABBN:
+    case ARM_BUILTIN_WQMIABTN:
+    case ARM_BUILTIN_WQMIATBN:
+    case ARM_BUILTIN_WQMIATTN:
+    case ARM_BUILTIN_WMIABB:
+    case ARM_BUILTIN_WMIABT:
+    case ARM_BUILTIN_WMIATB:
+    case ARM_BUILTIN_WMIATT:
+    case ARM_BUILTIN_WMIABBN:
+    case ARM_BUILTIN_WMIABTN:
+    case ARM_BUILTIN_WMIATBN:
+    case ARM_BUILTIN_WMIATTN:
+    case ARM_BUILTIN_WMIAWBB:
+    case ARM_BUILTIN_WMIAWBT:
+    case ARM_BUILTIN_WMIAWTB:
+    case ARM_BUILTIN_WMIAWTT:
+    case ARM_BUILTIN_WMIAWBBN:
+    case ARM_BUILTIN_WMIAWBTN:
+    case ARM_BUILTIN_WMIAWTBN:
+    case ARM_BUILTIN_WMIAWTTN:
+    case ARM_BUILTIN_WSADB:
+    case ARM_BUILTIN_WSADH:
       icode = (fcode == ARM_BUILTIN_WMACS ? CODE_FOR_iwmmxt_wmacs
 	       : fcode == ARM_BUILTIN_WMACU ? CODE_FOR_iwmmxt_wmacu
 	       : fcode == ARM_BUILTIN_TMIA ? CODE_FOR_iwmmxt_tmia
@@ -21526,7 +21882,32 @@ arm_expand_builtin (tree exp,
 	       : fcode == ARM_BUILTIN_TMIABT ? CODE_FOR_iwmmxt_tmiabt
 	       : fcode == ARM_BUILTIN_TMIATB ? CODE_FOR_iwmmxt_tmiatb
 	       : fcode == ARM_BUILTIN_TMIATT ? CODE_FOR_iwmmxt_tmiatt
-	       : CODE_FOR_iwmmxt_walign);
+	       : fcode == ARM_BUILTIN_WQMIABB ? CODE_FOR_iwmmxt_wqmiabb
+	       : fcode == ARM_BUILTIN_WQMIABT ? CODE_FOR_iwmmxt_wqmiabt
+	       : fcode == ARM_BUILTIN_WQMIATB ? CODE_FOR_iwmmxt_wqmiatb
+	       : fcode == ARM_BUILTIN_WQMIATT ? CODE_FOR_iwmmxt_wqmiatt
+	       : fcode == ARM_BUILTIN_WQMIABBN ? CODE_FOR_iwmmxt_wqmiabbn
+	       : fcode == ARM_BUILTIN_WQMIABTN ? CODE_FOR_iwmmxt_wqmiabtn
+	       : fcode == ARM_BUILTIN_WQMIATBN ? CODE_FOR_iwmmxt_wqmiatbn
+	       : fcode == ARM_BUILTIN_WQMIATTN ? CODE_FOR_iwmmxt_wqmiattn
+	       : fcode == ARM_BUILTIN_WMIABB ? CODE_FOR_iwmmxt_wmiabb
+	       : fcode == ARM_BUILTIN_WMIABT ? CODE_FOR_iwmmxt_wmiabt
+	       : fcode == ARM_BUILTIN_WMIATB ? CODE_FOR_iwmmxt_wmiatb
+	       : fcode == ARM_BUILTIN_WMIATT ? CODE_FOR_iwmmxt_wmiatt
+	       : fcode == ARM_BUILTIN_WMIABBN ? CODE_FOR_iwmmxt_wmiabbn
+	       : fcode == ARM_BUILTIN_WMIABTN ? CODE_FOR_iwmmxt_wmiabtn
+	       : fcode == ARM_BUILTIN_WMIATBN ? CODE_FOR_iwmmxt_wmiatbn
+	       : fcode == ARM_BUILTIN_WMIATTN ? CODE_FOR_iwmmxt_wmiattn
+	       : fcode == ARM_BUILTIN_WMIAWBB ? CODE_FOR_iwmmxt_wmiawbb
+	       : fcode == ARM_BUILTIN_WMIAWBT ? CODE_FOR_iwmmxt_wmiawbt
+	       : fcode == ARM_BUILTIN_WMIAWTB ? CODE_FOR_iwmmxt_wmiawtb
+	       : fcode == ARM_BUILTIN_WMIAWTT ? CODE_FOR_iwmmxt_wmiawtt
+	       : fcode == ARM_BUILTIN_WMIAWBBN ? CODE_FOR_iwmmxt_wmiawbbn
+	       : fcode == ARM_BUILTIN_WMIAWBTN ? CODE_FOR_iwmmxt_wmiawbtn
+	       : fcode == ARM_BUILTIN_WMIAWTBN ? CODE_FOR_iwmmxt_wmiawtbn
+	       : fcode == ARM_BUILTIN_WMIAWTTN ? CODE_FOR_iwmmxt_wmiawttn
+	       : fcode == ARM_BUILTIN_WSADB ? CODE_FOR_iwmmxt_wsadb
+	       : CODE_FOR_iwmmxt_wsadh);
       arg0 = CALL_EXPR_ARG (exp, 0);
       arg1 = CALL_EXPR_ARG (exp, 1);
       arg2 = CALL_EXPR_ARG (exp, 2);
@@ -21559,6 +21940,123 @@ arm_expand_builtin (tree exp,
       emit_insn (gen_iwmmxt_clrdi (target));
       return target;
 
+    case ARM_BUILTIN_WSRLHI:
+    case ARM_BUILTIN_WSRLWI:
+    case ARM_BUILTIN_WSRLDI:
+    case ARM_BUILTIN_WSLLHI:
+    case ARM_BUILTIN_WSLLWI:
+    case ARM_BUILTIN_WSLLDI:
+    case ARM_BUILTIN_WSRAHI:
+    case ARM_BUILTIN_WSRAWI:
+    case ARM_BUILTIN_WSRADI:
+    case ARM_BUILTIN_WRORHI:
+    case ARM_BUILTIN_WRORWI:
+    case ARM_BUILTIN_WRORDI:
+    case ARM_BUILTIN_WSRLH:
+    case ARM_BUILTIN_WSRLW:
+    case ARM_BUILTIN_WSRLD:
+    case ARM_BUILTIN_WSLLH:
+    case ARM_BUILTIN_WSLLW:
+    case ARM_BUILTIN_WSLLD:
+    case ARM_BUILTIN_WSRAH:
+    case ARM_BUILTIN_WSRAW:
+    case ARM_BUILTIN_WSRAD:
+    case ARM_BUILTIN_WRORH:
+    case ARM_BUILTIN_WRORW:
+    case ARM_BUILTIN_WRORD:
+      icode = (fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+	       : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+	       : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+	       : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+	       : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+	       : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+	       : fcode == ARM_BUILTIN_WSLLH  ? CODE_FOR_ashlv4hi3_di
+	       : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+	       : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+	       : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+	       : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+	       : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+	       : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+	       : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+	       : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+	       : CODE_FOR_nothing);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      op1 = expand_normal (arg1);
+      if (GET_MODE (op1) == VOIDmode)
+	{
+	  imm = INTVAL (op1);
+	  if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI
+	       || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
+	      && (imm < 0 || imm > 32))
+	    {
+	      if (fcode == ARM_BUILTIN_WRORHI)
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WRORWI)
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WRORH)
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi16 in code.");
+	      else
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi32 in code.");
+	    }
+	  else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
+		   && (imm < 0 || imm > 64))
+	    {
+	      if (fcode == ARM_BUILTIN_WRORDI)
+		error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_rori_si64 in code.");
+	      else
+		error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_ror_si64 in code.");
+	    }
+	  else if (imm < 0)
+	    {
+	      if (fcode == ARM_BUILTIN_WSRLHI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLWI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLDI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLHI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLWI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLDI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAHI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAWI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRADI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLH)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLW)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLD)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLH)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLW)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLD)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAH)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAW)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi32 in code.");
+	      else
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_si64 in code.");
+	    }
+	}
+      return arm_expand_binop_builtin (icode, exp, target);
+
     case ARM_BUILTIN_THREAD_POINTER:
       return arm_load_tp (target);
 
-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH ARM iWMMXt 5/5] pipeline description
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 3/5] built in define and expand Matt Turner
@ 2012-05-29  4:14 ` Matt Turner
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 1/5] ARM code generic change Matt Turner
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Matt Turner @ 2012-05-29  4:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, Nick Clifton, Paul Brook,
	Xinyu Qi

From: Xinyu Qi <xyqi@marvell.com>

	gcc/
	* config/arm/t-arm (MD_INCLUDES): Add marvell-f-iwmmxt.md.
	* config/arm/marvell-f-iwmmxt.md: New file.
	* config/arm/arm.md (marvell-f-iwmmxt.md): Include.
---
 gcc/config/arm/arm.md              |    1 +
 gcc/config/arm/marvell-f-iwmmxt.md |  179 ++++++++++++++++++++++++++++++++++++
 gcc/config/arm/t-arm               |    1 +
 3 files changed, 181 insertions(+), 0 deletions(-)
 create mode 100644 gcc/config/arm/marvell-f-iwmmxt.md

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b0333c2..baa3b7c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -546,6 +546,7 @@
 	  (const_string "yes")
 	  (const_string "no"))))
 
+(include "marvell-f-iwmmxt.md")
 (include "arm-generic.md")
 (include "arm926ejs.md")
 (include "arm1020e.md")
diff --git a/gcc/config/arm/marvell-f-iwmmxt.md b/gcc/config/arm/marvell-f-iwmmxt.md
new file mode 100644
index 0000000..fe8e455
--- /dev/null
+++ b/gcc/config/arm/marvell-f-iwmmxt.md
@@ -0,0 +1,179 @@
+;; Marvell WMMX2 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;; Written by Marvell, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+(define_automaton "marvell_f_iwmmxt")
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Pipelines
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+;; This is a 7-stage pipelines:
+;;
+;;    MD | MI | ME1 | ME2 | ME3 | ME4 | MW
+;;
+;; There are various bypasses modelled to a greater or lesser extent.
+;;
+;; Latencies in this file correspond to the number of cycles after
+;; the issue stage that it takes for the result of the instruction to
+;; be computed, or for its side-effects to occur.
+
+(define_cpu_unit "mf_iwmmxt_MD" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_MI" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME1" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME2" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME3" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME4" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_MW" "marvell_f_iwmmxt")
+
+(define_reservation "mf_iwmmxt_ME"
+      "mf_iwmmxt_ME1,mf_iwmmxt_ME2,mf_iwmmxt_ME3,mf_iwmmxt_ME4"
+)
+
+(define_reservation "mf_iwmmxt_pipeline"
+      "mf_iwmmxt_MD, mf_iwmmxt_MI, mf_iwmmxt_ME, mf_iwmmxt_MW"
+)
+
+;; An attribute to indicate whether our reservations are applicable.
+(define_attr "marvell_f_iwmmxt" "yes,no"
+  (const (if_then_else (symbol_ref "arm_arch_iwmmxt")
+                       (const_string "yes") (const_string "no"))))
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; instruction classes
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+;; An attribute appended to instructions for classification
+
+(define_attr "wmmxt_shift" "yes,no"
+  (if_then_else (eq_attr "wtype" "wror, wsll, wsra, wsrl")
+		(const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_pack" "yes,no"
+  (if_then_else (eq_attr "wtype" "waligni, walignr, wmerge, wpack, wshufh, wunpckeh, wunpckih, wunpckel, wunpckil")
+		(const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_mult_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "wmac, wmadd, wmiaxy, wmiawxy, wmulw, wqmiaxy, wqmulwm")
+		(const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_mult_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "wmul, wqmulm")
+		(const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "wabs, wabsdiff, wand, wandn, wmov, wor, wxor")
+	        (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "wacc, wadd, waddsubhx, wavg2, wavg4, wcmpeq, wcmpgt, wmax, wmin, wsub, waddbhus, wsubaddhx")
+		(const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c3" "yes,no"
+  (if_then_else (eq_attr "wtype" "wsad")
+	        (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_transfer_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "tbcst, tinsr, tmcr, tmcrr")
+                (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_transfer_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "textrm, tmovmsk, tmrc, tmrrc")
+	        (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_transfer_c3" "yes,no"
+  (if_then_else (eq_attr "wtype" "tmia, tmiaph, tmiaxy")
+	        (const_string "yes") (const_string "no"))
+)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Main description
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn_reservation "marvell_f_iwmmxt_alu_c1" 1
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_alu_c1" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_pack" 1
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_pack" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_shift" 1
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_shift" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_transfer_c1" 1
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_transfer_c1" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_transfer_c2" 5
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_transfer_c2" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_alu_c2" 2
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_alu_c2" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_alu_c3" 3
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_alu_c3" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_transfer_c3" 4
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_transfer_c3" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_mult_c1" 4
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_mult_c1" "yes"))
+  "mf_iwmmxt_pipeline")
+
+;There is a forwarding path from ME3 stage
+(define_insn_reservation "marvell_f_iwmmxt_mult_c2" 3
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wmmxt_mult_c2" "yes"))
+  "mf_iwmmxt_pipeline")
+
+(define_insn_reservation "marvell_f_iwmmxt_wstr" 0
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wtype" "wstr"))
+  "mf_iwmmxt_pipeline")
+
+;There is a forwarding path from MW stage
+(define_insn_reservation "marvell_f_iwmmxt_wldr" 5
+  (and (eq_attr "marvell_f_iwmmxt" "yes")
+       (eq_attr "wtype" "wldr"))
+  "mf_iwmmxt_pipeline")
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 83c18f7..30687e1 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -51,6 +51,7 @@ MD_INCLUDES=	$(srcdir)/config/arm/arm1020e.md \
 		$(srcdir)/config/arm/iwmmxt.md \
 		$(srcdir)/config/arm/iwmmxt2.md \
 		$(srcdir)/config/arm/ldmstm.md \
+		$(srcdir)/config/arm/marvell-f-iwmmxt.md \
 		$(srcdir)/config/arm/neon.md \
 		$(srcdir)/config/arm/predicates.md \
 		$(srcdir)/config/arm/sync.md \
-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH ARM iWMMXt 1/5] ARM code generic change
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 3/5] built in define and expand Matt Turner
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 5/5] pipeline description Matt Turner
@ 2012-05-29  4:14 ` Matt Turner
  2012-06-06 11:53   ` Ramana Radhakrishnan
  2012-05-29  4:15 ` [PATCH ARM iWMMXt 2/5] intrinsic head file change Matt Turner
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Matt Turner @ 2012-05-29  4:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, Nick Clifton, Paul Brook,
	Xinyu Qi

From: Xinyu Qi <xyqi@marvell.com>

	gcc/
	* config/arm/arm.c (FL_IWMMXT2): New define.
	(arm_arch_iwmmxt2): New variable.
	(arm_option_override): Enable use of iWMMXt with VFP.
	Disable use of iWMMXt with NEON. Disable use of iWMMXt under
	Thumb mode. Set arm_arch_iwmmxt2.
	(arm_expand_binop_builtin): Accept VOIDmode op.
	* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define __IWMMXT2__.
	(TARGET_IWMMXT2): New define.
	(TARGET_REALLY_IWMMXT2): Likewise.
	(arm_arch_iwmmxt2): Declare.
	* config/arm/arm-cores.def (iwmmxt2): Add FL_IWMMXT2.
	* config/arm/arm-arches.def (iwmmxt2): Likewise.
	* config/arm/arm.md (arch): Add "iwmmxt2".
	(arch_enabled): Handle "iwmmxt2".
---
 gcc/config/arm/arm-arches.def |    2 +-
 gcc/config/arm/arm-cores.def  |    2 +-
 gcc/config/arm/arm.c          |   25 +++++++++++++++++--------
 gcc/config/arm/arm.h          |    7 +++++++
 gcc/config/arm/arm.md         |    6 +++++-
 5 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 3123426..f4dd6cc 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,4 +57,4 @@ ARM_ARCH("armv7-m", cortexm3,	7M,  FL_CO_PROC |	      FL_FOR_ARCH7M)
 ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |	      FL_FOR_ARCH7EM)
 ARM_ARCH("ep9312",  ep9312,     4T,  FL_LDSCHED | FL_CIRRUS | FL_FOR_ARCH4)
 ARM_ARCH("iwmmxt",  iwmmxt,     5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
-ARM_ARCH("iwmmxt2", iwmmxt2,    5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
+ARM_ARCH("iwmmxt2", iwmmxt2,    5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index d82b10b..c82eada 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -105,7 +105,7 @@ ARM_CORE("arm1020e",      arm1020e,	5TE,				 FL_LDSCHED, fastmul)
 ARM_CORE("arm1022e",      arm1022e,	5TE,				 FL_LDSCHED, fastmul)
 ARM_CORE("xscale",        xscale,	5TE,	                         FL_LDSCHED | FL_STRONG | FL_XSCALE, xscale)
 ARM_CORE("iwmmxt",        iwmmxt,	5TE,	                         FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
-ARM_CORE("iwmmxt2",       iwmmxt2,	5TE,	                         FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
+ARM_CORE("iwmmxt2",       iwmmxt2,	5TE,	                         FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2, xscale)
 ARM_CORE("fa606te",       fa606te,      5TE,                             FL_LDSCHED, 9e)
 ARM_CORE("fa626te",       fa626te,      5TE,                             FL_LDSCHED, 9e)
 ARM_CORE("fmp626",        fmp626,       5TE,                             FL_LDSCHED, 9e)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7a98197..b0680ab 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -685,6 +685,7 @@ static int thumb_call_reg_needed;
 #define FL_ARM_DIV    (1 << 23)	      /* Hardware divide (ARM mode).  */
 
 #define FL_IWMMXT     (1 << 29)	      /* XScale v2 or "Intel Wireless MMX technology".  */
+#define FL_IWMMXT2    (1 << 30)       /* "Intel Wireless MMX2 technology".  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
@@ -766,6 +767,9 @@ int arm_arch_cirrus = 0;
 /* Nonzero if this chip supports Intel Wireless MMX technology.  */
 int arm_arch_iwmmxt = 0;
 
+/* Nonzero if this chip supports Intel Wireless MMX2 technology.  */
+int arm_arch_iwmmxt2 = 0;
+
 /* Nonzero if this chip is an XScale.  */
 int arm_arch_xscale = 0;
 
@@ -1717,6 +1721,7 @@ arm_option_override (void)
   arm_tune_wbuf = (tune_flags & FL_WBUF) != 0;
   arm_tune_xscale = (tune_flags & FL_XSCALE) != 0;
   arm_arch_iwmmxt = (insn_flags & FL_IWMMXT) != 0;
+  arm_arch_iwmmxt2 = (insn_flags & FL_IWMMXT2) != 0;
   arm_arch_thumb_hwdiv = (insn_flags & FL_THUMB_DIV) != 0;
   arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
   arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
@@ -1817,14 +1822,17 @@ arm_option_override (void)
     }
 
   /* FPA and iWMMXt are incompatible because the insn encodings overlap.
-     VFP and iWMMXt can theoretically coexist, but it's unlikely such silicon
-     will ever exist.  GCC makes no attempt to support this combination.  */
-  if (TARGET_IWMMXT && !TARGET_SOFT_FLOAT)
-    sorry ("iWMMXt and hardware floating point");
+     VFP and iWMMXt however can coexist.  */
+  if (TARGET_IWMMXT && TARGET_HARD_FLOAT && !TARGET_VFP)
+    error ("iWMMXt and non-VFP floating point unit are incompatible");
+
+  /* iWMMXt and NEON are incompatible.  */
+  if (TARGET_IWMMXT && TARGET_NEON)
+    error ("iWMMXt and NEON are incompatible");
 
-  /* ??? iWMMXt insn patterns need auditing for Thumb-2.  */
-  if (TARGET_THUMB2 && TARGET_IWMMXT)
-    sorry ("Thumb-2 iWMMXt");
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TARGET_THUMB && TARGET_IWMMXT)
+    error ("iWMMXt unsupported under Thumb mode");
 
   /* __fp16 support currently assumes the core has ldrh.  */
   if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE)
@@ -20867,7 +20875,8 @@ arm_expand_binop_builtin (enum insn_code icode,
       || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
     target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+	      && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
     op0 = copy_to_mode_reg (mode0, op0);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f4204e4..c51bce9 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -97,6 +97,8 @@ extern char arm_arch_name[];
 	  builtin_define ("__XSCALE__");		\
 	if (arm_arch_iwmmxt)				\
 	  builtin_define ("__IWMMXT__");		\
+	if (arm_arch_iwmmxt2)				\
+	  builtin_define ("__IWMMXT2__");		\
 	if (TARGET_AAPCS_BASED)				\
 	  {						\
 	    if (arm_pcs_default == ARM_PCS_AAPCS_VFP)	\
@@ -194,7 +196,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_MAVERICK		(arm_fpu_desc->model == ARM_FP_MODEL_MAVERICK)
 #define TARGET_VFP		(arm_fpu_desc->model == ARM_FP_MODEL_VFP)
 #define TARGET_IWMMXT			(arm_arch_iwmmxt)
+#define TARGET_IWMMXT2			(arm_arch_iwmmxt2)
 #define TARGET_REALLY_IWMMXT		(TARGET_IWMMXT && TARGET_32BIT)
+#define TARGET_REALLY_IWMMXT2		(TARGET_IWMMXT2 && TARGET_32BIT)
 #define TARGET_IWMMXT_ABI (TARGET_32BIT && arm_abi == ARM_ABI_IWMMXT)
 #define TARGET_ARM                      (! TARGET_THUMB)
 #define TARGET_EITHER			1 /* (TARGET_ARM | TARGET_THUMB) */
@@ -410,6 +414,9 @@ extern int arm_arch_cirrus;
 /* Nonzero if this chip supports Intel XScale with Wireless MMX technology.  */
 extern int arm_arch_iwmmxt;
 
+/* Nonzero if this chip supports Intel Wireless MMX2 technology.  */
+extern int arm_arch_iwmmxt2;
+
 /* Nonzero if this chip is an XScale.  */
 extern int arm_arch_xscale;
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index bbf6380..ad9d948 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -197,7 +197,7 @@
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
 ; arm_arch6.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8"
+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -248,6 +248,10 @@
 	 (and (eq_attr "arch" "neon_nota8")
 	      (not (eq_attr "tune" "cortexa8"))
 	      (match_test "TARGET_NEON"))
+	 (const_string "yes")
+
+	 (and (eq_attr "arch" "iwmmxt2")
+	      (match_test "TARGET_REALLY_IWMMXT2"))
 	 (const_string "yes")]
 	(const_string "no")))
 
-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH ARM iWMMXt 4/5] WMMX machine description
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
                   ` (3 preceding siblings ...)
  2012-05-29  4:15 ` [PATCH ARM iWMMXt 2/5] intrinsic head file change Matt Turner
@ 2012-05-29  4:15 ` Matt Turner
  2012-06-06 11:59 ` [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Ramana Radhakrishnan
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Matt Turner @ 2012-05-29  4:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, Nick Clifton, Paul Brook,
	Xinyu Qi

From: Xinyu Qi <xyqi@marvell.com>

	gcc/
	* config/arm/arm.c (arm_output_iwmmxt_shift_immediate): New function.
	(arm_output_iwmmxt_tinsr): Likewise.
	* config/arm/arm-protos.h (arm_output_iwmmxt_shift_immediate): Declare.
	(arm_output_iwmmxt_tinsr): Likewise.
	* config/arm/iwmmxt.md (WCGR0, WCGR1, WCGR2, WCGR3): New constant.
	(iwmmxt_psadbw, iwmmxt_walign, iwmmxt_tmrc, iwmmxt_tmcr): Delete.
	(rorv4hi3, rorv2si3, rordi3): Likewise.
	(rorv4hi3_di, rorv2si3_di, rordi3_di): Likewise.
	(ashrv4hi3_di, ashrv2si3_di, ashrdi3_di): Likewise.
	(lshrv4hi3_di, lshrv2si3_di, lshrdi3_di): Likewise.
	(ashlv4hi3_di, ashlv2si3_di, ashldi3_di): Likewise.
	(iwmmxt_tbcstqi, iwmmxt_tbcsthi, iwmmxt_tbcstsi): Likewise
	(*iwmmxt_clrv8qi, *iwmmxt_clrv4hi, *iwmmxt_clrv2si): Likewise.
	(tbcstv8qi, tbcstv4hi, tbsctv2si): New pattern.
	(iwmmxt_clrv8qi, iwmmxt_clrv4hi, iwmmxt_clrv2si): Likewise.
	(*and<mode>3_iwmmxt, *ior<mode>3_iwmmxt, *xor<mode>3_iwmmxt): Likewise.
	(ror<mode>3, ror<mode>3_di): Likewise.
	(ashr<mode>3_di, lshr<mode>3_di, ashl<mode>3_di): Likewise.
	(ashli<mode>3_iwmmxt, iwmmxt_waligni, iwmmxt_walignr): Likewise.
	(iwmmxt_walignr0, iwmmxt_walignr1): Likewise.
	(iwmmxt_walignr2, iwmmxt_walignr3): Likewise.
	(iwmmxt_setwcgr0, iwmmxt_setwcgr1): Likewise.
	(iwmmxt_setwcgr2, iwmmxt_setwcgr3): Likewise.
	(iwmmxt_getwcgr0, iwmmxt_getwcgr1): Likewise.
	(iwmmxt_getwcgr2, iwmmxt_getwcgr3): Likewise.
	(All instruction patterns): Add wtype attribute.
	(*iwmmxt_arm_movdi, *iwmmxt_movsi_insn): iWMMXt coexist with vfp.
	(iwmmxt_uavgrndv8qi3, iwmmxt_uavgrndv4hi3): Revise the pattern.
	(iwmmxt_uavgv8qi3, iwmmxt_uavgv4hi3): Likewise.
	(ashr<mode>3_iwmmxt, ashl<mode>3_iwmmxt, lshr<mode>3_iwmmxt): Likewise.
	(iwmmxt_tinsrb, iwmmxt_tinsrh, iwmmxt_tinsrw):Likewise.
	(eqv8qi3, eqv4hi3, eqv2si3, gtuv8qi3): Likewise.
	(gtuv4hi3, gtuv2si3, gtv8qi3, gtv4hi3, gtv2si3): Likewise.
	(iwmmxt_wunpckihh, iwmmxt_wunpckihw, iwmmxt_wunpckilh): Likewise.
	(iwmmxt_wunpckilw, iwmmxt_wunpckehub, iwmmxt_wunpckehuh): Likewise.
	(iwmmxt_wunpckehuw, iwmmxt_wunpckehsb, iwmmxt_wunpckehsh): Likewise.
	(iwmmxt_wunpckehsw, iwmmxt_wunpckelub, iwmmxt_wunpckeluh): Likewise.
	(iwmmxt_wunpckeluw, iwmmxt_wunpckelsb, iwmmxt_wunpckelsh): Likewise.
	(iwmmxt_wunpckelsw, iwmmxt_wmadds, iwmmxt_wmaddu): Likewise.
	(iwmmxt_wsadb, iwmmxt_wsadh, iwmmxt_wsadbz, iwmmxt_wsadhz): Likewise.
	(iwmmxt2.md): Include.
	* config/arm/iwmmxt2.md: New file.
	* config/arm/iterators.md (VMMX2): New mode_iterator.
	* config/arm/arm.md (wtype): New attribute.
	(UNSPEC_WMADDS, UNSPEC_WMADDU): Delete.
	(UNSPEC_WALIGNI): New unspec.
	* config/arm/t-arm (MD_INCLUDES): Add iwmmxt2.md.
	* config/arm/predicates.md (imm_or_reg_operand): New predicate.
---
 gcc/config/arm/arm-protos.h  |    2 +
 gcc/config/arm/arm.c         |   89 +++
 gcc/config/arm/arm.md        |    8 +-
 gcc/config/arm/iterators.md  |    2 +
 gcc/config/arm/iwmmxt.md     | 1753 ++++++++++++++++++++++++++----------------
 gcc/config/arm/iwmmxt2.md    |  918 ++++++++++++++++++++++
 gcc/config/arm/predicates.md |    5 +
 gcc/config/arm/t-arm         |    1 +
 8 files changed, 2122 insertions(+), 656 deletions(-)
 create mode 100644 gcc/config/arm/iwmmxt2.md

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 4e6d7bb..955f324 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -159,6 +159,8 @@ extern const char *vfp_output_fstmd (rtx *);
 extern void arm_set_return_address (rtx, rtx);
 extern int arm_eliminable_register (rtx);
 extern const char *arm_output_shift(rtx *, int);
+extern const char *arm_output_iwmmxt_shift_immediate (const char *, rtx *, bool);
+extern const char *arm_output_iwmmxt_tinsr (rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
 extern void arm_expand_compare_and_swap (rtx op[]);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 51eed40..a709f2f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25149,6 +25149,95 @@ arm_output_shift(rtx * operands, int set_flags)
   return "";
 }
 
+/* Output assembly for a WMMX immediate shift instruction.  */
+const char *
+arm_output_iwmmxt_shift_immediate (const char *insn_name, rtx *operands, bool wror_or_wsra)
+{
+  int shift = INTVAL (operands[2]);
+  char templ[50];
+  enum machine_mode opmode = GET_MODE (operands[0]);
+
+  gcc_assert (shift >= 0);
+
+  /* If the shift value in the register versions is > 63 (for D qualifier),
+     31 (for W qualifier) or 15 (for H qualifier).  */
+  if (((opmode == V4HImode) && (shift > 15))
+	|| ((opmode == V2SImode) && (shift > 31))
+	|| ((opmode == DImode) && (shift > 63)))
+  {
+    if (wror_or_wsra)
+      {
+        sprintf (templ, "%s\t%%0, %%1, #%d", insn_name, 32);
+        output_asm_insn (templ, operands);
+        if (opmode == DImode)
+          {
+	    sprintf (templ, "%s\t%%0, %%0, #%d", insn_name, 32);
+	    output_asm_insn (templ, operands);
+          }
+      }
+    else
+      {
+        /* The destination register will contain all zeros.  */
+        sprintf (templ, "wzero\t%%0");
+        output_asm_insn (templ, operands);
+      }
+    return "";
+  }
+
+  if ((opmode == DImode) && (shift > 32))
+    {
+      sprintf (templ, "%s\t%%0, %%1, #%d", insn_name, 32);
+      output_asm_insn (templ, operands);
+      sprintf (templ, "%s\t%%0, %%0, #%d", insn_name, shift - 32);
+      output_asm_insn (templ, operands);
+    }
+  else
+    {
+      sprintf (templ, "%s\t%%0, %%1, #%d", insn_name, shift);
+      output_asm_insn (templ, operands);
+    }
+  return "";
+}
+
+/* Output assembly for a WMMX tinsr instruction.  */
+const char *
+arm_output_iwmmxt_tinsr (rtx *operands)
+{
+  int mask = INTVAL (operands[3]);
+  int i;
+  char templ[50];
+  int units = mode_nunits[GET_MODE (operands[0])];
+  gcc_assert ((mask & (mask - 1)) == 0);
+  for (i = 0; i < units; ++i)
+    {
+      if ((mask & 0x01) == 1)
+        {
+          break;
+        }
+      mask >>= 1;
+    }
+  gcc_assert (i < units);
+  {
+    switch (GET_MODE (operands[0]))
+      {
+      case V8QImode:
+	sprintf (templ, "tinsrb%%?\t%%0, %%2, #%d", i);
+	break;
+      case V4HImode:
+	sprintf (templ, "tinsrh%%?\t%%0, %%2, #%d", i);
+	break;
+      case V2SImode:
+	sprintf (templ, "tinsrw%%?\t%%0, %%2, #%d", i);
+	break;
+      default:
+	gcc_unreachable ();
+	break;
+      }
+    output_asm_insn (templ, operands);
+  }
+  return "";
+}
+
 /* Output a Thumb-1 casesi dispatch sequence.  */
 const char *
 thumb1_output_casesi (rtx *operands)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ad9d948..b0333c2 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -62,6 +62,7 @@
 ;; UNSPEC Usage:
 ;; Note: sin and cos are no-longer used.
 ;; Unspec enumerators for Neon are defined in neon.md.
+;; Unspec enumerators for iwmmxt2 are defined in iwmmxt2.md
 
 (define_c_enum "unspec" [
   UNSPEC_SIN            ; `sin' operation (MODE_FLOAT):
@@ -98,8 +99,7 @@
   UNSPEC_WMACSZ         ; Used by the intrinsic form of the iWMMXt WMACSZ instruction.
   UNSPEC_WMACUZ         ; Used by the intrinsic form of the iWMMXt WMACUZ instruction.
   UNSPEC_CLRDI          ; Used by the intrinsic form of the iWMMXt CLRDI instruction.
-  UNSPEC_WMADDS         ; Used by the intrinsic form of the iWMMXt WMADDS instruction.
-  UNSPEC_WMADDU         ; Used by the intrinsic form of the iWMMXt WMADDU instruction.
+  UNSPEC_WALIGNI        ; Used by the intrinsic form of the iWMMXt WALIGN instruction.
   UNSPEC_TLS            ; A symbol that has been treated properly for TLS usage.
   UNSPEC_PIC_LABEL      ; A label used for PIC access that does not appear in the
                         ; instruction stream.
@@ -366,6 +366,10 @@
 	       (const_string "yes")
 	       (const_string "no")))
 
+; wtype for WMMX insn scheduling purposes.
+(define_attr "wtype"
+        "none,wor,wxor,wand,wandn,wmov,tmcrr,tmrrc,wldr,wstr,tmcr,tmrc,wadd,wsub,wmul,wmac,wavg2,tinsr,textrm,wshufh,wcmpeq,wcmpgt,wmax,wmin,wpack,wunpckih,wunpckil,wunpckeh,wunpckel,wror,wsra,wsrl,wsll,wmadd,tmia,tmiaph,tmiaxy,tbcst,tmovmsk,wacc,waligni,walignr,tandc,textrc,torc,torvsc,wsad,wabs,wabsdiff,waddsubhx,wsubaddhx,wavg4,wmulw,wqmulm,wqmulwm,waddbhus,wqmiaxy,wmiaxy,wmiawxy,wmerge" (const_string "none"))
+
 ; Load scheduling, set from the arm_ld_sched variable
 ; initialized by arm_option_override()
 (define_attr "ldsched" "no,yes" (const (symbol_ref "arm_ld_sched")))
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 1567264..916444c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -45,6 +45,8 @@
 ;; Integer element sizes implemented by IWMMXT.
 (define_mode_iterator VMMX [V2SI V4HI V8QI])
 
+(define_mode_iterator VMMX2 [V4HI V2SI])
+
 ;; Integer element sizes for shifts.
 (define_mode_iterator VSHFT [V4HI V2SI DI])
 
diff --git a/gcc/config/arm/iwmmxt.md b/gcc/config/arm/iwmmxt.md
index bc0b80d..12f4179 100644
--- a/gcc/config/arm/iwmmxt.md
+++ b/gcc/config/arm/iwmmxt.md
@@ -1,4 +1,3 @@
-;; ??? This file needs auditing for thumb2
 ;; Patterns for the Intel Wireless MMX technology architecture.
 ;; Copyright (C) 2003, 2004, 2005, 2007, 2008, 2010
 ;; Free Software Foundation, Inc.
@@ -20,6 +19,41 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
+;; Register numbers
+(define_constants
+  [(WCGR0           43)
+   (WCGR1           44)
+   (WCGR2           45)
+   (WCGR3           46)
+  ]
+)
+
+(define_insn "tbcstv8qi"
+  [(set (match_operand:V8QI                   0 "register_operand" "=y")
+        (vec_duplicate:V8QI (match_operand:QI 1 "s_register_operand" "r")))]
+  "TARGET_REALLY_IWMMXT"
+  "tbcstb%?\\t%0, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tbcst")]
+)
+
+(define_insn "tbcstv4hi"
+  [(set (match_operand:V4HI                   0 "register_operand" "=y")
+        (vec_duplicate:V4HI (match_operand:HI 1 "s_register_operand" "r")))]
+  "TARGET_REALLY_IWMMXT"
+  "tbcsth%?\\t%0, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tbcst")]
+)
+
+(define_insn "tbcstv2si"
+  [(set (match_operand:V2SI                   0 "register_operand" "=y")
+        (vec_duplicate:V2SI (match_operand:SI 1 "s_register_operand" "r")))]
+  "TARGET_REALLY_IWMMXT"
+  "tbcstw%?\\t%0, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tbcst")]
+)
 
 (define_insn "iwmmxt_iordi3"
   [(set (match_operand:DI         0 "register_operand" "=y,?&r,?&r")
@@ -31,7 +65,9 @@
    #
    #"
   [(set_attr "predicable" "yes")
-   (set_attr "length" "4,8,8")])
+   (set_attr "length" "4,8,8")
+   (set_attr "wtype" "wor,none,none")]
+)
 
 (define_insn "iwmmxt_xordi3"
   [(set (match_operand:DI         0 "register_operand" "=y,?&r,?&r")
@@ -43,7 +79,9 @@
    #
    #"
   [(set_attr "predicable" "yes")
-   (set_attr "length" "4,8,8")])
+   (set_attr "length" "4,8,8")
+   (set_attr "wtype" "wxor,none,none")]
+)
 
 (define_insn "iwmmxt_anddi3"
   [(set (match_operand:DI         0 "register_operand" "=y,?&r,?&r")
@@ -55,7 +93,9 @@
    #
    #"
   [(set_attr "predicable" "yes")
-   (set_attr "length" "4,8,8")])
+   (set_attr "length" "4,8,8")
+   (set_attr "wtype" "wand,none,none")]
+)
 
 (define_insn "iwmmxt_nanddi3"
   [(set (match_operand:DI                 0 "register_operand" "=y")
@@ -63,64 +103,96 @@
 		(not:DI (match_operand:DI 2 "register_operand"  "y"))))]
   "TARGET_REALLY_IWMMXT"
   "wandn%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wandn")]
+)
 
 (define_insn "*iwmmxt_arm_movdi"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r, r, m,y,y,yr,y,yrUy")
-	(match_operand:DI 1 "di_operand"              "rIK,mi,r,y,yr,y,yrUy,y"))]
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r, r, r, r, m,y,y,yr,y,yrUy,*w, r,*w,*w, *Uv")
+        (match_operand:DI 1 "di_operand"              "rDa,Db,Dc,mi,r,y,yr,y,yrUy,y, r,*w,*w,*Uvi,*w"))]
   "TARGET_REALLY_IWMMXT
    && (   register_operand (operands[0], DImode)
        || register_operand (operands[1], DImode))"
   "*
-{
   switch (which_alternative)
     {
-    default:
-      return output_move_double (operands, true, NULL);
     case 0:
+    case 1:
+    case 2:
       return \"#\";
-    case 3:
+    case 3: case 4:
+      return output_move_double (operands, true, NULL);
+    case 5:
       return \"wmov%?\\t%0,%1\";
-    case 4:
+    case 6:
       return \"tmcrr%?\\t%0,%Q1,%R1\";
-    case 5:
+    case 7:
       return \"tmrrc%?\\t%Q0,%R0,%1\";
-    case 6:
+    case 8:
       return \"wldrd%?\\t%0,%1\";
-    case 7:
+    case 9:
       return \"wstrd%?\\t%1,%0\";
+    case 10:
+      return \"fmdrr%?\\t%P0, %Q1, %R1\\t%@ int\";
+    case 11:
+      return \"fmrrd%?\\t%Q0, %R0, %P1\\t%@ int\";
+    case 12:
+      if (TARGET_VFP_SINGLE)
+	return \"fcpys%?\\t%0, %1\\t%@ int\;fcpys%?\\t%p0, %p1\\t%@ int\";
+      else
+	return \"fcpyd%?\\t%P0, %P1\\t%@ int\";
+    case 13: case 14:
+      return output_move_vfp (operands);
+    default:
+      gcc_unreachable ();
     }
-}"
-  [(set_attr "length"         "8,8,8,4,4,4,4,4")
-   (set_attr "type"           "*,load1,store2,*,*,*,*,*")
-   (set_attr "pool_range"     "*,1020,*,*,*,*,*,*")
-   (set_attr "neg_pool_range" "*,1012,*,*,*,*,*,*")]
+  "
+  [(set (attr "length") (cond [(eq_attr "alternative" "0,3,4") (const_int 8)
+                              (eq_attr "alternative" "1") (const_int 12)
+                              (eq_attr "alternative" "2") (const_int 16)
+                              (eq_attr "alternative" "12")
+                               (if_then_else
+                                 (eq (symbol_ref "TARGET_VFP_SINGLE") (const_int 1))
+                                 (const_int 8)
+                                 (const_int 4))]
+                              (const_int 4)))
+   (set_attr "type" "*,*,*,load2,store2,*,*,*,*,*,r_2_f,f_2_r,ffarithd,f_loadd,f_stored")
+   (set_attr "arm_pool_range" "*,*,*,1020,*,*,*,*,*,*,*,*,*,1020,*")
+   (set_attr "arm_neg_pool_range" "*,*,*,1008,*,*,*,*,*,*,*,*,*,1008,*")
+   (set_attr "wtype" "*,*,*,*,*,wmov,tmcrr,tmrrc,wldr,wstr,*,*,*,*,*")]
 )
 
 (define_insn "*iwmmxt_movsi_insn"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,rk, m,z,r,?z,Uy,z")
-	(match_operand:SI 1 "general_operand"      "rk, I,K,mi,rk,r,z,Uy,z, z"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk, m,z,r,?z,?Uy,*t, r,*t,*t  ,*Uv")
+	(match_operand:SI 1 "general_operand"      " rk,I,K,j,mi,rk,r,z,Uy,  z, r,*t,*t,*Uvi, *t"))]
   "TARGET_REALLY_IWMMXT
    && (   register_operand (operands[0], SImode)
        || register_operand (operands[1], SImode))"
   "*
    switch (which_alternative)
-   {
-   case 0: return \"mov\\t%0, %1\";
-   case 1: return \"mov\\t%0, %1\";
-   case 2: return \"mvn\\t%0, #%B1\";
-   case 3: return \"ldr\\t%0, %1\";
-   case 4: return \"str\\t%1, %0\";
-   case 5: return \"tmcr\\t%0, %1\";
-   case 6: return \"tmrc\\t%0, %1\";
-   case 7: return arm_output_load_gr (operands);
-   case 8: return \"wstrw\\t%1, %0\";
-   default:return \"wstrw\\t%1, [sp, #-4]!\;wldrw\\t%0, [sp], #4\\t@move CG reg\";
-  }"
-  [(set_attr "type"           "*,*,*,load1,store1,*,*,load1,store1,*")
-   (set_attr "length"         "*,*,*,*,        *,*,*,  16,     *,8")
-   (set_attr "pool_range"     "*,*,*,4096,     *,*,*,1024,     *,*")
-   (set_attr "neg_pool_range" "*,*,*,4084,     *,*,*,   *,  1012,*")
+     {
+     case 0: return \"mov\\t%0, %1\";
+     case 1: return \"mov\\t%0, %1\";
+     case 2: return \"mvn\\t%0, #%B1\";
+     case 3: return \"movw\\t%0, %1\";
+     case 4: return \"ldr\\t%0, %1\";
+     case 5: return \"str\\t%1, %0\";
+     case 6: return \"tmcr\\t%0, %1\";
+     case 7: return \"tmrc\\t%0, %1\";
+     case 8: return arm_output_load_gr (operands);
+     case 9: return \"wstrw\\t%1, %0\";
+     case 10:return \"fmsr\\t%0, %1\";
+     case 11:return \"fmrs\\t%0, %1\";
+     case 12:return \"fcpys\\t%0, %1\\t%@ int\";
+     case 13: case 14:
+       return output_move_vfp (operands);
+     default:
+       gcc_unreachable ();
+     }"
+  [(set_attr "type"           "*,*,*,*,load1,store1,*,*,*,*,r_2_f,f_2_r,fcpys,f_loads,f_stores")
+   (set_attr "length"         "*,*,*,*,*,        *,*,*,  16,     *,*,*,*,*,*")
+   (set_attr "pool_range"     "*,*,*,*,4096,     *,*,*,1024,     *,*,*,*,1020,*")
+   (set_attr "neg_pool_range" "*,*,*,*,4084,     *,*,*,   *,  1012,*,*,*,1008,*")
    ;; Note - the "predicable" attribute is not allowed to have alternatives.
    ;; Since the wSTRw wCx instruction is not predicable, we cannot support
    ;; predicating any of the alternatives in this template.  Instead,
@@ -129,7 +201,8 @@
    ;; Also - we have to pretend that these insns clobber the condition code
    ;; bits as otherwise arm_final_prescan_insn() will try to conditionalize
    ;; them.
-   (set_attr "conds" "clob")]
+   (set_attr "conds" "clob")
+   (set_attr "wtype" "*,*,*,*,*,*,tmcr,tmrc,wldr,wstr,*,*,*,*,*")]
 )
 
 ;; Because iwmmxt_movsi_insn is not predicable, we provide the
@@ -177,19 +250,110 @@
    }"
   [(set_attr "predicable" "yes")
    (set_attr "length"         "4,     4,   4,4,4,8,   8,8")
-   (set_attr "type"           "*,store1,load1,*,*,*,load1,store1")
+   (set_attr "type"           "*,*,*,*,*,*,load1,store1")
    (set_attr "pool_range"     "*,     *, 256,*,*,*, 256,*")
-   (set_attr "neg_pool_range" "*,     *, 244,*,*,*, 244,*")])
+   (set_attr "neg_pool_range" "*,     *, 244,*,*,*, 244,*")
+   (set_attr "wtype"          "wmov,wstr,wldr,tmrrc,tmcrr,*,*,*")]
+)
+
+(define_expand "iwmmxt_setwcgr0"
+  [(set (reg:SI WCGR0)
+	(match_operand:SI 0 "register_operand"  ""))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_expand "iwmmxt_setwcgr1"
+  [(set (reg:SI WCGR1)
+	(match_operand:SI 0 "register_operand"  ""))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_expand "iwmmxt_setwcgr2"
+  [(set (reg:SI WCGR2)
+	(match_operand:SI 0 "register_operand"  ""))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_expand "iwmmxt_setwcgr3"
+  [(set (reg:SI WCGR3)
+	(match_operand:SI 0 "register_operand"  ""))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_expand "iwmmxt_getwcgr0"
+  [(set (match_operand:SI 0 "register_operand"  "")
+        (reg:SI WCGR0))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_expand "iwmmxt_getwcgr1"
+  [(set (match_operand:SI 0 "register_operand"  "")
+        (reg:SI WCGR1))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_expand "iwmmxt_getwcgr2"
+  [(set (match_operand:SI 0 "register_operand"  "")
+        (reg:SI WCGR2))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_expand "iwmmxt_getwcgr3"
+  [(set (match_operand:SI 0 "register_operand"  "")
+        (reg:SI WCGR3))]
+  "TARGET_REALLY_IWMMXT"
+  {}
+)
+
+(define_insn "*and<mode>3_iwmmxt"
+  [(set (match_operand:VMMX           0 "register_operand" "=y")
+        (and:VMMX (match_operand:VMMX 1 "register_operand"  "y")
+	          (match_operand:VMMX 2 "register_operand"  "y")))]
+  "TARGET_REALLY_IWMMXT"
+  "wand\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wand")]
+)
+
+(define_insn "*ior<mode>3_iwmmxt"
+  [(set (match_operand:VMMX           0 "register_operand" "=y")
+        (ior:VMMX (match_operand:VMMX 1 "register_operand"  "y")
+	          (match_operand:VMMX 2 "register_operand"  "y")))]
+  "TARGET_REALLY_IWMMXT"
+  "wor\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wor")]
+)
+
+(define_insn "*xor<mode>3_iwmmxt"
+  [(set (match_operand:VMMX           0 "register_operand" "=y")
+        (xor:VMMX (match_operand:VMMX 1 "register_operand"  "y")
+	          (match_operand:VMMX 2 "register_operand"  "y")))]
+  "TARGET_REALLY_IWMMXT"
+  "wxor\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wxor")]
+)
+
 
 ;; Vector add/subtract
 
 (define_insn "*add<mode>3_iwmmxt"
   [(set (match_operand:VMMX            0 "register_operand" "=y")
-        (plus:VMMX (match_operand:VMMX 1 "register_operand"  "y")
-	           (match_operand:VMMX 2 "register_operand"  "y")))]
+        (plus:VMMX (match_operand:VMMX 1 "register_operand" "y")
+	           (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wadd<MMX_char>%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
 
 (define_insn "ssaddv8qi3"
   [(set (match_operand:V8QI               0 "register_operand" "=y")
@@ -197,7 +361,9 @@
 		      (match_operand:V8QI 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "waddbss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
 
 (define_insn "ssaddv4hi3"
   [(set (match_operand:V4HI               0 "register_operand" "=y")
@@ -205,7 +371,9 @@
 		      (match_operand:V4HI 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "waddhss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
 
 (define_insn "ssaddv2si3"
   [(set (match_operand:V2SI               0 "register_operand" "=y")
@@ -213,7 +381,9 @@
 		      (match_operand:V2SI 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "waddwss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
 
 (define_insn "usaddv8qi3"
   [(set (match_operand:V8QI               0 "register_operand" "=y")
@@ -221,7 +391,9 @@
 		      (match_operand:V8QI 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "waddbus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
 
 (define_insn "usaddv4hi3"
   [(set (match_operand:V4HI               0 "register_operand" "=y")
@@ -229,7 +401,9 @@
 		      (match_operand:V4HI 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "waddhus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
 
 (define_insn "usaddv2si3"
   [(set (match_operand:V2SI               0 "register_operand" "=y")
@@ -237,7 +411,9 @@
 		      (match_operand:V2SI 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "waddwus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
 
 (define_insn "*sub<mode>3_iwmmxt"
   [(set (match_operand:VMMX             0 "register_operand" "=y")
@@ -245,7 +421,9 @@
 		    (match_operand:VMMX 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "wsub<MMX_char>%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsub")]
+)
 
 (define_insn "sssubv8qi3"
   [(set (match_operand:V8QI                0 "register_operand" "=y")
@@ -253,7 +431,9 @@
 		       (match_operand:V8QI 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
   "wsubbss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsub")]
+)
 
 (define_insn "sssubv4hi3"
   [(set (match_operand:V4HI                0 "register_operand" "=y")
@@ -261,7 +441,9 @@
 		       (match_operand:V4HI 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wsubhss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsub")]
+)
 
 (define_insn "sssubv2si3"
   [(set (match_operand:V2SI                0 "register_operand" "=y")
@@ -269,7 +451,9 @@
 		       (match_operand:V2SI 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wsubwss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsub")]
+)
 
 (define_insn "ussubv8qi3"
   [(set (match_operand:V8QI                0 "register_operand" "=y")
@@ -277,7 +461,9 @@
 		       (match_operand:V8QI 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wsubbus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsub")]
+)
 
 (define_insn "ussubv4hi3"
   [(set (match_operand:V4HI                0 "register_operand" "=y")
@@ -285,7 +471,9 @@
 		       (match_operand:V4HI 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wsubhus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsub")]
+)
 
 (define_insn "ussubv2si3"
   [(set (match_operand:V2SI                0 "register_operand" "=y")
@@ -293,7 +481,9 @@
 		       (match_operand:V2SI 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wsubwus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsub")]
+)
 
 (define_insn "*mulv4hi3_iwmmxt"
   [(set (match_operand:V4HI            0 "register_operand" "=y")
@@ -301,63 +491,77 @@
 		   (match_operand:V4HI 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wmulul%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmul")]
+)
 
 (define_insn "smulv4hi3_highpart"
-  [(set (match_operand:V4HI                                0 "register_operand" "=y")
-	(truncate:V4HI
-	 (lshiftrt:V4SI
-	  (mult:V4SI (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
-		     (sign_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
-	  (const_int 16))))]
+  [(set (match_operand:V4HI 0 "register_operand" "=y")
+	  (truncate:V4HI
+	    (lshiftrt:V4SI
+	      (mult:V4SI (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                 (sign_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
+	      (const_int 16))))]
   "TARGET_REALLY_IWMMXT"
   "wmulsm%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmul")]
+)
 
 (define_insn "umulv4hi3_highpart"
-  [(set (match_operand:V4HI                                0 "register_operand" "=y")
-	(truncate:V4HI
-	 (lshiftrt:V4SI
-	  (mult:V4SI (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
-		     (zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
-	  (const_int 16))))]
+  [(set (match_operand:V4HI 0 "register_operand" "=y")
+	  (truncate:V4HI
+	    (lshiftrt:V4SI
+	      (mult:V4SI (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                 (zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
+	      (const_int 16))))]
   "TARGET_REALLY_IWMMXT"
   "wmulum%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmul")]
+)
 
 (define_insn "iwmmxt_wmacs"
   [(set (match_operand:DI               0 "register_operand" "=y")
 	(unspec:DI [(match_operand:DI   1 "register_operand" "0")
-		    (match_operand:V4HI 2 "register_operand" "y")
-		    (match_operand:V4HI 3 "register_operand" "y")] UNSPEC_WMACS))]
+	            (match_operand:V4HI 2 "register_operand" "y")
+	            (match_operand:V4HI 3 "register_operand" "y")] UNSPEC_WMACS))]
   "TARGET_REALLY_IWMMXT"
   "wmacs%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmac")]
+)
 
 (define_insn "iwmmxt_wmacsz"
   [(set (match_operand:DI               0 "register_operand" "=y")
 	(unspec:DI [(match_operand:V4HI 1 "register_operand" "y")
-		    (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WMACSZ))]
+	            (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WMACSZ))]
   "TARGET_REALLY_IWMMXT"
   "wmacsz%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmac")]
+)
 
 (define_insn "iwmmxt_wmacu"
   [(set (match_operand:DI               0 "register_operand" "=y")
 	(unspec:DI [(match_operand:DI   1 "register_operand" "0")
-		    (match_operand:V4HI 2 "register_operand" "y")
-		    (match_operand:V4HI 3 "register_operand" "y")] UNSPEC_WMACU))]
+	            (match_operand:V4HI 2 "register_operand" "y")
+	            (match_operand:V4HI 3 "register_operand" "y")] UNSPEC_WMACU))]
   "TARGET_REALLY_IWMMXT"
   "wmacu%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmac")]
+)
 
 (define_insn "iwmmxt_wmacuz"
   [(set (match_operand:DI               0 "register_operand" "=y")
 	(unspec:DI [(match_operand:V4HI 1 "register_operand" "y")
-		    (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WMACUZ))]
+	            (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WMACUZ))]
   "TARGET_REALLY_IWMMXT"
   "wmacuz%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmac")]
+)
 
 ;; Same as xordi3, but don't show input operands so that we don't think
 ;; they are live.
@@ -366,168 +570,207 @@
         (unspec:DI [(const_int 0)] UNSPEC_CLRDI))]
   "TARGET_REALLY_IWMMXT"
   "wxor%?\\t%0, %0, %0"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wxor")]
+)
 
 ;; Seems like cse likes to generate these, so we have to support them.
 
-(define_insn "*iwmmxt_clrv8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+(define_insn "iwmmxt_clrv8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=y")
         (const_vector:V8QI [(const_int 0) (const_int 0)
 			    (const_int 0) (const_int 0)
 			    (const_int 0) (const_int 0)
 			    (const_int 0) (const_int 0)]))]
   "TARGET_REALLY_IWMMXT"
   "wxor%?\\t%0, %0, %0"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wxor")]
+)
 
-(define_insn "*iwmmxt_clrv4hi"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn "iwmmxt_clrv4hi"
+  [(set (match_operand:V4HI 0 "s_register_operand" "=y")
         (const_vector:V4HI [(const_int 0) (const_int 0)
 			    (const_int 0) (const_int 0)]))]
   "TARGET_REALLY_IWMMXT"
   "wxor%?\\t%0, %0, %0"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wxor")]
+)
 
-(define_insn "*iwmmxt_clrv2si"
+(define_insn "iwmmxt_clrv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y")
         (const_vector:V2SI [(const_int 0) (const_int 0)]))]
   "TARGET_REALLY_IWMMXT"
   "wxor%?\\t%0, %0, %0"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wxor")]
+)
 
 ;; Unsigned averages/sum of absolute differences
 
 (define_insn "iwmmxt_uavgrndv8qi3"
-  [(set (match_operand:V8QI              0 "register_operand" "=y")
-        (ashiftrt:V8QI
-	 (plus:V8QI (plus:V8QI
-		     (match_operand:V8QI 1 "register_operand" "y")
-		     (match_operand:V8QI 2 "register_operand" "y"))
-		    (const_vector:V8QI [(const_int 1)
-					(const_int 1)
-					(const_int 1)
-					(const_int 1)
-					(const_int 1)
-					(const_int 1)
-					(const_int 1)
-					(const_int 1)]))
-	 (const_int 1)))]
+  [(set (match_operand:V8QI                                    0 "register_operand" "=y")
+        (truncate:V8QI
+	  (lshiftrt:V8HI
+	    (plus:V8HI
+	      (plus:V8HI (zero_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+	                 (zero_extend:V8HI (match_operand:V8QI 2 "register_operand" "y")))
+	      (const_vector:V8HI [(const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)]))
+	    (const_int 1))))]
   "TARGET_REALLY_IWMMXT"
   "wavg2br%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wavg2")]
+)
 
 (define_insn "iwmmxt_uavgrndv4hi3"
-  [(set (match_operand:V4HI              0 "register_operand" "=y")
-        (ashiftrt:V4HI
-	 (plus:V4HI (plus:V4HI
-		     (match_operand:V4HI 1 "register_operand" "y")
-		     (match_operand:V4HI 2 "register_operand" "y"))
-		    (const_vector:V4HI [(const_int 1)
-					(const_int 1)
-					(const_int 1)
-					(const_int 1)]))
-	 (const_int 1)))]
+  [(set (match_operand:V4HI                                    0 "register_operand" "=y")
+        (truncate:V4HI
+	  (lshiftrt:V4SI
+            (plus:V4SI
+	      (plus:V4SI (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                 (zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
+	      (const_vector:V4SI [(const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)
+	                          (const_int 1)]))
+	    (const_int 1))))]
   "TARGET_REALLY_IWMMXT"
   "wavg2hr%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wavg2")]
+)
 
 (define_insn "iwmmxt_uavgv8qi3"
-  [(set (match_operand:V8QI                 0 "register_operand" "=y")
-        (ashiftrt:V8QI (plus:V8QI
-			(match_operand:V8QI 1 "register_operand" "y")
-			(match_operand:V8QI 2 "register_operand" "y"))
-		       (const_int 1)))]
+  [(set (match_operand:V8QI                                  0 "register_operand" "=y")
+        (truncate:V8QI
+	  (lshiftrt:V8HI
+	    (plus:V8HI (zero_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+	               (zero_extend:V8HI (match_operand:V8QI 2 "register_operand" "y")))
+	    (const_int 1))))]
   "TARGET_REALLY_IWMMXT"
   "wavg2b%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wavg2")]
+)
 
 (define_insn "iwmmxt_uavgv4hi3"
-  [(set (match_operand:V4HI                 0 "register_operand" "=y")
-        (ashiftrt:V4HI (plus:V4HI
-			(match_operand:V4HI 1 "register_operand" "y")
-			(match_operand:V4HI 2 "register_operand" "y"))
-		       (const_int 1)))]
+  [(set (match_operand:V4HI                                  0 "register_operand" "=y")
+        (truncate:V4HI
+	  (lshiftrt:V4SI
+	    (plus:V4SI (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	               (zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
+	    (const_int 1))))]
   "TARGET_REALLY_IWMMXT"
   "wavg2h%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "iwmmxt_psadbw"
-  [(set (match_operand:V8QI                       0 "register_operand" "=y")
-        (abs:V8QI (minus:V8QI (match_operand:V8QI 1 "register_operand" "y")
-			      (match_operand:V8QI 2 "register_operand" "y"))))]
-  "TARGET_REALLY_IWMMXT"
-  "psadbw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wavg2")]
+)
 
 ;; Insert/extract/shuffle
 
 (define_insn "iwmmxt_tinsrb"
-  [(set (match_operand:V8QI                             0 "register_operand"    "=y")
-        (vec_merge:V8QI (match_operand:V8QI             1 "register_operand"     "0")
-			(vec_duplicate:V8QI
-			 (truncate:QI (match_operand:SI 2 "nonimmediate_operand" "r")))
-			(match_operand:SI               3 "immediate_operand"    "i")))]
+  [(set (match_operand:V8QI                0 "register_operand" "=y")
+        (vec_merge:V8QI
+	  (vec_duplicate:V8QI
+	    (truncate:QI (match_operand:SI 2 "nonimmediate_operand" "r")))
+	  (match_operand:V8QI              1 "register_operand"     "0")
+	  (match_operand:SI                3 "immediate_operand"    "i")))]
   "TARGET_REALLY_IWMMXT"
-  "tinsrb%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  "*
+   {
+     return arm_output_iwmmxt_tinsr (operands);
+   }
+   "
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tinsr")]
+)
 
 (define_insn "iwmmxt_tinsrh"
-  [(set (match_operand:V4HI                             0 "register_operand"    "=y")
-        (vec_merge:V4HI (match_operand:V4HI             1 "register_operand"     "0")
-			(vec_duplicate:V4HI
-			 (truncate:HI (match_operand:SI 2 "nonimmediate_operand" "r")))
-			(match_operand:SI               3 "immediate_operand"    "i")))]
+  [(set (match_operand:V4HI                0 "register_operand"    "=y")
+        (vec_merge:V4HI
+          (vec_duplicate:V4HI
+            (truncate:HI (match_operand:SI 2 "nonimmediate_operand" "r")))
+	  (match_operand:V4HI              1 "register_operand"     "0")
+	  (match_operand:SI                3 "immediate_operand"    "i")))]
   "TARGET_REALLY_IWMMXT"
-  "tinsrh%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  "*
+   {
+     return arm_output_iwmmxt_tinsr (operands);
+   }
+   "
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tinsr")]
+)
 
 (define_insn "iwmmxt_tinsrw"
-  [(set (match_operand:V2SI                 0 "register_operand"    "=y")
-        (vec_merge:V2SI (match_operand:V2SI 1 "register_operand"     "0")
-			(vec_duplicate:V2SI
-			 (match_operand:SI  2 "nonimmediate_operand" "r"))
-			(match_operand:SI   3 "immediate_operand"    "i")))]
+  [(set (match_operand:V2SI   0 "register_operand"    "=y")
+        (vec_merge:V2SI
+          (vec_duplicate:V2SI
+            (match_operand:SI 2 "nonimmediate_operand" "r"))
+          (match_operand:V2SI 1 "register_operand"     "0")
+          (match_operand:SI   3 "immediate_operand"    "i")))]
   "TARGET_REALLY_IWMMXT"
-  "tinsrw%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  "*
+   {
+     return arm_output_iwmmxt_tinsr (operands);
+   }
+   "
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tinsr")]
+)
 
 (define_insn "iwmmxt_textrmub"
-  [(set (match_operand:SI                                  0 "register_operand" "=r")
-        (zero_extend:SI (vec_select:QI (match_operand:V8QI 1 "register_operand" "y")
-				       (parallel
-					[(match_operand:SI 2 "immediate_operand" "i")]))))]
+  [(set (match_operand:SI                                   0 "register_operand" "=r")
+        (zero_extend:SI (vec_select:QI (match_operand:V8QI  1 "register_operand" "y")
+		                       (parallel
+				         [(match_operand:SI 2 "immediate_operand" "i")]))))]
   "TARGET_REALLY_IWMMXT"
   "textrmub%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "textrm")]
+)
 
 (define_insn "iwmmxt_textrmsb"
-  [(set (match_operand:SI                                  0 "register_operand" "=r")
-        (sign_extend:SI (vec_select:QI (match_operand:V8QI 1 "register_operand" "y")
+  [(set (match_operand:SI                                   0 "register_operand" "=r")
+        (sign_extend:SI (vec_select:QI (match_operand:V8QI  1 "register_operand" "y")
 				       (parallel
-					[(match_operand:SI 2 "immediate_operand" "i")]))))]
+				         [(match_operand:SI 2 "immediate_operand" "i")]))))]
   "TARGET_REALLY_IWMMXT"
   "textrmsb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "textrm")]
+)
 
 (define_insn "iwmmxt_textrmuh"
-  [(set (match_operand:SI                                  0 "register_operand" "=r")
-        (zero_extend:SI (vec_select:HI (match_operand:V4HI 1 "register_operand" "y")
+  [(set (match_operand:SI                                   0 "register_operand" "=r")
+        (zero_extend:SI (vec_select:HI (match_operand:V4HI  1 "register_operand" "y")
 				       (parallel
-					[(match_operand:SI 2 "immediate_operand" "i")]))))]
+				         [(match_operand:SI 2 "immediate_operand" "i")]))))]
   "TARGET_REALLY_IWMMXT"
   "textrmuh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "textrm")]
+)
 
 (define_insn "iwmmxt_textrmsh"
-  [(set (match_operand:SI                                  0 "register_operand" "=r")
-        (sign_extend:SI (vec_select:HI (match_operand:V4HI 1 "register_operand" "y")
+  [(set (match_operand:SI                                   0 "register_operand" "=r")
+        (sign_extend:SI (vec_select:HI (match_operand:V4HI  1 "register_operand" "y")
 				       (parallel
-					[(match_operand:SI 2 "immediate_operand" "i")]))))]
+				         [(match_operand:SI 2 "immediate_operand" "i")]))))]
   "TARGET_REALLY_IWMMXT"
   "textrmsh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "textrm")]
+)
 
 ;; There are signed/unsigned variants of this instruction, but they are
 ;; pointless.
@@ -537,7 +780,9 @@
 		       (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
   "TARGET_REALLY_IWMMXT"
   "textrmsw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "textrm")]
+)
 
 (define_insn "iwmmxt_wshufh"
   [(set (match_operand:V4HI               0 "register_operand" "=y")
@@ -545,7 +790,9 @@
 		      (match_operand:SI   2 "immediate_operand" "i")] UNSPEC_WSHUFH))]
   "TARGET_REALLY_IWMMXT"
   "wshufh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wshufh")]
+)
 
 ;; Mask-generating comparisons
 ;;
@@ -557,92 +804,106 @@
 ;; into the entire destination vector, (with the '1' going into the least
 ;; significant element of the vector).  This is not how these instructions
 ;; behave.
-;;
-;; Unfortunately the current patterns are illegal.  They are SET insns
-;; without a SET in them.  They work in most cases for ordinary code
-;; generation, but there are circumstances where they can cause gcc to fail.
-;; XXX - FIXME.
 
 (define_insn "eqv8qi3"
-  [(unspec_volatile [(match_operand:V8QI 0 "register_operand" "=y")
-		     (match_operand:V8QI 1 "register_operand"  "y")
-		     (match_operand:V8QI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_EQ)]
+  [(set (match_operand:V8QI                        0 "register_operand" "=y")
+	(unspec_volatile:V8QI [(match_operand:V8QI 1 "register_operand"  "y")
+	                       (match_operand:V8QI 2 "register_operand"  "y")]
+	                      VUNSPEC_WCMP_EQ))]
   "TARGET_REALLY_IWMMXT"
   "wcmpeqb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpeq")]
+)
 
 (define_insn "eqv4hi3"
-  [(unspec_volatile [(match_operand:V4HI 0 "register_operand" "=y")
-		     (match_operand:V4HI 1 "register_operand"  "y")
-		     (match_operand:V4HI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_EQ)]
+  [(set (match_operand:V4HI                        0 "register_operand" "=y")
+	(unspec_volatile:V4HI [(match_operand:V4HI 1 "register_operand"  "y")
+		               (match_operand:V4HI 2 "register_operand"  "y")]
+	                       VUNSPEC_WCMP_EQ))]
   "TARGET_REALLY_IWMMXT"
   "wcmpeqh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpeq")]
+)
 
 (define_insn "eqv2si3"
-  [(unspec_volatile:V2SI [(match_operand:V2SI 0 "register_operand" "=y")
-			  (match_operand:V2SI 1 "register_operand"  "y")
-			  (match_operand:V2SI 2 "register_operand"  "y")]
-			 VUNSPEC_WCMP_EQ)]
+  [(set (match_operand:V2SI    0 "register_operand" "=y")
+	(unspec_volatile:V2SI
+	  [(match_operand:V2SI 1 "register_operand"  "y")
+	   (match_operand:V2SI 2 "register_operand"  "y")]
+           VUNSPEC_WCMP_EQ))]
   "TARGET_REALLY_IWMMXT"
   "wcmpeqw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpeq")]
+)
 
 (define_insn "gtuv8qi3"
-  [(unspec_volatile [(match_operand:V8QI 0 "register_operand" "=y")
-		     (match_operand:V8QI 1 "register_operand"  "y")
-		     (match_operand:V8QI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_GTU)]
+  [(set (match_operand:V8QI                        0 "register_operand" "=y")
+	(unspec_volatile:V8QI [(match_operand:V8QI 1 "register_operand"  "y")
+	                       (match_operand:V8QI 2 "register_operand"  "y")]
+	                       VUNSPEC_WCMP_GTU))]
   "TARGET_REALLY_IWMMXT"
   "wcmpgtub%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpgt")]
+)
 
 (define_insn "gtuv4hi3"
-  [(unspec_volatile [(match_operand:V4HI 0 "register_operand" "=y")
-		     (match_operand:V4HI 1 "register_operand"  "y")
-		     (match_operand:V4HI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_GTU)]
+  [(set (match_operand:V4HI                        0 "register_operand" "=y")
+        (unspec_volatile:V4HI [(match_operand:V4HI 1 "register_operand"  "y")
+                               (match_operand:V4HI 2 "register_operand"  "y")]
+                               VUNSPEC_WCMP_GTU))]
   "TARGET_REALLY_IWMMXT"
   "wcmpgtuh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpgt")]
+)
 
 (define_insn "gtuv2si3"
-  [(unspec_volatile [(match_operand:V2SI 0 "register_operand" "=y")
-		     (match_operand:V2SI 1 "register_operand"  "y")
-		     (match_operand:V2SI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_GTU)]
+  [(set (match_operand:V2SI                        0 "register_operand" "=y")
+	(unspec_volatile:V2SI [(match_operand:V2SI 1 "register_operand"  "y")
+	                       (match_operand:V2SI 2 "register_operand"  "y")]
+	                       VUNSPEC_WCMP_GTU))]
   "TARGET_REALLY_IWMMXT"
   "wcmpgtuw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpgt")]
+)
 
 (define_insn "gtv8qi3"
-  [(unspec_volatile [(match_operand:V8QI 0 "register_operand" "=y")
-		     (match_operand:V8QI 1 "register_operand"  "y")
-		     (match_operand:V8QI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_GT)]
+  [(set (match_operand:V8QI                        0 "register_operand" "=y")
+	(unspec_volatile:V8QI [(match_operand:V8QI 1 "register_operand"  "y")
+	                       (match_operand:V8QI 2 "register_operand"  "y")]
+	                       VUNSPEC_WCMP_GT))]
   "TARGET_REALLY_IWMMXT"
   "wcmpgtsb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpgt")]
+)
 
 (define_insn "gtv4hi3"
-  [(unspec_volatile [(match_operand:V4HI 0 "register_operand" "=y")
-		     (match_operand:V4HI 1 "register_operand"  "y")
-		     (match_operand:V4HI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_GT)]
+  [(set (match_operand:V4HI                        0 "register_operand" "=y")
+	(unspec_volatile:V4HI [(match_operand:V4HI 1 "register_operand"  "y")
+	                       (match_operand:V4HI 2 "register_operand"  "y")]
+	                       VUNSPEC_WCMP_GT))]
   "TARGET_REALLY_IWMMXT"
   "wcmpgtsh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpgt")]
+)
 
 (define_insn "gtv2si3"
-  [(unspec_volatile [(match_operand:V2SI 0 "register_operand" "=y")
-		     (match_operand:V2SI 1 "register_operand"  "y")
-		     (match_operand:V2SI 2 "register_operand"  "y")]
-		    VUNSPEC_WCMP_GT)]
+  [(set (match_operand:V2SI                        0 "register_operand" "=y")
+	(unspec_volatile:V2SI [(match_operand:V2SI 1 "register_operand"  "y")
+	                       (match_operand:V2SI 2 "register_operand"  "y")]
+	                       VUNSPEC_WCMP_GT))]
   "TARGET_REALLY_IWMMXT"
   "wcmpgtsw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wcmpgt")]
+)
 
 ;; Max/min insns
 
@@ -652,7 +913,9 @@
 		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wmaxs<MMX_char>%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmax")]
+)
 
 (define_insn "*umax<mode>3_iwmmxt"
   [(set (match_operand:VMMX            0 "register_operand" "=y")
@@ -660,7 +923,9 @@
 		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wmaxu<MMX_char>%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmax")]
+)
 
 (define_insn "*smin<mode>3_iwmmxt"
   [(set (match_operand:VMMX            0 "register_operand" "=y")
@@ -668,7 +933,9 @@
 		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wmins<MMX_char>%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmin")]
+)
 
 (define_insn "*umin<mode>3_iwmmxt"
   [(set (match_operand:VMMX            0 "register_operand" "=y")
@@ -676,657 +943,835 @@
 		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
   "wminu<MMX_char>%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmin")]
+)
 
 ;; Pack/unpack insns.
 
 (define_insn "iwmmxt_wpackhss"
-  [(set (match_operand:V8QI                    0 "register_operand" "=y")
+  [(set (match_operand:V8QI                     0 "register_operand" "=y")
 	(vec_concat:V8QI
-	 (ss_truncate:V4QI (match_operand:V4HI 1 "register_operand" "y"))
-	 (ss_truncate:V4QI (match_operand:V4HI 2 "register_operand" "y"))))]
+	  (ss_truncate:V4QI (match_operand:V4HI 1 "register_operand" "y"))
+	  (ss_truncate:V4QI (match_operand:V4HI 2 "register_operand" "y"))))]
   "TARGET_REALLY_IWMMXT"
   "wpackhss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wpack")]
+)
 
 (define_insn "iwmmxt_wpackwss"
-  [(set (match_operand:V4HI                    0 "register_operand" "=y")
-	(vec_concat:V4HI
-	 (ss_truncate:V2HI (match_operand:V2SI 1 "register_operand" "y"))
-	 (ss_truncate:V2HI (match_operand:V2SI 2 "register_operand" "y"))))]
+  [(set (match_operand:V4HI                     0 "register_operand" "=y")
+        (vec_concat:V4HI
+	  (ss_truncate:V2HI (match_operand:V2SI 1 "register_operand" "y"))
+	  (ss_truncate:V2HI (match_operand:V2SI 2 "register_operand" "y"))))]
   "TARGET_REALLY_IWMMXT"
   "wpackwss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wpack")]
+)
 
 (define_insn "iwmmxt_wpackdss"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
+  [(set (match_operand:V2SI                 0 "register_operand" "=y")
 	(vec_concat:V2SI
-	 (ss_truncate:SI (match_operand:DI 1 "register_operand" "y"))
-	 (ss_truncate:SI (match_operand:DI 2 "register_operand" "y"))))]
+	  (ss_truncate:SI (match_operand:DI 1 "register_operand" "y"))
+	  (ss_truncate:SI (match_operand:DI 2 "register_operand" "y"))))]
   "TARGET_REALLY_IWMMXT"
   "wpackdss%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wpack")]
+)
 
 (define_insn "iwmmxt_wpackhus"
-  [(set (match_operand:V8QI                    0 "register_operand" "=y")
+  [(set (match_operand:V8QI                     0 "register_operand" "=y")
 	(vec_concat:V8QI
-	 (us_truncate:V4QI (match_operand:V4HI 1 "register_operand" "y"))
-	 (us_truncate:V4QI (match_operand:V4HI 2 "register_operand" "y"))))]
+	  (us_truncate:V4QI (match_operand:V4HI 1 "register_operand" "y"))
+	  (us_truncate:V4QI (match_operand:V4HI 2 "register_operand" "y"))))]
   "TARGET_REALLY_IWMMXT"
   "wpackhus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wpack")]
+)
 
 (define_insn "iwmmxt_wpackwus"
-  [(set (match_operand:V4HI                    0 "register_operand" "=y")
+  [(set (match_operand:V4HI                     0 "register_operand" "=y")
 	(vec_concat:V4HI
-	 (us_truncate:V2HI (match_operand:V2SI 1 "register_operand" "y"))
-	 (us_truncate:V2HI (match_operand:V2SI 2 "register_operand" "y"))))]
+	  (us_truncate:V2HI (match_operand:V2SI 1 "register_operand" "y"))
+	  (us_truncate:V2HI (match_operand:V2SI 2 "register_operand" "y"))))]
   "TARGET_REALLY_IWMMXT"
   "wpackwus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wpack")]
+)
 
 (define_insn "iwmmxt_wpackdus"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
+  [(set (match_operand:V2SI                 0 "register_operand" "=y")
 	(vec_concat:V2SI
-	 (us_truncate:SI (match_operand:DI 1 "register_operand" "y"))
-	 (us_truncate:SI (match_operand:DI 2 "register_operand" "y"))))]
+	  (us_truncate:SI (match_operand:DI 1 "register_operand" "y"))
+	  (us_truncate:SI (match_operand:DI 2 "register_operand" "y"))))]
   "TARGET_REALLY_IWMMXT"
   "wpackdus%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wpack")]
+)
 
 (define_insn "iwmmxt_wunpckihb"
-  [(set (match_operand:V8QI                   0 "register_operand" "=y")
+  [(set (match_operand:V8QI                                      0 "register_operand" "=y")
 	(vec_merge:V8QI
-	 (vec_select:V8QI (match_operand:V8QI 1 "register_operand" "y")
-			  (parallel [(const_int 4)
-				     (const_int 0)
-				     (const_int 5)
-				     (const_int 1)
-				     (const_int 6)
-				     (const_int 2)
-				     (const_int 7)
-				     (const_int 3)]))
-	 (vec_select:V8QI (match_operand:V8QI 2 "register_operand" "y")
-			  (parallel [(const_int 0)
-				     (const_int 4)
-				     (const_int 1)
-				     (const_int 5)
-				     (const_int 2)
-				     (const_int 6)
-				     (const_int 3)
-				     (const_int 7)]))
-	 (const_int 85)))]
+	  (vec_select:V8QI (match_operand:V8QI 1 "register_operand" "y")
+		           (parallel [(const_int 4)
+			              (const_int 0)
+			              (const_int 5)
+			              (const_int 1)
+			              (const_int 6)
+			              (const_int 2)
+			              (const_int 7)
+			              (const_int 3)]))
+          (vec_select:V8QI (match_operand:V8QI 2 "register_operand" "y")
+			   (parallel [(const_int 0)
+			              (const_int 4)
+			              (const_int 1)
+			              (const_int 5)
+			              (const_int 2)
+			              (const_int 6)
+			              (const_int 3)
+			              (const_int 7)]))
+          (const_int 85)))]
   "TARGET_REALLY_IWMMXT"
   "wunpckihb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckih")]
+)
 
 (define_insn "iwmmxt_wunpckihh"
-  [(set (match_operand:V4HI                   0 "register_operand" "=y")
+  [(set (match_operand:V4HI                                      0 "register_operand" "=y")
 	(vec_merge:V4HI
-	 (vec_select:V4HI (match_operand:V4HI 1 "register_operand" "y")
-			  (parallel [(const_int 0)
-				     (const_int 2)
-				     (const_int 1)
-				     (const_int 3)]))
-	 (vec_select:V4HI (match_operand:V4HI 2 "register_operand" "y")
-			  (parallel [(const_int 2)
-				     (const_int 0)
-				     (const_int 3)
-				     (const_int 1)]))
-	 (const_int 5)))]
+	  (vec_select:V4HI (match_operand:V4HI 1 "register_operand" "y")
+		           (parallel [(const_int 2)
+			              (const_int 0)
+			              (const_int 3)
+			              (const_int 1)]))
+	  (vec_select:V4HI (match_operand:V4HI 2 "register_operand" "y")
+		           (parallel [(const_int 0)
+			              (const_int 2)
+			              (const_int 1)
+			              (const_int 3)]))
+          (const_int 5)))]
   "TARGET_REALLY_IWMMXT"
   "wunpckihh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckih")]
+)
 
 (define_insn "iwmmxt_wunpckihw"
-  [(set (match_operand:V2SI                   0 "register_operand" "=y")
+  [(set (match_operand:V2SI                    0 "register_operand" "=y")
 	(vec_merge:V2SI
-	 (vec_select:V2SI (match_operand:V2SI 1 "register_operand" "y")
-			  (parallel [(const_int 0)
-				     (const_int 1)]))
-	 (vec_select:V2SI (match_operand:V2SI 2 "register_operand" "y")
-			  (parallel [(const_int 1)
-				     (const_int 0)]))
-	 (const_int 1)))]
+	  (vec_select:V2SI (match_operand:V2SI 1 "register_operand" "y")
+		           (parallel [(const_int 1)
+		                      (const_int 0)]))
+          (vec_select:V2SI (match_operand:V2SI 2 "register_operand" "y")
+		           (parallel [(const_int 0)
+			              (const_int 1)]))
+          (const_int 1)))]
   "TARGET_REALLY_IWMMXT"
   "wunpckihw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckih")]
+)
 
 (define_insn "iwmmxt_wunpckilb"
-  [(set (match_operand:V8QI                   0 "register_operand" "=y")
+  [(set (match_operand:V8QI                                      0 "register_operand" "=y")
 	(vec_merge:V8QI
-	 (vec_select:V8QI (match_operand:V8QI 1 "register_operand" "y")
-			  (parallel [(const_int 0)
-				     (const_int 4)
-				     (const_int 1)
-				     (const_int 5)
-				     (const_int 2)
-				     (const_int 6)
-				     (const_int 3)
-				     (const_int 7)]))
-	 (vec_select:V8QI (match_operand:V8QI 2 "register_operand" "y")
-			  (parallel [(const_int 4)
-				     (const_int 0)
-				     (const_int 5)
-				     (const_int 1)
-				     (const_int 6)
-				     (const_int 2)
-				     (const_int 7)
-				     (const_int 3)]))
-	 (const_int 85)))]
+	  (vec_select:V8QI (match_operand:V8QI 1 "register_operand" "y")
+		           (parallel [(const_int 0)
+			              (const_int 4)
+			              (const_int 1)
+			              (const_int 5)
+		                      (const_int 2)
+				      (const_int 6)
+				      (const_int 3)
+				      (const_int 7)]))
+	  (vec_select:V8QI (match_operand:V8QI 2 "register_operand" "y")
+		           (parallel [(const_int 4)
+			              (const_int 0)
+			              (const_int 5)
+			              (const_int 1)
+			              (const_int 6)
+			              (const_int 2)
+			              (const_int 7)
+			              (const_int 3)]))
+	  (const_int 85)))]
   "TARGET_REALLY_IWMMXT"
   "wunpckilb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckil")]
+)
 
 (define_insn "iwmmxt_wunpckilh"
-  [(set (match_operand:V4HI                   0 "register_operand" "=y")
+  [(set (match_operand:V4HI                                      0 "register_operand" "=y")
 	(vec_merge:V4HI
-	 (vec_select:V4HI (match_operand:V4HI 1 "register_operand" "y")
-			  (parallel [(const_int 2)
-				     (const_int 0)
-				     (const_int 3)
-				     (const_int 1)]))
-	 (vec_select:V4HI (match_operand:V4HI 2 "register_operand" "y")
-			  (parallel [(const_int 0)
-				     (const_int 2)
-				     (const_int 1)
-				     (const_int 3)]))
-	 (const_int 5)))]
+	  (vec_select:V4HI (match_operand:V4HI 1 "register_operand" "y")
+		           (parallel [(const_int 0)
+			              (const_int 2)
+			              (const_int 1)
+			              (const_int 3)]))
+	  (vec_select:V4HI (match_operand:V4HI 2 "register_operand" "y")
+			   (parallel [(const_int 2)
+			              (const_int 0)
+			              (const_int 3)
+			              (const_int 1)]))
+	  (const_int 5)))]
   "TARGET_REALLY_IWMMXT"
   "wunpckilh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckil")]
+)
 
 (define_insn "iwmmxt_wunpckilw"
-  [(set (match_operand:V2SI                   0 "register_operand" "=y")
+  [(set (match_operand:V2SI                    0 "register_operand" "=y")
 	(vec_merge:V2SI
-	 (vec_select:V2SI (match_operand:V2SI 1 "register_operand" "y")
-			   (parallel [(const_int 1)
-				      (const_int 0)]))
-	 (vec_select:V2SI (match_operand:V2SI 2 "register_operand" "y")
-			  (parallel [(const_int 0)
-				     (const_int 1)]))
-	 (const_int 1)))]
+	  (vec_select:V2SI (match_operand:V2SI 1 "register_operand" "y")
+		           (parallel [(const_int 0)
+				      (const_int 1)]))
+	  (vec_select:V2SI (match_operand:V2SI 2 "register_operand" "y")
+		           (parallel [(const_int 1)
+			              (const_int 0)]))
+	  (const_int 1)))]
   "TARGET_REALLY_IWMMXT"
   "wunpckilw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckil")]
+)
 
 (define_insn "iwmmxt_wunpckehub"
-  [(set (match_operand:V4HI                   0 "register_operand" "=y")
-	(zero_extend:V4HI
-	 (vec_select:V4QI (match_operand:V8QI 1 "register_operand" "y")
-			  (parallel [(const_int 4) (const_int 5)
-				     (const_int 6) (const_int 7)]))))]
+  [(set (match_operand:V4HI                     0 "register_operand" "=y")
+	(vec_select:V4HI
+	  (zero_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+	  (parallel [(const_int 4) (const_int 5)
+	             (const_int 6) (const_int 7)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckehub%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckeh")]
+)
 
 (define_insn "iwmmxt_wunpckehuh"
-  [(set (match_operand:V2SI                   0 "register_operand" "=y")
-	(zero_extend:V2SI
-	 (vec_select:V2HI (match_operand:V4HI 1 "register_operand" "y")
-			  (parallel [(const_int 2) (const_int 3)]))))]
+  [(set (match_operand:V2SI                     0 "register_operand" "=y")
+	(vec_select:V2SI
+	  (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	  (parallel [(const_int 2) (const_int 3)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckehuh%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckeh")]
+)
 
 (define_insn "iwmmxt_wunpckehuw"
-  [(set (match_operand:DI                   0 "register_operand" "=y")
-	(zero_extend:DI
-	 (vec_select:SI (match_operand:V2SI 1 "register_operand" "y")
-			(parallel [(const_int 1)]))))]
+  [(set (match_operand:DI                       0 "register_operand" "=y")
+	(vec_select:DI
+	  (zero_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+	  (parallel [(const_int 1)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckehuw%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckeh")]
+)
 
 (define_insn "iwmmxt_wunpckehsb"
-  [(set (match_operand:V4HI                   0 "register_operand" "=y")
-	(sign_extend:V4HI
-	 (vec_select:V4QI (match_operand:V8QI 1 "register_operand" "y")
-			  (parallel [(const_int 4) (const_int 5)
-				     (const_int 6) (const_int 7)]))))]
+  [(set (match_operand:V4HI                     0 "register_operand" "=y")
+        (vec_select:V4HI
+	  (sign_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+	  (parallel [(const_int 4) (const_int 5)
+	             (const_int 6) (const_int 7)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckehsb%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckeh")]
+)
 
 (define_insn "iwmmxt_wunpckehsh"
-  [(set (match_operand:V2SI                   0 "register_operand" "=y")
-	(sign_extend:V2SI
-	 (vec_select:V2HI (match_operand:V4HI 1 "register_operand" "y")
-			  (parallel [(const_int 2) (const_int 3)]))))]
+  [(set (match_operand:V2SI                     0 "register_operand" "=y")
+	(vec_select:V2SI
+	  (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	  (parallel [(const_int 2) (const_int 3)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckehsh%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckeh")]
+)
 
 (define_insn "iwmmxt_wunpckehsw"
-  [(set (match_operand:DI                   0 "register_operand" "=y")
-	(sign_extend:DI
-	 (vec_select:SI (match_operand:V2SI 1 "register_operand" "y")
-			(parallel [(const_int 1)]))))]
+  [(set (match_operand:DI                       0 "register_operand" "=y")
+	(vec_select:DI
+	  (sign_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+	  (parallel [(const_int 1)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckehsw%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckeh")]
+)
 
 (define_insn "iwmmxt_wunpckelub"
-  [(set (match_operand:V4HI                   0 "register_operand" "=y")
-	(zero_extend:V4HI
-	 (vec_select:V4QI (match_operand:V8QI 1 "register_operand" "y")
-			  (parallel [(const_int 0) (const_int 1)
-				     (const_int 2) (const_int 3)]))))]
+  [(set (match_operand:V4HI                     0 "register_operand" "=y")
+	(vec_select:V4HI
+	  (zero_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+	  (parallel [(const_int 0) (const_int 1)
+		     (const_int 2) (const_int 3)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckelub%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckel")]
+)
 
 (define_insn "iwmmxt_wunpckeluh"
-  [(set (match_operand:V2SI                   0 "register_operand" "=y")
-	(zero_extend:V2SI
-	 (vec_select:V2HI (match_operand:V4HI 1 "register_operand" "y")
-			  (parallel [(const_int 0) (const_int 1)]))))]
+  [(set (match_operand:V2SI                     0 "register_operand" "=y")
+	(vec_select:V2SI
+	  (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckeluh%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckel")]
+)
 
 (define_insn "iwmmxt_wunpckeluw"
-  [(set (match_operand:DI                   0 "register_operand" "=y")
-	(zero_extend:DI
-	 (vec_select:SI (match_operand:V2SI 1 "register_operand" "y")
-			(parallel [(const_int 0)]))))]
+  [(set (match_operand:DI                       0 "register_operand" "=y")
+	(vec_select:DI
+	  (zero_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+	  (parallel [(const_int 0)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckeluw%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckel")]
+)
 
 (define_insn "iwmmxt_wunpckelsb"
-  [(set (match_operand:V4HI                   0 "register_operand" "=y")
-	(sign_extend:V4HI
-	 (vec_select:V4QI (match_operand:V8QI 1 "register_operand" "y")
-			  (parallel [(const_int 0) (const_int 1)
-				     (const_int 2) (const_int 3)]))))]
+  [(set (match_operand:V4HI                     0 "register_operand" "=y")
+	(vec_select:V4HI
+	  (sign_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+	  (parallel [(const_int 0) (const_int 1)
+		     (const_int 2) (const_int 3)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckelsb%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckel")]
+)
 
 (define_insn "iwmmxt_wunpckelsh"
-  [(set (match_operand:V2SI                   0 "register_operand" "=y")
-	(sign_extend:V2SI
-	 (vec_select:V2HI (match_operand:V4HI 1 "register_operand" "y")
-			  (parallel [(const_int 0) (const_int 1)]))))]
+  [(set (match_operand:V2SI                     0 "register_operand" "=y")
+	(vec_select:V2SI
+	  (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckelsh%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckel")]
+)
 
 (define_insn "iwmmxt_wunpckelsw"
-  [(set (match_operand:DI                   0 "register_operand" "=y")
-	(sign_extend:DI
-	 (vec_select:SI (match_operand:V2SI 1 "register_operand" "y")
-			(parallel [(const_int 0)]))))]
+  [(set (match_operand:DI                       0 "register_operand" "=y")
+        (vec_select:DI
+	  (sign_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+	  (parallel [(const_int 0)])))]
   "TARGET_REALLY_IWMMXT"
   "wunpckelsw%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wunpckel")]
+)
 
 ;; Shifts
 
-(define_insn "rorv4hi3"
-  [(set (match_operand:V4HI                0 "register_operand" "=y")
-        (rotatert:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		       (match_operand:SI   2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wrorhg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "rorv2si3"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
-        (rotatert:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:SI   2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wrorwg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "rordi3"
-  [(set (match_operand:DI              0 "register_operand" "=y")
-	(rotatert:DI (match_operand:DI 1 "register_operand" "y")
-		   (match_operand:SI   2 "register_operand" "z")))]
+(define_insn "ror<mode>3"
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y,y")
+        (rotatert:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+		        (match_operand:SI    2 "imm_or_reg_operand" "z,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wrordg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch  (which_alternative)
+    {
+    case 0:
+      return \"wror<MMX_char>g%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wror<MMX_char>\", operands, true);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wror, wror")]
+)
 
 (define_insn "ashr<mode>3_iwmmxt"
-  [(set (match_operand:VSHFT                 0 "register_operand" "=y")
-        (ashiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y")
-			(match_operand:SI    2 "register_operand" "z")))]
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y,y")
+        (ashiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+			(match_operand:SI    2 "imm_or_reg_operand" "z,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wsra<MMX_char>g%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch  (which_alternative)
+    {
+    case 0:
+      return \"wsra<MMX_char>g%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wsra<MMX_char>\", operands, true);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wsra, wsra")]
+)
 
 (define_insn "lshr<mode>3_iwmmxt"
-  [(set (match_operand:VSHFT                 0 "register_operand" "=y")
-        (lshiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y")
-			(match_operand:SI    2 "register_operand" "z")))]
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y,y")
+        (lshiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+			(match_operand:SI    2 "imm_or_reg_operand" "z,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wsrl<MMX_char>g%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch  (which_alternative)
+    {
+    case 0:
+      return \"wsrl<MMX_char>g%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wsrl<MMX_char>\", operands, false);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wsrl, wsrl")]
+)
 
 (define_insn "ashl<mode>3_iwmmxt"
-  [(set (match_operand:VSHFT               0 "register_operand" "=y")
-        (ashift:VSHFT (match_operand:VSHFT 1 "register_operand" "y")
-		      (match_operand:SI    2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsll<MMX_char>g%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "rorv4hi3_di"
-  [(set (match_operand:V4HI                0 "register_operand" "=y")
-        (rotatert:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		       (match_operand:DI   2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wrorh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "rorv2si3_di"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
-        (rotatert:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:DI   2 "register_operand" "y")))]
+  [(set (match_operand:VSHFT               0 "register_operand" "=y,y")
+        (ashift:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+		      (match_operand:SI    2 "imm_or_reg_operand" "z,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wrorw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "rordi3_di"
-  [(set (match_operand:DI              0 "register_operand" "=y")
-	(rotatert:DI (match_operand:DI 1 "register_operand" "y")
-		   (match_operand:DI   2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wrord%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "ashrv4hi3_di"
-  [(set (match_operand:V4HI                0 "register_operand" "=y")
-        (ashiftrt:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		       (match_operand:DI   2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrah%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "ashrv2si3_di"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
-        (ashiftrt:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:DI   2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsraw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "ashrdi3_di"
-  [(set (match_operand:DI              0 "register_operand" "=y")
-	(ashiftrt:DI (match_operand:DI 1 "register_operand" "y")
-		   (match_operand:DI   2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrad%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "lshrv4hi3_di"
-  [(set (match_operand:V4HI                0 "register_operand" "=y")
-        (lshiftrt:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		       (match_operand:DI   2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrlh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "lshrv2si3_di"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
-        (lshiftrt:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:DI   2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrlw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch  (which_alternative)
+    {
+    case 0:
+      return \"wsll<MMX_char>g%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wsll<MMX_char>\", operands, false);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wsll, wsll")]
+)
 
-(define_insn "lshrdi3_di"
-  [(set (match_operand:DI              0 "register_operand" "=y")
-	(lshiftrt:DI (match_operand:DI 1 "register_operand" "y")
-		     (match_operand:DI 2 "register_operand" "y")))]
+(define_insn "ror<mode>3_di"
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y,y")
+        (rotatert:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+		        (match_operand:DI    2 "imm_or_reg_operand" "y,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wsrld%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch (which_alternative)
+    {
+    case 0:
+      return \"wror<MMX_char>%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wror<MMX_char>\", operands, true);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wror, wror")]
+)
 
-(define_insn "ashlv4hi3_di"
-  [(set (match_operand:V4HI              0 "register_operand" "=y")
-        (ashift:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		     (match_operand:DI   2 "register_operand" "y")))]
+(define_insn "ashr<mode>3_di"
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y,y")
+        (ashiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+		        (match_operand:DI    2 "imm_or_reg_operand" "y,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wsllh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch (which_alternative)
+    {
+    case 0:
+      return \"wsra<MMX_char>%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wsra<MMX_char>\", operands, true);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wsra, wsra")]
+)
 
-(define_insn "ashlv2si3_di"
-  [(set (match_operand:V2SI              0 "register_operand" "=y")
-        (ashift:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:DI 2 "register_operand" "y")))]
+(define_insn "lshr<mode>3_di"
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y,y")
+        (lshiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+		        (match_operand:DI    2 "register_operand" "y,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wsllw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch (which_alternative)
+    {
+    case 0:
+      return \"wsrl<MMX_char>%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wsrl<MMX_char>\", operands, false);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wsrl, wsrl")]
+)
 
-(define_insn "ashldi3_di"
-  [(set (match_operand:DI            0 "register_operand" "=y")
-	(ashift:DI (match_operand:DI 1 "register_operand" "y")
-		   (match_operand:DI 2 "register_operand" "y")))]
+(define_insn "ashl<mode>3_di"
+  [(set (match_operand:VSHFT               0 "register_operand" "=y,y")
+        (ashift:VSHFT (match_operand:VSHFT 1 "register_operand" "y,y")
+		      (match_operand:DI    2 "imm_or_reg_operand" "y,i")))]
   "TARGET_REALLY_IWMMXT"
-  "wslld%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "*
+  switch (which_alternative)
+    {
+    case 0:
+      return \"wsll<MMX_char>%?\\t%0, %1, %2\";
+    case 1:
+      return arm_output_iwmmxt_shift_immediate (\"wsll<MMX_char>\", operands, false);
+    default:
+      gcc_unreachable ();
+    }
+  "
+  [(set_attr "predicable" "yes")
+   (set_attr "arch" "*, iwmmxt2")
+   (set_attr "wtype" "wsll, wsll")]
+)
 
 (define_insn "iwmmxt_wmadds"
-  [(set (match_operand:V4HI               0 "register_operand" "=y")
-        (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "y")
-		      (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WMADDS))]
+  [(set (match_operand:V2SI                                        0 "register_operand" "=y")
+	(plus:V2SI
+	  (mult:V2SI
+	    (vec_select:V2SI (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                     (parallel [(const_int 1) (const_int 3)]))
+	    (vec_select:V2SI (sign_extend:V4SI (match_operand:V4HI 2 "register_operand" "y"))
+	                     (parallel [(const_int 1) (const_int 3)])))
+	  (mult:V2SI
+	    (vec_select:V2SI (sign_extend:V4SI (match_dup 1))
+	                     (parallel [(const_int 0) (const_int 2)]))
+	    (vec_select:V2SI (sign_extend:V4SI (match_dup 2))
+	                     (parallel [(const_int 0) (const_int 2)])))))]
   "TARGET_REALLY_IWMMXT"
   "wmadds%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmadd")]
+)
 
 (define_insn "iwmmxt_wmaddu"
-  [(set (match_operand:V4HI               0 "register_operand" "=y")
-        (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "y")
-		      (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WMADDU))]
+  [(set (match_operand:V2SI               0 "register_operand" "=y")
+	(plus:V2SI
+	  (mult:V2SI
+	    (vec_select:V2SI (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                     (parallel [(const_int 1) (const_int 3)]))
+	    (vec_select:V2SI (zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y"))
+	                     (parallel [(const_int 1) (const_int 3)])))
+	  (mult:V2SI
+	    (vec_select:V2SI (zero_extend:V4SI (match_dup 1))
+	                     (parallel [(const_int 0) (const_int 2)]))
+	    (vec_select:V2SI (zero_extend:V4SI (match_dup 2))
+	                     (parallel [(const_int 0) (const_int 2)])))))]
   "TARGET_REALLY_IWMMXT"
   "wmaddu%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmadd")]
+)
 
 (define_insn "iwmmxt_tmia"
-  [(set (match_operand:DI                    0 "register_operand" "=y")
-	(plus:DI (match_operand:DI           1 "register_operand" "0")
+  [(set (match_operand:DI                     0 "register_operand" "=y")
+	(plus:DI (match_operand:DI            1 "register_operand" "0")
 		 (mult:DI (sign_extend:DI
-			   (match_operand:SI 2 "register_operand" "r"))
+			    (match_operand:SI 2 "register_operand" "r"))
 			  (sign_extend:DI
-			   (match_operand:SI 3 "register_operand" "r")))))]
+			    (match_operand:SI 3 "register_operand" "r")))))]
   "TARGET_REALLY_IWMMXT"
   "tmia%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmia")]
+)
 
 (define_insn "iwmmxt_tmiaph"
-  [(set (match_operand:DI          0 "register_operand" "=y")
-	(plus:DI (match_operand:DI 1 "register_operand" "0")
+  [(set (match_operand:DI                                    0 "register_operand" "=y")
+	(plus:DI (match_operand:DI                           1 "register_operand" "0")
 		 (plus:DI
-		  (mult:DI (sign_extend:DI
-			    (truncate:HI (match_operand:SI 2 "register_operand" "r")))
-			   (sign_extend:DI
-			    (truncate:HI (match_operand:SI 3 "register_operand" "r"))))
-		  (mult:DI (sign_extend:DI
-			    (truncate:HI (ashiftrt:SI (match_dup 2) (const_int 16))))
-			   (sign_extend:DI
-			    (truncate:HI (ashiftrt:SI (match_dup 3) (const_int 16))))))))]
+		   (mult:DI (sign_extend:DI
+			      (truncate:HI (match_operand:SI 2 "register_operand" "r")))
+			    (sign_extend:DI
+			      (truncate:HI (match_operand:SI 3 "register_operand" "r"))))
+		   (mult:DI (sign_extend:DI
+			      (truncate:HI (ashiftrt:SI (match_dup 2) (const_int 16))))
+			    (sign_extend:DI
+			      (truncate:HI (ashiftrt:SI (match_dup 3) (const_int 16))))))))]
   "TARGET_REALLY_IWMMXT"
   "tmiaph%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmiaph")]
+)
 
 (define_insn "iwmmxt_tmiabb"
-  [(set (match_operand:DI          0 "register_operand" "=y")
-	(plus:DI (match_operand:DI 1 "register_operand" "0")
+  [(set (match_operand:DI                                  0 "register_operand" "=y")
+	(plus:DI (match_operand:DI                         1 "register_operand" "0")
 		 (mult:DI (sign_extend:DI
-			   (truncate:HI (match_operand:SI 2 "register_operand" "r")))
+			    (truncate:HI (match_operand:SI 2 "register_operand" "r")))
 			  (sign_extend:DI
-			   (truncate:HI (match_operand:SI 3 "register_operand" "r"))))))]
+			    (truncate:HI (match_operand:SI 3 "register_operand" "r"))))))]
   "TARGET_REALLY_IWMMXT"
   "tmiabb%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmiaxy")]
+)
 
 (define_insn "iwmmxt_tmiatb"
-  [(set (match_operand:DI          0 "register_operand" "=y")
-	(plus:DI (match_operand:DI 1 "register_operand" "0")
+  [(set (match_operand:DI                         0 "register_operand" "=y")
+	(plus:DI (match_operand:DI                1 "register_operand" "0")
 		 (mult:DI (sign_extend:DI
-			   (truncate:HI (ashiftrt:SI
-					 (match_operand:SI 2 "register_operand" "r")
-					 (const_int 16))))
+			    (truncate:HI
+			      (ashiftrt:SI
+				(match_operand:SI 2 "register_operand" "r")
+				(const_int 16))))
 			  (sign_extend:DI
-			   (truncate:HI (match_operand:SI 3 "register_operand" "r"))))))]
+			    (truncate:HI
+			      (match_operand:SI   3 "register_operand" "r"))))))]
   "TARGET_REALLY_IWMMXT"
   "tmiatb%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmiaxy")]
+)
 
 (define_insn "iwmmxt_tmiabt"
-  [(set (match_operand:DI          0 "register_operand" "=y")
-	(plus:DI (match_operand:DI 1 "register_operand" "0")
+  [(set (match_operand:DI                         0 "register_operand" "=y")
+	(plus:DI (match_operand:DI                1 "register_operand" "0")
 		 (mult:DI (sign_extend:DI
-			   (truncate:HI (match_operand:SI 2 "register_operand" "r")))
+			    (truncate:HI
+			      (match_operand:SI   2 "register_operand" "r")))
 			  (sign_extend:DI
-			   (truncate:HI (ashiftrt:SI
-					 (match_operand:SI 3 "register_operand" "r")
-					 (const_int 16)))))))]
+			    (truncate:HI
+			      (ashiftrt:SI
+				(match_operand:SI 3 "register_operand" "r")
+				(const_int 16)))))))]
   "TARGET_REALLY_IWMMXT"
   "tmiabt%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmiaxy")]
+)
 
 (define_insn "iwmmxt_tmiatt"
   [(set (match_operand:DI          0 "register_operand" "=y")
 	(plus:DI (match_operand:DI 1 "register_operand" "0")
 		 (mult:DI (sign_extend:DI
-			   (truncate:HI (ashiftrt:SI
-					 (match_operand:SI 2 "register_operand" "r")
-					 (const_int 16))))
+			    (truncate:HI
+			      (ashiftrt:SI
+				(match_operand:SI 2 "register_operand" "r")
+				(const_int 16))))
 			  (sign_extend:DI
-			   (truncate:HI (ashiftrt:SI
-					 (match_operand:SI 3 "register_operand" "r")
-					 (const_int 16)))))))]
+			    (truncate:HI
+			      (ashiftrt:SI
+				(match_operand:SI 3 "register_operand" "r")
+				(const_int 16)))))))]
   "TARGET_REALLY_IWMMXT"
   "tmiatt%?\\t%0, %2, %3"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "iwmmxt_tbcstqi"
-  [(set (match_operand:V8QI                   0 "register_operand" "=y")
-	(vec_duplicate:V8QI (match_operand:QI 1 "register_operand" "r")))]
-  "TARGET_REALLY_IWMMXT"
-  "tbcstb%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "iwmmxt_tbcsthi"
-  [(set (match_operand:V4HI                   0 "register_operand" "=y")
-	(vec_duplicate:V4HI (match_operand:HI 1 "register_operand" "r")))]
-  "TARGET_REALLY_IWMMXT"
-  "tbcsth%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "iwmmxt_tbcstsi"
-  [(set (match_operand:V2SI                   0 "register_operand" "=y")
-	(vec_duplicate:V2SI (match_operand:SI 1 "register_operand" "r")))]
-  "TARGET_REALLY_IWMMXT"
-  "tbcstw%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmiaxy")]
+)
 
 (define_insn "iwmmxt_tmovmskb"
   [(set (match_operand:SI               0 "register_operand" "=r")
 	(unspec:SI [(match_operand:V8QI 1 "register_operand" "y")] UNSPEC_TMOVMSK))]
   "TARGET_REALLY_IWMMXT"
   "tmovmskb%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmovmsk")]
+)
 
 (define_insn "iwmmxt_tmovmskh"
   [(set (match_operand:SI               0 "register_operand" "=r")
 	(unspec:SI [(match_operand:V4HI 1 "register_operand" "y")] UNSPEC_TMOVMSK))]
   "TARGET_REALLY_IWMMXT"
   "tmovmskh%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmovmsk")]
+)
 
 (define_insn "iwmmxt_tmovmskw"
   [(set (match_operand:SI               0 "register_operand" "=r")
 	(unspec:SI [(match_operand:V2SI 1 "register_operand" "y")] UNSPEC_TMOVMSK))]
   "TARGET_REALLY_IWMMXT"
   "tmovmskw%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tmovmsk")]
+)
 
 (define_insn "iwmmxt_waccb"
   [(set (match_operand:DI               0 "register_operand" "=y")
 	(unspec:DI [(match_operand:V8QI 1 "register_operand" "y")] UNSPEC_WACC))]
   "TARGET_REALLY_IWMMXT"
   "waccb%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wacc")]
+)
 
 (define_insn "iwmmxt_wacch"
   [(set (match_operand:DI               0 "register_operand" "=y")
 	(unspec:DI [(match_operand:V4HI 1 "register_operand" "y")] UNSPEC_WACC))]
   "TARGET_REALLY_IWMMXT"
   "wacch%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wacc")]
+)
 
 (define_insn "iwmmxt_waccw"
   [(set (match_operand:DI               0 "register_operand" "=y")
 	(unspec:DI [(match_operand:V2SI 1 "register_operand" "y")] UNSPEC_WACC))]
   "TARGET_REALLY_IWMMXT"
   "waccw%?\\t%0, %1"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wacc")]
+)
 
-(define_insn "iwmmxt_walign"
-  [(set (match_operand:V8QI                           0 "register_operand" "=y,y")
+;; use unspec here to prevent 8 * imm to be optimized by cse
+(define_insn "iwmmxt_waligni"
+  [(set (match_operand:V8QI                                0 "register_operand" "=y")
+	(unspec:V8QI [(subreg:V8QI
+		        (ashiftrt:TI
+		          (subreg:TI (vec_concat:V16QI
+				       (match_operand:V8QI 1 "register_operand" "y")
+				       (match_operand:V8QI 2 "register_operand" "y")) 0)
+		          (mult:SI
+		            (match_operand:SI              3 "immediate_operand" "i")
+		            (const_int 8))) 0)] UNSPEC_WALIGNI))]
+  "TARGET_REALLY_IWMMXT"
+  "waligni%?\\t%0, %1, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "waligni")]
+)
+
+(define_insn "iwmmxt_walignr"
+  [(set (match_operand:V8QI                           0 "register_operand" "=y")
 	(subreg:V8QI (ashiftrt:TI
-		      (subreg:TI (vec_concat:V16QI
-				  (match_operand:V8QI 1 "register_operand" "y,y")
-				  (match_operand:V8QI 2 "register_operand" "y,y")) 0)
-		      (mult:SI
-		       (match_operand:SI              3 "nonmemory_operand" "i,z")
-		       (const_int 8))) 0))]
-  "TARGET_REALLY_IWMMXT"
-  "@
-   waligni%?\\t%0, %1, %2, %3
-   walignr%U3%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+		       (subreg:TI (vec_concat:V16QI
+				    (match_operand:V8QI 1 "register_operand" "y")
+				    (match_operand:V8QI 2 "register_operand" "y")) 0)
+		       (mult:SI
+		         (zero_extract:SI (match_operand:SI 3 "register_operand" "z") (const_int 3) (const_int 0))
+		         (const_int 8))) 0))]
+  "TARGET_REALLY_IWMMXT"
+  "walignr%U3%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "walignr")]
+)
 
-(define_insn "iwmmxt_tmrc"
-  [(set (match_operand:SI                      0 "register_operand" "=r")
-	(unspec_volatile:SI [(match_operand:SI 1 "immediate_operand" "i")]
-			    VUNSPEC_TMRC))]
-  "TARGET_REALLY_IWMMXT"
-  "tmrc%?\\t%0, %w1"
-  [(set_attr "predicable" "yes")])
+(define_insn "iwmmxt_walignr0"
+  [(set (match_operand:V8QI                           0 "register_operand" "=y")
+	(subreg:V8QI (ashiftrt:TI
+		       (subreg:TI (vec_concat:V16QI
+				    (match_operand:V8QI 1 "register_operand" "y")
+				    (match_operand:V8QI 2 "register_operand" "y")) 0)
+		       (mult:SI
+		         (zero_extract:SI (reg:SI WCGR0) (const_int 3) (const_int 0))
+		         (const_int 8))) 0))]
+  "TARGET_REALLY_IWMMXT"
+  "walignr0%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "walignr")]
+)
 
-(define_insn "iwmmxt_tmcr"
-  [(unspec_volatile:SI [(match_operand:SI 0 "immediate_operand" "i")
-			(match_operand:SI 1 "register_operand"  "r")]
-		       VUNSPEC_TMCR)]
-  "TARGET_REALLY_IWMMXT"
-  "tmcr%?\\t%w0, %1"
-  [(set_attr "predicable" "yes")])
+(define_insn "iwmmxt_walignr1"
+  [(set (match_operand:V8QI                           0 "register_operand" "=y")
+	(subreg:V8QI (ashiftrt:TI
+		       (subreg:TI (vec_concat:V16QI
+				    (match_operand:V8QI 1 "register_operand" "y")
+				    (match_operand:V8QI 2 "register_operand" "y")) 0)
+		       (mult:SI
+		         (zero_extract:SI (reg:SI WCGR1) (const_int 3) (const_int 0))
+		         (const_int 8))) 0))]
+  "TARGET_REALLY_IWMMXT"
+  "walignr1%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "walignr")]
+)
+
+(define_insn "iwmmxt_walignr2"
+  [(set (match_operand:V8QI                           0 "register_operand" "=y")
+	(subreg:V8QI (ashiftrt:TI
+		       (subreg:TI (vec_concat:V16QI
+				    (match_operand:V8QI 1 "register_operand" "y")
+				    (match_operand:V8QI 2 "register_operand" "y")) 0)
+		       (mult:SI
+		         (zero_extract:SI (reg:SI WCGR2) (const_int 3) (const_int 0))
+		         (const_int 8))) 0))]
+  "TARGET_REALLY_IWMMXT"
+  "walignr2%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "walignr")]
+)
+
+(define_insn "iwmmxt_walignr3"
+  [(set (match_operand:V8QI                           0 "register_operand" "=y")
+	(subreg:V8QI (ashiftrt:TI
+		       (subreg:TI (vec_concat:V16QI
+				    (match_operand:V8QI 1 "register_operand" "y")
+				    (match_operand:V8QI 2 "register_operand" "y")) 0)
+		       (mult:SI
+		         (zero_extract:SI (reg:SI WCGR3) (const_int 3) (const_int 0))
+		         (const_int 8))) 0))]
+  "TARGET_REALLY_IWMMXT"
+  "walignr3%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "walignr")]
+)
 
 (define_insn "iwmmxt_wsadb"
-  [(set (match_operand:V8QI               0 "register_operand" "=y")
-        (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "y")
-		      (match_operand:V8QI 2 "register_operand" "y")] UNSPEC_WSAD))]
+  [(set (match_operand:V2SI               0 "register_operand" "=y")
+        (unspec:V2SI [
+		      (match_operand:V2SI 1 "register_operand" "0")
+		      (match_operand:V8QI 2 "register_operand" "y")
+		      (match_operand:V8QI 3 "register_operand" "y")] UNSPEC_WSAD))]
   "TARGET_REALLY_IWMMXT"
-  "wsadb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "wsadb%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsad")]
+)
 
 (define_insn "iwmmxt_wsadh"
-  [(set (match_operand:V4HI               0 "register_operand" "=y")
-        (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "y")
-		      (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WSAD))]
+  [(set (match_operand:V2SI               0 "register_operand" "=y")
+        (unspec:V2SI [
+		      (match_operand:V2SI 1 "register_operand" "0")
+		      (match_operand:V4HI 2 "register_operand" "y")
+		      (match_operand:V4HI 3 "register_operand" "y")] UNSPEC_WSAD))]
   "TARGET_REALLY_IWMMXT"
-  "wsadh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  "wsadh%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsad")]
+)
 
 (define_insn "iwmmxt_wsadbz"
-  [(set (match_operand:V8QI               0 "register_operand" "=y")
-        (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "y")
+  [(set (match_operand:V2SI               0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V8QI 1 "register_operand" "y")
 		      (match_operand:V8QI 2 "register_operand" "y")] UNSPEC_WSADZ))]
   "TARGET_REALLY_IWMMXT"
   "wsadbz%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsad")]
+)
 
 (define_insn "iwmmxt_wsadhz"
-  [(set (match_operand:V4HI               0 "register_operand" "=y")
-        (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "y")
+  [(set (match_operand:V2SI               0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V4HI 1 "register_operand" "y")
 		      (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WSADZ))]
   "TARGET_REALLY_IWMMXT"
   "wsadhz%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsad")]
+)
 
+(include "iwmmxt2.md")
diff --git a/gcc/config/arm/iwmmxt2.md b/gcc/config/arm/iwmmxt2.md
new file mode 100644
index 0000000..78fcb7f
--- /dev/null
+++ b/gcc/config/arm/iwmmxt2.md
@@ -0,0 +1,918 @@
+;; Patterns for the Intel Wireless MMX technology architecture.
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;; Written by Marvell, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_c_enum "unspec" [
+  UNSPEC_WADDC		; Used by the intrinsic form of the iWMMXt WADDC instruction.
+  UNSPEC_WABS		; Used by the intrinsic form of the iWMMXt WABS instruction.
+  UNSPEC_WQMULWMR	; Used by the intrinsic form of the iWMMXt WQMULWMR instruction.
+  UNSPEC_WQMULMR	; Used by the intrinsic form of the iWMMXt WQMULMR instruction.
+  UNSPEC_WQMULWM	; Used by the intrinsic form of the iWMMXt WQMULWM instruction.
+  UNSPEC_WQMULM		; Used by the intrinsic form of the iWMMXt WQMULM instruction.
+  UNSPEC_WQMIAxyn	; Used by the intrinsic form of the iWMMXt WMIAxyn instruction.
+  UNSPEC_WQMIAxy	; Used by the intrinsic form of the iWMMXt WMIAxy instruction.
+  UNSPEC_TANDC		; Used by the intrinsic form of the iWMMXt TANDC instruction.
+  UNSPEC_TORC		; Used by the intrinsic form of the iWMMXt TORC instruction.
+  UNSPEC_TORVSC		; Used by the intrinsic form of the iWMMXt TORVSC instruction.
+  UNSPEC_TEXTRC		; Used by the intrinsic form of the iWMMXt TEXTRC instruction.
+])
+
+(define_insn "iwmmxt_wabs<mode>3"
+  [(set (match_operand:VMMX               0 "register_operand" "=y")
+        (unspec:VMMX [(match_operand:VMMX 1 "register_operand"  "y")] UNSPEC_WABS))]
+  "TARGET_REALLY_IWMMXT"
+  "wabs<MMX_char>%?\\t%0, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wabs")]
+)
+
+(define_insn "iwmmxt_wabsdiffb"
+  [(set (match_operand:V8QI                          0 "register_operand" "=y")
+	(truncate:V8QI
+	  (abs:V8HI
+	    (minus:V8HI
+	      (zero_extend:V8HI (match_operand:V8QI  1 "register_operand"  "y"))
+	      (zero_extend:V8HI (match_operand:V8QI  2 "register_operand"  "y"))))))]
+ "TARGET_REALLY_IWMMXT"
+ "wabsdiffb%?\\t%0, %1, %2"
+ [(set_attr "predicable" "yes")
+  (set_attr "wtype" "wabsdiff")]
+)
+
+(define_insn "iwmmxt_wabsdiffh"
+  [(set (match_operand:V4HI                          0 "register_operand" "=y")
+        (truncate: V4HI
+          (abs:V4SI
+            (minus:V4SI
+              (zero_extend:V4SI (match_operand:V4HI  1 "register_operand"  "y"))
+	      (zero_extend:V4SI (match_operand:V4HI  2 "register_operand"  "y"))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wabsdiffh%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wabsdiff")]
+)
+
+(define_insn "iwmmxt_wabsdiffw"
+  [(set (match_operand:V2SI                          0 "register_operand" "=y")
+        (truncate: V2SI
+	  (abs:V2DI
+	    (minus:V2DI
+	      (zero_extend:V2DI (match_operand:V2SI  1 "register_operand"  "y"))
+	      (zero_extend:V2DI (match_operand:V2SI  2 "register_operand"  "y"))))))]
+ "TARGET_REALLY_IWMMXT"
+ "wabsdiffw%?\\t%0, %1, %2"
+ [(set_attr "predicable" "yes")
+  (set_attr "wtype" "wabsdiff")]
+)
+
+(define_insn "iwmmxt_waddsubhx"
+  [(set (match_operand:V4HI                                        0 "register_operand" "=y")
+	(vec_merge:V4HI
+	  (ss_minus:V4HI
+	    (match_operand:V4HI                                    1 "register_operand" "y")
+	    (vec_select:V4HI (match_operand:V4HI 2 "register_operand" "y")
+	                     (parallel [(const_int 1) (const_int 0) (const_int 3) (const_int 2)])))
+	  (ss_plus:V4HI
+	    (match_dup 1)
+	    (vec_select:V4HI (match_dup 2)
+	                     (parallel [(const_int 1) (const_int 0) (const_int 3) (const_int 2)])))
+	  (const_int 10)))]
+  "TARGET_REALLY_IWMMXT"
+  "waddsubhx%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "waddsubhx")]
+)
+
+(define_insn "iwmmxt_wsubaddhx"
+  [(set (match_operand:V4HI                                        0 "register_operand" "=y")
+	(vec_merge:V4HI
+	  (ss_plus:V4HI
+	    (match_operand:V4HI                                    1 "register_operand" "y")
+	    (vec_select:V4HI (match_operand:V4HI 2 "register_operand" "y")
+	                     (parallel [(const_int 1) (const_int 0) (const_int 3) (const_int 2)])))
+	  (ss_minus:V4HI
+	    (match_dup 1)
+	    (vec_select:V4HI (match_dup 2)
+	                     (parallel [(const_int 1) (const_int 0) (const_int 3) (const_int 2)])))
+	  (const_int 10)))]
+  "TARGET_REALLY_IWMMXT"
+  "wsubaddhx%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wsubaddhx")]
+)
+
+(define_insn "addc<mode>3"
+  [(set (match_operand:VMMX2      0 "register_operand" "=y")
+	(unspec:VMMX2
+          [(plus:VMMX2
+             (match_operand:VMMX2 1 "register_operand"  "y")
+	     (match_operand:VMMX2 2 "register_operand"  "y"))] UNSPEC_WADDC))]
+  "TARGET_REALLY_IWMMXT"
+  "wadd<MMX_char>c%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wadd")]
+)
+
+(define_insn "iwmmxt_avg4"
+[(set (match_operand:V8QI                                 0 "register_operand" "=y")
+      (truncate:V8QI
+        (vec_select:V8HI
+	  (vec_merge:V8HI
+	    (lshiftrt:V8HI
+	      (plus:V8HI
+	        (plus:V8HI
+		  (plus:V8HI
+	            (plus:V8HI
+		      (zero_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+		      (zero_extend:V8HI (match_operand:V8QI 2 "register_operand" "y")))
+		    (vec_select:V8HI (zero_extend:V8HI (match_dup 1))
+		                     (parallel [(const_int 7) (const_int 0) (const_int 1) (const_int 2)
+				                (const_int 3) (const_int 4) (const_int 5) (const_int 6)])))
+		  (vec_select:V8HI (zero_extend:V8HI (match_dup 2))
+		                   (parallel [(const_int 7) (const_int 0) (const_int 1) (const_int 2)
+				              (const_int 3) (const_int 4) (const_int 5) (const_int 6)])))
+	        (const_vector:V8HI [(const_int 1) (const_int 1) (const_int 1) (const_int 1)
+	                            (const_int 1) (const_int 1) (const_int 1) (const_int 1)]))
+	      (const_int 2))
+	    (const_vector:V8HI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)
+	                        (const_int 0) (const_int 0) (const_int 0) (const_int 0)])
+	    (const_int 254))
+	  (parallel [(const_int 1) (const_int 2) (const_int 3) (const_int 4)
+	             (const_int 5) (const_int 6) (const_int 7) (const_int 0)]))))]
+  "TARGET_REALLY_IWMMXT"
+  "wavg4%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wavg4")]
+)
+
+(define_insn "iwmmxt_avg4r"
+  [(set (match_operand:V8QI                                   0 "register_operand" "=y")
+	(truncate:V8QI
+	  (vec_select:V8HI
+	    (vec_merge:V8HI
+	      (lshiftrt:V8HI
+	        (plus:V8HI
+		  (plus:V8HI
+		    (plus:V8HI
+		      (plus:V8HI
+		        (zero_extend:V8HI (match_operand:V8QI 1 "register_operand" "y"))
+		        (zero_extend:V8HI (match_operand:V8QI 2 "register_operand" "y")))
+		      (vec_select:V8HI (zero_extend:V8HI (match_dup 1))
+		                       (parallel [(const_int 7) (const_int 0) (const_int 1) (const_int 2)
+				                  (const_int 3) (const_int 4) (const_int 5) (const_int 6)])))
+		    (vec_select:V8HI (zero_extend:V8HI (match_dup 2))
+		                     (parallel [(const_int 7) (const_int 0) (const_int 1) (const_int 2)
+				                (const_int 3) (const_int 4) (const_int 5) (const_int 6)])))
+		  (const_vector:V8HI [(const_int 2) (const_int 2) (const_int 2) (const_int 2)
+		                      (const_int 2) (const_int 2) (const_int 2) (const_int 2)]))
+	        (const_int 2))
+	      (const_vector:V8HI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)
+	                          (const_int 0) (const_int 0) (const_int 0) (const_int 0)])
+	      (const_int 254))
+	    (parallel [(const_int 1) (const_int 2) (const_int 3) (const_int 4)
+	               (const_int 5) (const_int 6) (const_int 7) (const_int 0)]))))]
+  "TARGET_REALLY_IWMMXT"
+  "wavg4r%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wavg4")]
+)
+
+(define_insn "iwmmxt_wmaddsx"
+  [(set (match_operand:V2SI                                        0 "register_operand" "=y")
+	(plus:V2SI
+	  (mult:V2SI
+	    (vec_select:V2SI (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                     (parallel [(const_int 1) (const_int 3)]))
+	    (vec_select:V2SI (sign_extend:V4SI (match_operand:V4HI 2 "register_operand" "y"))
+	                     (parallel [(const_int 0) (const_int 2)])))
+	  (mult:V2SI
+	    (vec_select:V2SI (sign_extend:V4SI (match_dup 1))
+	                     (parallel [(const_int 0) (const_int 2)]))
+	    (vec_select:V2SI (sign_extend:V4SI (match_dup 2))
+	                     (parallel [(const_int 1) (const_int 3)])))))]
+ "TARGET_REALLY_IWMMXT"
+  "wmaddsx%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+	(set_attr "wtype" "wmadd")]
+)
+
+(define_insn "iwmmxt_wmaddux"
+  [(set (match_operand:V2SI                                        0 "register_operand" "=y")
+	(plus:V2SI
+	  (mult:V2SI
+	    (vec_select:V2SI (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                     (parallel [(const_int 1) (const_int 3)]))
+	    (vec_select:V2SI (zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y"))
+	                     (parallel [(const_int 0) (const_int 2)])))
+	  (mult:V2SI
+	    (vec_select:V2SI (zero_extend:V4SI (match_dup 1))
+	                     (parallel [(const_int 0) (const_int 2)]))
+	    (vec_select:V2SI (zero_extend:V4SI (match_dup 2))
+	                     (parallel [(const_int 1) (const_int 3)])))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmaddux%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmadd")]
+)
+
+(define_insn "iwmmxt_wmaddsn"
+ [(set (match_operand:V2SI                                     0 "register_operand" "=y")
+    (minus:V2SI
+      (mult:V2SI
+        (vec_select:V2SI (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                 (parallel [(const_int 0) (const_int 2)]))
+        (vec_select:V2SI (sign_extend:V4SI (match_operand:V4HI 2 "register_operand" "y"))
+	                 (parallel [(const_int 0) (const_int 2)])))
+      (mult:V2SI
+        (vec_select:V2SI (sign_extend:V4SI (match_dup 1))
+	                 (parallel [(const_int 1) (const_int 3)]))
+        (vec_select:V2SI (sign_extend:V4SI (match_dup 2))
+	                 (parallel [(const_int 1) (const_int 3)])))))]
+ "TARGET_REALLY_IWMMXT"
+ "wmaddsn%?\\t%0, %1, %2"
+ [(set_attr "predicable" "yes")
+  (set_attr "wtype" "wmadd")]
+)
+
+(define_insn "iwmmxt_wmaddun"
+  [(set (match_operand:V2SI                                        0 "register_operand" "=y")
+	(minus:V2SI
+	  (mult:V2SI
+	    (vec_select:V2SI (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+	                     (parallel [(const_int 0) (const_int 2)]))
+	    (vec_select:V2SI (zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y"))
+	                     (parallel [(const_int 0) (const_int 2)])))
+	  (mult:V2SI
+	    (vec_select:V2SI (zero_extend:V4SI (match_dup 1))
+	                     (parallel [(const_int 1) (const_int 3)]))
+	    (vec_select:V2SI (zero_extend:V4SI (match_dup 2))
+	                     (parallel [(const_int 1) (const_int 3)])))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmaddun%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmadd")]
+)
+
+(define_insn "iwmmxt_wmulwsm"
+  [(set (match_operand:V2SI                         0 "register_operand" "=y")
+	(truncate:V2SI
+	  (ashiftrt:V2DI
+	    (mult:V2DI
+	      (sign_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+	      (sign_extend:V2DI (match_operand:V2SI 2 "register_operand" "y")))
+	    (const_int 32))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmulwsm%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmulw")]
+)
+
+(define_insn "iwmmxt_wmulwum"
+  [(set (match_operand:V2SI                         0 "register_operand" "=y")
+	(truncate:V2SI
+          (lshiftrt:V2DI
+	    (mult:V2DI
+	      (zero_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+	      (zero_extend:V2DI (match_operand:V2SI 2 "register_operand" "y")))
+	    (const_int 32))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmulwum%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmulw")]
+)
+
+(define_insn "iwmmxt_wmulsmr"
+  [(set (match_operand:V4HI                           0 "register_operand" "=y")
+	(truncate:V4HI
+	  (ashiftrt:V4SI
+	    (plus:V4SI
+	      (mult:V4SI
+	        (sign_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+		(sign_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
+	      (const_vector:V4SI [(const_int 32768)
+	                          (const_int 32768)
+				  (const_int 32768)]))
+	    (const_int 16))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmulsmr%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmul")]
+)
+
+(define_insn "iwmmxt_wmulumr"
+  [(set (match_operand:V4HI                           0 "register_operand" "=y")
+	(truncate:V4HI
+	  (lshiftrt:V4SI
+	    (plus:V4SI
+	      (mult:V4SI
+	        (zero_extend:V4SI (match_operand:V4HI 1 "register_operand" "y"))
+		(zero_extend:V4SI (match_operand:V4HI 2 "register_operand" "y")))
+	      (const_vector:V4SI [(const_int 32768)
+				  (const_int 32768)
+				  (const_int 32768)
+				  (const_int 32768)]))
+	  (const_int 16))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmulumr%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmul")]
+)
+
+(define_insn "iwmmxt_wmulwsmr"
+  [(set (match_operand:V2SI                           0 "register_operand" "=y")
+	(truncate:V2SI
+	  (ashiftrt:V2DI
+	    (plus:V2DI
+	      (mult:V2DI
+	        (sign_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+		(sign_extend:V2DI (match_operand:V2SI 2 "register_operand" "y")))
+	      (const_vector:V2DI [(const_int 2147483648)
+				  (const_int 2147483648)]))
+	    (const_int 32))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmulwsmr%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmul")]
+)
+
+(define_insn "iwmmxt_wmulwumr"
+  [(set (match_operand:V2SI                           0 "register_operand" "=y")
+	(truncate:V2SI
+	  (lshiftrt:V2DI
+	    (plus:V2DI
+	      (mult:V2DI
+	        (zero_extend:V2DI (match_operand:V2SI 1 "register_operand" "y"))
+		(zero_extend:V2DI (match_operand:V2SI 2 "register_operand" "y")))
+	      (const_vector:V2DI [(const_int 2147483648)
+			          (const_int 2147483648)]))
+	    (const_int 32))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmulwumr%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmulw")]
+)
+
+(define_insn "iwmmxt_wmulwl"
+  [(set (match_operand:V2SI   0 "register_operand" "=y")
+        (mult:V2SI
+          (match_operand:V2SI 1 "register_operand" "y")
+	  (match_operand:V2SI 2 "register_operand" "y")))]
+  "TARGET_REALLY_IWMMXT"
+  "wmulwl%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmulw")]
+)
+
+(define_insn "iwmmxt_wqmulm"
+  [(set (match_operand:V4HI            0 "register_operand" "=y")
+        (unspec:V4HI [(match_operand:V4HI 1 "register_operand" "y")
+		      (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WQMULM))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmulm%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmulm")]
+)
+
+(define_insn "iwmmxt_wqmulwm"
+  [(set (match_operand:V2SI               0 "register_operand" "=y")
+	(unspec:V2SI [(match_operand:V2SI 1 "register_operand" "y")
+		      (match_operand:V2SI 2 "register_operand" "y")] UNSPEC_WQMULWM))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmulwm%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmulwm")]
+)
+
+(define_insn "iwmmxt_wqmulmr"
+  [(set (match_operand:V4HI               0 "register_operand" "=y")
+	(unspec:V4HI [(match_operand:V4HI 1 "register_operand" "y")
+		      (match_operand:V4HI 2 "register_operand" "y")] UNSPEC_WQMULMR))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmulmr%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmulm")]
+)
+
+(define_insn "iwmmxt_wqmulwmr"
+  [(set (match_operand:V2SI            0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V2SI 1 "register_operand" "y")
+		      (match_operand:V2SI 2 "register_operand" "y")] UNSPEC_WQMULWMR))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmulwmr%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmulwm")]
+)
+
+(define_insn "iwmmxt_waddbhusm"
+  [(set (match_operand:V8QI                          0 "register_operand" "=y")
+	(vec_concat:V8QI
+	  (const_vector:V4QI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)])
+	  (us_truncate:V4QI
+	    (ss_plus:V4HI
+	      (match_operand:V4HI                    1 "register_operand" "y")
+	      (zero_extend:V4HI
+	        (vec_select:V4QI (match_operand:V8QI 2 "register_operand" "y")
+	                         (parallel [(const_int 4) (const_int 5) (const_int 6) (const_int 7)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "waddbhusm%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "waddbhus")]
+)
+
+(define_insn "iwmmxt_waddbhusl"
+  [(set (match_operand:V8QI                          0 "register_operand" "=y")
+	(vec_concat:V8QI
+	  (us_truncate:V4QI
+	    (ss_plus:V4HI
+	      (match_operand:V4HI                    1 "register_operand" "y")
+	      (zero_extend:V4HI
+		(vec_select:V4QI (match_operand:V8QI 2 "register_operand" "y")
+		                 (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))))
+	  (const_vector:V4QI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)])))]
+  "TARGET_REALLY_IWMMXT"
+  "waddbhusl%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "waddbhus")]
+)
+
+(define_insn "iwmmxt_wqmiabb"
+  [(set (match_operand:V2SI	                             0 "register_operand" "=y")
+	(unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+		      (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 0))
+		      (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 32))
+		      (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 0))
+		      (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 32))] UNSPEC_WQMIAxy))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiabb%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wqmiabt"
+  [(set (match_operand:V2SI	                             0 "register_operand" "=y")
+	(unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+	              (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 0))
+		      (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 32))
+		      (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 16))
+		      (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 48))] UNSPEC_WQMIAxy))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiabt%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wqmiatb"
+  [(set (match_operand:V2SI                                  0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+	              (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 16))
+	              (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 48))
+	              (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 0))
+	              (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 32))] UNSPEC_WQMIAxy))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiatb%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wqmiatt"
+  [(set (match_operand:V2SI                                  0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+	              (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 16))
+	              (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 48))
+	              (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 16))
+	              (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 48))] UNSPEC_WQMIAxy))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiatt%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wqmiabbn"
+  [(set (match_operand:V2SI                                  0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+                      (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 0))
+	              (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 32))
+	              (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 0))
+	              (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 32))] UNSPEC_WQMIAxyn))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiabbn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wqmiabtn"
+  [(set (match_operand:V2SI                                  0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+                      (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 0))
+	              (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 32))
+	              (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 16))
+	              (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 48))] UNSPEC_WQMIAxyn))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiabtn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wqmiatbn"
+  [(set (match_operand:V2SI                                  0 "register_operand" "=y")
+        (unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+                      (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 16))
+	              (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 48))
+	              (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 0))
+	              (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 32))] UNSPEC_WQMIAxyn))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiatbn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wqmiattn"
+ [(set (match_operand:V2SI                                  0 "register_operand" "=y")
+       (unspec:V2SI [(match_operand:V2SI                    1 "register_operand" "0")
+                     (zero_extract:V4HI (match_operand:V4HI 2 "register_operand" "y") (const_int 16) (const_int 16))
+	             (zero_extract:V4HI (match_dup 2) (const_int 16) (const_int 48))
+	             (zero_extract:V4HI (match_operand:V4HI 3 "register_operand" "y") (const_int 16) (const_int 16))
+	             (zero_extract:V4HI (match_dup 3) (const_int 16) (const_int 48))] UNSPEC_WQMIAxyn))]
+  "TARGET_REALLY_IWMMXT"
+  "wqmiattn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wqmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiabb"
+  [(set	(match_operand:DI	                          0 "register_operand" "=y")
+	(plus:DI (match_operand:DI	                  1 "register_operand" "0")
+		 (plus:DI
+		   (mult:DI
+		     (sign_extend:DI
+		       (vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+				      (parallel [(const_int 0)])))
+		     (sign_extend:DI
+		       (vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+				      (parallel [(const_int 0)]))))
+		   (mult:DI
+		     (sign_extend:DI
+		       (vec_select:HI (match_dup 2)
+			              (parallel [(const_int 2)])))
+		     (sign_extend:DI
+		       (vec_select:HI (match_dup 3)
+				      (parallel [(const_int 2)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiabb%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiabt"
+  [(set	(match_operand:DI	                          0 "register_operand" "=y")
+	(plus:DI (match_operand:DI	                  1 "register_operand" "0")
+		 (plus:DI
+		   (mult:DI
+		     (sign_extend:DI
+		       (vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+				      (parallel [(const_int 0)])))
+		     (sign_extend:DI
+		       (vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+				      (parallel [(const_int 1)]))))
+		   (mult:DI
+		     (sign_extend:DI
+		       (vec_select:HI (match_dup 2)
+				      (parallel [(const_int 2)])))
+		     (sign_extend:DI
+		       (vec_select:HI (match_dup 3)
+				      (parallel [(const_int 3)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiabt%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiatb"
+  [(set	(match_operand:DI	                          0 "register_operand" "=y")
+	(plus:DI (match_operand:DI	                  1 "register_operand" "0")
+		 (plus:DI
+		   (mult:DI
+		     (sign_extend:DI
+		       (vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+				      (parallel [(const_int 1)])))
+		     (sign_extend:DI
+		       (vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+				      (parallel [(const_int 0)]))))
+		   (mult:DI
+		     (sign_extend:DI
+		       (vec_select:HI (match_dup 2)
+				      (parallel [(const_int 3)])))
+		     (sign_extend:DI
+		       (vec_select:HI (match_dup 3)
+				      (parallel [(const_int 2)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiatb%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiatt"
+  [(set	(match_operand:DI	                   0 "register_operand" "=y")
+        (plus:DI (match_operand:DI	           1 "register_operand" "0")
+          (plus:DI
+            (mult:DI
+              (sign_extend:DI
+                (vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+	                       (parallel [(const_int 1)])))
+	      (sign_extend:DI
+	        (vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+	                       (parallel [(const_int 1)]))))
+            (mult:DI
+	      (sign_extend:DI
+                (vec_select:HI (match_dup 2)
+	                       (parallel [(const_int 3)])))
+              (sign_extend:DI
+                (vec_select:HI (match_dup 3)
+	                       (parallel [(const_int 3)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiatt%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiabbn"
+  [(set	(match_operand:DI	                           0 "register_operand" "=y")
+	(minus:DI (match_operand:DI	                   1 "register_operand" "0")
+		  (plus:DI
+		    (mult:DI
+		      (sign_extend:DI
+			(vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+				       (parallel [(const_int 0)])))
+		      (sign_extend:DI
+		        (vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+				       (parallel [(const_int 0)]))))
+		    (mult:DI
+		      (sign_extend:DI
+			(vec_select:HI (match_dup 2)
+				       (parallel [(const_int 2)])))
+		      (sign_extend:DI
+		        (vec_select:HI (match_dup 3)
+				       (parallel [(const_int 2)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiabbn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiabtn"
+  [(set	(match_operand:DI	                           0 "register_operand" "=y")
+	(minus:DI (match_operand:DI	                   1 "register_operand" "0")
+		  (plus:DI
+		    (mult:DI
+		      (sign_extend:DI
+			(vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+				       (parallel [(const_int 0)])))
+		      (sign_extend:DI
+		        (vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+				       (parallel [(const_int 1)]))))
+		    (mult:DI
+		      (sign_extend:DI
+		        (vec_select:HI (match_dup 2)
+				       (parallel [(const_int 2)])))
+		      (sign_extend:DI
+			(vec_select:HI (match_dup 3)
+				       (parallel [(const_int 3)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiabtn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiatbn"
+  [(set (match_operand:DI	                           0 "register_operand" "=y")
+	(minus:DI (match_operand:DI	                   1 "register_operand" "0")
+		  (plus:DI
+		    (mult:DI
+		      (sign_extend:DI
+			(vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+				       (parallel [(const_int 1)])))
+		      (sign_extend:DI
+		        (vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+				       (parallel [(const_int 0)]))))
+		    (mult:DI
+		      (sign_extend:DI
+		        (vec_select:HI (match_dup 2)
+				       (parallel [(const_int 3)])))
+		      (sign_extend:DI
+			(vec_select:HI (match_dup 3)
+				       (parallel [(const_int 2)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiatbn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiattn"
+  [(set (match_operand:DI	                           0 "register_operand" "=y")
+	(minus:DI (match_operand:DI	                   1 "register_operand" "0")
+		  (plus:DI
+		    (mult:DI
+		      (sign_extend:DI
+			(vec_select:HI (match_operand:V4HI 2 "register_operand" "y")
+				       (parallel [(const_int 1)])))
+		      (sign_extend:DI
+			(vec_select:HI (match_operand:V4HI 3 "register_operand" "y")
+				       (parallel [(const_int 1)]))))
+		    (mult:DI
+		      (sign_extend:DI
+			(vec_select:HI (match_dup 2)
+				       (parallel [(const_int 3)])))
+		      (sign_extend:DI
+			(vec_select:HI (match_dup 3)
+				       (parallel [(const_int 3)])))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiattn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiaxy")]
+)
+
+(define_insn "iwmmxt_wmiawbb"
+  [(set (match_operand:DI	0 "register_operand" "=y")
+	(plus:DI
+	  (match_operand:DI      1 "register_operand" "0")
+	  (mult:DI
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 0)])))
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 0)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawbb%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmiawbt"
+  [(set (match_operand:DI	                               0 "register_operand" "=y")
+	(plus:DI
+	  (match_operand:DI                                    1 "register_operand" "0")
+	  (mult:DI
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 0)])))
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 1)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawbt%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmiawtb"
+  [(set (match_operand:DI	                               0 "register_operand" "=y")
+	(plus:DI
+	  (match_operand:DI                                    1 "register_operand" "0")
+	  (mult:DI
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 1)])))
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 0)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawtb%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmiawtt"
+[(set (match_operand:DI	                                     0 "register_operand" "=y")
+      (plus:DI
+	(match_operand:DI                                    1 "register_operand" "0")
+	(mult:DI
+	  (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 1)])))
+	  (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 1)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawtt%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmiawbbn"
+  [(set (match_operand:DI	                               0 "register_operand" "=y")
+	(minus:DI
+	  (match_operand:DI                                    1 "register_operand" "0")
+	  (mult:DI
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 0)])))
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 0)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawbbn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmiawbtn"
+  [(set (match_operand:DI	                               0 "register_operand" "=y")
+	(minus:DI
+	  (match_operand:DI                                    1 "register_operand" "0")
+	  (mult:DI
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 0)])))
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 1)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawbtn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmiawtbn"
+  [(set (match_operand:DI	                               0 "register_operand" "=y")
+	(minus:DI
+	  (match_operand:DI                                    1 "register_operand" "0")
+	  (mult:DI
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 1)])))
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 0)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawtbn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmiawttn"
+  [(set (match_operand:DI	                               0 "register_operand" "=y")
+	(minus:DI
+	  (match_operand:DI                                    1 "register_operand" "0")
+	  (mult:DI
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 2 "register_operand" "y") (parallel [(const_int 1)])))
+	    (sign_extend:DI (vec_select:SI (match_operand:V2SI 3 "register_operand" "y") (parallel [(const_int 1)]))))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmiawttn%?\\t%0, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmiawxy")]
+)
+
+(define_insn "iwmmxt_wmerge"
+  [(set (match_operand:DI         0 "register_operand" "=y")
+	(ior:DI
+	  (ashift:DI
+	    (match_operand:DI     2 "register_operand" "y")
+	    (minus:SI
+	      (const_int 64)
+	      (mult:SI
+	        (match_operand:SI 3 "immediate_operand" "i")
+		(const_int 8))))
+	  (lshiftrt:DI
+	    (ashift:DI
+	      (match_operand:DI   1 "register_operand" "y")
+	      (mult:SI
+	        (match_dup 3)
+		(const_int 8)))
+	    (mult:SI
+	      (match_dup 3)
+	      (const_int 8)))))]
+  "TARGET_REALLY_IWMMXT"
+  "wmerge%?\\t%0, %1, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "wmerge")]
+)
+
+(define_insn "iwmmxt_tandc<mode>3"
+  [(set (reg:CC CC_REGNUM)
+	(subreg:CC (unspec:VMMX [(const_int 0)] UNSPEC_TANDC) 0))
+   (unspec:CC [(reg:SI 15)] UNSPEC_TANDC)]
+  "TARGET_REALLY_IWMMXT"
+  "tandc<MMX_char>%?\\t r15"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "tandc")]
+)
+
+(define_insn "iwmmxt_torc<mode>3"
+  [(set (reg:CC CC_REGNUM)
+	(subreg:CC (unspec:VMMX [(const_int 0)] UNSPEC_TORC) 0))
+   (unspec:CC [(reg:SI 15)] UNSPEC_TORC)]
+  "TARGET_REALLY_IWMMXT"
+  "torc<MMX_char>%?\\t r15"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "torc")]
+)
+
+(define_insn "iwmmxt_torvsc<mode>3"
+  [(set (reg:CC CC_REGNUM)
+	(subreg:CC (unspec:VMMX [(const_int 0)] UNSPEC_TORVSC) 0))
+   (unspec:CC [(reg:SI 15)] UNSPEC_TORVSC)]
+  "TARGET_REALLY_IWMMXT"
+  "torvsc<MMX_char>%?\\t r15"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "torvsc")]
+)
+
+(define_insn "iwmmxt_textrc<mode>3"
+  [(set (reg:CC CC_REGNUM)
+	(subreg:CC (unspec:VMMX [(const_int 0)
+		                 (match_operand:SI 0 "immediate_operand" "i")] UNSPEC_TEXTRC) 0))
+   (unspec:CC [(reg:SI 15)] UNSPEC_TEXTRC)]
+  "TARGET_REALLY_IWMMXT"
+  "textrc<MMX_char>%?\\t r15, %0"
+  [(set_attr "predicable" "yes")
+   (set_attr "wtype" "textrc")]
+)
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index fa2027c..8334b2b 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -493,6 +493,11 @@
   (and (match_code "const_int")
        (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) < 64")))
 
+;; iWMMXt predicates
+
+(define_predicate "imm_or_reg_operand"
+  (ior (match_operand 0 "immediate_operand")
+       (match_operand 0 "register_operand")))
 
 ;; Neon predicates
 
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 1128d19..83c18f7 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -49,6 +49,7 @@ MD_INCLUDES=	$(srcdir)/config/arm/arm1020e.md \
 		$(srcdir)/config/arm/fpa.md \
 		$(srcdir)/config/arm/iterators.md \
 		$(srcdir)/config/arm/iwmmxt.md \
+		$(srcdir)/config/arm/iwmmxt2.md \
 		$(srcdir)/config/arm/ldmstm.md \
 		$(srcdir)/config/arm/neon.md \
 		$(srcdir)/config/arm/predicates.md \
-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH ARM iWMMXt 2/5] intrinsic head file change
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
                   ` (2 preceding siblings ...)
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 1/5] ARM code generic change Matt Turner
@ 2012-05-29  4:15 ` Matt Turner
  2012-06-06 12:22   ` Ramana Radhakrishnan
  2012-05-29  4:15 ` [PATCH ARM iWMMXt 4/5] WMMX machine description Matt Turner
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Matt Turner @ 2012-05-29  4:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, Nick Clifton, Paul Brook,
	Xinyu Qi

From: Xinyu Qi <xyqi@marvell.com>

	gcc/
	* config/arm/mmintrin.h: Use __IWMMXT__ to enable iWMMXt intrinsics.
	Use __IWMMXT2__ to enable iWMMXt2 intrinsics.
	Use C name-mangling for intrinsics.
	(__v8qi): Redefine.
	(_mm_cvtsi32_si64, _mm_andnot_si64, _mm_sad_pu8): Revise.
	(_mm_sad_pu16, _mm_align_si64, _mm_setwcx, _mm_getwcx): Likewise.
	(_m_from_int): Likewise.
	(_mm_sada_pu8, _mm_sada_pu16): New intrinsic.
	(_mm_alignr0_si64, _mm_alignr1_si64, _mm_alignr2_si64): Likewise.
	(_mm_alignr3_si64, _mm_tandcb, _mm_tandch, _mm_tandcw): Likewise.
	(_mm_textrcb, _mm_textrch, _mm_textrcw, _mm_torcb): Likewise.
	(_mm_torch, _mm_torcw, _mm_tbcst_pi8, _mm_tbcst_pi16): Likewise.
	(_mm_tbcst_pi32): Likewise.
	(_mm_abs_pi8, _mm_abs_pi16, _mm_abs_pi32): New iWMMXt2 intrinsic.
	(_mm_addsubhx_pi16, _mm_absdiff_pu8, _mm_absdiff_pu16): Likewise.
	(_mm_absdiff_pu32, _mm_addc_pu16, _mm_addc_pu32): Likewise.
	(_mm_avg4_pu8, _mm_avg4r_pu8, _mm_maddx_pi16, _mm_maddx_pu16): Likewise.
	(_mm_msub_pi16, _mm_msub_pu16, _mm_mulhi_pi32): Likewise.
	(_mm_mulhi_pu32, _mm_mulhir_pi16, _mm_mulhir_pi32): Likewise.
	(_mm_mulhir_pu16, _mm_mulhir_pu32, _mm_mullo_pi32): Likewise.
	(_mm_qmulm_pi16, _mm_qmulm_pi32, _mm_qmulmr_pi16): Likewise.
	(_mm_qmulmr_pi32, _mm_subaddhx_pi16, _mm_addbhusl_pu8): Likewise.
	(_mm_addbhusm_pu8, _mm_qmiabb_pi32, _mm_qmiabbn_pi32): Likewise.
	(_mm_qmiabt_pi32, _mm_qmiabtn_pi32, _mm_qmiatb_pi32): Likewise.
	(_mm_qmiatbn_pi32, _mm_qmiatt_pi32, _mm_qmiattn_pi32): Likewise.
	(_mm_wmiabb_si64, _mm_wmiabbn_si64, _mm_wmiabt_si64): Likewise.
	(_mm_wmiabtn_si64, _mm_wmiatb_si64, _mm_wmiatbn_si64): Likewise.
	(_mm_wmiatt_si64, _mm_wmiattn_si64, _mm_wmiawbb_si64): Likewise.
	(_mm_wmiawbbn_si64, _mm_wmiawbt_si64, _mm_wmiawbtn_si64): Likewise.
	(_mm_wmiawtb_si64, _mm_wmiawtbn_si64, _mm_wmiawtt_si64): Likewise.
	(_mm_wmiawttn_si64, _mm_merge_si64): Likewise.
	(_mm_torvscb, _mm_torvsch, _mm_torvscw): Likewise.
	(_m_to_int): New define.
---
 gcc/config/arm/mmintrin.h |  649 ++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 614 insertions(+), 35 deletions(-)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index 2cc500d..0fe551d 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -24,16 +24,30 @@
 #ifndef _MMINTRIN_H_INCLUDED
 #define _MMINTRIN_H_INCLUDED
 
+#ifndef __IWMMXT__
+#error You must enable WMMX/WMMX2 instructions (e.g. -march=iwmmxt or -march=iwmmxt2) to use iWMMXt/iWMMXt2 intrinsics
+#else
+
+#ifndef __IWMMXT2__
+#warning You only enable iWMMXt intrinsics. Extended iWMMXt2 intrinsics available only if WMMX2 instructions enabled (e.g. -march=iwmmxt2)
+#endif
+
+
+#if defined __cplusplus
+extern "C" { /* Begin "C" */
+/* Intrinsics use C name-mangling.  */
+#endif /* __cplusplus */
+
 /* The data type intended for user use.  */
 typedef unsigned long long __m64, __int64;
 
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
-typedef char __v8qi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
 
 /* "Convert" __m64 and __int64 into each other.  */
-static __inline __m64 
+static __inline __m64
 _mm_cvtsi64_m64 (__int64 __i)
 {
   return __i;
@@ -54,7 +68,7 @@ _mm_cvtsi64_si32 (__int64 __i)
 static __inline __int64
 _mm_cvtsi32_si64 (int __i)
 {
-  return __i;
+  return (__i & 0xffffffff);
 }
 
 /* Pack the four 16-bit values from M1 into the lower four 8-bit values of
@@ -603,7 +617,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2)
 static __inline __m64
 _mm_andnot_si64 (__m64 __m1, __m64 __m2)
 {
-  return __builtin_arm_wandn (__m1, __m2);
+  return __builtin_arm_wandn (__m2, __m1);
 }
 
 /* Bit-wise inclusive OR the 64-bit values in M1 and M2.  */
@@ -935,7 +949,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu8 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B);
+  return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B);
+}
+
+static __inline __m64
+_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8qi)__C);
 }
 
 /* Compute the sum of the absolute differences of the unsigned 16-bit
@@ -944,9 +964,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu16 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B);
+  return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
+static __inline __m64
+_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4hi)__C);
+}
+
+
 /* Compute the sum of the absolute differences of the unsigned 8-bit
    values in A and B.  Return the value in the lower 16-bit word; the
    upper words are cleared.  */
@@ -965,11 +992,8 @@ _mm_sadz_pu16 (__m64 __A, __m64 __B)
   return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
-static __inline __m64
-_mm_align_si64 (__m64 __A, __m64 __B, int __C)
-{
-  return (__m64) __builtin_arm_walign ((__v8qi)__A, (__v8qi)__B, __C);
-}
+#define _mm_align_si64(__A,__B, N) \
+  (__m64) __builtin_arm_walign ((__v8qi) (__A),(__v8qi) (__B), (N))
 
 /* Creates a 64-bit zero.  */
 static __inline __m64
@@ -987,42 +1011,76 @@ _mm_setwcx (const int __value, const int __regno)
 {
   switch (__regno)
     {
-    case 0:  __builtin_arm_setwcx (__value, 0); break;
-    case 1:  __builtin_arm_setwcx (__value, 1); break;
-    case 2:  __builtin_arm_setwcx (__value, 2); break;
-    case 3:  __builtin_arm_setwcx (__value, 3); break;
-    case 8:  __builtin_arm_setwcx (__value, 8); break;
-    case 9:  __builtin_arm_setwcx (__value, 9); break;
-    case 10: __builtin_arm_setwcx (__value, 10); break;
-    case 11: __builtin_arm_setwcx (__value, 11); break;
-    default: break;
+    case 0:
+      __asm __volatile ("tmcr wcid, %0" :: "r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmcr wcon, %0" :: "r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmcr wcssf, %0" :: "r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmcr wcasf, %0" :: "r"(__value));
+      break;
+    case 8:
+      __builtin_arm_setwcgr0 (__value);
+      break;
+    case 9:
+      __builtin_arm_setwcgr1 (__value);
+      break;
+    case 10:
+      __builtin_arm_setwcgr2 (__value);
+      break;
+    case 11:
+      __builtin_arm_setwcgr3 (__value);
+      break;
+    default:
+      break;
     }
 }
 
 static __inline int
 _mm_getwcx (const int __regno)
 {
+  int __value;
   switch (__regno)
     {
-    case 0:  return __builtin_arm_getwcx (0);
-    case 1:  return __builtin_arm_getwcx (1);
-    case 2:  return __builtin_arm_getwcx (2);
-    case 3:  return __builtin_arm_getwcx (3);
-    case 8:  return __builtin_arm_getwcx (8);
-    case 9:  return __builtin_arm_getwcx (9);
-    case 10: return __builtin_arm_getwcx (10);
-    case 11: return __builtin_arm_getwcx (11);
-    default: return 0;
+    case 0:
+      __asm __volatile ("tmrc %0, wcid" : "=r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmrc %0, wcon" : "=r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmrc %0, wcssf" : "=r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmrc %0, wcasf" : "=r"(__value));
+      break;
+    case 8:
+      return __builtin_arm_getwcgr0 ();
+    case 9:
+      return __builtin_arm_getwcgr1 ();
+    case 10:
+      return __builtin_arm_getwcgr2 ();
+    case 11:
+      return __builtin_arm_getwcgr3 ();
+    default:
+      break;
     }
+  return __value;
 }
 
 /* Creates a vector of two 32-bit values; I0 is least significant.  */
 static __inline __m64
 _mm_set_pi32 (int __i1, int __i0)
 {
-  union {
+  union
+  {
     __m64 __q;
-    struct {
+    struct
+    {
       unsigned int __i0;
       unsigned int __i1;
     } __s;
@@ -1041,7 +1099,7 @@ _mm_set_pi16 (short __w3, short __w2, short __w1, short __w0)
   unsigned int __i1 = (unsigned short)__w3 << 16 | (unsigned short)__w2;
   unsigned int __i0 = (unsigned short)__w1 << 16 | (unsigned short)__w0;
   return _mm_set_pi32 (__i1, __i0);
-		       
+
 }
 
 /* Creates a vector of eight 8-bit values; B0 is least significant.  */
@@ -1108,11 +1166,526 @@ _mm_set1_pi8 (char __b)
   return _mm_set1_pi32 (__i);
 }
 
-/* Convert an integer to a __m64 object.  */
+#ifdef __IWMMXT2__
+static __inline __m64
+_mm_abs_pi8 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsb ((__v8qi)m1);
+}
+
+static __inline __m64
+_mm_abs_pi16 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsh ((__v4hi)m1);
+
+}
+
+static __inline __m64
+_mm_abs_pi32 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsw ((__v2si)m1);
+
+}
+
+static __inline __m64
+_mm_addsubhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddsubhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffb ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffh ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffw ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_addc_pu16 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddhc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_addc_pu32 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddwc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_avg4_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4 ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_avg4r_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4r ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_maddx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_maddx_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddux ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsn ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddun ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhi_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhi_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwum ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulsmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulumr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwumr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mullo_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwl ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulm ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_subaddhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wsubaddhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_addbhusl_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusl ((__v4hi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_addbhusm_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusm ((__v4hi)a, (__v8qi)b);
+}
+
+#define _mm_qmiabb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabtn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc=acc;\
+   __m64 _m1=m1;\
+   __m64 _m2=m2;\
+   _acc = (__m64) __builtin_arm_wqmiabtn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiattn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiattn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabtn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiattn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiattn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbtn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawttn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawttn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+/* The third arguments should be an immediate.  */
+#define _mm_merge_si64(a, b, n) \
+  ({\
+   __m64 result;\
+   result = (__m64) __builtin_arm_wmerge ((__m64) (a), (__m64) (b), (n));\
+   result;\
+   })
+#endif  /* __IWMMXT2__ */
+
+static __inline __m64
+_mm_alignr0_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr0 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr1_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr1 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr2_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr2 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr3_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr3 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline void
+_mm_tandcb ()
+{
+  __asm __volatile ("tandcb r15");
+}
+
+static __inline void
+_mm_tandch ()
+{
+  __asm __volatile ("tandch r15");
+}
+
+static __inline void
+_mm_tandcw ()
+{
+  __asm __volatile ("tandcw r15");
+}
+
+#define _mm_textrcb(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcb r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrch(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrch r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrcw(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcw r15, %0" : : "i" (n));\
+   })
+
+static __inline void
+_mm_torcb ()
+{
+  __asm __volatile ("torcb r15");
+}
+
+static __inline void
+_mm_torch ()
+{
+  __asm __volatile ("torch r15");
+}
+
+static __inline void
+_mm_torcw ()
+{
+  __asm __volatile ("torcw r15");
+}
+
+#ifdef __IWMMXT2__
+static __inline void
+_mm_torvscb ()
+{
+  __asm __volatile ("torvscb r15");
+}
+
+static __inline void
+_mm_torvsch ()
+{
+  __asm __volatile ("torvsch r15");
+}
+
+static __inline void
+_mm_torvscw ()
+{
+  __asm __volatile ("torvscw r15");
+}
+#endif
+
+static __inline __m64
+_mm_tbcst_pi8 (int value)
+{
+  return (__m64) __builtin_arm_tbcstb ((signed char) value);
+}
+
+static __inline __m64
+_mm_tbcst_pi16 (int value)
+{
+  return (__m64) __builtin_arm_tbcsth ((short) value);
+}
+
 static __inline __m64
-_m_from_int (int __a)
+_mm_tbcst_pi32 (int value)
 {
-  return (__m64)__a;
+  return (__m64) __builtin_arm_tbcstw (value);
 }
 
 #define _m_packsswb _mm_packs_pi16
@@ -1250,5 +1823,11 @@ _m_from_int (int __a)
 #define _m_paligniq _mm_align_si64
 #define _m_cvt_si2pi _mm_cvtsi64_m64
 #define _m_cvt_pi2si _mm_cvtm64_si64
+#define _m_from_int _mm_cvtsi32_si64
+#define _m_to_int _mm_cvtsi64_si32
 
+#if defined __cplusplus
+}; /* End "C" */
+#endif /* __cplusplus */
+#endif /* __IWMMXT__ */
 #endif /* _MMINTRIN_H_INCLUDED */
-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 1/5] ARM code generic change
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 1/5] ARM code generic change Matt Turner
@ 2012-06-06 11:53   ` Ramana Radhakrishnan
  2012-12-27  2:31     ` [PATCH, ARM, iWMMXT] Fix define_constants for WCGR Xinyu Qi
  2013-01-22  9:22     ` [PING][PATCH, " Xinyu Qi
  0 siblings, 2 replies; 33+ messages in thread
From: Ramana Radhakrishnan @ 2012-06-06 11:53 UTC (permalink / raw)
  To: Matt Turner
  Cc: gcc-patches, Richard Earnshaw, Nick Clifton, Paul Brook, Xinyu Qi

On 29 May 2012 05:13, Matt Turner <mattst88@gmail.com> wrote:
> From: Xinyu Qi <xyqi@marvell.com>
>
>        gcc/
>        * config/arm/arm.c (FL_IWMMXT2): New define.
>        (arm_arch_iwmmxt2): New variable.
>        (arm_option_override): Enable use of iWMMXt with VFP.
>        Disable use of iWMMXt with NEON. Disable use of iWMMXt under
>        Thumb mode. Set arm_arch_iwmmxt2.
>        (arm_expand_binop_builtin): Accept VOIDmode op.
>        * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define __IWMMXT2__.
>        (TARGET_IWMMXT2): New define.
>        (TARGET_REALLY_IWMMXT2): Likewise.
>        (arm_arch_iwmmxt2): Declare.
>        * config/arm/arm-cores.def (iwmmxt2): Add FL_IWMMXT2.
>        * config/arm/arm-arches.def (iwmmxt2): Likewise.
>        * config/arm/arm.md (arch): Add "iwmmxt2".
>        (arch_enabled): Handle "iwmmxt2".
> ---
>  gcc/config/arm/arm-arches.def |    2 +-
>  gcc/config/arm/arm-cores.def  |    2 +-
>  gcc/config/arm/arm.c          |   25 +++++++++++++++++--------
>  gcc/config/arm/arm.h          |    7 +++++++
>  gcc/config/arm/arm.md         |    6 +++++-
>  5 files changed, 31 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
> index 3123426..f4dd6cc 100644
> --- a/gcc/config/arm/arm-arches.def
> +++ b/gcc/config/arm/arm-arches.def
> @@ -57,4 +57,4 @@ ARM_ARCH("armv7-m", cortexm3, 7M,  FL_CO_PROC |             FL_FOR_ARCH7M)
>  ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |            FL_FOR_ARCH7EM)
>  ARM_ARCH("ep9312",  ep9312,     4T,  FL_LDSCHED | FL_CIRRUS | FL_FOR_ARCH4)
>  ARM_ARCH("iwmmxt",  iwmmxt,     5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
> -ARM_ARCH("iwmmxt2", iwmmxt2,    5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
> +ARM_ARCH("iwmmxt2", iwmmxt2,    5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
> index d82b10b..c82eada 100644
> --- a/gcc/config/arm/arm-cores.def
> +++ b/gcc/config/arm/arm-cores.def
> @@ -105,7 +105,7 @@ ARM_CORE("arm1020e",      arm1020e, 5TE,                             FL_LDSCHED, fastmul)
>  ARM_CORE("arm1022e",      arm1022e,    5TE,                             FL_LDSCHED, fastmul)
>  ARM_CORE("xscale",        xscale,      5TE,                             FL_LDSCHED | FL_STRONG | FL_XSCALE, xscale)
>  ARM_CORE("iwmmxt",        iwmmxt,      5TE,                             FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
> -ARM_CORE("iwmmxt2",       iwmmxt2,     5TE,                             FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
> +ARM_CORE("iwmmxt2",       iwmmxt2,     5TE,                             FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2, xscale)
>  ARM_CORE("fa606te",       fa606te,      5TE,                             FL_LDSCHED, 9e)
>  ARM_CORE("fa626te",       fa626te,      5TE,                             FL_LDSCHED, 9e)
>  ARM_CORE("fmp626",        fmp626,       5TE,                             FL_LDSCHED, 9e)
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 7a98197..b0680ab 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -685,6 +685,7 @@ static int thumb_call_reg_needed;
>  #define FL_ARM_DIV    (1 << 23)              /* Hardware divide (ARM mode).  */
>
>  #define FL_IWMMXT     (1 << 29)              /* XScale v2 or "Intel Wireless MMX technology".  */
> +#define FL_IWMMXT2    (1 << 30)       /* "Intel Wireless MMX2 technology".  */
>
>  /* Flags that only effect tuning, not available instructions.  */
>  #define FL_TUNE                (FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
> @@ -766,6 +767,9 @@ int arm_arch_cirrus = 0;
>  /* Nonzero if this chip supports Intel Wireless MMX technology.  */
>  int arm_arch_iwmmxt = 0;
>
> +/* Nonzero if this chip supports Intel Wireless MMX2 technology.  */
> +int arm_arch_iwmmxt2 = 0;
> +
>  /* Nonzero if this chip is an XScale.  */
>  int arm_arch_xscale = 0;
>
> @@ -1717,6 +1721,7 @@ arm_option_override (void)
>   arm_tune_wbuf = (tune_flags & FL_WBUF) != 0;
>   arm_tune_xscale = (tune_flags & FL_XSCALE) != 0;
>   arm_arch_iwmmxt = (insn_flags & FL_IWMMXT) != 0;
> +  arm_arch_iwmmxt2 = (insn_flags & FL_IWMMXT2) != 0;
>   arm_arch_thumb_hwdiv = (insn_flags & FL_THUMB_DIV) != 0;
>   arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
>   arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
> @@ -1817,14 +1822,17 @@ arm_option_override (void)
>     }
>
>   /* FPA and iWMMXt are incompatible because the insn encodings overlap.
> -     VFP and iWMMXt can theoretically coexist, but it's unlikely such silicon
> -     will ever exist.  GCC makes no attempt to support this combination.  */
> -  if (TARGET_IWMMXT && !TARGET_SOFT_FLOAT)
> -    sorry ("iWMMXt and hardware floating point");
> +     VFP and iWMMXt however can coexist.  */
> +  if (TARGET_IWMMXT && TARGET_HARD_FLOAT && !TARGET_VFP)
> +    error ("iWMMXt and non-VFP floating point unit are incompatible");
> +
> +  /* iWMMXt and NEON are incompatible.  */
> +  if (TARGET_IWMMXT && TARGET_NEON)
> +    error ("iWMMXt and NEON are incompatible");
>
> -  /* ??? iWMMXt insn patterns need auditing for Thumb-2.  */
> -  if (TARGET_THUMB2 && TARGET_IWMMXT)
> -    sorry ("Thumb-2 iWMMXt");
> +  /* iWMMXt unsupported under Thumb mode.  */
> +  if (TARGET_THUMB && TARGET_IWMMXT)
> +    error ("iWMMXt unsupported under Thumb mode");
>
>   /* __fp16 support currently assumes the core has ldrh.  */
>   if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE)
> @@ -20867,7 +20875,8 @@ arm_expand_binop_builtin (enum insn_code icode,
>       || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
>     target = gen_reg_rtx (tmode);
>
> -  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
> +  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
> +             && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
>
>   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
>     op0 = copy_to_mode_reg (mode0, op0);
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index f4204e4..c51bce9 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -97,6 +97,8 @@ extern char arm_arch_name[];
>          builtin_define ("__XSCALE__");                \
>        if (arm_arch_iwmmxt)                            \
>          builtin_define ("__IWMMXT__");                \
> +       if (arm_arch_iwmmxt2)                           \
> +         builtin_define ("__IWMMXT2__");               \
>        if (TARGET_AAPCS_BASED)                         \
>          {                                             \
>            if (arm_pcs_default == ARM_PCS_AAPCS_VFP)   \
> @@ -194,7 +196,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
>  #define TARGET_MAVERICK                (arm_fpu_desc->model == ARM_FP_MODEL_MAVERICK)
>  #define TARGET_VFP             (arm_fpu_desc->model == ARM_FP_MODEL_VFP)
>  #define TARGET_IWMMXT                  (arm_arch_iwmmxt)
> +#define TARGET_IWMMXT2                 (arm_arch_iwmmxt2)
>  #define TARGET_REALLY_IWMMXT           (TARGET_IWMMXT && TARGET_32BIT)
> +#define TARGET_REALLY_IWMMXT2          (TARGET_IWMMXT2 && TARGET_32BIT)
>  #define TARGET_IWMMXT_ABI (TARGET_32BIT && arm_abi == ARM_ABI_IWMMXT)
>  #define TARGET_ARM                      (! TARGET_THUMB)
>  #define TARGET_EITHER                  1 /* (TARGET_ARM | TARGET_THUMB) */
> @@ -410,6 +414,9 @@ extern int arm_arch_cirrus;
>  /* Nonzero if this chip supports Intel XScale with Wireless MMX technology.  */
>  extern int arm_arch_iwmmxt;
>
> +/* Nonzero if this chip supports Intel Wireless MMX2 technology.  */
> +extern int arm_arch_iwmmxt2;
> +
>  /* Nonzero if this chip is an XScale.  */
>  extern int arm_arch_xscale;
>
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index bbf6380..ad9d948 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -197,7 +197,7 @@
>  ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
>  ; arm_arch6.  This attribute is used to compute attribute "enabled",
>  ; use type "any" to enable an alternative in all cases.
> -(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8"
> +(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2"
>   (const_string "any"))
>
>  (define_attr "arch_enabled" "no,yes"
> @@ -248,6 +248,10 @@
>         (and (eq_attr "arch" "neon_nota8")
>              (not (eq_attr "tune" "cortexa8"))
>              (match_test "TARGET_NEON"))
> +        (const_string "yes")
> +

Unnecessary new line here.

> +        (and (eq_attr "arch" "iwmmxt2")
> +             (match_test "TARGET_REALLY_IWMMXT2"))
>         (const_string "yes")]
>        (const_string "no")))


Given that we already have iwmmxt2 as a CPU it isn't really changing
behaviour. OK with that change.


regards,
Ramana

>
> --
> 1.7.3.4
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 3/5] built in define and expand
  2012-05-29  4:14 ` [PATCH ARM iWMMXt 3/5] built in define and expand Matt Turner
@ 2012-06-06 11:55   ` Ramana Radhakrishnan
  0 siblings, 0 replies; 33+ messages in thread
From: Ramana Radhakrishnan @ 2012-06-06 11:55 UTC (permalink / raw)
  To: Matt Turner
  Cc: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw,
	Nick Clifton, Paul Brook, Xinyu Qi

On 29 May 2012 05:13, Matt Turner <mattst88@gmail.com> wrote:
> From: Xinyu Qi <xyqi@marvell.com>
>
>        gcc/
>        * config/arm/arm.c (enum arm_builtins): Revise built-in fcode.
>        (IWMMXT2_BUILTIN): New define.
>        (IWMMXT2_BUILTIN2): Likewise.
>        (iwmmx2_mbuiltin): Likewise.
>        (builtin_description bdesc_2arg): Revise built in declaration.
>        (builtin_description bdesc_1arg): Likewise.
>        (arm_init_iwmmxt_builtins): Revise built in initialization.
>        (arm_expand_builtin): Revise built in expansion.
> ---
>  gcc/config/arm/arm.c |  620 +++++++++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 559 insertions(+), 61 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index b0680ab..51eed40 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -19637,8 +19637,15 @@ static neon_builtin_datum neon_builtin_data[] =
>    FIXME?  */
>  enum arm_builtins
>  {
> -  ARM_BUILTIN_GETWCX,
> -  ARM_BUILTIN_SETWCX,
> +  ARM_BUILTIN_GETWCGR0,
> +  ARM_BUILTIN_GETWCGR1,
> +  ARM_BUILTIN_GETWCGR2,
> +  ARM_BUILTIN_GETWCGR3,
> +
> +  ARM_BUILTIN_SETWCGR0,
> +  ARM_BUILTIN_SETWCGR1,
> +  ARM_BUILTIN_SETWCGR2,
> +  ARM_BUILTIN_SETWCGR3,
>
>   ARM_BUILTIN_WZERO,
>
> @@ -19661,7 +19668,11 @@ enum arm_builtins
>   ARM_BUILTIN_WSADH,
>   ARM_BUILTIN_WSADHZ,
>
> -  ARM_BUILTIN_WALIGN,
> +  ARM_BUILTIN_WALIGNI,
> +  ARM_BUILTIN_WALIGNR0,
> +  ARM_BUILTIN_WALIGNR1,
> +  ARM_BUILTIN_WALIGNR2,
> +  ARM_BUILTIN_WALIGNR3,
>
>   ARM_BUILTIN_TMIA,
>   ARM_BUILTIN_TMIAPH,
> @@ -19797,6 +19808,81 @@ enum arm_builtins
>   ARM_BUILTIN_WUNPCKELUH,
>   ARM_BUILTIN_WUNPCKELUW,
>
> +  ARM_BUILTIN_WABSB,
> +  ARM_BUILTIN_WABSH,
> +  ARM_BUILTIN_WABSW,
> +
> +  ARM_BUILTIN_WADDSUBHX,
> +  ARM_BUILTIN_WSUBADDHX,
> +
> +  ARM_BUILTIN_WABSDIFFB,
> +  ARM_BUILTIN_WABSDIFFH,
> +  ARM_BUILTIN_WABSDIFFW,
> +
> +  ARM_BUILTIN_WADDCH,
> +  ARM_BUILTIN_WADDCW,
> +
> +  ARM_BUILTIN_WAVG4,
> +  ARM_BUILTIN_WAVG4R,
> +
> +  ARM_BUILTIN_WMADDSX,
> +  ARM_BUILTIN_WMADDUX,
> +
> +  ARM_BUILTIN_WMADDSN,
> +  ARM_BUILTIN_WMADDUN,
> +
> +  ARM_BUILTIN_WMULWSM,
> +  ARM_BUILTIN_WMULWUM,
> +
> +  ARM_BUILTIN_WMULWSMR,
> +  ARM_BUILTIN_WMULWUMR,
> +
> +  ARM_BUILTIN_WMULWL,
> +
> +  ARM_BUILTIN_WMULSMR,
> +  ARM_BUILTIN_WMULUMR,
> +
> +  ARM_BUILTIN_WQMULM,
> +  ARM_BUILTIN_WQMULMR,
> +
> +  ARM_BUILTIN_WQMULWM,
> +  ARM_BUILTIN_WQMULWMR,
> +
> +  ARM_BUILTIN_WADDBHUSM,
> +  ARM_BUILTIN_WADDBHUSL,
> +
> +  ARM_BUILTIN_WQMIABB,
> +  ARM_BUILTIN_WQMIABT,
> +  ARM_BUILTIN_WQMIATB,
> +  ARM_BUILTIN_WQMIATT,
> +
> +  ARM_BUILTIN_WQMIABBN,
> +  ARM_BUILTIN_WQMIABTN,
> +  ARM_BUILTIN_WQMIATBN,
> +  ARM_BUILTIN_WQMIATTN,
> +
> +  ARM_BUILTIN_WMIABB,
> +  ARM_BUILTIN_WMIABT,
> +  ARM_BUILTIN_WMIATB,
> +  ARM_BUILTIN_WMIATT,
> +
> +  ARM_BUILTIN_WMIABBN,
> +  ARM_BUILTIN_WMIABTN,
> +  ARM_BUILTIN_WMIATBN,
> +  ARM_BUILTIN_WMIATTN,
> +
> +  ARM_BUILTIN_WMIAWBB,
> +  ARM_BUILTIN_WMIAWBT,
> +  ARM_BUILTIN_WMIAWTB,
> +  ARM_BUILTIN_WMIAWTT,
> +
> +  ARM_BUILTIN_WMIAWBBN,
> +  ARM_BUILTIN_WMIAWBTN,
> +  ARM_BUILTIN_WMIAWTBN,
> +  ARM_BUILTIN_WMIAWTTN,
> +
> +  ARM_BUILTIN_WMERGE,
> +
>   ARM_BUILTIN_THREAD_POINTER,
>
>   ARM_BUILTIN_NEON_BASE,
> @@ -20329,6 +20415,10 @@ static const struct builtin_description bdesc_2arg[] =
>   { FL_IWMMXT, CODE_FOR_##code, "__builtin_arm_" string, \
>     ARM_BUILTIN_##builtin, UNKNOWN, 0 },
>
> +#define IWMMXT2_BUILTIN(code, string, builtin) \
> +  { FL_IWMMXT2, CODE_FOR_##code, "__builtin_arm_" string, \
> +    ARM_BUILTIN_##builtin, UNKNOWN, 0 },
> +
>   IWMMXT_BUILTIN (addv8qi3, "waddb", WADDB)
>   IWMMXT_BUILTIN (addv4hi3, "waddh", WADDH)
>   IWMMXT_BUILTIN (addv2si3, "waddw", WADDW)
> @@ -20385,44 +20475,45 @@ static const struct builtin_description bdesc_2arg[] =
>   IWMMXT_BUILTIN (iwmmxt_wunpckihb, "wunpckihb", WUNPCKIHB)
>   IWMMXT_BUILTIN (iwmmxt_wunpckihh, "wunpckihh", WUNPCKIHH)
>   IWMMXT_BUILTIN (iwmmxt_wunpckihw, "wunpckihw", WUNPCKIHW)
> -  IWMMXT_BUILTIN (iwmmxt_wmadds, "wmadds", WMADDS)
> -  IWMMXT_BUILTIN (iwmmxt_wmaddu, "wmaddu", WMADDU)
> +  IWMMXT2_BUILTIN (iwmmxt_waddsubhx, "waddsubhx", WADDSUBHX)
> +  IWMMXT2_BUILTIN (iwmmxt_wsubaddhx, "wsubaddhx", WSUBADDHX)
> +  IWMMXT2_BUILTIN (iwmmxt_wabsdiffb, "wabsdiffb", WABSDIFFB)
> +  IWMMXT2_BUILTIN (iwmmxt_wabsdiffh, "wabsdiffh", WABSDIFFH)
> +  IWMMXT2_BUILTIN (iwmmxt_wabsdiffw, "wabsdiffw", WABSDIFFW)
> +  IWMMXT2_BUILTIN (iwmmxt_avg4, "wavg4", WAVG4)
> +  IWMMXT2_BUILTIN (iwmmxt_avg4r, "wavg4r", WAVG4R)
> +  IWMMXT2_BUILTIN (iwmmxt_wmulwsm, "wmulwsm", WMULWSM)
> +  IWMMXT2_BUILTIN (iwmmxt_wmulwum, "wmulwum", WMULWUM)
> +  IWMMXT2_BUILTIN (iwmmxt_wmulwsmr, "wmulwsmr", WMULWSMR)
> +  IWMMXT2_BUILTIN (iwmmxt_wmulwumr, "wmulwumr", WMULWUMR)
> +  IWMMXT2_BUILTIN (iwmmxt_wmulwl, "wmulwl", WMULWL)
> +  IWMMXT2_BUILTIN (iwmmxt_wmulsmr, "wmulsmr", WMULSMR)
> +  IWMMXT2_BUILTIN (iwmmxt_wmulumr, "wmulumr", WMULUMR)
> +  IWMMXT2_BUILTIN (iwmmxt_wqmulm, "wqmulm", WQMULM)
> +  IWMMXT2_BUILTIN (iwmmxt_wqmulmr, "wqmulmr", WQMULMR)
> +  IWMMXT2_BUILTIN (iwmmxt_wqmulwm, "wqmulwm", WQMULWM)
> +  IWMMXT2_BUILTIN (iwmmxt_wqmulwmr, "wqmulwmr", WQMULWMR)
> +  IWMMXT_BUILTIN (iwmmxt_walignr0, "walignr0", WALIGNR0)
> +  IWMMXT_BUILTIN (iwmmxt_walignr1, "walignr1", WALIGNR1)
> +  IWMMXT_BUILTIN (iwmmxt_walignr2, "walignr2", WALIGNR2)
> +  IWMMXT_BUILTIN (iwmmxt_walignr3, "walignr3", WALIGNR3)
>
>  #define IWMMXT_BUILTIN2(code, builtin) \
>   { FL_IWMMXT, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
>
> +#define IWMMXT2_BUILTIN2(code, builtin) \
> +  { FL_IWMMXT2, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
> +
> +  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusm, WADDBHUSM)
> +  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusl, WADDBHUSL)
>   IWMMXT_BUILTIN2 (iwmmxt_wpackhss, WPACKHSS)
>   IWMMXT_BUILTIN2 (iwmmxt_wpackwss, WPACKWSS)
>   IWMMXT_BUILTIN2 (iwmmxt_wpackdss, WPACKDSS)
>   IWMMXT_BUILTIN2 (iwmmxt_wpackhus, WPACKHUS)
>   IWMMXT_BUILTIN2 (iwmmxt_wpackwus, WPACKWUS)
>   IWMMXT_BUILTIN2 (iwmmxt_wpackdus, WPACKDUS)
> -  IWMMXT_BUILTIN2 (ashlv4hi3_di,    WSLLH)
> -  IWMMXT_BUILTIN2 (ashlv4hi3_iwmmxt, WSLLHI)
> -  IWMMXT_BUILTIN2 (ashlv2si3_di,    WSLLW)
> -  IWMMXT_BUILTIN2 (ashlv2si3_iwmmxt, WSLLWI)
> -  IWMMXT_BUILTIN2 (ashldi3_di,      WSLLD)
> -  IWMMXT_BUILTIN2 (ashldi3_iwmmxt,  WSLLDI)
> -  IWMMXT_BUILTIN2 (lshrv4hi3_di,    WSRLH)
> -  IWMMXT_BUILTIN2 (lshrv4hi3_iwmmxt, WSRLHI)
> -  IWMMXT_BUILTIN2 (lshrv2si3_di,    WSRLW)
> -  IWMMXT_BUILTIN2 (lshrv2si3_iwmmxt, WSRLWI)
> -  IWMMXT_BUILTIN2 (lshrdi3_di,      WSRLD)
> -  IWMMXT_BUILTIN2 (lshrdi3_iwmmxt,  WSRLDI)
> -  IWMMXT_BUILTIN2 (ashrv4hi3_di,    WSRAH)
> -  IWMMXT_BUILTIN2 (ashrv4hi3_iwmmxt, WSRAHI)
> -  IWMMXT_BUILTIN2 (ashrv2si3_di,    WSRAW)
> -  IWMMXT_BUILTIN2 (ashrv2si3_iwmmxt, WSRAWI)
> -  IWMMXT_BUILTIN2 (ashrdi3_di,      WSRAD)
> -  IWMMXT_BUILTIN2 (ashrdi3_iwmmxt,  WSRADI)
> -  IWMMXT_BUILTIN2 (rorv4hi3_di,     WRORH)
> -  IWMMXT_BUILTIN2 (rorv4hi3,        WRORHI)
> -  IWMMXT_BUILTIN2 (rorv2si3_di,     WRORW)
> -  IWMMXT_BUILTIN2 (rorv2si3,        WRORWI)
> -  IWMMXT_BUILTIN2 (rordi3_di,       WRORD)
> -  IWMMXT_BUILTIN2 (rordi3,          WRORDI)
> -  IWMMXT_BUILTIN2 (iwmmxt_wmacuz,   WMACUZ)
> -  IWMMXT_BUILTIN2 (iwmmxt_wmacsz,   WMACSZ)
> +  IWMMXT_BUILTIN2 (iwmmxt_wmacuz, WMACUZ)
> +  IWMMXT_BUILTIN2 (iwmmxt_wmacsz, WMACSZ)
>  };
>
>  static const struct builtin_description bdesc_1arg[] =
> @@ -20445,6 +20536,12 @@ static const struct builtin_description bdesc_1arg[] =
>   IWMMXT_BUILTIN (iwmmxt_wunpckelsb, "wunpckelsb", WUNPCKELSB)
>   IWMMXT_BUILTIN (iwmmxt_wunpckelsh, "wunpckelsh", WUNPCKELSH)
>   IWMMXT_BUILTIN (iwmmxt_wunpckelsw, "wunpckelsw", WUNPCKELSW)
> +  IWMMXT2_BUILTIN (iwmmxt_wabsv8qi3, "wabsb", WABSB)
> +  IWMMXT2_BUILTIN (iwmmxt_wabsv4hi3, "wabsh", WABSH)
> +  IWMMXT2_BUILTIN (iwmmxt_wabsv2si3, "wabsw", WABSW)
> +  IWMMXT_BUILTIN (tbcstv8qi, "tbcstb", TBCSTB)
> +  IWMMXT_BUILTIN (tbcstv4hi, "tbcsth", TBCSTH)
> +  IWMMXT_BUILTIN (tbcstv2si, "tbcstw", TBCSTW)
>  };
>
>  /* Set up all the iWMMXt builtins.  This is not called if
> @@ -20460,9 +20557,6 @@ arm_init_iwmmxt_builtins (void)
>   tree V4HI_type_node = build_vector_type_for_mode (intHI_type_node, V4HImode);
>   tree V8QI_type_node = build_vector_type_for_mode (intQI_type_node, V8QImode);
>
> -  tree int_ftype_int
> -    = build_function_type_list (integer_type_node,
> -                               integer_type_node, NULL_TREE);
>   tree v8qi_ftype_v8qi_v8qi_int
>     = build_function_type_list (V8QI_type_node,
>                                V8QI_type_node, V8QI_type_node,
> @@ -20524,6 +20618,9 @@ arm_init_iwmmxt_builtins (void)
>   tree v4hi_ftype_v2si_v2si
>     = build_function_type_list (V4HI_type_node,
>                                V2SI_type_node, V2SI_type_node, NULL_TREE);
> +  tree v8qi_ftype_v4hi_v8qi
> +    = build_function_type_list (V8QI_type_node,
> +                               V4HI_type_node, V8QI_type_node, NULL_TREE);
>   tree v2si_ftype_v4hi_v4hi
>     = build_function_type_list (V2SI_type_node,
>                                V4HI_type_node, V4HI_type_node, NULL_TREE);
> @@ -20538,12 +20635,10 @@ arm_init_iwmmxt_builtins (void)
>     = build_function_type_list (V2SI_type_node,
>                                V2SI_type_node, long_long_integer_type_node,
>                                NULL_TREE);
> -  tree void_ftype_int_int
> -    = build_function_type_list (void_type_node,
> -                               integer_type_node, integer_type_node,
> -                               NULL_TREE);
>   tree di_ftype_void
>     = build_function_type_list (long_long_unsigned_type_node, NULL_TREE);
> +  tree int_ftype_void
> +    = build_function_type_list (integer_type_node, NULL_TREE);
>   tree di_ftype_v8qi
>     = build_function_type_list (long_long_integer_type_node,
>                                V8QI_type_node, NULL_TREE);
> @@ -20559,6 +20654,15 @@ arm_init_iwmmxt_builtins (void)
>   tree v4hi_ftype_v8qi
>     = build_function_type_list (V4HI_type_node,
>                                V8QI_type_node, NULL_TREE);
> +  tree v8qi_ftype_v8qi
> +    = build_function_type_list (V8QI_type_node,
> +                               V8QI_type_node, NULL_TREE);
> +  tree v4hi_ftype_v4hi
> +    = build_function_type_list (V4HI_type_node,
> +                               V4HI_type_node, NULL_TREE);
> +  tree v2si_ftype_v2si
> +    = build_function_type_list (V2SI_type_node,
> +                               V2SI_type_node, NULL_TREE);
>
>   tree di_ftype_di_v4hi_v4hi
>     = build_function_type_list (long_long_unsigned_type_node,
> @@ -20571,6 +20675,48 @@ arm_init_iwmmxt_builtins (void)
>                                V4HI_type_node,V4HI_type_node,
>                                NULL_TREE);
>
> +  tree v2si_ftype_v2si_v4hi_v4hi
> +    = build_function_type_list (V2SI_type_node,
> +                                V2SI_type_node, V4HI_type_node,
> +                                V4HI_type_node, NULL_TREE);
> +
> +  tree v2si_ftype_v2si_v8qi_v8qi
> +    = build_function_type_list (V2SI_type_node,
> +                                V2SI_type_node, V8QI_type_node,
> +                                V8QI_type_node, NULL_TREE);
> +
> +  tree di_ftype_di_v2si_v2si
> +     = build_function_type_list (long_long_unsigned_type_node,
> +                                 long_long_unsigned_type_node,
> +                                 V2SI_type_node, V2SI_type_node,
> +                                 NULL_TREE);
> +
> +   tree di_ftype_di_di_int
> +     = build_function_type_list (long_long_unsigned_type_node,
> +                                 long_long_unsigned_type_node,
> +                                 long_long_unsigned_type_node,
> +                                 integer_type_node, NULL_TREE);
> +
> +   tree void_ftype_void
> +     = build_function_type_list (void_type_node,
> +                                 NULL_TREE);
> +
> +   tree void_ftype_int
> +     = build_function_type_list (void_type_node,
> +                                 integer_type_node, NULL_TREE);
> +
> +   tree v8qi_ftype_char
> +     = build_function_type_list (V8QI_type_node,
> +                                 signed_char_type_node, NULL_TREE);
> +
> +   tree v4hi_ftype_short
> +     = build_function_type_list (V4HI_type_node,
> +                                 short_integer_type_node, NULL_TREE);
> +
> +   tree v2si_ftype_int
> +     = build_function_type_list (V2SI_type_node,
> +                                 integer_type_node, NULL_TREE);
> +
>   /* Normal vector binops.  */
>   tree v8qi_ftype_v8qi_v8qi
>     = build_function_type_list (V8QI_type_node,
> @@ -20628,9 +20774,19 @@ arm_init_iwmmxt_builtins (void)
>   def_mbuiltin (FL_IWMMXT, "__builtin_arm_" NAME, (TYPE),      \
>                ARM_BUILTIN_ ## CODE)
>
> +#define iwmmx2_mbuiltin(NAME, TYPE, CODE)                      \
> +  def_mbuiltin (FL_IWMMXT2, "__builtin_arm_" NAME, (TYPE),     \
> +               ARM_BUILTIN_ ## CODE)
> +
>   iwmmx_mbuiltin ("wzero", di_ftype_void, WZERO);
> -  iwmmx_mbuiltin ("setwcx", void_ftype_int_int, SETWCX);
> -  iwmmx_mbuiltin ("getwcx", int_ftype_int, GETWCX);
> +  iwmmx_mbuiltin ("setwcgr0", void_ftype_int, SETWCGR0);
> +  iwmmx_mbuiltin ("setwcgr1", void_ftype_int, SETWCGR1);
> +  iwmmx_mbuiltin ("setwcgr2", void_ftype_int, SETWCGR2);
> +  iwmmx_mbuiltin ("setwcgr3", void_ftype_int, SETWCGR3);
> +  iwmmx_mbuiltin ("getwcgr0", int_ftype_void, GETWCGR0);
> +  iwmmx_mbuiltin ("getwcgr1", int_ftype_void, GETWCGR1);
> +  iwmmx_mbuiltin ("getwcgr2", int_ftype_void, GETWCGR2);
> +  iwmmx_mbuiltin ("getwcgr3", int_ftype_void, GETWCGR3);
>
>   iwmmx_mbuiltin ("wsllh", v4hi_ftype_v4hi_di, WSLLH);
>   iwmmx_mbuiltin ("wsllw", v2si_ftype_v2si_di, WSLLW);
> @@ -20662,8 +20818,14 @@ arm_init_iwmmxt_builtins (void)
>
>   iwmmx_mbuiltin ("wshufh", v4hi_ftype_v4hi_int, WSHUFH);
>
> -  iwmmx_mbuiltin ("wsadb", v2si_ftype_v8qi_v8qi, WSADB);
> -  iwmmx_mbuiltin ("wsadh", v2si_ftype_v4hi_v4hi, WSADH);
> +  iwmmx_mbuiltin ("wsadb", v2si_ftype_v2si_v8qi_v8qi, WSADB);
> +  iwmmx_mbuiltin ("wsadh", v2si_ftype_v2si_v4hi_v4hi, WSADH);
> +  iwmmx_mbuiltin ("wmadds", v2si_ftype_v4hi_v4hi, WMADDS);
> +  iwmmx2_mbuiltin ("wmaddsx", v2si_ftype_v4hi_v4hi, WMADDSX);
> +  iwmmx2_mbuiltin ("wmaddsn", v2si_ftype_v4hi_v4hi, WMADDSN);
> +  iwmmx_mbuiltin ("wmaddu", v2si_ftype_v4hi_v4hi, WMADDU);
> +  iwmmx2_mbuiltin ("wmaddux", v2si_ftype_v4hi_v4hi, WMADDUX);
> +  iwmmx2_mbuiltin ("wmaddun", v2si_ftype_v4hi_v4hi, WMADDUN);
>   iwmmx_mbuiltin ("wsadbz", v2si_ftype_v8qi_v8qi, WSADBZ);
>   iwmmx_mbuiltin ("wsadhz", v2si_ftype_v4hi_v4hi, WSADHZ);
>
> @@ -20685,6 +20847,9 @@ arm_init_iwmmxt_builtins (void)
>   iwmmx_mbuiltin ("tmovmskh", int_ftype_v4hi, TMOVMSKH);
>   iwmmx_mbuiltin ("tmovmskw", int_ftype_v2si, TMOVMSKW);
>
> +  iwmmx2_mbuiltin ("waddbhusm", v8qi_ftype_v4hi_v8qi, WADDBHUSM);
> +  iwmmx2_mbuiltin ("waddbhusl", v8qi_ftype_v4hi_v8qi, WADDBHUSL);
> +
>   iwmmx_mbuiltin ("wpackhss", v8qi_ftype_v4hi_v4hi, WPACKHSS);
>   iwmmx_mbuiltin ("wpackhus", v8qi_ftype_v4hi_v4hi, WPACKHUS);
>   iwmmx_mbuiltin ("wpackwus", v4hi_ftype_v2si_v2si, WPACKWUS);
> @@ -20710,7 +20875,7 @@ arm_init_iwmmxt_builtins (void)
>   iwmmx_mbuiltin ("wmacu", di_ftype_di_v4hi_v4hi, WMACU);
>   iwmmx_mbuiltin ("wmacuz", di_ftype_v4hi_v4hi, WMACUZ);
>
> -  iwmmx_mbuiltin ("walign", v8qi_ftype_v8qi_v8qi_int, WALIGN);
> +  iwmmx_mbuiltin ("walign", v8qi_ftype_v8qi_v8qi_int, WALIGNI);
>   iwmmx_mbuiltin ("tmia", di_ftype_di_int_int, TMIA);
>   iwmmx_mbuiltin ("tmiaph", di_ftype_di_int_int, TMIAPH);
>   iwmmx_mbuiltin ("tmiabb", di_ftype_di_int_int, TMIABB);
> @@ -20718,7 +20883,48 @@ arm_init_iwmmxt_builtins (void)
>   iwmmx_mbuiltin ("tmiatb", di_ftype_di_int_int, TMIATB);
>   iwmmx_mbuiltin ("tmiatt", di_ftype_di_int_int, TMIATT);
>
> +  iwmmx2_mbuiltin ("wabsb", v8qi_ftype_v8qi, WABSB);
> +  iwmmx2_mbuiltin ("wabsh", v4hi_ftype_v4hi, WABSH);
> +  iwmmx2_mbuiltin ("wabsw", v2si_ftype_v2si, WABSW);
> +
> +  iwmmx2_mbuiltin ("wqmiabb", v2si_ftype_v2si_v4hi_v4hi, WQMIABB);
> +  iwmmx2_mbuiltin ("wqmiabt", v2si_ftype_v2si_v4hi_v4hi, WQMIABT);
> +  iwmmx2_mbuiltin ("wqmiatb", v2si_ftype_v2si_v4hi_v4hi, WQMIATB);
> +  iwmmx2_mbuiltin ("wqmiatt", v2si_ftype_v2si_v4hi_v4hi, WQMIATT);
> +
> +  iwmmx2_mbuiltin ("wqmiabbn", v2si_ftype_v2si_v4hi_v4hi, WQMIABBN);
> +  iwmmx2_mbuiltin ("wqmiabtn", v2si_ftype_v2si_v4hi_v4hi, WQMIABTN);
> +  iwmmx2_mbuiltin ("wqmiatbn", v2si_ftype_v2si_v4hi_v4hi, WQMIATBN);
> +  iwmmx2_mbuiltin ("wqmiattn", v2si_ftype_v2si_v4hi_v4hi, WQMIATTN);
> +
> +  iwmmx2_mbuiltin ("wmiabb", di_ftype_di_v4hi_v4hi, WMIABB);
> +  iwmmx2_mbuiltin ("wmiabt", di_ftype_di_v4hi_v4hi, WMIABT);
> +  iwmmx2_mbuiltin ("wmiatb", di_ftype_di_v4hi_v4hi, WMIATB);
> +  iwmmx2_mbuiltin ("wmiatt", di_ftype_di_v4hi_v4hi, WMIATT);
> +
> +  iwmmx2_mbuiltin ("wmiabbn", di_ftype_di_v4hi_v4hi, WMIABBN);
> +  iwmmx2_mbuiltin ("wmiabtn", di_ftype_di_v4hi_v4hi, WMIABTN);
> +  iwmmx2_mbuiltin ("wmiatbn", di_ftype_di_v4hi_v4hi, WMIATBN);
> +  iwmmx2_mbuiltin ("wmiattn", di_ftype_di_v4hi_v4hi, WMIATTN);
> +
> +  iwmmx2_mbuiltin ("wmiawbb", di_ftype_di_v2si_v2si, WMIAWBB);
> +  iwmmx2_mbuiltin ("wmiawbt", di_ftype_di_v2si_v2si, WMIAWBT);
> +  iwmmx2_mbuiltin ("wmiawtb", di_ftype_di_v2si_v2si, WMIAWTB);
> +  iwmmx2_mbuiltin ("wmiawtt", di_ftype_di_v2si_v2si, WMIAWTT);
> +
> +  iwmmx2_mbuiltin ("wmiawbbn", di_ftype_di_v2si_v2si, WMIAWBBN);
> +  iwmmx2_mbuiltin ("wmiawbtn", di_ftype_di_v2si_v2si, WMIAWBTN);
> +  iwmmx2_mbuiltin ("wmiawtbn", di_ftype_di_v2si_v2si, WMIAWTBN);
> +  iwmmx2_mbuiltin ("wmiawttn", di_ftype_di_v2si_v2si, WMIAWTTN);
> +
> +  iwmmx2_mbuiltin ("wmerge", di_ftype_di_di_int, WMERGE);
> +
> +  iwmmx_mbuiltin ("tbcstb", v8qi_ftype_char, TBCSTB);
> +  iwmmx_mbuiltin ("tbcsth", v4hi_ftype_short, TBCSTH);
> +  iwmmx_mbuiltin ("tbcstw", v2si_ftype_int, TBCSTW);
> +
>  #undef iwmmx_mbuiltin
> +#undef iwmmx2_mbuiltin
>  }
>
>  static void
> @@ -21375,6 +21581,10 @@ arm_expand_builtin (tree exp,
>   enum machine_mode mode0;
>   enum machine_mode mode1;
>   enum machine_mode mode2;
> +  int opint;
> +  int selector;
> +  int mask;
> +  int imm;
>
>   if (fcode >= ARM_BUILTIN_NEON_BASE)
>     return arm_expand_neon_builtin (fcode, exp, target);
> @@ -21409,6 +21619,24 @@ arm_expand_builtin (tree exp,
>          error ("selector must be an immediate");
>          return gen_reg_rtx (tmode);
>        }
> +
> +      opint = INTVAL (op1);
> +      if (fcode == ARM_BUILTIN_TEXTRMSB || fcode == ARM_BUILTIN_TEXTRMUB)
> +       {
> +         if (opint > 7 || opint < 0)
> +           error ("the range of selector should be in 0 to 7");
> +       }
> +      else if (fcode == ARM_BUILTIN_TEXTRMSH || fcode == ARM_BUILTIN_TEXTRMUH)
> +       {
> +         if (opint > 3 || opint < 0)
> +           error ("the range of selector should be in 0 to 3");
> +       }
> +      else /* ARM_BUILTIN_TEXTRMSW || ARM_BUILTIN_TEXTRMUW.  */
> +       {
> +         if (opint > 1 || opint < 0)
> +           error ("the range of selector should be in 0 to 1");
> +       }
> +
>       if (target == 0
>          || GET_MODE (target) != tmode
>          || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
> @@ -21419,11 +21647,61 @@ arm_expand_builtin (tree exp,
>       emit_insn (pat);
>       return target;
>
> +    case ARM_BUILTIN_WALIGNI:
> +      /* If op2 is immediate, call walighi, else call walighr.  */
> +      arg0 = CALL_EXPR_ARG (exp, 0);
> +      arg1 = CALL_EXPR_ARG (exp, 1);
> +      arg2 = CALL_EXPR_ARG (exp, 2);
> +      op0 = expand_normal (arg0);
> +      op1 = expand_normal (arg1);
> +      op2 = expand_normal (arg2);
> +      if (GET_CODE (op2) == CONST_INT)

Replace this with CONST_INT_P everywhere in your patches .

> +        {
> +         icode = CODE_FOR_iwmmxt_waligni;
> +          tmode = insn_data[icode].operand[0].mode;
> +         mode0 = insn_data[icode].operand[1].mode;
> +         mode1 = insn_data[icode].operand[2].mode;
> +         mode2 = insn_data[icode].operand[3].mode;
> +          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
> +           op0 = copy_to_mode_reg (mode0, op0);
> +          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
> +           op1 = copy_to_mode_reg (mode1, op1);
> +          gcc_assert ((*insn_data[icode].operand[3].predicate) (op2, mode2));
> +         selector = INTVAL (op2);
> +         if (selector > 7 || selector < 0)
> +           error ("the range of selector should be in 0 to 7");
> +       }
> +      else
> +        {
> +         icode = CODE_FOR_iwmmxt_walignr;
> +          tmode = insn_data[icode].operand[0].mode;
> +         mode0 = insn_data[icode].operand[1].mode;
> +         mode1 = insn_data[icode].operand[2].mode;
> +         mode2 = insn_data[icode].operand[3].mode;
> +          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
> +           op0 = copy_to_mode_reg (mode0, op0);
> +          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
> +           op1 = copy_to_mode_reg (mode1, op1);
> +          if (!(*insn_data[icode].operand[3].predicate) (op2, mode2))
> +           op2 = copy_to_mode_reg (mode2, op2);
> +       }
> +      if (target == 0
> +         || GET_MODE (target) != tmode
> +         || !(*insn_data[icode].operand[0].predicate) (target, tmode))
> +       target = gen_reg_rtx (tmode);
> +      pat = GEN_FCN (icode) (target, op0, op1, op2);
> +      if (!pat)
> +       return 0;
> +      emit_insn (pat);
> +      return target;
> +
>     case ARM_BUILTIN_TINSRB:
>     case ARM_BUILTIN_TINSRH:
>     case ARM_BUILTIN_TINSRW:
> +    case ARM_BUILTIN_WMERGE:
>       icode = (fcode == ARM_BUILTIN_TINSRB ? CODE_FOR_iwmmxt_tinsrb
>               : fcode == ARM_BUILTIN_TINSRH ? CODE_FOR_iwmmxt_tinsrh
> +              : fcode == ARM_BUILTIN_WMERGE ? CODE_FOR_iwmmxt_wmerge
>               : CODE_FOR_iwmmxt_tinsrw);
>       arg0 = CALL_EXPR_ARG (exp, 0);
>       arg1 = CALL_EXPR_ARG (exp, 1);
> @@ -21442,10 +21720,30 @@ arm_expand_builtin (tree exp,
>        op1 = copy_to_mode_reg (mode1, op1);
>       if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
>        {
> -         /* @@@ better error message */
>          error ("selector must be an immediate");
>          return const0_rtx;
>        }
> +      if (icode == CODE_FOR_iwmmxt_wmerge)
> +       {
> +         selector = INTVAL (op2);
> +         if (selector > 7 || selector < 0)
> +           error ("the range of selector should be in 0 to 7");
> +       }
> +      if ((icode == CODE_FOR_iwmmxt_tinsrb)
> +         || (icode == CODE_FOR_iwmmxt_tinsrh)
> +         || (icode == CODE_FOR_iwmmxt_tinsrw))
> +        {
> +         mask = 0x01;
> +         selector= INTVAL (op2);
> +         if (icode == CODE_FOR_iwmmxt_tinsrb && (selector < 0 || selector > 7))
> +           error ("the range of selector should be in 0 to 7");
> +         else if (icode == CODE_FOR_iwmmxt_tinsrh && (selector < 0 ||selector > 3))
> +           error ("the range of selector should be in 0 to 3");
> +         else if (icode == CODE_FOR_iwmmxt_tinsrw && (selector < 0 ||selector > 1))
> +           error ("the range of selector should be in 0 to 1");
> +         mask <<= selector;
> +         op2 = gen_rtx_CONST_INT (SImode, mask);
> +       }
>       if (target == 0
>          || GET_MODE (target) != tmode
>          || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
> @@ -21456,19 +21754,42 @@ arm_expand_builtin (tree exp,
>       emit_insn (pat);
>       return target;
>
> -    case ARM_BUILTIN_SETWCX:
> +    case ARM_BUILTIN_SETWCGR0:
> +    case ARM_BUILTIN_SETWCGR1:
> +    case ARM_BUILTIN_SETWCGR2:
> +    case ARM_BUILTIN_SETWCGR3:
> +      icode = (fcode == ARM_BUILTIN_SETWCGR0 ? CODE_FOR_iwmmxt_setwcgr0
> +              : fcode == ARM_BUILTIN_SETWCGR1 ? CODE_FOR_iwmmxt_setwcgr1
> +              : fcode == ARM_BUILTIN_SETWCGR2 ? CODE_FOR_iwmmxt_setwcgr2
> +              : CODE_FOR_iwmmxt_setwcgr3);
>       arg0 = CALL_EXPR_ARG (exp, 0);
> -      arg1 = CALL_EXPR_ARG (exp, 1);
> -      op0 = force_reg (SImode, expand_normal (arg0));
> -      op1 = expand_normal (arg1);
> -      emit_insn (gen_iwmmxt_tmcr (op1, op0));
> +      op0 = expand_normal (arg0);
> +      mode0 = insn_data[icode].operand[0].mode;
> +      if (!(*insn_data[icode].operand[0].predicate) (op0, mode0))
> +        op0 = copy_to_mode_reg (mode0, op0);
> +      pat = GEN_FCN (icode) (op0);
> +      if (!pat)
> +       return 0;
> +      emit_insn (pat);
>       return 0;
>
> -    case ARM_BUILTIN_GETWCX:
> -      arg0 = CALL_EXPR_ARG (exp, 0);
> -      op0 = expand_normal (arg0);
> -      target = gen_reg_rtx (SImode);
> -      emit_insn (gen_iwmmxt_tmrc (target, op0));
> +    case ARM_BUILTIN_GETWCGR0:
> +    case ARM_BUILTIN_GETWCGR1:
> +    case ARM_BUILTIN_GETWCGR2:
> +    case ARM_BUILTIN_GETWCGR3:
> +      icode = (fcode == ARM_BUILTIN_GETWCGR0 ? CODE_FOR_iwmmxt_getwcgr0
> +              : fcode == ARM_BUILTIN_GETWCGR1 ? CODE_FOR_iwmmxt_getwcgr1
> +              : fcode == ARM_BUILTIN_GETWCGR2 ? CODE_FOR_iwmmxt_getwcgr2
> +              : CODE_FOR_iwmmxt_getwcgr3);
> +      tmode = insn_data[icode].operand[0].mode;
> +      if (target == 0
> +         || GET_MODE (target) != tmode
> +         || !(*insn_data[icode].operand[0].predicate) (target, tmode))
> +        target = gen_reg_rtx (tmode);
> +      pat = GEN_FCN (icode) (target);
> +      if (!pat)
> +        return 0;
> +      emit_insn (pat);
>       return target;
>
>     case ARM_BUILTIN_WSHUFH:
> @@ -21485,10 +21806,12 @@ arm_expand_builtin (tree exp,
>        op0 = copy_to_mode_reg (mode1, op0);
>       if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
>        {
> -         /* @@@ better error message */
>          error ("mask must be an immediate");
>          return const0_rtx;
>        }
> +      selector = INTVAL (op1);
> +      if (selector < 0 || selector > 255)
> +       error ("the range of mask should be in 0 to 255");
>       if (target == 0
>          || GET_MODE (target) != tmode
>          || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
> @@ -21499,10 +21822,18 @@ arm_expand_builtin (tree exp,
>       emit_insn (pat);
>       return target;
>
> -    case ARM_BUILTIN_WSADB:
> -      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadb, exp, target);
> -    case ARM_BUILTIN_WSADH:
> -      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadh, exp, target);
> +    case ARM_BUILTIN_WMADDS:
> +      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmadds, exp, target);
> +    case ARM_BUILTIN_WMADDSX:
> +      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsx, exp, target);
> +    case ARM_BUILTIN_WMADDSN:
> +      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsn, exp, target);
> +    case ARM_BUILTIN_WMADDU:
> +      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddu, exp, target);
> +    case ARM_BUILTIN_WMADDUX:
> +      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddux, exp, target);
> +    case ARM_BUILTIN_WMADDUN:
> +      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddun, exp, target);
>     case ARM_BUILTIN_WSADBZ:
>       return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target);
>     case ARM_BUILTIN_WSADHZ:
> @@ -21511,13 +21842,38 @@ arm_expand_builtin (tree exp,
>       /* Several three-argument builtins.  */
>     case ARM_BUILTIN_WMACS:
>     case ARM_BUILTIN_WMACU:
> -    case ARM_BUILTIN_WALIGN:
>     case ARM_BUILTIN_TMIA:
>     case ARM_BUILTIN_TMIAPH:
>     case ARM_BUILTIN_TMIATT:
>     case ARM_BUILTIN_TMIATB:
>     case ARM_BUILTIN_TMIABT:
>     case ARM_BUILTIN_TMIABB:
> +    case ARM_BUILTIN_WQMIABB:
> +    case ARM_BUILTIN_WQMIABT:
> +    case ARM_BUILTIN_WQMIATB:
> +    case ARM_BUILTIN_WQMIATT:
> +    case ARM_BUILTIN_WQMIABBN:
> +    case ARM_BUILTIN_WQMIABTN:
> +    case ARM_BUILTIN_WQMIATBN:
> +    case ARM_BUILTIN_WQMIATTN:
> +    case ARM_BUILTIN_WMIABB:
> +    case ARM_BUILTIN_WMIABT:
> +    case ARM_BUILTIN_WMIATB:
> +    case ARM_BUILTIN_WMIATT:
> +    case ARM_BUILTIN_WMIABBN:
> +    case ARM_BUILTIN_WMIABTN:
> +    case ARM_BUILTIN_WMIATBN:
> +    case ARM_BUILTIN_WMIATTN:
> +    case ARM_BUILTIN_WMIAWBB:
> +    case ARM_BUILTIN_WMIAWBT:
> +    case ARM_BUILTIN_WMIAWTB:
> +    case ARM_BUILTIN_WMIAWTT:
> +    case ARM_BUILTIN_WMIAWBBN:
> +    case ARM_BUILTIN_WMIAWBTN:
> +    case ARM_BUILTIN_WMIAWTBN:
> +    case ARM_BUILTIN_WMIAWTTN:
> +    case ARM_BUILTIN_WSADB:
> +    case ARM_BUILTIN_WSADH:
>       icode = (fcode == ARM_BUILTIN_WMACS ? CODE_FOR_iwmmxt_wmacs
>               : fcode == ARM_BUILTIN_WMACU ? CODE_FOR_iwmmxt_wmacu
>               : fcode == ARM_BUILTIN_TMIA ? CODE_FOR_iwmmxt_tmia
> @@ -21526,7 +21882,32 @@ arm_expand_builtin (tree exp,
>               : fcode == ARM_BUILTIN_TMIABT ? CODE_FOR_iwmmxt_tmiabt
>               : fcode == ARM_BUILTIN_TMIATB ? CODE_FOR_iwmmxt_tmiatb
>               : fcode == ARM_BUILTIN_TMIATT ? CODE_FOR_iwmmxt_tmiatt
> -              : CODE_FOR_iwmmxt_walign);
> +              : fcode == ARM_BUILTIN_WQMIABB ? CODE_FOR_iwmmxt_wqmiabb
> +              : fcode == ARM_BUILTIN_WQMIABT ? CODE_FOR_iwmmxt_wqmiabt
> +              : fcode == ARM_BUILTIN_WQMIATB ? CODE_FOR_iwmmxt_wqmiatb
> +              : fcode == ARM_BUILTIN_WQMIATT ? CODE_FOR_iwmmxt_wqmiatt
> +              : fcode == ARM_BUILTIN_WQMIABBN ? CODE_FOR_iwmmxt_wqmiabbn
> +              : fcode == ARM_BUILTIN_WQMIABTN ? CODE_FOR_iwmmxt_wqmiabtn
> +              : fcode == ARM_BUILTIN_WQMIATBN ? CODE_FOR_iwmmxt_wqmiatbn
> +              : fcode == ARM_BUILTIN_WQMIATTN ? CODE_FOR_iwmmxt_wqmiattn
> +              : fcode == ARM_BUILTIN_WMIABB ? CODE_FOR_iwmmxt_wmiabb
> +              : fcode == ARM_BUILTIN_WMIABT ? CODE_FOR_iwmmxt_wmiabt
> +              : fcode == ARM_BUILTIN_WMIATB ? CODE_FOR_iwmmxt_wmiatb
> +              : fcode == ARM_BUILTIN_WMIATT ? CODE_FOR_iwmmxt_wmiatt
> +              : fcode == ARM_BUILTIN_WMIABBN ? CODE_FOR_iwmmxt_wmiabbn
> +              : fcode == ARM_BUILTIN_WMIABTN ? CODE_FOR_iwmmxt_wmiabtn
> +              : fcode == ARM_BUILTIN_WMIATBN ? CODE_FOR_iwmmxt_wmiatbn
> +              : fcode == ARM_BUILTIN_WMIATTN ? CODE_FOR_iwmmxt_wmiattn
> +              : fcode == ARM_BUILTIN_WMIAWBB ? CODE_FOR_iwmmxt_wmiawbb
> +              : fcode == ARM_BUILTIN_WMIAWBT ? CODE_FOR_iwmmxt_wmiawbt
> +              : fcode == ARM_BUILTIN_WMIAWTB ? CODE_FOR_iwmmxt_wmiawtb
> +              : fcode == ARM_BUILTIN_WMIAWTT ? CODE_FOR_iwmmxt_wmiawtt
> +              : fcode == ARM_BUILTIN_WMIAWBBN ? CODE_FOR_iwmmxt_wmiawbbn
> +              : fcode == ARM_BUILTIN_WMIAWBTN ? CODE_FOR_iwmmxt_wmiawbtn
> +              : fcode == ARM_BUILTIN_WMIAWTBN ? CODE_FOR_iwmmxt_wmiawtbn
> +              : fcode == ARM_BUILTIN_WMIAWTTN ? CODE_FOR_iwmmxt_wmiawttn
> +              : fcode == ARM_BUILTIN_WSADB ? CODE_FOR_iwmmxt_wsadb
> +              : CODE_FOR_iwmmxt_wsadh);

Can this chunk here be extracted from a table. Having a nested sequence of
ternary operations is just too gross.

>       arg0 = CALL_EXPR_ARG (exp, 0);
>       arg1 = CALL_EXPR_ARG (exp, 1);
>       arg2 = CALL_EXPR_ARG (exp, 2);
> @@ -21559,6 +21940,123 @@ arm_expand_builtin (tree exp,
>       emit_insn (gen_iwmmxt_clrdi (target));
>       return target;
>
> +    case ARM_BUILTIN_WSRLHI:
> +    case ARM_BUILTIN_WSRLWI:
> +    case ARM_BUILTIN_WSRLDI:
> +    case ARM_BUILTIN_WSLLHI:
> +    case ARM_BUILTIN_WSLLWI:
> +    case ARM_BUILTIN_WSLLDI:
> +    case ARM_BUILTIN_WSRAHI:
> +    case ARM_BUILTIN_WSRAWI:
> +    case ARM_BUILTIN_WSRADI:
> +    case ARM_BUILTIN_WRORHI:
> +    case ARM_BUILTIN_WRORWI:
> +    case ARM_BUILTIN_WRORDI:
> +    case ARM_BUILTIN_WSRLH:
> +    case ARM_BUILTIN_WSRLW:
> +    case ARM_BUILTIN_WSRLD:
> +    case ARM_BUILTIN_WSLLH:
> +    case ARM_BUILTIN_WSLLW:
> +    case ARM_BUILTIN_WSLLD:
> +    case ARM_BUILTIN_WSRAH:
> +    case ARM_BUILTIN_WSRAW:
> +    case ARM_BUILTIN_WSRAD:
> +    case ARM_BUILTIN_WRORH:
> +    case ARM_BUILTIN_WRORW:
> +    case ARM_BUILTIN_WRORD:
> +      icode = (fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
> +              : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
> +              : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
> +              : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
> +              : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
> +              : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
> +              : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
> +              : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
> +              : fcode == ARM_BUILTIN_WSLLH  ? CODE_FOR_ashlv4hi3_di
> +              : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
> +              : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
> +              : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
> +              : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
> +              : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
> +              : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
> +              : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
> +              : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
> +              : CODE_FOR_nothing);
> +      arg1 = CALL_EXPR_ARG (exp, 1);
> +      op1 = expand_normal (arg1);
> +      if (GET_MODE (op1) == VOIDmode)
> +       {
> +         imm = INTVAL (op1);
> +         if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI
> +              || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
> +             && (imm < 0 || imm > 32))
> +           {
> +             if (fcode == ARM_BUILTIN_WRORHI)
> +               error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi16 in code.");
> +             else if (fcode == ARM_BUILTIN_WRORWI)
> +               error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi32 in code.");
> +             else if (fcode == ARM_BUILTIN_WRORH)
> +               error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi16 in code.");
> +             else
> +               error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi32 in code.");
> +           }
> +         else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
> +                  && (imm < 0 || imm > 64))
> +           {
> +             if (fcode == ARM_BUILTIN_WRORDI)
> +               error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_rori_si64 in code.");
> +             else
> +               error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_ror_si64 in code.");
> +           }
> +         else if (imm < 0)
> +           {
> +             if (fcode == ARM_BUILTIN_WSRLHI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi16 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRLWI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi32 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRLDI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srli_si64 in code.");
> +             else if (fcode == ARM_BUILTIN_WSLLHI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi16 in code.");
> +             else if (fcode == ARM_BUILTIN_WSLLWI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi32 in code.");
> +             else if (fcode == ARM_BUILTIN_WSLLDI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_slli_si64 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRAHI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi16 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRAWI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi32 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRADI)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srai_si64 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRLH)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi16 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRLW)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi32 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRLD)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_srl_si64 in code.");
> +             else if (fcode == ARM_BUILTIN_WSLLH)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi16 in code.");
> +             else if (fcode == ARM_BUILTIN_WSLLW)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi32 in code.");
> +             else if (fcode == ARM_BUILTIN_WSLLD)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_sll_si64 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRAH)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi16 in code.");
> +             else if (fcode == ARM_BUILTIN_WSRAW)
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi32 in code.");
> +             else
> +               error ("the count should be no less than 0.  please check the intrinsic _mm_sra_si64 in code.");

Uggh. I'd really rather have a nicer way of doing this - Wouldn't it
make more sense to extract this information from a table rather than
have such a sequence of nested ifs ?
Is there a way we can get to the location of the expansion and give
better diagnostics using error_at ? Can you try and organize the table
above
to have this information as well and just index the error string from there ?

Also it would be nice to have some execute tests for some of these
intrinsics in the testsuite.

regards,
Ramana


> +           }
> +       }
> +      return arm_expand_binop_builtin (icode, exp, target);
> +
>     case ARM_BUILTIN_THREAD_POINTER:
>       return arm_load_tp (target);
>
> --
> 1.7.3.4
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
                   ` (4 preceding siblings ...)
  2012-05-29  4:15 ` [PATCH ARM iWMMXt 4/5] WMMX machine description Matt Turner
@ 2012-06-06 11:59 ` Ramana Radhakrishnan
  2012-06-11  9:24 ` nick clifton
  2012-06-13  7:36 ` nick clifton
  7 siblings, 0 replies; 33+ messages in thread
From: Ramana Radhakrishnan @ 2012-06-06 11:59 UTC (permalink / raw)
  To: Matt Turner
  Cc: gcc-patches, Richard Earnshaw, Nick Clifton, Paul Brook, Xinyu Qi

On 29 May 2012 05:13, Matt Turner <mattst88@gmail.com> wrote:
>
> This series was written by Marvell and sent by Xinyu Qi <xyqi@marvell.com>
> a number of times in the last year.
>
> We (One Laptop per Child) need these patches for reasonable iWMMXt support
> and performance. Without them, logical and shift intrinsics cause ICEs,
> see PR 35294 and its duplicates 36798 and 36966.
>
> The software compositing library pixman uses MMX intrinsics to optimize
> various compositing routines. The following are the minimum execution times
> of cairo-perf-trace graphics work loads without and with iWMMXt-optimized
> pixman for the image and image16 backends (32-bpp and 16-bpp respectively).
>
>                             image               image16
>           evolution   33.492 ->  29.590    30.334 ->  24.751
> firefox-planet-gnome  191.465 -> 173.835   211.297 -> 187.570
> gnome-system-monitor   51.956 ->  44.549    52.272 ->  40.525
>  gnome-terminal-vim   53.625 ->  54.554    47.593 ->  47.341
>      grads-heat-map    4.439 ->   4.165     4.548 ->   4.624
>       midori-zoomed   38.033 ->  28.500    38.576 ->  26.937
>             poppler   41.096 ->  31.949    41.230 ->  31.749
>  swfdec-giant-steps   20.062 ->  16.912    28.294 ->  17.286
>      swfdec-youtube   42.281 ->  37.335    52.848 ->  47.053
>   xfce4-terminal-a1   64.311 ->  51.011    62.592 ->  51.191
>
> We have cleaned up some white-space issues with the patches and fixed a
> small bug in patch 4/5 since the last time they were posted in December
> (added tandc,textrc,torc,torvsc to the "wtype" attribute)
>
> Please commit them for 4.8.

You do not mention how these patches have been tested with trunk after
you've rebased them - I understand that you are using them in your
port but can you specify how these were tested and what the results
looked like ?

Ramana

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 2/5] intrinsic head file change
  2012-05-29  4:15 ` [PATCH ARM iWMMXt 2/5] intrinsic head file change Matt Turner
@ 2012-06-06 12:22   ` Ramana Radhakrishnan
  0 siblings, 0 replies; 33+ messages in thread
From: Ramana Radhakrishnan @ 2012-06-06 12:22 UTC (permalink / raw)
  To: Matt Turner
  Cc: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw,
	Nick Clifton, Paul Brook, Xinyu Qi

I've only had a brief look at this and point out certain stylistic
issues that I noticed and would like another set of eyes on this and
the next patch.


On 29 May 2012 05:13, Matt Turner <mattst88@gmail.com> wrote:
> From: Xinyu Qi <xyqi@marvell.com>
>
>        gcc/
>        * config/arm/mmintrin.h: Use __IWMMXT__ to enable iWMMXt intrinsics.
>        Use __IWMMXT2__ to enable iWMMXt2 intrinsics.
>        Use C name-mangling for intrinsics.
>        (__v8qi): Redefine.
>        (_mm_cvtsi32_si64, _mm_andnot_si64, _mm_sad_pu8): Revise.
>        (_mm_sad_pu16, _mm_align_si64, _mm_setwcx, _mm_getwcx): Likewise.
>        (_m_from_int): Likewise.
>        (_mm_sada_pu8, _mm_sada_pu16): New intrinsic.
>        (_mm_alignr0_si64, _mm_alignr1_si64, _mm_alignr2_si64): Likewise.
>        (_mm_alignr3_si64, _mm_tandcb, _mm_tandch, _mm_tandcw): Likewise.
>        (_mm_textrcb, _mm_textrch, _mm_textrcw, _mm_torcb): Likewise.
>        (_mm_torch, _mm_torcw, _mm_tbcst_pi8, _mm_tbcst_pi16): Likewise.
>        (_mm_tbcst_pi32): Likewise.
>        (_mm_abs_pi8, _mm_abs_pi16, _mm_abs_pi32): New iWMMXt2 intrinsic.
>        (_mm_addsubhx_pi16, _mm_absdiff_pu8, _mm_absdiff_pu16): Likewise.
>        (_mm_absdiff_pu32, _mm_addc_pu16, _mm_addc_pu32): Likewise.
>        (_mm_avg4_pu8, _mm_avg4r_pu8, _mm_maddx_pi16, _mm_maddx_pu16): Likewise.
>        (_mm_msub_pi16, _mm_msub_pu16, _mm_mulhi_pi32): Likewise.
>        (_mm_mulhi_pu32, _mm_mulhir_pi16, _mm_mulhir_pi32): Likewise.
>        (_mm_mulhir_pu16, _mm_mulhir_pu32, _mm_mullo_pi32): Likewise.
>        (_mm_qmulm_pi16, _mm_qmulm_pi32, _mm_qmulmr_pi16): Likewise.
>        (_mm_qmulmr_pi32, _mm_subaddhx_pi16, _mm_addbhusl_pu8): Likewise.
>        (_mm_addbhusm_pu8, _mm_qmiabb_pi32, _mm_qmiabbn_pi32): Likewise.
>        (_mm_qmiabt_pi32, _mm_qmiabtn_pi32, _mm_qmiatb_pi32): Likewise.
>        (_mm_qmiatbn_pi32, _mm_qmiatt_pi32, _mm_qmiattn_pi32): Likewise.
>        (_mm_wmiabb_si64, _mm_wmiabbn_si64, _mm_wmiabt_si64): Likewise.
>        (_mm_wmiabtn_si64, _mm_wmiatb_si64, _mm_wmiatbn_si64): Likewise.
>        (_mm_wmiatt_si64, _mm_wmiattn_si64, _mm_wmiawbb_si64): Likewise.
>        (_mm_wmiawbbn_si64, _mm_wmiawbt_si64, _mm_wmiawbtn_si64): Likewise.
>        (_mm_wmiawtb_si64, _mm_wmiawtbn_si64, _mm_wmiawtt_si64): Likewise.
>        (_mm_wmiawttn_si64, _mm_merge_si64): Likewise.
>        (_mm_torvscb, _mm_torvsch, _mm_torvscw): Likewise.
>        (_m_to_int): New define.
> ---
>  gcc/config/arm/mmintrin.h |  649 ++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 614 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
> index 2cc500d..0fe551d 100644
> --- a/gcc/config/arm/mmintrin.h
> +++ b/gcc/config/arm/mmintrin.h
> @@ -24,16 +24,30 @@
>  #ifndef _MMINTRIN_H_INCLUDED
>  #define _MMINTRIN_H_INCLUDED
>
> +#ifndef __IWMMXT__
> +#error You must enable WMMX/WMMX2 instructions (e.g. -march=iwmmxt or -march=iwmmxt2) to use iWMMXt/iWMMXt2 intrinsics
> +#else
> +
> +#ifndef __IWMMXT2__
> +#warning You only enable iWMMXt intrinsics. Extended iWMMXt2 intrinsics available only if WMMX2 instructions enabled (e.g. -march=iwmmxt2)
> +#endif
> +

Extra newline.

> +
> +#if defined __cplusplus
> +extern "C" { /* Begin "C" */
> +/* Intrinsics use C name-mangling.  */
> +#endif /* __cplusplus */
> +
>  /* The data type intended for user use.  */
>  typedef unsigned long long __m64, __int64;
>
>  /* Internal data types for implementing the intrinsics.  */
>  typedef int __v2si __attribute__ ((vector_size (8)));
>  typedef short __v4hi __attribute__ ((vector_size (8)));
> -typedef char __v8qi __attribute__ ((vector_size (8)));
> +typedef signed char __v8qi __attribute__ ((vector_size (8)));
>
>  /* "Convert" __m64 and __int64 into each other.  */
> -static __inline __m64
> +static __inline __m64
>  _mm_cvtsi64_m64 (__int64 __i)
>  {
>   return __i;
> @@ -54,7 +68,7 @@ _mm_cvtsi64_si32 (__int64 __i)
>  static __inline __int64
>  _mm_cvtsi32_si64 (int __i)
>  {
> -  return __i;
> +  return (__i & 0xffffffff);
>  }
>
>  /* Pack the four 16-bit values from M1 into the lower four 8-bit values of
> @@ -603,7 +617,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2)
>  static __inline __m64
>  _mm_andnot_si64 (__m64 __m1, __m64 __m2)
>  {
> -  return __builtin_arm_wandn (__m1, __m2);
> +  return __builtin_arm_wandn (__m2, __m1);
>  }
>
>  /* Bit-wise inclusive OR the 64-bit values in M1 and M2.  */
> @@ -935,7 +949,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B)
>  static __inline __m64
>  _mm_sad_pu8 (__m64 __A, __m64 __B)
>  {
> -  return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B);
> +  return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B);
> +}
> +
> +static __inline __m64
> +_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C)
> +{
> +  return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8qi)__C);
>  }
>
>  /* Compute the sum of the absolute differences of the unsigned 16-bit
> @@ -944,9 +964,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B)
>  static __inline __m64
>  _mm_sad_pu16 (__m64 __A, __m64 __B)
>  {
> -  return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B);
> +  return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
>  }
>
> +static __inline __m64
> +_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C)
> +{
> +  return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4hi)__C);
> +}
> +
> +
>  /* Compute the sum of the absolute differences of the unsigned 8-bit
>    values in A and B.  Return the value in the lower 16-bit word; the
>    upper words are cleared.  */
> @@ -965,11 +992,8 @@ _mm_sadz_pu16 (__m64 __A, __m64 __B)
>   return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
>  }
>
> -static __inline __m64
> -_mm_align_si64 (__m64 __A, __m64 __B, int __C)
> -{
> -  return (__m64) __builtin_arm_walign ((__v8qi)__A, (__v8qi)__B, __C);
> -}
> +#define _mm_align_si64(__A,__B, N) \
> +  (__m64) __builtin_arm_walign ((__v8qi) (__A),(__v8qi) (__B), (N))
>
>  /* Creates a 64-bit zero.  */
>  static __inline __m64
> @@ -987,42 +1011,76 @@ _mm_setwcx (const int __value, const int __regno)
>  {
>   switch (__regno)
>     {
> -    case 0:  __builtin_arm_setwcx (__value, 0); break;
> -    case 1:  __builtin_arm_setwcx (__value, 1); break;
> -    case 2:  __builtin_arm_setwcx (__value, 2); break;
> -    case 3:  __builtin_arm_setwcx (__value, 3); break;
> -    case 8:  __builtin_arm_setwcx (__value, 8); break;
> -    case 9:  __builtin_arm_setwcx (__value, 9); break;
> -    case 10: __builtin_arm_setwcx (__value, 10); break;
> -    case 11: __builtin_arm_setwcx (__value, 11); break;
> -    default: break;
> +    case 0:
> +      __asm __volatile ("tmcr wcid, %0" :: "r"(__value));
> +      break;
> +    case 1:
> +      __asm __volatile ("tmcr wcon, %0" :: "r"(__value));
> +      break;
> +    case 2:
> +      __asm __volatile ("tmcr wcssf, %0" :: "r"(__value));
> +      break;
> +    case 3:
> +      __asm __volatile ("tmcr wcasf, %0" :: "r"(__value));
> +      break;
> +    case 8:
> +      __builtin_arm_setwcgr0 (__value);
> +      break;
> +    case 9:
> +      __builtin_arm_setwcgr1 (__value);
> +      break;
> +    case 10:
> +      __builtin_arm_setwcgr2 (__value);
> +      break;
> +    case 11:
> +      __builtin_arm_setwcgr3 (__value);
> +      break;
> +    default:
> +      break;
>     }
>  }
>
>  static __inline int
>  _mm_getwcx (const int __regno)
>  {
> +  int __value;
>   switch (__regno)
>     {
> -    case 0:  return __builtin_arm_getwcx (0);
> -    case 1:  return __builtin_arm_getwcx (1);
> -    case 2:  return __builtin_arm_getwcx (2);
> -    case 3:  return __builtin_arm_getwcx (3);
> -    case 8:  return __builtin_arm_getwcx (8);
> -    case 9:  return __builtin_arm_getwcx (9);
> -    case 10: return __builtin_arm_getwcx (10);
> -    case 11: return __builtin_arm_getwcx (11);
> -    default: return 0;
> +    case 0:
> +      __asm __volatile ("tmrc %0, wcid" : "=r"(__value));
> +      break;
> +    case 1:
> +      __asm __volatile ("tmrc %0, wcon" : "=r"(__value));
> +      break;
> +    case 2:
> +      __asm __volatile ("tmrc %0, wcssf" : "=r"(__value));
> +      break;
> +    case 3:
> +      __asm __volatile ("tmrc %0, wcasf" : "=r"(__value));
> +      break;
> +    case 8:
> +      return __builtin_arm_getwcgr0 ();
> +    case 9:
> +      return __builtin_arm_getwcgr1 ();
> +    case 10:
> +      return __builtin_arm_getwcgr2 ();
> +    case 11:
> +      return __builtin_arm_getwcgr3 ();
> +    default:
> +      break;
>     }
> +  return __value;
>  }
>
>  /* Creates a vector of two 32-bit values; I0 is least significant.  */
>  static __inline __m64
>  _mm_set_pi32 (int __i1, int __i0)
>  {
> -  union {
> +  union
> +  {
>     __m64 __q;
> -    struct {
> +    struct
> +    {
>       unsigned int __i0;
>       unsigned int __i1;
>     } __s;
> @@ -1041,7 +1099,7 @@ _mm_set_pi16 (short __w3, short __w2, short __w1, short __w0)
>   unsigned int __i1 = (unsigned short)__w3 << 16 | (unsigned short)__w2;
>   unsigned int __i0 = (unsigned short)__w1 << 16 | (unsigned short)__w0;
>   return _mm_set_pi32 (__i1, __i0);
> -
> +
Extra newline again here.
>  }
>
>  /* Creates a vector of eight 8-bit values; B0 is least significant.  */
> @@ -1108,11 +1166,526 @@ _mm_set1_pi8 (char __b)
>   return _mm_set1_pi32 (__i);
>  }
>
> -/* Convert an integer to a __m64 object.  */
> +#ifdef __IWMMXT2__
> +static __inline __m64
> +_mm_abs_pi8 (__m64 m1)
> +{
> +  return (__m64) __builtin_arm_wabsb ((__v8qi)m1);
> +}
> +
> +static __inline __m64
> +_mm_abs_pi16 (__m64 m1)
> +{
> +  return (__m64) __builtin_arm_wabsh ((__v4hi)m1);
> +

And here.

> +}
> +
> +static __inline __m64
> +_mm_abs_pi32 (__m64 m1)
> +{
> +  return (__m64) __builtin_arm_wabsw ((__v2si)m1);
> +
and here.

<large part snipped.>

> +
> +#define _mm_qmiabb_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wqmiabb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_qmiabbn_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wqmiabbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_qmiabt_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wqmiabt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_qmiabtn_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc=acc;\
> +   __m64 _m1=m1;\
> +   __m64 _m2=m2;\
> +   _acc = (__m64) __builtin_arm_wqmiabtn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_qmiatb_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wqmiatb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_qmiatbn_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wqmiatbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_qmiatt_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wqmiatt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_qmiattn_pi32(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wqmiattn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiabb_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiabb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiabbn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiabbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiabt_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiabt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiabtn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiabtn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiatb_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiatb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiatbn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiatbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiatt_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiatt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiattn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiattn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawbb_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawbb (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawbbn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawbbn (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawbt_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawbt (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawbtn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawbtn (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawtb_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawtb (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawtbn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawtbn (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawtt_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawtt (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })
> +
> +#define _mm_wmiawttn_si64(acc, m1, m2) \
> +  ({\
> +   __m64 _acc = acc;\
> +   __m64 _m1 = m1;\
> +   __m64 _m2 = m2;\
> +   _acc = (__m64) __builtin_arm_wmiawttn (_acc, (__v2si)_m1, (__v2si)_m2);\
> +   _acc;\
> +   })

I assume someone knows why these are macros and not inline functions
like the others ?


> +
> +/* The third arguments should be an immediate.  */

s/arguments/argument

> +#define _mm_merge_si64(a, b, n) \
> +  ({\
> +   __m64 result;\
> +   result = (__m64) __builtin_arm_wmerge ((__m64) (a), (__m64) (b), (n));\
> +   result;\
> +   })
> +#endif  /* __IWMMXT2__ */
> +

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
                   ` (5 preceding siblings ...)
  2012-06-06 11:59 ` [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Ramana Radhakrishnan
@ 2012-06-11  9:24 ` nick clifton
  2012-06-13  7:36 ` nick clifton
  7 siblings, 0 replies; 33+ messages in thread
From: nick clifton @ 2012-06-11  9:24 UTC (permalink / raw)
  To: Matt Turner
  Cc: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw, Paul Brook,
	Xinyu Qi

Hi Matt,

   This is just to let you know that I am currently reviewing these 
patches.  I do have a problem however.  With the patches applied I am 
seeing very bad results from the gcc testsuite when run with 
-mcpu=iwmmxt.  (Bad as in the testsuite takes days to run and most tests 
fail).  I am currently looking into this, but it may take me some time 
to track the problem down.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
                   ` (6 preceding siblings ...)
  2012-06-11  9:24 ` nick clifton
@ 2012-06-13  7:36 ` nick clifton
  2012-06-13 15:31   ` Matt Turner
  7 siblings, 1 reply; 33+ messages in thread
From: nick clifton @ 2012-06-13  7:36 UTC (permalink / raw)
  To: Matt Turner, Xinyu Qi
  Cc: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw, Paul Brook

Hi Matt, Hi Xinyu,

> This series was written by Marvell and sent by Xinyu Qi<xyqi@marvell.com>
> a number of times in the last year.

Sorry for the long delay in reviewing these patches.  Overall they were 
fine, with only a few, very minor, formatting issues.  I have committed 
the entire series of patches to the mainline.

> For 4.7 and 4.6 please consider committing my patch
> "[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)."
> which only fixes the logical and shift intrinsics.

I will look at this and post separately about it.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2012-06-13  7:36 ` nick clifton
@ 2012-06-13 15:31   ` Matt Turner
  2012-06-26 15:20     ` nick clifton
  0 siblings, 1 reply; 33+ messages in thread
From: Matt Turner @ 2012-06-13 15:31 UTC (permalink / raw)
  To: nick clifton
  Cc: Xinyu Qi, gcc-patches, Ramana Radhakrishnan, Richard Earnshaw,
	Paul Brook

On Wed, Jun 13, 2012 at 3:26 AM, nick clifton <nickc@redhat.com> wrote:
> Hi Matt, Hi Xinyu,
>
>
>> This series was written by Marvell and sent by Xinyu Qi<xyqi@marvell.com>
>> a number of times in the last year.
>
>
> Sorry for the long delay in reviewing these patches.  Overall they were
> fine, with only a few, very minor, formatting issues.  I have committed the
> entire series of patches to the mainline.

Great! Thank you so much! Thanks to Ramana for the reviews!

>> For 4.7 and 4.6 please consider committing my patch
>> "[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)."
>> which only fixes the logical and shift intrinsics.

Sounds good.

There's also a trivial documentation fix:

[PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation

and a test to exercise the intrinsics:

[PATCH 2/2] arm: add iwMMXt mmx-2.c test

Thanks a lot!

Matt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2012-06-13 15:31   ` Matt Turner
@ 2012-06-26 15:20     ` nick clifton
  2012-06-27 19:15       ` Matt Turner
  2013-01-28  3:49       ` Matt Turner
  0 siblings, 2 replies; 33+ messages in thread
From: nick clifton @ 2012-06-26 15:20 UTC (permalink / raw)
  To: Matt Turner
  Cc: Xinyu Qi, gcc-patches, Ramana Radhakrishnan, Richard Earnshaw,
	Paul Brook

Hi Matt,

> There's also a trivial documentation fix:
>
> [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation
>
> and a test to exercise the intrinsics:
>
> [PATCH 2/2] arm: add iwMMXt mmx-2.c test

These have both been checked in.

It turns out that both needed minor updates as some of the builtins have 
changed since these patches were written.  I have taken care of this 
however.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2012-06-26 15:20     ` nick clifton
@ 2012-06-27 19:15       ` Matt Turner
  2013-01-28  3:49       ` Matt Turner
  1 sibling, 0 replies; 33+ messages in thread
From: Matt Turner @ 2012-06-27 19:15 UTC (permalink / raw)
  To: nick clifton
  Cc: Xinyu Qi, gcc-patches, Ramana Radhakrishnan, Richard Earnshaw,
	Paul Brook

On Tue, Jun 26, 2012 at 10:56 AM, nick clifton <nickc@redhat.com> wrote:
> Hi Matt,
>
>
>> There's also a trivial documentation fix:
>>
>> [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation
>>
>> and a test to exercise the intrinsics:
>>
>> [PATCH 2/2] arm: add iwMMXt mmx-2.c test
>
>
> These have both been checked in.
>
> It turns out that both needed minor updates as some of the builtins have
> changed since these patches were written.  I have taken care of this
> however.
>
> Cheers
>  Nick

Thanks a lot, Nick!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2012-06-06 11:53   ` Ramana Radhakrishnan
@ 2012-12-27  2:31     ` Xinyu Qi
  2013-01-22  9:22     ` [PING][PATCH, " Xinyu Qi
  1 sibling, 0 replies; 33+ messages in thread
From: Xinyu Qi @ 2012-12-27  2:31 UTC (permalink / raw)
  To: gcc-patches

Hi,

  It is necessary to sync the constants WCGR0 to WCGR3 in iwmmxt.md
with the IWMMXT_GR_REGNUM in arm.h.

ChangeLog
	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
	* config/arm/iwmmxt.md (WCGR0, WCGR1): Update.
	* config/arm/iwmmxt.md (WCGR2, WCGR3): Likewise.

Index: config/arm/arm.h
===================================================================
--- config/arm/arm.h	(revision 194603)
+++ config/arm/arm.h	(working copy)
@@ -947,6 +947,8 @@
 
 #define FIRST_IWMMXT_REGNUM	(LAST_HI_VFP_REGNUM + 1)
 #define LAST_IWMMXT_REGNUM	(FIRST_IWMMXT_REGNUM + 15)
+
+/* Need to sync with WCGR in iwmmxt.md.  */
 #define FIRST_IWMMXT_GR_REGNUM	(LAST_IWMMXT_REGNUM + 1)
 #define LAST_IWMMXT_GR_REGNUM	(FIRST_IWMMXT_GR_REGNUM + 3)
 
Index: config/arm/iwmmxt.md
===================================================================
--- config/arm/iwmmxt.md	(revision 194603)
+++ config/arm/iwmmxt.md	(working copy)
@@ -19,12 +19,12 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-;; Register numbers
+;; Register numbers. Need to sync with FIRST_IWMMXT_GR_REGNUM in arm.h
 (define_constants
-  [(WCGR0           43)
-   (WCGR1           44)
-   (WCGR2           45)
-   (WCGR3           46)
+  [(WCGR0           96)
+   (WCGR1           97)
+   (WCGR2           98)
+   (WCGR3           99)
   ]
 )


OK?

Thanks,
Xinyu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PING][PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2012-06-06 11:53   ` Ramana Radhakrishnan
  2012-12-27  2:31     ` [PATCH, ARM, iWMMXT] Fix define_constants for WCGR Xinyu Qi
@ 2013-01-22  9:22     ` Xinyu Qi
  2013-01-22 11:59       ` Ramana Radhakrishnan
  1 sibling, 1 reply; 33+ messages in thread
From: Xinyu Qi @ 2013-01-22  9:22 UTC (permalink / raw)
  To: gcc-patches

Ping,

Fix ChangeLog
	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
	* config/arm/iwmmxt.md (WCGR0): Update.
	 (WCGR1, WCGR2, WCGR3): Likewise.

> Hi,
> 
>   It is necessary to sync the constants WCGR0 to WCGR3 in iwmmxt.md with
> the IWMMXT_GR_REGNUM in arm.h.
> 
> ChangeLog
> 	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
> 	* config/arm/iwmmxt.md (WCGR0, WCGR1): Update.
> 	* config/arm/iwmmxt.md (WCGR2, WCGR3): Likewise.
> 
> Index: config/arm/arm.h
> ================================================================
> ===
> --- config/arm/arm.h	(revision 194603)
> +++ config/arm/arm.h	(working copy)
> @@ -947,6 +947,8 @@
> 
>  #define FIRST_IWMMXT_REGNUM	(LAST_HI_VFP_REGNUM + 1)
>  #define LAST_IWMMXT_REGNUM	(FIRST_IWMMXT_REGNUM + 15)
> +
> +/* Need to sync with WCGR in iwmmxt.md.  */
>  #define FIRST_IWMMXT_GR_REGNUM	(LAST_IWMMXT_REGNUM + 1)
>  #define LAST_IWMMXT_GR_REGNUM	(FIRST_IWMMXT_GR_REGNUM +
> 3)
> 
> Index: config/arm/iwmmxt.md
> ================================================================
> ===
> --- config/arm/iwmmxt.md	(revision 194603)
> +++ config/arm/iwmmxt.md	(working copy)
> @@ -19,12 +19,12 @@
>  ;; along with GCC; see the file COPYING3.  If not see  ;;
> <http://www.gnu.org/licenses/>.
> 
> -;; Register numbers
> +;; Register numbers. Need to sync with FIRST_IWMMXT_GR_REGNUM in
> arm.h
>  (define_constants
> -  [(WCGR0           43)
> -   (WCGR1           44)
> -   (WCGR2           45)
> -   (WCGR3           46)
> +  [(WCGR0           96)
> +   (WCGR1           97)
> +   (WCGR2           98)
> +   (WCGR3           99)
>    ]
>  )
> 
> 
> OK?
> 
> Thanks,
> Xinyu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PING][PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2013-01-22  9:22     ` [PING][PATCH, " Xinyu Qi
@ 2013-01-22 11:59       ` Ramana Radhakrishnan
  2013-01-22 13:34         ` Andreas Schwab
                           ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Ramana Radhakrishnan @ 2013-01-22 11:59 UTC (permalink / raw)
  To: Xinyu Qi; +Cc: gcc-patches

On 01/22/13 09:21, Xinyu Qi wrote:
> Ping,
>
> Fix ChangeLog

The ChangeLog format includes .

<date>  <Author's name>  <a.b@c.com>

If you want a patch accepted in the future, please help by creating the 
Changelog entry in the correct format, i.e. fill in the author's name as 
well as email address as below. I've created an entry as below. Please 
remember to do so for every patch you submit - thanks.

<DATE>  Xinyu Qi  <xyqi@marvell.com>

	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
	* config/arm/iwmmxt.md (WCGR0): Update.
	(WCGR1, WCGR2, WCGR3): Likewise.

The patch by itself is OK but surprisingly I never saw this earlier. 
Your ping has removed the date from the original post so I couldn't 
track it down.

Anyway, please apply.


regards,
Ramana



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PING][PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2013-01-22 11:59       ` Ramana Radhakrishnan
@ 2013-01-22 13:34         ` Andreas Schwab
  2013-01-23  6:08         ` Xinyu Qi
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Andreas Schwab @ 2013-01-22 13:34 UTC (permalink / raw)
  To: ramrad01; +Cc: Xinyu Qi, gcc-patches

Ramana Radhakrishnan <ramrad01@arm.com> writes:

> The patch by itself is OK but surprisingly I never saw this earlier. Your
> ping has removed the date from the original post so I couldn't track it
> down.

You can follow the references and look up the message-id via
http://mid.gmane.org/<msg-id>.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PING][PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2013-01-22 11:59       ` Ramana Radhakrishnan
  2013-01-22 13:34         ` Andreas Schwab
@ 2013-01-23  6:08         ` Xinyu Qi
  2013-01-31  8:49         ` [PATCH, " Xinyu Qi
  2013-03-20  2:43         ` Xinyu Qi
  3 siblings, 0 replies; 33+ messages in thread
From: Xinyu Qi @ 2013-01-23  6:08 UTC (permalink / raw)
  To: ramrad01; +Cc: gcc-patches

At 2013-01-22 19:58:43,"Ramana Radhakrishnan" <ramrad01@arm.com> wrote:
> On 01/22/13 09:21, Xinyu Qi wrote:
> > Ping,
> >
> > Fix ChangeLog
> 
> The ChangeLog format includes .
> 
> <date>  <Author's name>  <a.b@c.com>
> 
> If you want a patch accepted in the future, please help by creating the
> Changelog entry in the correct format, i.e. fill in the author's name as well as
> email address as below. I've created an entry as below. Please remember to do
> so for every patch you submit - thanks.
> 
> <DATE>  Xinyu Qi  <xyqi@marvell.com>
> 
> 	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
> 	* config/arm/iwmmxt.md (WCGR0): Update.
> 	(WCGR1, WCGR2, WCGR3): Likewise.
> 
> The patch by itself is OK but surprisingly I never saw this earlier.
> Your ping has removed the date from the original post so I couldn't track it
> down.

Hi Ramana,

Thanks for reviewing.
I forget to keep the date which shows the original post is at Wed, 26 Dec 2012
You can find it at
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01418.html
I would remember to set the correct Changelog entry next time.

> 
> Anyway, please apply.

BTW, since I have no write access, would you mind to help to check in this patch?

Thanks!
Xinyu

> 
> 
> regards,
> Ramana
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2012-06-26 15:20     ` nick clifton
  2012-06-27 19:15       ` Matt Turner
@ 2013-01-28  3:49       ` Matt Turner
  2013-01-28 15:11         ` nick clifton
  1 sibling, 1 reply; 33+ messages in thread
From: Matt Turner @ 2013-01-28  3:49 UTC (permalink / raw)
  To: nick clifton
  Cc: Xinyu Qi, gcc-patches, Ramana Radhakrishnan, Richard Earnshaw,
	Paul Brook

On Tue, Jun 26, 2012 at 7:56 AM, nick clifton <nickc@redhat.com> wrote:
> Hi Matt,
>
>
>> There's also a trivial documentation fix:
>>
>> [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation
>>
>> and a test to exercise the intrinsics:
>>
>> [PATCH 2/2] arm: add iwMMXt mmx-2.c test
>
>
> These have both been checked in.
>
> It turns out that both needed minor updates as some of the builtins have
> changed since these patches were written.  I have taken care of this
> however.
>
> Cheers
>   Nick

Hi Nick,

Could this patch, or perhaps the much smaller one I attached to bug
35294 be committed to the 4.7 branch?

Also, could you close its duplicates, bugs 36798 and 36966?

Thanks,
Matt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
  2013-01-28  3:49       ` Matt Turner
@ 2013-01-28 15:11         ` nick clifton
  2013-02-21  2:35           ` closing PR's (was Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support) Hans-Peter Nilsson
  0 siblings, 1 reply; 33+ messages in thread
From: nick clifton @ 2013-01-28 15:11 UTC (permalink / raw)
  To: Matt Turner
  Cc: Xinyu Qi, gcc-patches, Ramana Radhakrishnan, Richard Earnshaw,
	Paul Brook

Hi Matt,

> Could this patch, or perhaps the much smaller one I attached to bug
> 35294 be committed to the 4.7 branch?

Yes.  Done.

> Also, could you close its duplicates, bugs 36798 and 36966?

Sorry no.  I do not actually own these PRs, so I cannot close them. :-(

Cheers
   Nick


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2013-01-22 11:59       ` Ramana Radhakrishnan
  2013-01-22 13:34         ` Andreas Schwab
  2013-01-23  6:08         ` Xinyu Qi
@ 2013-01-31  8:49         ` Xinyu Qi
  2013-03-20  2:43         ` Xinyu Qi
  3 siblings, 0 replies; 33+ messages in thread
From: Xinyu Qi @ 2013-01-31  8:49 UTC (permalink / raw)
  To: nick clifton; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

At 2013-01-22 19:58:43,"Ramana Radhakrishnan" <ramrad01@arm.com> wrote:> 
> On 01/22/13 09:21, Xinyu Qi wrote:
> > Ping,
> >
> > Fix ChangeLog
> 
> The ChangeLog format includes .
> 
> <date>  <Author's name>  <a.b@c.com>
> 
> If you want a patch accepted in the future, please help by creating the
> Changelog entry in the correct format, i.e. fill in the author's name as well as
> email address as below. I've created an entry as below. Please remember to do
> so for every patch you submit - thanks.
> 
> <DATE>  Xinyu Qi  <xyqi@marvell.com>
> 
> 	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
> 	* config/arm/iwmmxt.md (WCGR0): Update.
> 	(WCGR1, WCGR2, WCGR3): Likewise.
> 
> The patch by itself is OK but surprisingly I never saw this earlier.
> Your ping has removed the date from the original post so I couldn't track it
> down.
> 
> Anyway, please apply.
> 
> 
> regards,
> Ramana
> 
> 

Hi Nick,

Since I have no write access, would you mind to help to check in this patch which has already been approved?
The patch is attached.

ChangeLog
2013-01-31  Xinyu Qi  <xyqi@marvell.com>

	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
	* config/arm/iwmmxt.md (WCGR0): Update.
	(WCGR1, WCGR2, WCGR3): Likewise.

Thanks,
Xinyu

[-- Attachment #2: WCGR.diff --]
[-- Type: application/octet-stream, Size: 1102 bytes --]

Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	(revision 195599)
+++ gcc/config/arm/arm.h	(working copy)
@@ -945,6 +945,8 @@
 
 #define FIRST_IWMMXT_REGNUM	(LAST_HI_VFP_REGNUM + 1)
 #define LAST_IWMMXT_REGNUM	(FIRST_IWMMXT_REGNUM + 15)
+
+/* Need to sync with WCGR in iwmmxt.md.  */
 #define FIRST_IWMMXT_GR_REGNUM	(LAST_IWMMXT_REGNUM + 1)
 #define LAST_IWMMXT_GR_REGNUM	(FIRST_IWMMXT_GR_REGNUM + 3)
 
Index: gcc/config/arm/iwmmxt.md
===================================================================
--- gcc/config/arm/iwmmxt.md	(revision 195599)
+++ gcc/config/arm/iwmmxt.md	(working copy)
@@ -18,12 +18,12 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-;; Register numbers
+;; Register numbers. Need to sync with FIRST_IWMMXT_GR_REGNUM in arm.h
 (define_constants
-  [(WCGR0           43)
-   (WCGR1           44)
-   (WCGR2           45)
-   (WCGR3           46)
+  [(WCGR0           96)
+   (WCGR1           97)
+   (WCGR2           98)
+   (WCGR3           99)
   ]
 )
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* closing PR's (was Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support)
  2013-01-28 15:11         ` nick clifton
@ 2013-02-21  2:35           ` Hans-Peter Nilsson
  2013-02-22 12:42             ` nick clifton
  0 siblings, 1 reply; 33+ messages in thread
From: Hans-Peter Nilsson @ 2013-02-21  2:35 UTC (permalink / raw)
  To: nick clifton; +Cc: gcc-patches

On Mon, 28 Jan 2013, nick clifton wrote:
> > Also, could you close its duplicates, bugs 36798 and 36966?
>
> Sorry no.  I do not actually own these PRs, so I cannot close them. :-(

Sorry if I misinterpret, but it seems a reminder is in order:
magic powers are attached to whomever@gcc.gnu.org accounts in
bugzilla, so when people use them instead of their
whomever@employer.example.com, they are able to close PR's they
haven't created.

brgds, H-P

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: closing PR's (was Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support)
  2013-02-21  2:35           ` closing PR's (was Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support) Hans-Peter Nilsson
@ 2013-02-22 12:42             ` nick clifton
  0 siblings, 0 replies; 33+ messages in thread
From: nick clifton @ 2013-02-22 12:42 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc-patches

Hi Hans-Peter,
> Sorry if I misinterpret, but it seems a reminder is in order:
> magic powers are attached to whomever@gcc.gnu.org accounts in
> bugzilla, so when people use them instead of their
> whomever@employer.example.com, they are able to close PR's they
> haven't created.

Ah - thank you, I did not know that.  I have now logged in using that 
address and closed the requested PR.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2013-01-22 11:59       ` Ramana Radhakrishnan
                           ` (2 preceding siblings ...)
  2013-01-31  8:49         ` [PATCH, " Xinyu Qi
@ 2013-03-20  2:43         ` Xinyu Qi
  2013-03-26 14:01           ` Ramana Radhakrishnan
  3 siblings, 1 reply; 33+ messages in thread
From: Xinyu Qi @ 2013-03-20  2:43 UTC (permalink / raw)
  To: ramrad01; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

>At 2013-01-22 19:58:43,"Ramana Radhakrishnan" <ramrad01@arm.com> wrote:>
> > On 01/22/13 09:21, Xinyu Qi wrote:
> > > Ping,
> > >
> > > Fix ChangeLog
> >
> > The ChangeLog format includes .
> >
> > <date>  <Author's name>  <a.b@c.com>
> >
> > If you want a patch accepted in the future, please help by creating
> > the Changelog entry in the correct format, i.e. fill in the author's
> > name as well as email address as below. I've created an entry as
> > below. Please remember to do so for every patch you submit - thanks.
> >
> > <DATE>  Xinyu Qi  <xyqi@marvell.com>
> >
> > 	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
> > 	* config/arm/iwmmxt.md (WCGR0): Update.
> > 	(WCGR1, WCGR2, WCGR3): Likewise.
> >
> > The patch by itself is OK but surprisingly I never saw this earlier.
> > Your ping has removed the date from the original post so I couldn't
> > track it down.
> >
> > Anyway, please apply.
> >
> >
> > regards,
> > Ramana
> >
> >
> 
Hi Ramana,

Since I have no write access, would you mind to help to check in this patch?
The patch is attached.

ChangeLog
2013-01-31  Xinyu Qi  <xyqi@marvell.com>

	* config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
	* config/arm/iwmmxt.md (WCGR0): Update.
	(WCGR1, WCGR2, WCGR3): Likewise.

Thanks,
Xinyu

[-- Attachment #2: WCGR.DIFF --]
[-- Type: application/octet-stream, Size: 1102 bytes --]

Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	(revision 195599)
+++ gcc/config/arm/arm.h	(working copy)
@@ -945,6 +945,8 @@
 
 #define FIRST_IWMMXT_REGNUM	(LAST_HI_VFP_REGNUM + 1)
 #define LAST_IWMMXT_REGNUM	(FIRST_IWMMXT_REGNUM + 15)
+
+/* Need to sync with WCGR in iwmmxt.md.  */
 #define FIRST_IWMMXT_GR_REGNUM	(LAST_IWMMXT_REGNUM + 1)
 #define LAST_IWMMXT_GR_REGNUM	(FIRST_IWMMXT_GR_REGNUM + 3)
 
Index: gcc/config/arm/iwmmxt.md
===================================================================
--- gcc/config/arm/iwmmxt.md	(revision 195599)
+++ gcc/config/arm/iwmmxt.md	(working copy)
@@ -18,12 +18,12 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-;; Register numbers
+;; Register numbers. Need to sync with FIRST_IWMMXT_GR_REGNUM in arm.h
 (define_constants
-  [(WCGR0           43)
-   (WCGR1           44)
-   (WCGR2           45)
-   (WCGR3           46)
+  [(WCGR0           96)
+   (WCGR1           97)
+   (WCGR2           98)
+   (WCGR3           99)
   ]
 )
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH, ARM, iWMMXT] Fix define_constants for WCGR
  2013-03-20  2:43         ` Xinyu Qi
@ 2013-03-26 14:01           ` Ramana Radhakrishnan
  2013-04-02  9:55             ` [PATCH, ARM, iWMMXT] PR target/54338 - Include IWMMXT_GR_REGS in ALL_REGS Xinyu Qi
  0 siblings, 1 reply; 33+ messages in thread
From: Ramana Radhakrishnan @ 2013-03-26 14:01 UTC (permalink / raw)
  To: Xinyu Qi; +Cc: gcc-patches

On Wed, Mar 20, 2013 at 2:43 AM, Xinyu Qi <xyqi@marvell.com> wrote:
>>At 2013-01-22 19:58:43,"Ramana Radhakrishnan" <ramrad01@arm.com> wrote:>
>> > On 01/22/13 09:21, Xinyu Qi wrote:
>> > > Ping,
>> > >
>> > > Fix ChangeLog
>> >
>> > The ChangeLog format includes .
>> >
>> > <date>  <Author's name>  <a.b@c.com>
>> >
>> > If you want a patch accepted in the future, please help by creating
>> > the Changelog entry in the correct format, i.e. fill in the author's
>> > name as well as email address as below. I've created an entry as
>> > below. Please remember to do so for every patch you submit - thanks.
>> >
>> > <DATE>  Xinyu Qi  <xyqi@marvell.com>
>> >
>> >     * config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
>> >     * config/arm/iwmmxt.md (WCGR0): Update.
>> >     (WCGR1, WCGR2, WCGR3): Likewise.
>> >
>> > The patch by itself is OK but surprisingly I never saw this earlier.
>> > Your ping has removed the date from the original post so I couldn't
>> > track it down.
>> >
>> > Anyway, please apply.
>> >
>> >
>> > regards,
>> > Ramana
>> >
>> >
>>
> Hi Ramana,
>
> Since I have no write access, would you mind to help to check in this patch?
> The patch is attached.
>
> ChangeLog
> 2013-01-31  Xinyu Qi  <xyqi@marvell.com>
>
>         * config/arm/arm.h (FIRST_IWMMXT_GR_REGNUM): Add comment.
>         * config/arm/iwmmxt.md (WCGR0): Update.
>         (WCGR1, WCGR2, WCGR3): Likewise.
>

Now applied to trunk .sorry about the delay.

Ramana

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH, ARM, iWMMXT] PR target/54338 - Include IWMMXT_GR_REGS in ALL_REGS
  2013-03-26 14:01           ` Ramana Radhakrishnan
@ 2013-04-02  9:55             ` Xinyu Qi
  2013-04-02 10:03               ` Ramana Radhakrishnan
  0 siblings, 1 reply; 33+ messages in thread
From: Xinyu Qi @ 2013-04-02  9:55 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 693 bytes --]

Hi,
  According to Vladimir Makarov's analysis, the root cause of PR target/54338 is that ALL_REGS doesn't contain IWMMXT_GR_REGS in REG_CLASS_CONTENTS.
  It seems there is no reason to exclude the IWMMXT_GR_REGS from ALL_REGS as IWMMXT_GR_REGS are the real registers.
  This patch simply makes ALL_REGS include IWMMXT_GR_REGS to fix this PR.
  Since the test case gcc.target/arm/mmx-2.c would fail for the same reason and become pass with this fix, no extra test case need to be add.
  Pass arm.exp test. Patch attached.

ChangeLog

2013-04-02  Xinyu Qi  <xyqi@marvell.com>

	* config/arm/arm.h (REG_CLASS_CONTENTS): Include IWMMXT_GR_REGS in ALL_REGS.


OK?

Thanks,
Xinyu

[-- Attachment #2: IWMMXT_GR_REGS.diff --]
[-- Type: application/octet-stream, Size: 1359 bytes --]

Index: config/arm/arm.h
===================================================================
*** config/arm/arm.h	(revision 197340)
--- config/arm/arm.h	(working copy)
***************
*** 1203,1213 ****
    { 0x00000000, 0x00000000, 0x00000000, 0x0000000F }, /* IWMMXT_GR_REGS */ \
    { 0x00000000, 0x00000000, 0x00000000, 0x00000010 }, /* CC_REG */	\
    { 0x00000000, 0x00000000, 0x00000000, 0x00000020 }, /* VFPCC_REG */	\
    { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */	\
    { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */	\
!   { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000000 }  /* ALL_REGS */	\
  }
  
  /* Any of the VFP register classes.  */
  #define IS_VFP_CLASS(X) \
    ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS \
--- 1203,1213 ----
    { 0x00000000, 0x00000000, 0x00000000, 0x0000000F }, /* IWMMXT_GR_REGS */ \
    { 0x00000000, 0x00000000, 0x00000000, 0x00000010 }, /* CC_REG */	\
    { 0x00000000, 0x00000000, 0x00000000, 0x00000020 }, /* VFPCC_REG */	\
    { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */	\
    { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */	\
!   { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS */	\
  }
  
  /* Any of the VFP register classes.  */
  #define IS_VFP_CLASS(X) \
    ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS \

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH, ARM, iWMMXT] PR target/54338 - Include IWMMXT_GR_REGS in ALL_REGS
  2013-04-02  9:55             ` [PATCH, ARM, iWMMXT] PR target/54338 - Include IWMMXT_GR_REGS in ALL_REGS Xinyu Qi
@ 2013-04-02 10:03               ` Ramana Radhakrishnan
  0 siblings, 0 replies; 33+ messages in thread
From: Ramana Radhakrishnan @ 2013-04-02 10:03 UTC (permalink / raw)
  To: gcc-patches

On 04/02/13 10:40, Xinyu Qi wrote:
> Hi,
>    According to Vladimir Makarov's analysis, the root cause of PR target/54338 is that ALL_REGS doesn't contain IWMMXT_GR_REGS in REG_CLASS_CONTENTS.
>    It seems there is no reason to exclude the IWMMXT_GR_REGS from ALL_REGS as IWMMXT_GR_REGS are the real registers.
>    This patch simply makes ALL_REGS include IWMMXT_GR_REGS to fix this PR.
>    Since the test case gcc.target/arm/mmx-2.c would fail for the same reason and become pass with this fix, no extra test case need to be add.
>    Pass arm.exp test. Patch attached.

Testing just with arm.exp is not enough.

Ok if no regressions running the entire regression testsuite for C and 
C++ for arm*-*-*eabi with an iwmmxt configuration.

Thanks
Ramana


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change
  2011-08-18  2:35 ` Ramana Radhakrishnan
@ 2011-08-24  9:07   ` Xinyu Qi
  0 siblings, 0 replies; 33+ messages in thread
From: Xinyu Qi @ 2011-08-24  9:07 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3310 bytes --]

At 2011-08-18 09:33:27,"Ramana Radhakrishnan" <ramana.radhakrishnan@linaro.org> wrote:
> On 6 July 2011 11:11, Xinyu Qi <xyqi@marvell.com> wrote:
> > Hi,
> >
> > It is the second part of iWMMXt maintenance.
> >
> > *config/arm/mmintrin.h:
> >  Revise the iWMMXt intrinsics head file. Fix some intrinsics and add some
> new intrinsics
> 
> Is there a document somewhere that lists these intrinsics and what
> each of these are supposed to be doing ? Missing details again . We
> seem to be changing quite a few things.

Hi,
The intrinsic_doc.txt is attached. It is the piece of iWMMXt intrinsic details doc picked out from "Intel Wireless MMX Technology Intrinsic Support" with some modification.

> > +
> > +/*  We will treat __int64 as a long long type
> > +    and __m64 as an unsigned long long type to conform to VSC++.  */Is
> > +typedef unsigned long long __m64;
> > +typedef long long __int64;
> 
> Interesting this sort of a change with these cases where you are
> changing the type to conform to VSC++ ? This just means old code that
> uses this is pretty much broken. Not that I have much hope of that
> happening by default - -flax-conversions appears to be needed even
> with a trunk compiler.

I couldn't find any material to show why __int64 needs to be redefined. And all the tests are passed without this change. So decide to discard this change.

> 
> > @@ -54,7 +63,7 @@ _mm_cvtsi64_si32 (__int64 __i)
> >  static __inline __int64
> >  _mm_cvtsi32_si64 (int __i)
> >  {
> > -  return __i;
> > +  return (__i & 0xffffffff);
> >  }
> 
> Eh ? why the & 0xffffffff before promotion rules.  Is this set of
> intrinsics documented some place ?  What is missing and could be the
> subject of a follow-up patch is a set of tests for the wMMX intrinsics
> ....

See the intrinsics doc. It says the description of _mm_cvtsi32_si64 is "The integer value is zero-extended to 64 bits.
If r = _mm_cvtsi32_si64(i), then the action is
r [0:31] = i;
r[32:63] = 0;"

> 
> What's the behaviour of wandn supposed to be ? Does wandn x, y, z
> imply x = y & ~z or x = ~y & z ? If the former then your intrinsic
> expansion is wrong unless the meaning of this has changed ? Whats the
> behaviour of the intrinsic __mm_and_not_si64 . ?

The description of _mm_andnot_si64 is "Performs a logical NOT on the 64-bit value in m1 and use the result in a bitwise AND with the 64-bit value in m2."
And, "wandn wRd, wRn, wRm" means "wRd = wRn & ~wRm"
I think __builtin_arm_wandn had better directly match the behavior of wandn.
Therefore, match _mm_andnot_si64 (m1, m2) to __builtin_arm_wandn (m2, m1).



> @@ -985,44 +1004,83 @@ _mm_setzero_si64 (void)
>  static __inline void
>  _mm_setwcx (const int __value, const int __regno)
>  {
> > +  /*Since gcc has the imformation of all wcgr regs
> > +    in arm backend, use builtin to access them instead
> > +    of throw asm directly.  Thus, gcc could do some
> > +    optimization on them.  */
> > +
> 
> Also this comment is contradictory to what follows in the patch .
> You've prima-facie replaced them with bits of inline assembler. I'm
> not sure this comment makes a lot of sense on its own. 

Sorry. This comment should be removed.

The modified diff is attached.

Thanks,
Xinyu



[-- Attachment #2: intrinsic_doc.txt --]
[-- Type: text/plain, Size: 74561 bytes --]

20.9.1 	_mm_abs_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_abs_pi8 (__m64 m1)
Description 
Changes the eight 8-bit values in m1 to their absolute values and returns the result.
This function uses the assembler instruction WABSB.
 

20.9.2 	_mm_abs_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_abs_pi16 (__m64 m1)
Description 
Changes the four 16-bit values in m1 to their absolute values and returns the result.
This function uses the assembler instruction WABSH.
 
 
20.9.3 	_mm_abs_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_abs_pi32 (__m64 m1)
Description 
Changes the two 32-bit values in m1 to their absolute values and returns the result.
This function uses the assembler instruction WABSW.
 
 
20.9.4 	_mm_absdiff_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_absdiff_pu8 (__m64 a, __m64 b)
Description 
Subtracts the unsigned eight 8-bit values of a from their counterparts in b and returns the absolute values of the results.
This function uses the assembler instruction WABSDIFFB.
 
 
20.9.5 	_mm_absdiff_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_absdiff_pu16 (__m64 a, __m64 b)
Description 
Subtracts the four unsigned 16-bit values of a from their counterparts in b and returns the absolute values of the results.
This function uses the assembler instruction WABSDIFFH.
 
 
20.9.6 	_mm_absdiff_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_absdiff_pu32 (__m64 a, __m64 b)
Description 
Subtracts the two unsigned 32-bit values of a from their counterparts in b and returns the absolute values of the results.
This function uses the assembler instruction WABSDIFFW.
 
 
20.9.7 	_mm_acc_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_acc_pu8 (__m64 m1)
Description 
Unsigned accumulate across eight 8-bit values in m1. 
This function uses the assembler instruction WACCB.
 

20.9.8 	_mm_acc_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_acc_pu16 (__m64 m1)
Description 
Unsigned accumulate across four 16-bit values in m1.
This function uses the assembler instruction WACCH.
 

20.9.9 	_mm_acc_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_acc_pu32 (__m64 m1)
Description 
Unsigned accumulate across two 32-bit values in m1.
This function uses the assembler instruction WACCW.
 

20.9.10 	_mm_add_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_add_pi8 (__m64 m1, __m64 m2)
Description 
Adds the eight 8-bit values in m1 to the eight 8-bit values in m2.
This function uses the assembler instruction WADDB.
 

20.9.11 	_mm_add_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_add_pi16 (__m64 m1, __m64 m2)
Description 
Adds the four 16-bit values in m1 to the four 16-bit values in m2.
This function uses the assembler instruction WADDH.
 

20.9.12 	_mm_add_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_add_pi32 (__m64 m1, __m64 m2)
Description 
Adds the two 32-bit values in m1 to the two 32-bit values in m2.
This function uses the assembler instruction WADDW.
 

20.9.13 	_mm_addc_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_addc_pu16 (__m64 m1, __m64 m2)
Description 
Adds the four unsigned 16-bit values in m1 to the four unsigned 16-bit values in m2 using carry flags from the wCASF register as the Carry-in to the addition operation.
This function uses the assembler instruction WADDHC.
 

20.9.14 	_mm_addc_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_addc_pu32 (__m64 m1, __m64 m2)
Description 
Two unsigned 32-bit values in m1 to the two unsigned 16-bit values in m2 using carry flags from the wCASF register as the Carry-in to the addition operation.
This function uses the assembler instruction WADDWC.
 

20.9.15 	_mm_addbhusl_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_addbhusl_pu8 (__m64 a, __m64 b)
Description 
Performs a vector mixed mode addition of four 16-bit values of parameter a and four 8-bit zero-extended values from the lower half of parameter b and returns the result in the lower half of the return value.
This function uses the assembler instruction WADDBHUSL.
 
 
20.9.16 	_mm_addbhusm_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_addbhusm_pu8 (__m64 a, __m64 b)
Description 
Performs a vector mixed mode addition of four 16-bit values of parameter a and four 8-bit zero-extended values from the upper half of parameter b and returns the result in the upper half of the return value.
This function uses the assembler instruction WADDBHUSM.
 
 
20.9.17 	_mm_adds_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_adds_pi8 (__m64 m1, __m64 m2)
Description 
Adds the eight signed 8-bit values in m1 to the eight signed 8-bit values in m2 using saturating arithmetic.
This function uses the assembler instruction WADDBSS.
 

20.9.18 	_mm_adds_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_adds_pi16 (__m64 m1, __m64 m2)
Description 
Adds the four signed 16-bit values in m1 to the four signed 16-bit values in m2 using saturating arithmetic.
This function uses the assembler instruction WADDHSS.
 

20.9.19 	_mm_adds_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_adds_pi32 (__m64 m1, __m64 m2)
Description 
Adds the two signed 32-bit values in m1 to the two signed 32-bit values in m2 using saturating arithmetic.
This function uses the assembler instruction WADDWSS.
 

20.9.20 	_mm_adds_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_adds_pu8 (__m64 m1, __m64 m2)
Description 
Adds the eight unsigned 8-bit values in m1 to the eight unsigned 8-bit values in m2 using saturating arithmetic.
This function uses the assembler instruction WADDBUS.
 

20.9.21 	_mm_adds_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_adds_pu16 (__m64 m1, __m64 m2)
Description 
Adds the four unsigned 16-bit values in m1 to the four unsigned 16-bit values in m2 using saturating arithmetic.
This function uses the assembler instruction WADDHUS.
 

20.9.22 	_mm_adds_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_adds_pu32 (__m64 m1, __m64 m2)
Description 
Two unsigned 32-bit values in m1 to the two unsigned 16-bit values in m2 using saturating arithmetic.
This function uses the assembler instruction WADDWUS.
 

20.9.23 	_mm_addsubhx_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_addsubhx_pi16 (__m64 a, __m64 b)
Description 
Performs complex vector addition/subtraction of its parameters a and b for vectors of 16-bit data. The four operands from each of the parameters are alternately added and subtracted using a cross selection in each of the parallel operations. The result of the operation is saturated to the signed limits and returned.
This function uses the assembler instruction WADDSUBHX.
 
 
20.9.24 	_mm_align_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_align_si64(__m64 m1, __m64 m2, int count)
Description 
Extracts a 64-bit value from the two 64-bit input values m1, m2 with count byte offset.
If r = _mm_align_si64(m1, m2, count), the action is
r = Low_DB_word((m1, m2) >> (count * 8));
 This function uses the assembler instruction WALIGNI.
Note:  	The parameter count has to be a numeric value or expression that can be evaluated at compile-time; it cannot be a variable. The range of count is 0 to 7.
 

20.9.25 	_mm_alignr0_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_alignr0_si64(__m64 m1, __m64 m2)
Description 
Extracts a 64-bit value from the two 64-bit input values m1, m2 with 3-bit offset stored in the specified general-purpose register 0 (wCGR0).
This function uses the assembler instruction WALIGNR0.
 

20.9.26 	_mm_alignr1_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_alignr1_si64(__m64 m1, __m64 m2)
Description 
Extracts a 64-bit value from the two 64-bit input values m1, m2 with 3-bit offset stored in the specified general-purpose register 1 (wCGR1).
This function uses the assembler instruction WALIGNR1.
 

20.9.27 	_mm_alignr2_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_alignr2_si64(__m64 m1, __m64 m2)
Description 
Extracts a 64-bit value from the two 64-bit input values m1, m2 with 3-bit offset stored in the specified general-purpose register 2 (wCGR2).
This function uses the assembler instruction WALIGNR2. 
 

20.9.28 	_mm_alignr3_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_alignr3_si64(__m64 m1, __m64 m2)
Description 
Extracts a 64-bit value from the two 64-bit input values m1, m2 with 3-bit offset stored in the specified general-purpose register 3 (wCGR3).
This function uses the assembler instruction WALIGNR3.
 

20.9.29 	_mm_and_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_and_si64 (__m64 m1, __m64 m2)
Description 
Performs a bitwise AND of the 64-bit value in m1 with the 64-bit value in m2.
This function uses the assembler instruction WAND.
 

20.9.30 	_mm_andnot_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_andnot_si64 (__m64 m1, __m64 m2)
Description 
Performs a logical NOT on the 64-bit value in m1 and use the result in a bitwise AND with the 64-bit value in m2.
This function uses the assembler instruction WANDN.
 

20.9.31 	_mm_avg_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_avg_pu8(__m64 a, __m64 b)
Description 
Computes the (rounded) averages of the unsigned bytes in a and b.
If r = _mm_avg_pu8(a, b), the action is
t = (unsigned short)a0 + (unsigned short)b0;
r0 = (t >> 1) | (t & 0x01);
...
t = (unsigned short)a7 + (unsigned short)b7;
r7 = (unsigned char)((t >> 1) | (t & 0x01));
This function uses the assembler instruction WAVG2BR.
 

20.9.32 	_mm_avg_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_avg_pu16(__m64 a, __m64 b)
Description 
Computes the (rounded) averages of the unsigned words in a and b.
If r = _mm_avg_pu16(a, b), the action is 
t = (unsigned int)a0 + (unsigned int)b0;
r0 = (t >> 1) | (t & 0x01);
...
t = (unsigned word)a7 + (unsigned word)b7;
r7 = (unsigned short)((t >> 1) | (t & 0x01));
This function uses the assembler instruction WAVG2HR.
 

20.9.33 	_mm_avg2_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_avg2_pu8(__m64 a, __m64 b)
Description 
Computes the without rounded averages of the unsigned bytes in a and b.
If r = _mm_avg2_pu8(a, b), the action is
t = (unsigned byte)a0 + (unsigned byte)b0;
r0 = (t >> 1);
...
t = (unsigned byte)a7 + (unsigned byte)b7;
r7 = (unsigned char)(t >> 1);
This function uses the assembler instruction WAVG2B.
 

20.9.34 	_mm_avg2_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_avg2_pu16(__m64 a, __m64 b)
Description 
Computes the with out rounded averages of the unsigned words in a and b.
If r = _mm_avg2_pu16(a, b), the action is
t = (unsigned half word)a0 + (unsigned half word)b0;
r0 = (t >> 1);
...
t = (unsigned half word)a3 + (unsigned half word)b3;
r3 = (unsigned short)(t >> 1);
This function uses the assembler instruction WAVG2H.
 

20.9.35 	_mm_avg4_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_avg4_pu8 (__m64 a, __m64 b)
Description 
Performs seven 4-pixel averages of unsigned 8-bit operands obtained from the bytes of the parameters a and b, and returns the result.
This function uses the assembler instruction WAVG4.
 
 
20.9.36 	_mm_avg4r_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_avg4r_pu8 (__m64 a, __m64 b)
Description 
Performs seven 4-pixel averages of unsigned 8-bit operands obtained from the bytes of the parameters a and b, and returns the result. Biased rounding is performed by adding +2 or +1 to the intermediate result before the divide-by-2.
This function uses the assembler instruction WAVG4R.
 
 
20.9.37 	_mm_cmpeq_pi8
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpeq_pi8 (__m64 m1, __m64 m2)
Description 
If the respective 8-bit values in m1 are equal to the respective 8-bit values in m2, the function sets the respective 8-bit resulting values to all ones, otherwise it sets them to all zeros.
This function uses the assembler instruction WCMPEQB.
 

20.9.38 	_mm_cmpeq_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpeq_pi16 (__m64 m1, __m64 m2)
Description 
If the respective 16-bit values in m1 are equal to the respective 16-bit values in m2, the function sets the respective 16-bit resulting values to all ones, otherwise it sets them to all zeros.
This function uses the assembler instruction WCMPEQH.
 

20.9.39 	_mm_cmpeq_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpeq_pi32 (__m64 m1, __m64 m2)
Description 
If the respective 32-bit values in m1 are equal to the respective 32-bit values in m2, the function sets the respective 32-bit resulting values to all ones, otherwise it sets them to all zeros.
This function uses the assembler instruction WCMPEQW.
 

20.9.40 	_mm_cmpgt_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpgt_pi8 (__m64 m1, __m64 m2)
Description 
If the respective 8-bit values in m1 are greater than the respective 8-bit values in m2, the function sets the respective 8-bit resulting values to all ones, otherwise it sets them to all zeros.
This function uses the assembler instruction WCMPGTSB.
 

20.9.41 	_mm_cmpgt_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpgt_pi16 (__m64 m1, __m64 m2)
Description 
If the respective 16-bit values in m1 are greater than the respective 16-bit values in m2, the function sets the respective 16-bit resulting values to all ones, otherwise it sets them to all zeros.
This function uses the assembler instruction WCMPGTSH.
 

20.9.42 	_mm_cmpgt_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpgt_pi32 (__m64 m1, __m64 m2)
Description 
If the respective 32-bit values in m1 are greater than the respective 32-bit values in m2, the function sets the respective 32-bit resulting values to all ones, otherwise it sets them all to zeros. 
This function uses the assembler instruction WCMPGTSW.
 

20.9.43 	_mm_cmpgt_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpgt_pu8 (__m64 m1, __m64 m2)
Description 
If the respective 8-bit values in m1 are unsigned greater than the respective 8-bit values in m2, the function sets the respective 8-bit resulting values to all ones, otherwise it sets them to all zeros.
This function uses the assembler instruction WCMPGTUB.
 

20.9.44 	_mm_cmpgt_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpgt_pu16 (__m64 m1, __m64 m2)
Description 
If the respective 16-bit values in m1 are unsigned greater than the respective 16-bit values in m2, the function sets the respective 16-bit resulting values to all ones, otherwise it sets them to all zeros.
This function uses the assembler instruction WCMPGTUH.
 

20.9.45 	_mm_cmpgt_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cmpgt_pu32 (__m64 m1, __m64 m2)
Description 
If the respective 32-bit values in m1 are unsigned greater than the respective 32-bit values in m2, the function sets the respective 32-bit resulting values to all ones, otherwise it sets them all to zeros.
This function uses the assembler instruction WCMPGTUW.
 

20.9.46 	_mm_cvtm64_si64
 
Syntax 
#include <mmintrin.h>
__int64
_mm_cvtm64_si64 (__m64 m)
Description 
Converts the 64-bit __m64 object m to __int64 bit integer.
If r = _mm_cvtm64_si64(a), then the action is
r0 = a[31:0]; (lower word)
r1 = a[63:32]; (upper word)
 
20.9.47 	_mm_cvtsi32_si64
 
Syntax 
#include <mmintrin.h>
__int64
_mm_cvtsi32_si64 (int i)
Description 
Converts the integer object i to a 64-bit __m64 object. The integer value is zero-extended to 64 bits.
If r = _mm_cvtsi32_si64(i), then the action is
r [0:31] = i;
r[32:63] = 0;
 
20.9.48 	_mm_cvtsi64_m64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_cvtsi64_m64 (__int64 i)
Description 
Converts the __int64 integer i to a 64-bit __m64 object.
If r = _mm_cvtsi64_m64(a), then the action is
r [31:0] = a0; (lower word)
r[63:32] = a1; (upper word)
 
20.9.49 	_mm_cvtsi64_si32
 
Syntax 
#include <mmintrin.h>
int
_mm_cvtsi64_si32 (__int64 m)
Description 
Converts the lower 32 bits of the __m64 object m to an integer.
If i = _mm_cvtsi64_si32(m), then the action is
i = a[31:0]; (lower word)
 
20.9.50 	_mm_extract_pi8
 
Syntax 
#include <mmintrin.h>
int
_mm_extract_pi8(__m64 a, const int n)
Description 
Extracts one of the eight bytes of a. The selector n must be an immediate and its range must be 0 to 7. The n variable selects the byte that should be extracted.
If r = _mm_extract_pi8(a, n), the action is 
r[7:0] = a[Byte n[2:0]];
r[31:8] = SignReplicate(a[Byte n[2:0]], 24);
This function uses the assembler instruction TEXTRMSB.
 

20.9.51 	_mm_extract_pi16
 
Syntax 
#include <mmintrin.h>
int
_mm_extract_pi16(__m64 a, const int n)
Description 
Extracts one of the four half words of a. The selector n must be an immediate and its range must be 0 to 3.
If r = _mm_extract_pi16(a, n), the action is 
r[15:0] = a[Halfword n[1:0]];
r[31:16] = SignReplicate(a[Byte n[1:0]], 16);
This function uses the assembler instruction TEXTRMSH.
 

20.9.52 	_mm_extract_pi32
 
Syntax 
#include <mmintrin.h>
int
_mm_extract_pi32(__m64 a, const int n)
Description 
Extracts one of the two words of a. The selector n must be an immediate and its range must be 0 to 1.
If r = _mm_extract_pi32(a, n), the action is 
r[31:0] = a[Byte n[0]];
This function uses the assembler instruction TEXTRMSW.
 

20.9.53 	_mm_extract_pu8
 
Syntax 
#include <mmintrin.h>
int
_mm_extract_pu8(__m64 a, const int n)
Description 
Extracts one of the eight bytes of a. The selector n must be an immediate and its range must be 0 to 7.
If r = _mm_extract_pu8(a, n), the action is 
r[7:0] = a[Byte n[2:0]];
r[31:8] = 0;
This function uses the assembler instruction TEXTRMUB.
 

20.9.54 	_mm_extract_pu16
 
Syntax 
#include <mmintrin.h>
int
_mm_extract_pu16(__m64 a, const int n)
Description 
Extracts one of the four half words of a. The selector n must be an immediate and its range must be 0 to 3.
If r = _mm_extract_pu16(a, n), the action is 
r [15:0] = a[Halfword n[1:0]];
r[31:16] = 0;
This function uses the assembler instruction TEXTRMUH.
 

20.9.55 	_mm_extract_pu32
 
Syntax 
#include <mmintrin.h>
int
_mm_extract_pu32(__m64 a, const int n)
Description 
This provides same functionality as _mm_extract_pi32.
 
20.9.56 	_mm_getwcx
 
Syntax 
#include <mmintrin.h>
int
_mm_getwcx(int number)
Description 
Returns contents of Intel Wireless MMX technology control register, which is specified with number, where number is the coprocessor register number. 
This function uses the assembler pseudo-instruction TMRC.
Note:  	The valid range for parameter number is [0, 3] and [8, 11]. The valid control registers are: wCID(0), wCon(1), wCSSF(2), wCASF(3), wCGR0(8), wCGR1(9), wCGR2(10), wCGR3(11).
 

20.9.57 	_mm_insert_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_insert_pi8(__m64 a, int d, int n)
Description 
Inserts byte d into one of eight bytes of a. The selector n must be an immediate and its range must be 0 to 7.
If r = _mm_insert_pi8(a, d, n), the action is
r0 = (n==0) ? d[7:0] : a0;
r1 = (n==1) ? d[7:0] : a1;
r2 = (n==2) ? d[7:0] : a2;
r3 = (n==3) ? d[7:0] : a3;
r4 = (n==4) ? d[7:0] : a4;
r5 = (n==5) ? d[7:0] : a5;
r6 = (n==6) ? d[7:0] : a6;
r7 = (n==7) ? d[7:0] : a7;
This function uses the assembler instruction TINSRB.
 

20.9.58 	_mm_insert_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_insert_pi16(__m64 a, int d, int n)
Description 
Inserts half word d into one of four half words of a. The selector n must be an immediate and its range must be 0 to 3.
If r = _mm_insert_pi16(a, d, n), the action is
r0 = (n==0) ? d[15:0] : a0;
r1 = (n==1) ? d[15:0] : a1;
r2 = (n==2) ? d[15:0] : a2;
r3 = (n==3) ? d[15:0] : a3;
This function uses the assembler instruction TINSRH.
 

20.9.59 	_mm_insert_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_insert_pi32(__m64 a, int d, int n)
Description 
Inserts word d into one of two half words of a. The selector n must be an immediate and its range must be 0 to 1.
If r = _mm_insert_pi32(a, d, n), the action is
r0 = (n==0) ? d[31:0] : a0;
r1 = (n==1) ? d[31:0] : a1;
This function uses the assembler instruction TINSRW.
 

20.9.60 	_mm_mac_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mac_pi16 (__m64 m1, __m64 m2, __m64 m3)
Description 
Multiplies four signed 16-bit values in signed m2 by four 16-bit values in m3 and accumulates result with value in m1.
This function uses the assembler instruction WMACS.
 

20.9.61 	_mm_mac_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mac_pu16 (__m64 m1, __m64 m2, __m64 m3)
Description 
Multiplies four unsigned 16-bit values in unsigned m2 by four 16-bit values in m3 and accumulates result with value in m1. 
This function uses the assembler instruction WMACU.
 

20.9.62 	_mm_macz_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_macz_pi16 (__m64 m1, __m64 m2)
Description 
Multiplies four signed 16-bit values in signed m1 by four 16-bit values in signed m2 and returns the result.
This function uses the assembler instruction WMACSZ.
 

20.9.63 	_mm_macz_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_macz_pu16 (__m64 m1, __m64 m2)
Description 
Multiplies four unsigned 16-bit values in unsigned m1 by four 16-bit values in m2 and accumulates result with zero.
This function uses the assembler instruction WMACUZ.
 

20.9.64 	_mm_madd_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_madd_pi16 (__m64 m1, __m64 m2)
Description 
Multiplies four 16-bit values in m1 by four 16-bit values in m2 producing four 32-bit intermediate results, which are then summed up: The sum of the lower two products yield the lower word and the sum of the upper two products yield the upper word of the result.
This function uses the assembler instruction WMADDS.
 

20.9.65 	_mm_madd_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_madd_pu16 (__m64 m1, __m64 m2)
Description 
Multiplies four unsigned 16-bit values in m1 by four unsigned 16-bit values in m2 producing four 32-bit intermediate results, which are then summed the lower products into the bottom word and the upper two products into the upper word of result.
This function uses the assembler instruction WMADDU.
 

20.9.66 	_mm_maddx_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_maddx_pi16 (__m64 m1, __m64 m2)
Description 
Cross multiplies four signed 16-bit values in m1 by four signed 16-bit values in m2 producing four 32-bit intermediate results, which are then summed the lower products into the bottom word and the upper two products into the upper word of result.
This function uses the assembler instruction WMADDSX.
 

20.9.67 	_mm_maddx_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_maddx_pu16 (__m64 m1, __m64 m2)
Description 
Cross multiplies four unsigned 16-bit values in m1 by four unsigned 16-bit values in m2 producing four 32-bit intermediate results, which are then summed the lower products into the bottom word and the upper two products into the upper word of result.
This function uses the assembler instruction WMADDUX.
 

20.9.68 	_mm_msub_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_msub_pi16 (__m64 m1, __m64 m2)
Description 
Multiplies four signed 16-bit values in m1 by four signed 16-bit values in m2 producing four 32-bit intermediate results, and then does subtraction: The difference of the lower two products yield the lower word and the difference of the upper two products yield the upper word of the result.
This function uses the assembler instruction WMADDSN.
 

20.9.69 	_mm_msub_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_msub_pu16 (__m64 m1, __m64 m2)
Description 
Multiplies four unsigned 16-bit values in m1 by four unsigned 16-bit values in m2 producing four 32-bit intermediate results, and then does subtraction: The difference of the lower two products yield the lower word and the difference of the upper two products yield the upper word of the result.
This function uses the assembler instruction WMADDUN.
 

20.9.70 	_mm_max_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_max_pi8(__m64 a, __m64 b)
Description 
Computes the element-wise maximum of the bytes in a and b.
If r = _mm_max_pi8(a, b), the action is
r0 = max(a0, b0);
r1 = max(a1, b1);
...
r7 = max(a7, b7);
This function uses the assembler instruction WMAXSB.
 

20.9.71 	_mm_max_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_max_pi16(__m64 a, __m64 b)
Description 
Computes the element-wise maximum of the half words in a and b.
If r = _mm_max_pi16(a, b), the action is
r0 = max(a0, b0);
r1 = max(a1, b1);
r2 = max(a2, b2);
r3 = max(a3, b3);
This function uses the assembler instruction WMAXSH.
 

20.9.72 	_mm_max_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_max_pi32(__m64 a, __m64 b)
Description 
Computes the element-wise maximum of the words in a and b.
If r = _mm_max_pi32(a, b), the action is
r0 = max(a0, b0);
r1 = max(a1, b1);
This function uses the assembler instruction WMAXSW.
 

20.9.73 	_mm_max_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_max_pu8(__m64 a, __m64 b)
Description 
Computes the element-wise maximum of the unsigned bytes in a and b.
If r = _mm_max_pu8(a, b), the action is
r0 = max(a0, b0);
r1 = max(a1, b1);
...
r7 = max(a7, b7);
This function uses the assembler instruction WMAXUB.
 

20.9.74 	_mm_max_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_max_pu16(__m64 a, __m64 b)
Description 
Computes the element-wise maximum of the unsigned half words in a and b.
If r = _mm_max_pu16(a, b), the action is
r0 = max(a0, b0);
r1 = max(a1, b1);
r2 = max(a2, b2);
r3 = max(a3, b3);
This function uses the assembler instruction WMAXUH.
 

20.9.75 	_mm_max_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_max_pu32(__m64 a, __m64 b)
Description 
Computes the element-wise maximum of the unsigned words in a and b.
If r = _mm_max_pu32(a, b), the action is
r0 = max(a0, b0);
r1 = max(a1, b1);
This function uses the assembler instruction WMAXUW.
 

20.9.76 	_mm_merge_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_merge_si64 (__m64 a, __m64 b, const int n)
Description 
Extracts a 64-bit value that contains elements from the parameters a and b, and returns a merged 64-bit result. The number of elements, in bytes, to be taken from b is represented by the constant parameter n.
This function uses the assembler instruction WMERGE.
 

20.9.77 	_mm_mia_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mia_si64 (__m64 m1, int a, int b)
Description 
Multiplies two signed 32-bit values in a & b and accumulates the result with 64-bit value in m1.
This function uses the assembler instruction TMIA.
 

20.9.78 	_mm_miabb_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_miabb_si64(__m64 m1, int a, int b)
Multiplies bottom half of signed 16-bit values in a and bottom half of signed 16-bit value in b and accumulates the result with 64-bit values in m1.
Result = sign_extend(a[15:0] * b[15:0]) + m1;
This function uses the assembler instruction TMIABB.
 

20.9.79 	_mm_miabt_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_miabt_si64(__m64 m1, int a, int b)
Description 
Multiplies bottom half of signed 16-bit values in a and top half of signed 16-bit value in b and accumulates the result with 64-bit values in m1.
Result = sign_extend(a[15:0] * b[31:16]) + m1;
This function uses the assembler instruction TMIABT.
 

20.9.80 	_mm_miaph_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_miaph_si64 (__m64 m1, int a, int b)
Description 
Multiplies accumulate signed 16-bit values in a & b and accumulates the result with 64-bit values in m1.
Result = sign_extend((a[31:16] * b[31:16]) + (a[15:0] * b[15:0])) + m1;
This function uses the assembler instruction TMIAPH.
 

20.9.81 	_mm_miatb_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_miatb_si64(__m64 m1, int a, int b)
Description 
Multiplies top half of signed 16-bit values in a and bottom half of signed 16-bit value in b and accumulates the result with 64-bit values in m1.
Result = sign_extend(a[31:16] * b[15:0]) + m1;
This function uses the assembler instruction TMIATB.
 

20.9.82 	_mm_miatt_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_miatt_si64(__m64 m1, int a, int b)
Description 
Multiplies top half of signed 16-bit values in a and top half of signed 16-bit value in b and accumulates the result with 64-bit values in m1.
Result = sign_extend(a[31:16] * b[31:16]) + m1;
This function uses the assembler instruction TMIATT.
 

20.9.83 	_mm_min_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_min_pi8(__m64 a, __m64 b)
Description 
Computes the element-wise minimum of the bytes in a and b.
If r = _mm_min_pi8(a, b), the action is
r0 = min(a0, b0);
r1 = min(a1, b1);
...
r7 = min(a7, b7);
This function uses the assembler instruction WMINSB.
 

20.9.84 	_mm_min_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_min_pi16(__m64 a, __m64 b)
Description 
Computes the element-wise minimum of the half words in a and b.
If r = _mm_min_pi16(a, b), the action is
r0 = min(a0, b0);
r1 = min(a1, b1);
r2 = min(a2, b2);
r3 = min(a3, b3);
This function uses the assembler instruction WMINSH.
 

20.9.85 	_mm_min_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_min_pi32(__m64 a, __m64 b)
Description 
Computes the element-wise minimum of the words in a and b.
If r = _mm_min_pi32(a, b), the action is
r0 = min(a0, b0);
r1 = min(a1, b1);
This function uses the assembler instruction WMINSW.
 

20.9.86 	_mm_min_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_min_pu8(__m64 a, __m64 b)
Description 
Computes the element-wise minimum of the unsigned bytes in a and b.
If r = _mm_min_pu8(a, b), the action is
r0 = min(a0, b0);
r1 = min(a1, b1);
...
r7 = min(a7, b7);
This function uses the assembler instruction WMINUB.
 

20.9.87 	_mm_min_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_min_pu16(__m64 a, __m64 b)
Description 
Computes the element-wise minimum of the unsigned half words in a and b.
If r = _mm_min_pu16(a, b), the action is
r0 = min(a0, b0);
r1 = min(a1, b1);
r2 = min(a2, b2);
r3 = min(a3, b3);
This function uses the assembler instruction WMINUH.
 

20.9.88 	_mm_min_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_min_pu32(__m64 a, __m64 b)
Description 
Computes the element-wise minimum of the unsigned words in a and b.
If r = _mm_min_pu32(a, b), the action is
r0 = min(a0, b0);
r1 = min(a1, b1);
This function uses the assembler instruction WMINUW.
 

20.9.89 	_mm_movemask_pi8
 
Syntax 
#include <mmintrin.h>
int
_mm_movemask_pi8(__m64 a)
Description 
Creates an 8-bit mask from the most significant bits of the bytes in a.
If r = _mm_movemask_pi8(a), the action is
r = 0;
r = sign(a7)<<7 | sign(a6)<<6 |... | sign(a0);
This function uses the assembler instruction TMOVMSKB.
 

20.9.90 	_mm_movemask_pi16
 
Syntax 
#include <mmintrin.h>
int
_mm_movemask_pi16(__m64 a)
Description 
Creates a 4-bit mask from the most significant bits of the half words in a.
If r = _mm_movemask_pi16(a), the action is
r = 0;
r = sign(a3)<<3 | sign(a2)<<2 |... | sign(a0);
This function uses the assembler instruction TMOVMSKH.
 

20.9.91 	_mm_movemask_pi32
 
Syntax 
#include <mmintrin.h>
int
_mm_movemask_pi32(__m64 a)
Description 
Creates a 2-bit mask from the most significant bits of the bytes in a.
If r = _mm_movemask_pi32(a), the action is
r = 0;
r = sign(a1)<<1 | sign(a0);
This function uses the assembler instruction TMOVMSKW.
 

20.9.92 	_mm_mulhi_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhi_pi16(__m64 a, __m64 b)
Description 
Multiplies four signed 16-bit values in m1 by four unsigned 16-bit values in m2 and produces the upper 16 bits of the four results.
If r = _mm_mulhi_pi16(a, b), the action is
r0 = hiword(a0 * b0);
r1 = hiword(a1 * b1);
r2 = hiword(a2 * b2);
r3 = hiword(a3 * b3);
This function uses the assembler instruction WMULSM.
 

20.9.93 	_mm_mulhi_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhi_pi32 (__m64 a, __m64 b)
Description 
Performs a signed vector multiplication on the 32-bit words of parameters a and b, to produce 64-bit intermediate results. Only the higher 32 bits of the results are returned.
This function uses the assembler instruction WMULWSM.
 
 
20.9.94 	_mm_mulhi_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhi_pu16(__m64 a, __m64 b)
Description 
Multiplies four unsigned 16-bit values in m1 by four unsigned 16-bit values in m2 and produces the upper 16 bits of the four results.
If r = _mm_mulhi_pu16(a, b), the action is
r0 = hiword(a0 * b0);
r1 = hiword(a1 * b1);
r2 = hiword(a2 * b2);
r3 = hiword(a3 * b3);
This function uses the assembler instruction WMULUM.
 

20.9.95 	_mm_mulhi_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhi_pu32 (__m64 a, __m64 b)
Description 
Performs an unsigned vector multiplication on the 32-bit words of parameters a and b, to produce 64-bit intermediate results. Only the higher 32 bits of the results are returned.
This function uses the assembler instruction WMULWUM.
 
 
20.9.96 	_mm_mulhir_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhir_pi16 (__m64 a, __m64 b)
Description 
Performs a signed vector multiplication on the 16-bit words of parameters a and b, to produce 32-bit intermediate results. Then the function rounds the least significant 16 bits into the most significant 16 bits. Only the higher 16 bits of the results are returned.
This function uses the assembler instruction WMULSMR.
 
 
20.9.97 	_mm_mulhir_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhir_pu32 (__m64 a, __m64 b)
Description 
Performs an unsigned vector multiplication on the 16-bit words of parameters a and b, to produce 32-bit intermediate results. Then the function rounds the least significant 16 bits into the most significant 16 bits. Only the higher 16 bits of the results are returned.
This function uses the assembler instruction WMULUMR.
 
 
20.9.98 	_mm_mulhir_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhir_pi32 (__m64 a, __m64 b)
Description 
Performs a signed vector multiplication on the 32-bit words of parameters a and b, to produce 64-bit intermediate results. Then the function rounds the least significant 32 bits into the most significant 32 bits. Only the higher 32 bits of the results are returned.
This function uses the assembler instruction WMULWSMR.
 
 
20.9.99 	_mm_mulhir_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mulhir_pu32 (__m64 a, __m64 b)
Description 
Performs an unsigned vector multiplication on the 32-bit words of parameters a and b, to produce 64-bit intermediate results. Then the function rounds the least significant 32 bits into the most significant 32 bits. Only the higher 32 bits of the results are returned.
This function uses the assembler instruction WMULWUMR.
 
 
20.9.100 	_mm_mullo_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mullo_pi16 (__m64 m1, __m64 m2)
Description 
Multiplies four 16-bit values in m1 by four 16-bit values in m2 and produces the lower 16 bits of the four results.
If r = _mm_mullo_pu16(a, b), the action is
r0 = lowword(a0 * b0);
r1 = lowword(a1 * b1);
r2 = lowword(a2 * b2);
r3 = lowword(a3 * b3);
This function uses the assembler instruction WMULUL.
 

20.9.101 	_mm_mullo_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_mullo_pi32 (__m64 a, __m64 b)
Description 
Performs a vector multiplication on the 32-bit words of parameters a and b, to produce 64-bit intermediate results. Only the lower 32 bits of the results are returned.
This function uses the assembler instruction WMULWL.
 
 
20.9.102 	_mm_or_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_or_si64 (__m64 m1, __m64 m2)
Description 
Performs a bitwise OR of the 64-bit value in m1 with the 64-bit value in m2.
This function uses the assembler instruction WOR.
 

20.9.103 	_mm_packs_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_packs_pi16 (__m64 m1, __m64 m2)
Description 
Packs the four 16-bit values from m1 into the lower four 8-bit values of the result with signed saturation, and packs the four 16-bit values from m2 into the upper four 8-bit values of the result with signed saturation.
This function uses the assembler instruction WPACKHSS.
 

20.9.104 	_mm_packs_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_packs_pi32 (__m64 m1, __m64 m2)
Description 
Packs the two 32-bit values from m1 into the lower two 16-bit values of the result with signed saturation, and packs the two 32-bit values from m2 into the upper two 16-bit values of the result with signed saturation.
This function uses the assembler instruction WPACKWSS.
 

20.9.105 	_mm_packs_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_packs_pu16 (__m64 m1, __m64 m2)
Description 
Packs the four 16-bit values from m1 into the lower four 8-bit values of the result with unsigned saturation, and packs the four 16-bit values from m2 into the upper four 8-bit values of the result with unsigned saturation.
This function uses the assembler instruction WPACKHUS.
 

20.9.106 	_mm_packs_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_packs_pu32 (__m64 m1, __m64 m2)
Description 
Packs the two 32-bit values from m1 into the lower two 16-bit values of the result with unsigned saturation, and packs the two 32-bit values from m2 into the upper two 16-bit values of the result with unsigned saturation.
This function uses the assembler instruction WPACKWUS.
 

20.9.107 	_mm_packs_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_packs_si64 (__m64 m1, __m64 m2)
Description 
Packs the 64-bit value from m1 into the lower 32-bit value of the result with signed saturation, and packs the one 32-bit value from m2 into the upper 32-bit value of the result with signed saturation.
This function uses the assembler instruction WPACKDSS.
 

20.9.108 	_mm_packs_su64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_packs_su64 (__m64 m1, __m64 m2)
Description 
Packs the 64-bit value from m1 into the lower 32-bit value of the result with signed saturation, and packs the upper 32-bit value from m2 into the upper 32-bit value of the result with signed saturation.
This function uses the assembler instruction WPACKDUS.
 

20.9.109 	_mm_qmiabb_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiabb_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the lower 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then adds the two 32-bit results to the two words of the parameter acc and returns the result of this addition.
This function uses the assembler instruction WQMIABB.
 
 
20.9.110 	_mm_qmiabbn_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiabbn_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the lower 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WQMIABBN.
 
 
20.9.111 	_mm_qmiabt_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiabt_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the lower 16 bits (halfwords) of each of the two words of parameter m1 and the higher 16 bits (halfwords) of each of the two words of the parameter m2, then adds the two 32-bit results to the two words of the parameter acc and returns the result of this addition.
This function uses the assembler instruction WQMIABT.
 
 
20.9.112 	_mm_qmiabtn_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiabtn_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the lower 16 bits (halfwords) of each of the two words of parameter m1 and the higher 16 bits (halfwords) of each of the two words of the parameter m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WQMIABTN.
 
 
20.9.113 	_mm_qmiatb_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiatb_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the higher 16 bits (halfwords) of each of the two words of parameter m1 and the lower 16 bits (halfwords) of each of the two words of the parameter m2, then adds the two 32-bit results to the two words of the parameter acc and returns the result of this addition.
This function uses the assembler instruction WQMIATB.
 
 
20.9.114 	_mm_qmiatbn_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiatbn_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the higher 16 bits (halfwords) of each of the two words of parameter m1 and the lower 16 bits (halfwords) of each of the two words of the parameter m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WQMIATBN.
 
 
20.9.115 	_mm_qmiatt_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiatt_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the higher 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then adds the two 32-bit results to the two words of the parameter acc and returns the result of this addition.
This function uses the assembler instruction WQMIATT.
 
 
20.9.116 	_mm_qmiattn_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmiattn_pi32 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit multiplication with the higher 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WQMIATTN.
 
 
20.9.117 	_mm_qmulm_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmulm_pi16 (__m64 a, __m64 b)
Description 
Performs parallel vector multiplication on the four 16-bit halfwords of the parameters a and b. The higher order 16 bits of the four 32-bit intermediate results are returned.
This function uses the assembler instruction WQMULM.
 
 
20.9.118 	_mm_qmulm_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmulm_pi32 (__m64 a, __m64 b)
Description 
Performs parallel vector multiplication on the two 32-bit words of the parameters a and b. The higher order 32 bits of the two 64-bit results are returned.
This function uses the assembler instruction WQMULWM.
 
 
20.9.119 	_mm_qmulmr_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmulmr_pi16 (__m64 a, __m64 b)
Description 
Performs parallel vector multiplication on the four 16-bit halfwords of the parameters a and b. The higher order 16 bits of the four 32-bit intermediate results are returned, with the least significant 16 bits rounded into the most significant 16 bits.
This function uses the assembler instruction WQMULMR.
 
 
20.9.120 	_mm_qmulmr_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_qmulmr_pi32 (__m64 a, __m64 b)
Description 
Performs parallel vector multiplication on the two 32-bit words of the parameters a and b. The higher order 32 bits of the two 64-bit results are returned, with the least significant 32 bits rounded into the most significant 32 bits.
This function uses the assembler instruction WQMULWMR.
 
 
20.9.121 	_mm_ror_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_ror_pi16 (__m64 m, __m64 count)
Description 
Rotates four 16-bit values in m right the amount specified by count.
 
20.9.122 	_mm_ror_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_ror_pi32 (__m64 m, __m64 count)
Description 
Rotates two 32-bit values in m right the amount specified by count.
 
20.9.123 	_mm_ror_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_ror_si64 (__m64 m, __m64 count)
Description 
Rotates 64-bit value in m right the amount specified by count.
 
20.9.124 	_mm_rori_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_rori_pi16 (__m64 m, int count)
Description 
Rotates four 16-bit values in m right the amount specified by count. The range of count must be in 0 to 32
 
20.9.125 	_mm_rori_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_rori_pi32 (__m64 m, int count)
Description 
Rotates two 32-bit values in m right the amount specified by count. The range of count must be in 0 to 32
 
20.9.126 	_mm_rori_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_rori_si64 (__m64 m, int count)
Description 
Rotates 64-bit value in m right the amount specified by count. The range of count must be in 0 to 64
 
 
20.9.127 	_mm_sad_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sad_pu8(__m64 a, __m64 b)
Description 
Computes the sum of the absolute differences of the unsigned bytes in a and b, returning the value in the lower word. The upper word of the result is cleared.
If r = _mm_sad_pu8(a, b), the action is
r [0:31] = abs(a0-b0) +... + abs(a7-b7);
r[32:63] = 0;
This function uses the assembler instruction WSADBZ.
 

20.9.128 	_mm_sad_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sad_pu16(__m64 a, __m64 b)
Description 
Computes the sum of the absolute differences of the unsigned half words in a and b, returning the value in the lower word. The upper word of the result is cleared.
If r = _mm_sad_pu16(a, b), the action is
r [0:31] = abs(a0-b0) +... + abs(a3-b3);
r[32:63] = 0;
This function uses the assembler instruction WSADHZ.
 

20.9.129 	_mm_sada_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sada_pu8(__m64 a, __m64 b, __m64 c)
Description 
Computes the sum of the absolute differences of the bytes in b and c, and accumulates the result with the lower word of a. The upper word of the result is cleared.
If r = _mm_sada_pu8(a, b, c), the action is
r [0:31] = a[0:31] + abs(b0-c0) +... + abs(b7-c7);
r[32:63] = 0;
This function uses the assembler instruction WSADB.
 

20.9.130 	_mm_sada_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sada_pu16(__m64 a, __m64 b, __m64 c)
Description 
Computes the sum of the absolute differences of the half words in b and c, and accumulates the result with the lower word of a.The upper word of the result is cleared.
If r = _mm_sada_pu16(a, b, c), the action is
r [0:31] = a[0:31] + abs(b0-c0) +... + abs(b3-c3);
r[32:63] = 0;
This function uses the assembler instruction WSADH.
 

20.9.131 	_mm_set_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_set_pi16 (short w3, short w2, short w1, short w0)
Description 
Sets the 4 signed 16-bit integer values.
If r = _mm_set_pi16 (w3, w2, w1, w0), the action is
r0 = w0;
r1 = w1;
r2 = w2;
r3 = w3;
 
20.9.132 	_mm_set_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_set_pi32 (int i1, int i0)
Description 
Sets the 2 signed 32-bit integer values.
If r = _mm_set_pi32(i1, i0), the action is
r0 = i0;
r1 = i1;
 
20.9.133 	_mm_set_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_set_pi8 (char b7, char b6,
             char b5, char b4,
             char b3, char b2,
             char b1, char b0)
Description 
Sets the 8 signed 8-bit integer values.
If r = _mm_set_pi8 (b7, b6, b5, b4, b3, b2, b1, b0), the action is
r0 = b0;
r1 = b1;
...
r7 = b7;
 
20.9.134 	_mm_set1_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_set1_pi16 (short w)
Description 
Sets the 4 signed 16-bit integer values to w.
If r = _mm_set1_pi16 (w), action is
r0 = w;
r1 = w;
r2 = w;
r3 = w;
 
20.9.135 	_mm_set1_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_set1_pi32 (int i)
Description 
Sets the 2 signed 32-bit integer values to i.
If r = _mm_set1_pi32 (i), the action is
r0 = i;
r1 = i;
 
20.9.136 	_mm_set1_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_set1_pi8 (char b)
Description 
Sets the 8 signed 8-bit integer values to b.
If r = _mm_set1_pi8 (b), the action is
r0 = b;
r1 = b;
...
r7 = b;
 
20.9.137 	_mm_setr_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_setr_pi16 (short w0, short w1, short w2, short w3)
If r = _mm_setr_pi16 (w0, w1, w2, w3), the action is
r0 = w0;
r1 = w1;
r2 = w2;
r3 = w3;
 
20.9.138 	_mm_setr_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_setr_pi32 (int i0, int i1)
Description 
Sets the 2 signed 32-bit integer values in reverse order.
If r = _mm_setr_pi32 (i0, i1), the action is
r0 = i0;
r1 = i1;
 
20.9.139 	_mm_setr_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_setr_pi8 (char b0, char b1, char b2, char b3,
                    char b4, char b5,
                    char b6, char b7)
Description 
Sets the 8 signed 8-bit integer values in reverse order.
If r = _mm_setr_pi8 (b0, b1, b2, b3, b4, b5, b6, b7), the action is
r0 = b0;
r1 = b1;
...
r7 = b7;
 
20.9.140 	_mm_setwcx
 
Syntax 
#include <mmintrin.h>
void
_mm_setwcx(int value, int number)
Description 
Sets the Intel Wireless MMX technology control register specified by number to the contents of value.
This function uses the assembler pseudo-instruction TMCR.
Note:  	The valid range for parameter number is [0, 3] and [8, 11]. The valid control registers are: wCID(0), wCon(1), wCSSF(2), wCASF(3), wCGR0(8), wCGR1(9), wCGR2(10), wCGR3(11).

 

20.9.141 	_mm_setzero_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_setzero_si64 ()
Description 
Sets the 64-bit value to zero.
If r = _mm_setzero_si64(), the action is 
r = 0x0;
 
20.9.142 	_mm_shuffle_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_shuffle_pi16(__m64 a, int n)
Description 
Returns a combination of the four half words of a. The selector n must be an immediate and its range must be 0 to 255.
If r = _mm_shuffle_pi16(a, n), the action is 
r0 = Half word (n&0x3) of a
r1 = Half word ((n>>2)&0x3) of a
r2 = Half word ((n>>4)&0x3) of a
r3 = Half word ((n>>6)&0x3) of a
This function uses the assembler instruction WSHUFH.
 

20.9.143 	_mm_sll_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sll_pi16 (__m64 m, __m64 count)
Description 
Shifts four 16-bit values in m left the amount specified by count while shifting in zeros.
 
20.9.144 	_mm_sll_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sll_pi32 (__m64 m, __m64 count)
Description 
Shifts two 32-bit values in m left the amount specified by count while shifting in zeros.
 
20.9.145 	_mm_sll_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sll_si64 (__m64 m, __m64 count)
Description 
Shifts the 64-bit value in m left the amount specified by count while shifting in zeros.
 
20.9.146 	_mm_slli_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_slli_pi16 (__m64 m, int count)
Description 
Shifts four 16-bit values in m left the amount specified by count while shifting in zeros. The count must be no less than 0.
 
20.9.147 	_mm_slli_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_slli_pi32 (__m64 m, int count)
Description 
Shifts two 32-bit values in m left the amount specified by count while shifting in zeros. The count must be no less than 0.
 
20.9.148 	_mm_slli_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_slli_si64 (__m64 m, int count)
Description 
Shifts the 64-bit value in m left the amount specified by count while shifting in zeros. The count must be no less than 0.
 
20.9.149 	_mm_sra_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sra_pi16 (__m64 m, __m64 count)
Description 
Shifts four 16-bit values in m right the amount specified by count while shifting in the sign bit.
 
20.9.150 	_mm_sra_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sra_pi32 (__m64 m, __m64 count)
Description 
Shifts two 32-bit values in m right the amount specified by count while shifting in the sign bit.
 
20.9.151 	_mm_sra_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sra_si64 (__m64 m, __m64 count)
Description 
Shifts 64-bit value in m right the amount specified by count while shifting in the sign bit.
 
20.9.152 	_mm_srai_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srai_pi16 (__m64 m, int count)
Description 
Shifts four 16-bit values in m right the amount specified by count while shifting in the sign bit. The count must be no less than 0.
 
20.9.153 	_mm_srai_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srai_pi32 (__m64 m, int count)
Description 
Shifts two 32-bit values in m right the amount specified by count while shifting in the sign bit. The count must be no less than 0.
 
20.9.154 	_mm_srai_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srai_si64 (__m64 m, int count)
Description 
Shifts 64-bit value in m right the amount specified by count while shifting in the sign bit. The count must be no less than 0.
 
20.9.155 	_mm_srl_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srl_pi16 (__m64 m, __m64 count)
Description 
Shifts four 16-bit values in m right the amount specified by count while shifting in zeros.
 
20.9.156 	_mm_srl_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srl_pi32 (__m64 m, __m64 count)
Description 
Shifts two 32-bit values in m right the amount specified by count while shifting in zeros.
 
20.9.157 	_mm_srl_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srl_si64 (__m64 m, __m64 count)
Description 
Shifts the 64-bit value in m right the amount specified by count while shifting in zeros.
 
20.9.158 	_mm_srli_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srli_pi16 (__m64 m, int count)
Description 
Shifts four 16-bit values in m right the amount specified by count while shifting in zeros. The count must be no less than 0.
 
20.9.159 	_mm_srli_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srli_pi32 (__m64 m, int count)
Description 
Shifts two 32-bit values in m right the amount specified by count while shifting in zeros. The count must be no less than 0.
 
20.9.160 	_mm_srli_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_srli_si64 (__m64 m, int count)
Description 
Shifts the 64-bit value in m right the amount specified by count while shifting in zeros. The count must be no less than 0.
 
20.9.161 	_mm_sub_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sub_pi8 (__m64 m1, __m64 m2)
Description 
Subtracts the eight 8-bit values in m2 from the eight 8-bit values in m1.
This function uses the assembler instruction WSUBB.
 

20.9.162 	_mm_sub_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sub_pi16 (__m64 m1, __m64 m2)
Description 
Subtracts the four 16-bit values in m2 from the four 16-bit values in m1.
This function uses the assembler instruction WSUBH.
 

20.9.163 	_mm_sub_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_sub_pi32 (__m64 m1, __m64 m2)
Description 
Subtracts the two 32-bit values in m2 from the two 32-bit values in m1.
This function uses the assembler instruction WSUBW.
 

20.9.164 	_mm_subaddhx_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_subaddhx_pi16 (__m64 a, __m64 b)
Description 
The four halfwords of parameter b are alternately added and subtracted from/to the halfwords of parameter a using a cross selection in each of the parallel operations. The result of the operation is saturated to the signed limits. and returned.
This function uses the assembler instruction WSUBADDHX.
 
 
20.9.165 	_mm_subs_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_subs_pi8 (__m64 m1, __m64 m2)
Description 
Subtracts the eight signed 8-bit values in m2 from the eight signed 8-bit values in m1 using saturating arithmetic.
This function uses the assembler instruction WSUBBSS.
 

20.9.166 	_mm_subs_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_subs_pi16 (__m64 m1, __m64 m2)
Description 
Subtracts the four signed 16-bit values in m2 from the four signed 16-bit values in m1 using saturating arithmetic.
This function uses the assembler instruction WSUBHSS.
 

20.9.167 	_mm_subs_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_subs_pi32 (__m64 m1, __m64 m2)
Description 
Subtracts the two signed 32-bit values in m2 from the two signed 32-bit values in m1 using saturating arithmetic.
This function uses the assembler instruction WSUBWSS.
 

20.9.168 	_mm_subs_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_subs_pu8 (__m64 m1, __m64 m2)
Description 
Subtracts the eight unsigned 8-bit values in m2 from the eight unsigned 8-bit values in m1 using saturating arithmetic.
This function uses the assembler instruction WSUBBUS.
 

20.9.169 	_mm_subs_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_subs_pu16 (__m64 m1, __m64 m2)
Description 
Subtracts the four unsigned 16-bit values in m2 from the four unsigned 16-bit values in m1 using saturating arithmetic.
This function uses the assembler instruction WSUBHUS.
 

20.9.170 	_mm_subs_pu32 
 
Syntax 
#include <mmintrin.h>
__m64
_mm_subs_pu32 (__m64 m1, __m64 m2)
Description 
Subtracts the two unsigned 32-bit values in m2 from the two unsigned 32-bit values in m1 using saturating arithmetic.
This function uses the assembler instruction WSUBWUS.
 

20.9.171 	_mm_tandcb 
 
Syntax 
#include <mmintrin.h>
void
_mm_tandcb ()
Description 
Performs ¡°AND¡± across the fields of the SIMD processor status register (PSR) (wCASF) and sends the result to the ARM* CPSR; performed after a byte operation that sets the flags. 
This function uses the assembler instruction TANDCB.
 

20.9.172 	_mm_tandch 
 
Syntax 
#include <mmintrin.h>
void
_mm_tandch ()
Description 
Performs ¡°AND¡± across the fields of the SIMD processor status register (PSR) (wCASF) and sends the result to the ARM* CPSR; performed after a half-word operation that sets the flags. 
This function uses the assembler instruction TANDCH.
 

20.9.173 	_mm_tandcw 
 
Syntax 
#include <mmintrin.h>
void
_mm_tandcw ()
Description 
Performs ¡°AND¡± across the fields of the SIMD processor status register (PSR) (wCASF) and sends the result to the ARM* CPSR; performed after a word operation that sets the flags. 
This function uses the assembler instruction TANDCW.
 

20.9.174 	_mm_tbcst_pi8 
 
Syntax 
#include <mmintrin.h>
__m64
_mm_tbcst_pi8 ( int value)
Description 
Broadcasts a value from the ARM* source register, Rn, or to every SIMD position in the Intel? Wireless MMXTM 2 coprocessor destination register, wRd; operate on 8-bit data values.
This function uses the assembler instruction TBCSTB.
 

20.9.175 	_mm_tbcst_pi16 
 
Syntax 
#include <mmintrin.h>
__m64
_mm_tbcst_pi16 ( int value)
Description 
Broadcasts a value from the ARM* source register, Rn, or to every SIMD position in the Intel? Wireless MMXTM 2 coprocessor destination register, wRd; operate on 16-bit data values.
This function uses the assembler instruction TBCSTH.
 

20.9.176 	_mm_tbcst_pi32 
 
Syntax 
#include <mmintrin.h>
__m64
_mm_tbcst_pi32 ( int value)
Description 
Broadcasts a value from the ARM* source register, Rn, or to every SIMD position in the Intel? Wireless MMXTM 2 coprocessor destination register, wRd; operate on 32-bit data values.
This function uses the assembler instruction TBCSTW.
 

20.9.177 	_mm_textrcb 
 
Syntax 
#include <mmintrin.h>
void
_mm_textrcb(n)
Description 
Extracts 4-bit field specified by the 3-bit immediate n from the SIMD PSR (wCASF), and transfers to the ARM* CPSR. The range of n is 0 to 7.
This function uses the assembler instruction TEXTRCB.
 

20.9.178 	_mm_ textrch 
 
Syntax 
#include <mmintrin.h>
void
_mm_textrch(n)
Description 
Extracts 8-bit field specified by the 3-bit immediate n from the SIMD PSR (wCASF), and transfers to the ARM* CPSR. The range of n is 0 to 3.
This function uses the assembler instruction TEXTRCH.
 

20.9.179 	_mm_ textrcw 
 
Syntax 
#include <mmintrin.h>
void
_mm_textrcw(n)
Description 
Extracts 16-bit field specified by the 3-bit immediate n from the SIMD PSR (wCASF), and transfers to the ARM* CPSR. The range of n is 0 to 1.
This function uses the assembler instruction TEXTRCW.
 

20.9.180 	_mm_torcb 
 
Syntax 
#include <mmintrin.h>
void
_mm_torcb()
Description 
Performs ¡°OR¡± across the fields of the SIMD PSR (wCASF) and sends the result to the ARM* CPSR; operation is performed after a byte operation that sets the flags.
This function uses the assembler instruction TORCB.
 

20.9.181 	_mm_torch 
 
Syntax 
#include <mmintrin.h>
void
_mm_torch()
Description 
Performs ¡°OR¡± across the fields of the SIMD PSR (wCASF) and sends the result to the ARM* CPSR; operation is performed after a half-word operation that sets the flags.
This function uses the assembler instruction TORCH.
 

20.9.182 	_mm_torcw 
 
Syntax 
#include <mmintrin.h>
void
_mm_torcw()
Description 
Performs ¡°OR¡± across the fields of the SIMD PSR (wCASF) and sends the result to the ARM* CPSR; operation is performed after a word operation that sets the flags.
This function uses the assembler instruction TORCW.
 

20.9.183 	_mm_torvscb 
 
Syntax 
#include <mmintrin.h>
void
_mm_torvscb()
Description 
Performs ¡°OR¡± across the fields of the SIMD saturation flags (wCSSF) and sends the result to the ARM* CPSR Overflow, (V), flag; operation is performed after a byte operation that sets the flags.
This function uses the assembler instruction TORVSCB.
 

20.9.184 	_mm_torvsch 
 
Syntax 
#include <mmintrin.h>
void
_mm_torvsch()
Description 
Performs ¡°OR¡± across the fields of the SIMD saturation flags (wCSSF) and sends the result to the ARM* CPSR Overflow, (V), flag; operation can be performed after a half-word operation that sets the flags.
This function uses the assembler instruction TORVSCH.
 

20.9.185 	_mm_torvscw 
 
Syntax 
#include <mmintrin.h>
void
_mm_torvscw()
Description 
Performs ¡°OR¡± across the fields of the SIMD saturation flags (wCSSF) and sends the result to the ARM* CPSR Overflow, (V), flag; operation can be performed after a word operation that sets the flags.
This function uses the assembler instruction TORVSCW.
 

20.9.186 	_mm_unpackeh_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackeh_pi8 (__m64 m1)
Description 
Unpacks the four 8-bit values from the upper half of m1 and sign-extends each value.
This function uses the assembler instruction WUNPCKEHSB.
 

20.9.187 	_mm_unpackeh_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackeh_pi16 (__m64 m1)
Description 
Unpacks the two 16-bit values from the upper half of m1 and sign-extends each value.
This function uses the assembler instruction WUNPCKEHSH.
 

20.9.188 	_mm_unpackeh_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackeh_pi32 (__m64 m1)
Description 
Unpacks the 32-bit value from the upper half of m1 and sign-extends each value.
This function uses the assembler instruction WUNPCKEHSW.
 

20.9.189 	_mm_unpackeh_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackeh_pu8 (__m64 m1)
Description 
Unpacks the four 8-bit values from the upper half of m1 and zero-extends each value.
This function uses the assembler instruction WUNPCKEHUB.
 

20.9.190 	_mm_unpackeh_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackeh_pu16 (__m64 m1)
Description 
Unpacks the two 16-bit values from the upper half of m1 and zero-extends each value.
This function uses the assembler instruction WUNPCKEHUH.
 

20.9.191 	_mm_unpackeh_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackeh_pu32 (__m64 m1)
Description 
Unpacks the 32-bit value from the upper half of m1 and zero-extends each value.
This function uses the assembler instruction WUNPCKEHUW.
 

20.9.192 	_mm_unpackel_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackel_pi8 (__m64 m1)
Description 
Unpacks the four 8-bit values from the lower half of m1 and sign-extends each value.
This function uses the assembler instruction WUNPCKELSB.
 

20.9.193 	_mm_unpackel_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackel_pi16 (__m64 m1)
Description 
Unpacks the two 16-bit values from the lower half of m1 and sign-extends each value.
This function uses the assembler instruction WUNPCKELSH.
 

20.9.194 	_mm_unpackel_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackel_pi32 (__m64 m1)
Description 
Unpacks the 32-bit value from the lower half of m1 and sign-extends each value.
This function uses the assembler instruction WUNPCKELSW.
 

20.9.195 	_mm_unpackel_pu8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackel_pu8 (__m64 m1)
Description 
Unpacks the four 8-bit values from the lower half of m1 and zero-extends each value.
This function uses the assembler instruction WUNPCKELUB.
 

20.9.196 	_mm_unpackel_pu16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackel_pu16 (__m64 m1)
Description 
Unpacks the two 16-bit values from the lower half of m1 and zero-extends each value.
This function uses the assembler instruction WUNPCKELUH.
 

20.9.197 	_mm_unpackel_pu32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackel_pu32 (__m64 m1)
Description 
Unpacks the 32-bit value from the lower half of m1 and zero-extends each value.
This function uses the assembler instruction WUNPCKELUW.
 

 
20.9.198 	_mm_unpackhi_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackhi_pi8 (__m64 m1, __m64 m2)
Description 
Interleaves the four 8-bit values from the upper half of m1 with the four values from the upper half of m2. The interleaving begins with the data from m1.
This function uses the assembler instruction WUNPCKIHB.
 

20.9.199 	_mm_unpackhi_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackhi_pi16 (__m64 m1, __m64 m2)
Description 
Interleaves the two 16-bit values from the upper half of m1 with the two values from the upper half of m2. The interleaving begins with the data from m1.
This function uses the assembler instruction WUNPCKIHH.
 

20.9.200 	_mm_unpackhi_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpackhi_pi32 (__m64 m1, __m64 m2)
Description 
Interleaves the 32-bit value from the upper half of m1 with the 32-bit value from the upper half of m2. The interleaving begins with the data from m1.
This function uses the assembler instruction WUNPCKIHW.
 

20.9.201 	_mm_unpacklo_pi8
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpacklo_pi8 (__m64 m1, __m64 m2)
Description 
Interleaves the four 8-bit values from the lower half of m1 with the four values from the lower half of m2. The interleaving begins with the data from m1.
This function uses the assembler instruction WUNPCKILB.
 

20.9.202 	_mm_unpacklo_pi16
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpacklo_pi16 (__m64 m1, __m64 m2)
Description 
Interleaves the two 16-bit values from the lower half of m1 with the two values from the lower half of m2. The interleaving begins with the data from m1.
This function uses the assembler instruction WUNPCKILH.
 

20.9.203 	_mm_unpacklo_pi32
 
Syntax 
#include <mmintrin.h>
__m64
_mm_unpacklo_pi32 (__m64 m1, __m64 m2)
Description 
Interleaves the 32-bit value from the lower half of m1 with the 32-bit value from the lower half of m2. The interleaving begins with the data from m1. 
This function uses the assembler instruction WUNPCKILW.
 

20.9.204 	_mm_wmiabb_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiabb_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the lower 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIABB.
 
 
20.9.205 	_mm_wmiabbn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiabbn_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the lower 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIABBN.
 
 
20.9.206 	_mm_wmiabt_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiabt_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the lower 16 bits (halfwords) of each of the two words of the parameter m1 and the upper 16 bits (halfwords) of each of the two words of the parameter m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIABT.
 
 
20.9.207 	_mm_wmiabtn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiabtn_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the lower 16 bits (halfwords) of each of the two words of the parameter m1 and the upper 16 bits (halfwords) of each of the two words of the parameter m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIABTN.
 
 
20.9.208 	_mm_wmiatb_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiatb_si64 (__m64 acc,__m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the upper 16 bits (halfwords) of each of the two words of the parameter m1 and the lower 16 bits (halfwords) of each of the two words of the parameter m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIATB.
 
 
20.9.209 	_mm_wmiatbn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiatbn_si64 (__m64 acc,__m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the upper 16 bits (halfwords) of each of the two words of the parameter m1 and the lower 16 bits (halfwords) of each of the two words of the parameter m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIATBN.
 
 
20.9.210 	_mm_wmiatt_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiatt_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the upper 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIATT.
 
 
20.9.211 	_mm_wmiattn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiattn_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the upper 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIATTN.
 
 
20.9.212 	_mm_wmiawbb_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawbb_si64 (__m64 acc,__m64 m1, __m64 m2)
Description 
Performs a 32-bit parallel multiply-accumulate with the lower 32 bits (bottom word) of each of the parameters m1 and m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIAWBB.
 
 
20.9.213 	_mm_wmiawbbn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawbbn_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 16-bit parallel multiply-accumulate with the lower 16 bits (halfwords) of each of the two words of the parameters m1 and m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIAWBBN.
 
 
20.9.214 	_mm_wmiawbt_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawbt_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 32-bit parallel multiply-accumulate with the lower 32 bits (bottom word) of the parameter m1 and the upper 32 bits (top word) of the parameter m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIAWBT.
 
 
20.9.215 	_mm_wmiawbtn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawbtn_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 32-bit parallel multiply-accumulate with the lower 32 bits (bottom word) of the parameter m1 and the upper 32 bits (top word) of the parameter m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIAWBTN.
 
 
20.9.216 	_mm_wmiawtb_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawtb_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 32-bit parallel multiply-accumulate with the upper 32 bits (top word) of the parameter m1 and the lower 32 bits (bottom word) of the parameter m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIAWTB.
 
 
20.9.217 	_mm_wmiawtbn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawtbn_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 32-bit parallel multiply-accumulate with the upper 32 bits (top word) of the parameter m1 and the lower 32 bits (bottom word) of the parameter m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIAWTBN.
 
 
20.9.218 	_mm_wmiawtt_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawtt_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 32-bit parallel multiply-accumulate with the upper 32 bits (top word) of the parameters m1 and m2, then adds the result to the parameter acc and returns the result of this addition.
This function uses the assembler instruction WMIAWTT.
 
 
20.9.219 	_mm_wmiawttn_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_wmiawttn_si64 (__m64 acc, __m64 m1, __m64 m2)
Description 
Performs a 32-bit parallel multiply-accumulate with the upper 32 bits (top word) of the parameters m1 and m2, then subtracts the result from the parameter acc and returns the result of this subtraction.
This function uses the assembler instruction WMIAWTTN.
 
 
20.9.220 	_mm_xor_si64
 
Syntax 
#include <mmintrin.h>
__m64
_mm_xor_si64 (__m64 m1, __m64 m2)
Description 
Performs a bitwise XOR of the 64-bit value in m1 with the 64-bit value in m2.
This function uses the assembler instruction WXOR.
 


[-- Attachment #3: 2_mmintrin.diff --]
[-- Type: application/octet-stream, Size: 17503 bytes --]

Index: gcc/config/arm/mmintrin.h
===================================================================
--- gcc/config/arm/mmintrin.h	(revision 178025)
+++ gcc/config/arm/mmintrin.h	(working copy)
@@ -24,16 +24,21 @@
 #ifndef _MMINTRIN_H_INCLUDED
 #define _MMINTRIN_H_INCLUDED
 
+#if defined __cplusplus
+extern "C" { /* Begin "C" */
+/* Intrinsics use C name-mangling.  */
+#endif /* __cplusplus */
+
 /* The data type intended for user use.  */
 typedef unsigned long long __m64, __int64;
 
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
-typedef char __v8qi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
 
 /* "Convert" __m64 and __int64 into each other.  */
-static __inline __m64 
+static __inline __m64
 _mm_cvtsi64_m64 (__int64 __i)
 {
   return __i;
@@ -54,7 +59,7 @@ _mm_cvtsi64_si32 (__int64 __i)
 static __inline __int64
 _mm_cvtsi32_si64 (int __i)
 {
-  return __i;
+  return (__i & 0xffffffff);
 }
 
 /* Pack the four 16-bit values from M1 into the lower four 8-bit values of
@@ -603,7 +608,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2)
 static __inline __m64
 _mm_andnot_si64 (__m64 __m1, __m64 __m2)
 {
-  return __builtin_arm_wandn (__m1, __m2);
+  return __builtin_arm_wandn (__m2, __m1);
 }
 
 /* Bit-wise inclusive OR the 64-bit values in M1 and M2.  */
@@ -935,7 +940,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu8 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B);
+  return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B);
+}
+
+static __inline __m64
+_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8qi)__C);
 }
 
 /* Compute the sum of the absolute differences of the unsigned 16-bit
@@ -944,9 +955,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu16 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B);
+  return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
+static __inline __m64
+_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4hi)__C);
+}
+
+
 /* Compute the sum of the absolute differences of the unsigned 8-bit
    values in A and B.  Return the value in the lower 16-bit word; the
    upper words are cleared.  */
@@ -965,11 +983,8 @@ _mm_sadz_pu16 (__m64 __A, __m64 __B)
   return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
-static __inline __m64
-_mm_align_si64 (__m64 __A, __m64 __B, int __C)
-{
-  return (__m64) __builtin_arm_walign ((__v8qi)__A, (__v8qi)__B, __C);
-}
+#define _mm_align_si64(__A,__B, N) \
+  (__m64) __builtin_arm_walign ((__v8qi) (__A),(__v8qi) (__B), (N))
 
 /* Creates a 64-bit zero.  */
 static __inline __m64
@@ -987,42 +1002,76 @@ _mm_setwcx (const int __value, const int
 {
   switch (__regno)
     {
-    case 0:  __builtin_arm_setwcx (__value, 0); break;
-    case 1:  __builtin_arm_setwcx (__value, 1); break;
-    case 2:  __builtin_arm_setwcx (__value, 2); break;
-    case 3:  __builtin_arm_setwcx (__value, 3); break;
-    case 8:  __builtin_arm_setwcx (__value, 8); break;
-    case 9:  __builtin_arm_setwcx (__value, 9); break;
-    case 10: __builtin_arm_setwcx (__value, 10); break;
-    case 11: __builtin_arm_setwcx (__value, 11); break;
-    default: break;
+    case 0:
+      __asm __volatile ("tmcr wcid, %0" :: "r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmcr wcon, %0" :: "r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmcr wcssf, %0" :: "r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmcr wcasf, %0" :: "r"(__value));
+      break;
+    case 8:
+      __builtin_arm_setwcgr0 (__value);
+      break;
+    case 9:
+      __builtin_arm_setwcgr1 (__value);
+      break;
+    case 10:
+      __builtin_arm_setwcgr2 (__value);
+      break;
+    case 11:
+      __builtin_arm_setwcgr3 (__value);
+      break;
+    default:
+      break;
     }
 }
 
 static __inline int
 _mm_getwcx (const int __regno)
 {
+  int __value;
   switch (__regno)
     {
-    case 0:  return __builtin_arm_getwcx (0);
-    case 1:  return __builtin_arm_getwcx (1);
-    case 2:  return __builtin_arm_getwcx (2);
-    case 3:  return __builtin_arm_getwcx (3);
-    case 8:  return __builtin_arm_getwcx (8);
-    case 9:  return __builtin_arm_getwcx (9);
-    case 10: return __builtin_arm_getwcx (10);
-    case 11: return __builtin_arm_getwcx (11);
-    default: return 0;
+    case 0:
+      __asm __volatile ("tmrc %0, wcid" : "=r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmrc %0, wcon" : "=r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmrc %0, wcssf" : "=r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmrc %0, wcasf" : "=r"(__value));
+      break;
+    case 8:
+      return __builtin_arm_getwcgr0 ();
+    case 9:
+      return __builtin_arm_getwcgr1 ();
+    case 10:
+      return __builtin_arm_getwcgr2 ();
+    case 11:
+      return __builtin_arm_getwcgr3 ();
+    default:
+      break;
     }
+  return __value;
 }
 
 /* Creates a vector of two 32-bit values; I0 is least significant.  */
 static __inline __m64
 _mm_set_pi32 (int __i1, int __i0)
 {
-  union {
+  union
+  {
     __m64 __q;
-    struct {
+    struct
+    {
       unsigned int __i0;
       unsigned int __i1;
     } __s;
@@ -1041,7 +1090,7 @@ _mm_set_pi16 (short __w3, short __w2, sh
   unsigned int __i1 = (unsigned short)__w3 << 16 | (unsigned short)__w2;
   unsigned int __i0 = (unsigned short)__w1 << 16 | (unsigned short)__w0;
   return _mm_set_pi32 (__i1, __i0);
-		       
+
 }
 
 /* Creates a vector of eight 8-bit values; B0 is least significant.  */
@@ -1110,9 +1159,521 @@ _mm_set1_pi8 (char __b)
 
 /* Convert an integer to a __m64 object.  */
 static __inline __m64
-_m_from_int (int __a)
+_mm_abs_pi8 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsb ((__v8qi)m1);
+}
+
+static __inline __m64
+_mm_abs_pi16 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsh ((__v4hi)m1);
+
+}
+
+static __inline __m64
+_mm_abs_pi32 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsw ((__v2si)m1);
+
+}
+
+static __inline __m64
+_mm_addsubhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddsubhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffb ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffh ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffw ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_addc_pu16 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddhc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_addc_pu32 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddwc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_avg4_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4 ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_avg4r_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4r ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_maddx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_maddx_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddux ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsn ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddun ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhi_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhi_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwum ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulsmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulumr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwumr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mullo_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwl ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulm ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_subaddhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wsubaddhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_addbhusl_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusl ((__v4hi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_addbhusm_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusm ((__v4hi)a, (__v8qi)b);
+}
+
+#define _mm_qmiabb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabtn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc=acc;\
+   __m64 _m1=m1;\
+   __m64 _m2=m2;\
+   _acc = (__m64) __builtin_arm_wqmiabtn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiattn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiattn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabtn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiattn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiattn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbtn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawttn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawttn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+/* The third arguments should be an immediate.  */
+#define _mm_merge_si64(a, b, n) \
+  ({\
+   __m64 result;\
+   result = (__m64) __builtin_arm_wmerge ((__m64) (a), (__m64) (b), (n));\
+   result;\
+   })
+
+static __inline __m64
+_mm_alignr0_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr0 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr1_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr1 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr2_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr2 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr3_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr3 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline void
+_mm_tandcb ()
+{
+  __asm __volatile ("tandcb r15");
+}
+
+static __inline void
+_mm_tandch ()
+{
+  __asm __volatile ("tandch r15");
+}
+
+static __inline void
+_mm_tandcw ()
+{
+  __asm __volatile ("tandcw r15");
+}
+
+#define _mm_textrcb(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcb r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrch(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrch r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrcw(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcw r15, %0" : : "i" (n));\
+   })
+
+static __inline void
+_mm_torcb ()
+{
+  __asm __volatile ("torcb r15");
+}
+
+static __inline void
+_mm_torch ()
+{
+  __asm __volatile ("torch r15");
+}
+
+static __inline void
+_mm_torcw ()
+{
+  __asm __volatile ("torcw r15");
+}
+
+static __inline void
+_mm_torvscb ()
+{
+  __asm __volatile ("torvscb r15");
+}
+
+static __inline void
+_mm_torvsch ()
+{
+  __asm __volatile ("torvsch r15");
+}
+
+static __inline void
+_mm_torvscw ()
+{
+  __asm __volatile ("torvscw r15");
+}
+
+static __inline __m64
+_mm_tbcst_pi8 (int value)
+{
+  return (__m64) __builtin_arm_tbcstb ((signed char) value);
+}
+
+static __inline __m64
+_mm_tbcst_pi16 (int value)
+{
+  return (__m64) __builtin_arm_tbcsth ((short) value);
+}
+
+static __inline __m64
+_mm_tbcst_pi32 (int value)
 {
-  return (__m64)__a;
+  return (__m64) __builtin_arm_tbcstw (value);
 }
 
 #define _m_packsswb _mm_packs_pi16
@@ -1250,5 +1811,10 @@ _m_from_int (int __a)
 #define _m_paligniq _mm_align_si64
 #define _m_cvt_si2pi _mm_cvtsi64_m64
 #define _m_cvt_pi2si _mm_cvtm64_si64
+#define _m_from_int _mm_cvtsi32_si64
+#define _m_to_int _mm_cvtsi64_si32
 
+#if defined __cplusplus
+}; /* End "C" */
+#endif /* __cplusplus */
 #endif /* _MMINTRIN_H_INCLUDED */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change
  2011-07-06 10:15 Xinyu Qi
@ 2011-08-18  2:35 ` Ramana Radhakrishnan
  2011-08-24  9:07   ` Xinyu Qi
  0 siblings, 1 reply; 33+ messages in thread
From: Ramana Radhakrishnan @ 2011-08-18  2:35 UTC (permalink / raw)
  To: Xinyu Qi; +Cc: gcc-patches

On 6 July 2011 11:11, Xinyu Qi <xyqi@marvell.com> wrote:
> Hi,
>
> It is the second part of iWMMXt maintenance.
>
> *config/arm/mmintrin.h:
>  Revise the iWMMXt intrinsics head file. Fix some intrinsics and add some new intrinsics

Is there a document somewhere that lists these intrinsics and what
each of these are supposed to be doing ? Missing details again . We
seem to be changing quite a few things.


> +
> +/*  We will treat __int64 as a long long type
> +    and __m64 as an unsigned long long type to conform to VSC++.  */Is
> +typedef unsigned long long __m64;
> +typedef long long __int64;

Interesting this sort of a change with these cases where you are
changing the type to conform to VSC++ ? This just means old code that
uses this is pretty much broken. Not that I have much hope of that
happening by default - -flax-conversions appears to be needed even
with a trunk compiler.

> @@ -54,7 +63,7 @@ _mm_cvtsi64_si32 (__int64 __i)
>  static __inline __int64
>  _mm_cvtsi32_si64 (int __i)
>  {
> -  return __i;
> +  return (__i & 0xffffffff);
>  }

Eh ? why the & 0xffffffff before promotion rules.  Is this set of
intrinsics documented some place ?  What is missing and could be the
subject of a follow-up patch is a set of tests for the wMMX intrinsics
....

What's the behaviour of wandn supposed to be ? Does wandn x, y, z
imply x = y & ~z or x = ~y & z ? If the former then your intrinsic
expansion is wrong unless the meaning of this has changed ? Whats the
behaviour of the intrinsic __mm_and_not_si64 . ?

@@ -985,44 +1004,83 @@ _mm_setzero_si64 (void)
 static __inline void
 _mm_setwcx (const int __value, const int __regno)
 {
> +  /*Since gcc has the imformation of all wcgr regs
> +    in arm backend, use builtin to access them instead
> +    of throw asm directly.  Thus, gcc could do some
> +    optimization on them.  */
> +

Also this comment is contradictory to what follows in the patch .
You've prima-facie replaced them with bits of inline assembler. I'm
not sure this comment makes a lot of sense on its own.


Ramana

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change
@ 2011-07-14  7:39 Xinyu Qi
  0 siblings, 0 replies; 33+ messages in thread
From: Xinyu Qi @ 2011-07-14  7:39 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 118 bytes --]

> 
> Hi,
> 
> It is the second part of iWMMXt maintenance.


*config/arm/mmintrin.h: Revise.

Thanks,
Xinyu

[-- Attachment #2: 2_mmintrin.diff --]
[-- Type: application/octet-stream, Size: 17935 bytes --]

Index: gcc/config/arm/mmintrin.h
===================================================================
--- gcc/config/arm/mmintrin.h	(revision 175285)
+++ gcc/config/arm/mmintrin.h	(working copy)
@@ -24,16 +24,25 @@
 #ifndef _MMINTRIN_H_INCLUDED
 #define _MMINTRIN_H_INCLUDED
 
+#if defined __cplusplus
+extern "C" { /* Begin "C" */
+/* Intrinsics use C name-mangling.  */
+#endif /* __cplusplus */
+
 /* The data type intended for user use.  */
-typedef unsigned long long __m64, __int64;
+
+/*  We will treat __int64 as a long long type
+    and __m64 as an unsigned long long type to conform to VSC++.  */
+typedef unsigned long long __m64;
+typedef long long __int64;
 
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
-typedef char __v8qi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
 
 /* "Convert" __m64 and __int64 into each other.  */
-static __inline __m64 
+static __inline __m64
 _mm_cvtsi64_m64 (__int64 __i)
 {
   return __i;
@@ -54,7 +63,7 @@ _mm_cvtsi64_si32 (__int64 __i)
 static __inline __int64
 _mm_cvtsi32_si64 (int __i)
 {
-  return __i;
+  return (__i & 0xffffffff);
 }
 
 /* Pack the four 16-bit values from M1 into the lower four 8-bit values of
@@ -603,7 +612,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2)
 static __inline __m64
 _mm_andnot_si64 (__m64 __m1, __m64 __m2)
 {
-  return __builtin_arm_wandn (__m1, __m2);
+  return __builtin_arm_wandn (__m2, __m1);
 }
 
 /* Bit-wise inclusive OR the 64-bit values in M1 and M2.  */
@@ -935,7 +944,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu8 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B);
+  return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B);
+}
+
+static __inline __m64
+_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8qi)__C);
 }
 
 /* Compute the sum of the absolute differences of the unsigned 16-bit
@@ -944,9 +959,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu16 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B);
+  return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
+static __inline __m64
+_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4hi)__C);
+}
+
+
 /* Compute the sum of the absolute differences of the unsigned 8-bit
    values in A and B.  Return the value in the lower 16-bit word; the
    upper words are cleared.  */
@@ -965,11 +987,8 @@ _mm_sadz_pu16 (__m64 __A, __m64 __B)
   return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
-static __inline __m64
-_mm_align_si64 (__m64 __A, __m64 __B, int __C)
-{
-  return (__m64) __builtin_arm_walign ((__v8qi)__A, (__v8qi)__B, __C);
-}
+#define _mm_align_si64(__A,__B, N) \
+  (__m64) __builtin_arm_walign ((__v8qi) (__A),(__v8qi) (__B), (N))
 
 /* Creates a 64-bit zero.  */
 static __inline __m64
@@ -985,44 +1004,83 @@ _mm_setzero_si64 (void)
 static __inline void
 _mm_setwcx (const int __value, const int __regno)
 {
+  /*Since gcc has the imformation of all wcgr regs
+    in arm backend, use builtin to access them instead
+    of throw asm directly.  Thus, gcc could do some
+    optimization on them.  */
+
   switch (__regno)
     {
-    case 0:  __builtin_arm_setwcx (__value, 0); break;
-    case 1:  __builtin_arm_setwcx (__value, 1); break;
-    case 2:  __builtin_arm_setwcx (__value, 2); break;
-    case 3:  __builtin_arm_setwcx (__value, 3); break;
-    case 8:  __builtin_arm_setwcx (__value, 8); break;
-    case 9:  __builtin_arm_setwcx (__value, 9); break;
-    case 10: __builtin_arm_setwcx (__value, 10); break;
-    case 11: __builtin_arm_setwcx (__value, 11); break;
-    default: break;
+    case 0:
+      __asm __volatile ("tmcr wcid, %0" :: "r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmcr wcon, %0" :: "r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmcr wcssf, %0" :: "r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmcr wcasf, %0" :: "r"(__value));
+      break;
+    case 8:
+      __builtin_arm_setwcgr0 (__value);
+      break;
+    case 9:
+      __builtin_arm_setwcgr1 (__value);
+      break;
+    case 10:
+      __builtin_arm_setwcgr2 (__value);
+      break;
+    case 11:
+      __builtin_arm_setwcgr3 (__value);
+      break;
+    default:
+      break;
     }
 }
 
 static __inline int
 _mm_getwcx (const int __regno)
 {
+  int __value;
   switch (__regno)
     {
-    case 0:  return __builtin_arm_getwcx (0);
-    case 1:  return __builtin_arm_getwcx (1);
-    case 2:  return __builtin_arm_getwcx (2);
-    case 3:  return __builtin_arm_getwcx (3);
-    case 8:  return __builtin_arm_getwcx (8);
-    case 9:  return __builtin_arm_getwcx (9);
-    case 10: return __builtin_arm_getwcx (10);
-    case 11: return __builtin_arm_getwcx (11);
-    default: return 0;
+    case 0:
+      __asm __volatile ("tmrc %0, wcid" : "=r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmrc %0, wcon" : "=r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmrc %0, wcssf" : "=r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmrc %0, wcasf" : "=r"(__value));
+      break;
+    case 8:
+      return __builtin_arm_getwcgr0 ();
+    case 9:
+      return __builtin_arm_getwcgr1 ();
+    case 10:
+      return __builtin_arm_getwcgr2 ();
+    case 11:
+      return __builtin_arm_getwcgr3 ();
+    default:
+      break;
     }
+  return __value;
 }
 
 /* Creates a vector of two 32-bit values; I0 is least significant.  */
 static __inline __m64
 _mm_set_pi32 (int __i1, int __i0)
 {
-  union {
+  union
+  {
     __m64 __q;
-    struct {
+    struct
+    {
       unsigned int __i0;
       unsigned int __i1;
     } __s;
@@ -1041,7 +1099,7 @@ _mm_set_pi16 (short __w3, short __w2, sh
   unsigned int __i1 = (unsigned short)__w3 << 16 | (unsigned short)__w2;
   unsigned int __i0 = (unsigned short)__w1 << 16 | (unsigned short)__w0;
   return _mm_set_pi32 (__i1, __i0);
-		       
+
 }
 
 /* Creates a vector of eight 8-bit values; B0 is least significant.  */
@@ -1110,9 +1168,521 @@ _mm_set1_pi8 (char __b)
 
 /* Convert an integer to a __m64 object.  */
 static __inline __m64
-_m_from_int (int __a)
+_mm_abs_pi8 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsb ((__v8qi)m1);
+}
+
+static __inline __m64
+_mm_abs_pi16 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsh ((__v4hi)m1);
+
+}
+
+static __inline __m64
+_mm_abs_pi32 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsw ((__v2si)m1);
+
+}
+
+static __inline __m64
+_mm_addsubhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddsubhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffb ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffh ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffw ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_addc_pu16 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddhc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_addc_pu32 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddwc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_avg4_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4 ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_avg4r_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4r ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_maddx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_maddx_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddux ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsn ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddun ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhi_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhi_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwum ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulsmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulumr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwumr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mullo_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwl ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulm ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_subaddhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wsubaddhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_addbhusl_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusl ((__v4hi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_addbhusm_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusm ((__v4hi)a, (__v8qi)b);
+}
+
+#define _mm_qmiabb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabtn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc=acc;\
+   __m64 _m1=m1;\
+   __m64 _m2=m2;\
+   _acc = (__m64) __builtin_arm_wqmiabtn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiattn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiattn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabtn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiattn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiattn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbtn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawttn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawttn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+/* The third arguments should be an immediate.  */
+#define _mm_merge_si64(a, b, n) \
+  ({\
+   __m64 result;\
+   result = (__m64) __builtin_arm_wmerge ((__m64) (a), (__m64) (b), (n));\
+   result;\
+   })
+
+static __inline __m64
+_mm_alignr0_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr0 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr1_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr1 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr2_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr2 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr3_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr3 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline void
+_mm_tandcb ()
+{
+  __asm __volatile ("tandcb r15");
+}
+
+static __inline void
+_mm_tandch ()
+{
+  __asm __volatile ("tandch r15");
+}
+
+static __inline void
+_mm_tandcw ()
+{
+  __asm __volatile ("tandcw r15");
+}
+
+#define _mm_textrcb(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcb r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrch(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrch r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrcw(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcw r15, %0" : : "i" (n));\
+   })
+
+static __inline void
+_mm_torcb ()
+{
+  __asm __volatile ("torcb r15");
+}
+
+static __inline void
+_mm_torch ()
+{
+  __asm __volatile ("torch r15");
+}
+
+static __inline void
+_mm_torcw ()
+{
+  __asm __volatile ("torcw r15");
+}
+
+static __inline void
+_mm_torvscb ()
+{
+  __asm __volatile ("torvscb r15");
+}
+
+static __inline void
+_mm_torvsch ()
+{
+  __asm __volatile ("torvsch r15");
+}
+
+static __inline void
+_mm_torvscw ()
+{
+  __asm __volatile ("torvscw r15");
+}
+
+static __inline __m64
+_mm_tbcst_pi8 (int value)
+{
+  return (__m64) __builtin_arm_tbcstb ((signed char) value);
+}
+
+static __inline __m64
+_mm_tbcst_pi16 (int value)
+{
+  return (__m64) __builtin_arm_tbcsth ((short) value);
+}
+
+static __inline __m64
+_mm_tbcst_pi32 (int value)
 {
-  return (__m64)__a;
+  return (__m64) __builtin_arm_tbcstw (value);
 }
 
 #define _m_packsswb _mm_packs_pi16
@@ -1250,5 +1820,10 @@ _m_from_int (int __a)
 #define _m_paligniq _mm_align_si64
 #define _m_cvt_si2pi _mm_cvtsi64_m64
 #define _m_cvt_pi2si _mm_cvtm64_si64
+#define _m_from_int _mm_cvtsi32_si64
+#define _m_to_int _mm_cvtsi64_si32
 
+#if defined __cplusplus
+}; /* End "C" */
+#endif /* __cplusplus */
 #endif /* _MMINTRIN_H_INCLUDED */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change
@ 2011-07-06 10:15 Xinyu Qi
  2011-08-18  2:35 ` Ramana Radhakrishnan
  0 siblings, 1 reply; 33+ messages in thread
From: Xinyu Qi @ 2011-07-06 10:15 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 189 bytes --]

Hi,

It is the second part of iWMMXt maintenance.

*config/arm/mmintrin.h:
 Revise the iWMMXt intrinsics head file. Fix some intrinsics and add some new intrinsics.

Thanks,
Xinyu

[-- Attachment #2: 2_mmintrin.diff --]
[-- Type: application/octet-stream, Size: 17935 bytes --]

Index: gcc/config/arm/mmintrin.h
===================================================================
--- gcc/config/arm/mmintrin.h	(revision 175285)
+++ gcc/config/arm/mmintrin.h	(working copy)
@@ -24,16 +24,25 @@
 #ifndef _MMINTRIN_H_INCLUDED
 #define _MMINTRIN_H_INCLUDED
 
+#if defined __cplusplus
+extern "C" { /* Begin "C" */
+/* Intrinsics use C name-mangling.  */
+#endif /* __cplusplus */
+
 /* The data type intended for user use.  */
-typedef unsigned long long __m64, __int64;
+
+/*  We will treat __int64 as a long long type
+    and __m64 as an unsigned long long type to conform to VSC++.  */
+typedef unsigned long long __m64;
+typedef long long __int64;
 
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
-typedef char __v8qi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
 
 /* "Convert" __m64 and __int64 into each other.  */
-static __inline __m64 
+static __inline __m64
 _mm_cvtsi64_m64 (__int64 __i)
 {
   return __i;
@@ -54,7 +63,7 @@ _mm_cvtsi64_si32 (__int64 __i)
 static __inline __int64
 _mm_cvtsi32_si64 (int __i)
 {
-  return __i;
+  return (__i & 0xffffffff);
 }
 
 /* Pack the four 16-bit values from M1 into the lower four 8-bit values of
@@ -603,7 +612,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2)
 static __inline __m64
 _mm_andnot_si64 (__m64 __m1, __m64 __m2)
 {
-  return __builtin_arm_wandn (__m1, __m2);
+  return __builtin_arm_wandn (__m2, __m1);
 }
 
 /* Bit-wise inclusive OR the 64-bit values in M1 and M2.  */
@@ -935,7 +944,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu8 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B);
+  return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B);
+}
+
+static __inline __m64
+_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8qi)__C);
 }
 
 /* Compute the sum of the absolute differences of the unsigned 16-bit
@@ -944,9 +959,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu16 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B);
+  return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
+static __inline __m64
+_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4hi)__C);
+}
+
+
 /* Compute the sum of the absolute differences of the unsigned 8-bit
    values in A and B.  Return the value in the lower 16-bit word; the
    upper words are cleared.  */
@@ -965,11 +987,8 @@ _mm_sadz_pu16 (__m64 __A, __m64 __B)
   return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
-static __inline __m64
-_mm_align_si64 (__m64 __A, __m64 __B, int __C)
-{
-  return (__m64) __builtin_arm_walign ((__v8qi)__A, (__v8qi)__B, __C);
-}
+#define _mm_align_si64(__A,__B, N) \
+  (__m64) __builtin_arm_walign ((__v8qi) (__A),(__v8qi) (__B), (N))
 
 /* Creates a 64-bit zero.  */
 static __inline __m64
@@ -985,44 +1004,83 @@ _mm_setzero_si64 (void)
 static __inline void
 _mm_setwcx (const int __value, const int __regno)
 {
+  /*Since gcc has the imformation of all wcgr regs
+    in arm backend, use builtin to access them instead
+    of throw asm directly.  Thus, gcc could do some
+    optimization on them.  */
+
   switch (__regno)
     {
-    case 0:  __builtin_arm_setwcx (__value, 0); break;
-    case 1:  __builtin_arm_setwcx (__value, 1); break;
-    case 2:  __builtin_arm_setwcx (__value, 2); break;
-    case 3:  __builtin_arm_setwcx (__value, 3); break;
-    case 8:  __builtin_arm_setwcx (__value, 8); break;
-    case 9:  __builtin_arm_setwcx (__value, 9); break;
-    case 10: __builtin_arm_setwcx (__value, 10); break;
-    case 11: __builtin_arm_setwcx (__value, 11); break;
-    default: break;
+    case 0:
+      __asm __volatile ("tmcr wcid, %0" :: "r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmcr wcon, %0" :: "r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmcr wcssf, %0" :: "r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmcr wcasf, %0" :: "r"(__value));
+      break;
+    case 8:
+      __builtin_arm_setwcgr0 (__value);
+      break;
+    case 9:
+      __builtin_arm_setwcgr1 (__value);
+      break;
+    case 10:
+      __builtin_arm_setwcgr2 (__value);
+      break;
+    case 11:
+      __builtin_arm_setwcgr3 (__value);
+      break;
+    default:
+      break;
     }
 }
 
 static __inline int
 _mm_getwcx (const int __regno)
 {
+  int __value;
   switch (__regno)
     {
-    case 0:  return __builtin_arm_getwcx (0);
-    case 1:  return __builtin_arm_getwcx (1);
-    case 2:  return __builtin_arm_getwcx (2);
-    case 3:  return __builtin_arm_getwcx (3);
-    case 8:  return __builtin_arm_getwcx (8);
-    case 9:  return __builtin_arm_getwcx (9);
-    case 10: return __builtin_arm_getwcx (10);
-    case 11: return __builtin_arm_getwcx (11);
-    default: return 0;
+    case 0:
+      __asm __volatile ("tmrc %0, wcid" : "=r"(__value));
+      break;
+    case 1:
+      __asm __volatile ("tmrc %0, wcon" : "=r"(__value));
+      break;
+    case 2:
+      __asm __volatile ("tmrc %0, wcssf" : "=r"(__value));
+      break;
+    case 3:
+      __asm __volatile ("tmrc %0, wcasf" : "=r"(__value));
+      break;
+    case 8:
+      return __builtin_arm_getwcgr0 ();
+    case 9:
+      return __builtin_arm_getwcgr1 ();
+    case 10:
+      return __builtin_arm_getwcgr2 ();
+    case 11:
+      return __builtin_arm_getwcgr3 ();
+    default:
+      break;
     }
+  return __value;
 }
 
 /* Creates a vector of two 32-bit values; I0 is least significant.  */
 static __inline __m64
 _mm_set_pi32 (int __i1, int __i0)
 {
-  union {
+  union
+  {
     __m64 __q;
-    struct {
+    struct
+    {
       unsigned int __i0;
       unsigned int __i1;
     } __s;
@@ -1041,7 +1099,7 @@ _mm_set_pi16 (short __w3, short __w2, sh
   unsigned int __i1 = (unsigned short)__w3 << 16 | (unsigned short)__w2;
   unsigned int __i0 = (unsigned short)__w1 << 16 | (unsigned short)__w0;
   return _mm_set_pi32 (__i1, __i0);
-		       
+
 }
 
 /* Creates a vector of eight 8-bit values; B0 is least significant.  */
@@ -1110,9 +1168,521 @@ _mm_set1_pi8 (char __b)
 
 /* Convert an integer to a __m64 object.  */
 static __inline __m64
-_m_from_int (int __a)
+_mm_abs_pi8 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsb ((__v8qi)m1);
+}
+
+static __inline __m64
+_mm_abs_pi16 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsh ((__v4hi)m1);
+
+}
+
+static __inline __m64
+_mm_abs_pi32 (__m64 m1)
+{
+  return (__m64) __builtin_arm_wabsw ((__v2si)m1);
+
+}
+
+static __inline __m64
+_mm_addsubhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddsubhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffb ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffh ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_absdiff_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wabsdiffw ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_addc_pu16 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddhc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_addc_pu32 (__m64 a, __m64 b)
+{
+  __m64 result;
+  __asm__ __volatile__ ("waddwc	%0, %1, %2" : "=y" (result) : "y" (a),  "y" (b));
+  return result;
+}
+
+static __inline __m64
+_mm_avg4_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4 ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_avg4r_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wavg4r ((__v8qi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_maddx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_maddx_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddux ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddsn ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_msub_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmaddun ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhi_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhi_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwum ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulsmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwsmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulumr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_mulhir_pu32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwumr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_mullo_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wmulwl ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulm ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulm_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwm ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulmr ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_qmulmr_pi32 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wqmulwmr ((__v2si)a, (__v2si)b);
+}
+
+static __inline __m64
+_mm_subaddhx_pi16 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_wsubaddhx ((__v4hi)a, (__v4hi)b);
+}
+
+static __inline __m64
+_mm_addbhusl_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusl ((__v4hi)a, (__v8qi)b);
+}
+
+static __inline __m64
+_mm_addbhusm_pu8 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_waddbhusm ((__v4hi)a, (__v8qi)b);
+}
+
+#define _mm_qmiabb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiabt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiabtn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc=acc;\
+   __m64 _m1=m1;\
+   __m64 _m2=m2;\
+   _acc = (__m64) __builtin_arm_wqmiabtn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatb_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatb ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatbn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatbn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiatt_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiatt ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_qmiattn_pi32(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wqmiattn ((__v2si)_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiabtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiabtn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatb (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatbn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiatt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiatt (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiattn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiattn (_acc, (__v4hi)_m1, (__v4hi)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawbtn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawbtn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtb_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtb (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtbn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtbn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawtt_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawtt (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+#define _mm_wmiawttn_si64(acc, m1, m2) \
+  ({\
+   __m64 _acc = acc;\
+   __m64 _m1 = m1;\
+   __m64 _m2 = m2;\
+   _acc = (__m64) __builtin_arm_wmiawttn (_acc, (__v2si)_m1, (__v2si)_m2);\
+   _acc;\
+   })
+
+/* The third arguments should be an immediate.  */
+#define _mm_merge_si64(a, b, n) \
+  ({\
+   __m64 result;\
+   result = (__m64) __builtin_arm_wmerge ((__m64) (a), (__m64) (b), (n));\
+   result;\
+   })
+
+static __inline __m64
+_mm_alignr0_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr0 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr1_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr1 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr2_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr2 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline __m64
+_mm_alignr3_si64 (__m64 a, __m64 b)
+{
+  return (__m64) __builtin_arm_walignr3 ((__v8qi) a, (__v8qi) b);
+}
+
+static __inline void
+_mm_tandcb ()
+{
+  __asm __volatile ("tandcb r15");
+}
+
+static __inline void
+_mm_tandch ()
+{
+  __asm __volatile ("tandch r15");
+}
+
+static __inline void
+_mm_tandcw ()
+{
+  __asm __volatile ("tandcw r15");
+}
+
+#define _mm_textrcb(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcb r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrch(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrch r15, %0" : : "i" (n));\
+   })
+
+#define _mm_textrcw(n) \
+  ({\
+   __asm__ __volatile__ (\
+     "textrcw r15, %0" : : "i" (n));\
+   })
+
+static __inline void
+_mm_torcb ()
+{
+  __asm __volatile ("torcb r15");
+}
+
+static __inline void
+_mm_torch ()
+{
+  __asm __volatile ("torch r15");
+}
+
+static __inline void
+_mm_torcw ()
+{
+  __asm __volatile ("torcw r15");
+}
+
+static __inline void
+_mm_torvscb ()
+{
+  __asm __volatile ("torvscb r15");
+}
+
+static __inline void
+_mm_torvsch ()
+{
+  __asm __volatile ("torvsch r15");
+}
+
+static __inline void
+_mm_torvscw ()
+{
+  __asm __volatile ("torvscw r15");
+}
+
+static __inline __m64
+_mm_tbcst_pi8 (int value)
+{
+  return (__m64) __builtin_arm_tbcstb ((signed char) value);
+}
+
+static __inline __m64
+_mm_tbcst_pi16 (int value)
+{
+  return (__m64) __builtin_arm_tbcsth ((short) value);
+}
+
+static __inline __m64
+_mm_tbcst_pi32 (int value)
 {
-  return (__m64)__a;
+  return (__m64) __builtin_arm_tbcstw (value);
 }
 
 #define _m_packsswb _mm_packs_pi16
@@ -1250,5 +1820,10 @@ _m_from_int (int __a)
 #define _m_paligniq _mm_align_si64
 #define _m_cvt_si2pi _mm_cvtsi64_m64
 #define _m_cvt_pi2si _mm_cvtm64_si64
+#define _m_from_int _mm_cvtsi32_si64
+#define _m_to_int _mm_cvtsi64_si32
 
+#if defined __cplusplus
+}; /* End "C" */
+#endif /* __cplusplus */
 #endif /* _MMINTRIN_H_INCLUDED */

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2013-04-02  9:50 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-29  4:13 [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Matt Turner
2012-05-29  4:14 ` [PATCH ARM iWMMXt 3/5] built in define and expand Matt Turner
2012-06-06 11:55   ` Ramana Radhakrishnan
2012-05-29  4:14 ` [PATCH ARM iWMMXt 5/5] pipeline description Matt Turner
2012-05-29  4:14 ` [PATCH ARM iWMMXt 1/5] ARM code generic change Matt Turner
2012-06-06 11:53   ` Ramana Radhakrishnan
2012-12-27  2:31     ` [PATCH, ARM, iWMMXT] Fix define_constants for WCGR Xinyu Qi
2013-01-22  9:22     ` [PING][PATCH, " Xinyu Qi
2013-01-22 11:59       ` Ramana Radhakrishnan
2013-01-22 13:34         ` Andreas Schwab
2013-01-23  6:08         ` Xinyu Qi
2013-01-31  8:49         ` [PATCH, " Xinyu Qi
2013-03-20  2:43         ` Xinyu Qi
2013-03-26 14:01           ` Ramana Radhakrishnan
2013-04-02  9:55             ` [PATCH, ARM, iWMMXT] PR target/54338 - Include IWMMXT_GR_REGS in ALL_REGS Xinyu Qi
2013-04-02 10:03               ` Ramana Radhakrishnan
2012-05-29  4:15 ` [PATCH ARM iWMMXt 2/5] intrinsic head file change Matt Turner
2012-06-06 12:22   ` Ramana Radhakrishnan
2012-05-29  4:15 ` [PATCH ARM iWMMXt 4/5] WMMX machine description Matt Turner
2012-06-06 11:59 ` [PATCH ARM iWMMXt 0/5] Improve iWMMXt support Ramana Radhakrishnan
2012-06-11  9:24 ` nick clifton
2012-06-13  7:36 ` nick clifton
2012-06-13 15:31   ` Matt Turner
2012-06-26 15:20     ` nick clifton
2012-06-27 19:15       ` Matt Turner
2013-01-28  3:49       ` Matt Turner
2013-01-28 15:11         ` nick clifton
2013-02-21  2:35           ` closing PR's (was Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support) Hans-Peter Nilsson
2013-02-22 12:42             ` nick clifton
  -- strict thread matches above, loose matches on Subject: below --
2011-07-14  7:39 [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change Xinyu Qi
2011-07-06 10:15 Xinyu Qi
2011-08-18  2:35 ` Ramana Radhakrishnan
2011-08-24  9:07   ` Xinyu Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).