public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [MIPS] ST Loongson 2E/2F submission
@ 2008-05-22 17:29 Maxim Kuvyrkov
  2008-05-22 17:53 ` [MIPS][LS2][1/5] Generic support Maxim Kuvyrkov
                   ` (6 more replies)
  0 siblings, 7 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-22 17:29 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Zhang Le, Eric Fisher

Hello Richard,

In this thread I'll post patches that add support for ST Loongson 2E/2F 
CPUs to MIPS GCC port.

These patches were developed at CodeSourcery Inc. by

* Mark Shinwell
* Nathan Sidwell
* Daniel Jacobowitz
* Kazu Hirata
* me

I'm now testing the cumulative patch at mips64el-st-linux-gnu Loongson 
2E and Loongson 2F boxes.


Thanks,

Maxim Kuvyrkov

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [MIPS][LS2][1/5] Generic support
  2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
@ 2008-05-22 17:53 ` Maxim Kuvyrkov
  2008-05-22 19:27   ` Richard Sandiford
  2008-05-22 18:08 ` [MIPS][LS2][2/5] Vector intrinsics Maxim Kuvyrkov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-22 17:53 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Zhang Le, Eric Fisher

[-- Attachment #1: Type: text/plain, Size: 158 bytes --]

This patch adds generic support for ST Loongson 2E/2F CPUs such as 
loongson2e and loongson2f values for -march= and mtune= options.

OK for trunk?

--
Maxim

[-- Attachment #2: fsf-ls2ef-1-generic.ChangeLog --]
[-- Type: text/plain, Size: 435 bytes --]

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>

	* config/mips/mips.c (mips_cpu_info_table): Add loongson2e
	and loongson2f entries.
	(mips_rtx_cost_data): Add entries for Loongson-2E/2F.
	* config/mips/mips.h (processor_type): Add Loongson-2E
	and Loongson-2F entries.
	(TARGET_LOONGSON_2E, TARGET_LOONGSON_2F, TARGET_LOONGSON_2EF): New.
	* doc/invoke.texi (MIPS Options): Document loongson2e
	and loongson2f processor names.

[-- Attachment #3: fsf-ls2ef-1-generic.patch --]
[-- Type: text/plain, Size: 2424 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 60)
+++ gcc/doc/invoke.texi	(working copy)
@@ -11925,7 +11925,8 @@ The processor names are:
 @samp{sb1},
 @samp{sr71000},
 @samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},
-@samp{vr5000}, @samp{vr5400} and @samp{vr5500}.
+@samp{vr5000}, @samp{vr5400}, @samp{vr5500}, @samp{loongson2e} and
+@samp{loongson2f}.
 The special value @samp{from-abi} selects the
 most compatible architecture for the selected ABI (that is,
 @samp{mips1} for 32-bit ABIs and @samp{mips3} for 64-bit ABIs)@.
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 60)
+++ gcc/config/mips/mips.c	(working copy)
@@ -585,6 +585,9 @@ static const struct mips_cpu_info mips_c
   { "orion", PROCESSOR_R4600, 3, 0 },
   { "r4650", PROCESSOR_R4650, 3, 0 },
 
+  { "loongson2e", PROCESSOR_LOONGSON_2E, 3, PTF_AVOID_BRANCHLIKELY },
+  { "loongson2f", PROCESSOR_LOONGSON_2F, 3, PTF_AVOID_BRANCHLIKELY },
+
   /* MIPS IV processors. */
   { "r8000", PROCESSOR_R8000, 4, 0 },
   { "vr5000", PROCESSOR_R5000, 4, 0 },
@@ -1006,6 +1009,12 @@ static const struct mips_rtx_cost_data m
   { /* SR71000 */
     DEFAULT_COSTS
   },
+  { /* Loongson-2E */
+    DEFAULT_COSTS
+  },
+  { /* Loongson-2F */
+    DEFAULT_COSTS
+  },
 };
 \f
 /* This hash table keeps track of implicit "mips16" and "nomips16" attributes
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(revision 60)
+++ gcc/config/mips/mips.h	(working copy)
@@ -67,6 +67,8 @@ enum processor_type {
   PROCESSOR_SB1,
   PROCESSOR_SB1A,
   PROCESSOR_SR71000,
+  PROCESSOR_LOONGSON_2E,
+  PROCESSOR_LOONGSON_2F,
   PROCESSOR_MAX
 };
 
@@ -237,6 +239,9 @@ enum mips_code_readable_setting {
 #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
 				     || mips_arch == PROCESSOR_SB1A)
 #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
+#define TARGET_LOONGSON_2E          (mips_arch == PROCESSOR_LOONGSON_2E)
+#define TARGET_LOONGSON_2F          (mips_arch == PROCESSOR_LOONGSON_2F)
+#define TARGET_LOONGSON_2EF         (TARGET_LOONGSON_2E || TARGET_LOONGSON_2F)
 
 /* Scheduling target defines.  */
 #define TUNE_MIPS3000               (mips_tune == PROCESSOR_R3000)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [MIPS][LS2][2/5] Vector intrinsics
  2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
  2008-05-22 17:53 ` [MIPS][LS2][1/5] Generic support Maxim Kuvyrkov
@ 2008-05-22 18:08 ` Maxim Kuvyrkov
  2008-05-22 19:35   ` Richard Sandiford
  2008-05-22 18:16 ` [MIPS][LS2][3/5] Miscellaneous instructions Maxim Kuvyrkov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-22 18:08 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Zhang Le, Eric Fisher

[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

This patch adds support for ST Loongson 2E/2F vector instructions, 
particularly, intrinsics for such instructions.

<Comment from Nathan Sidwell>:
The one quirk was that normally MIPS doesn't permit DImode in pairs of 
32-bit float regs.  Presumably because you can't do much with them 
there.  However, we need that because DImode objects can be recast to 
vector objects of the same width.  This new freedom causes the existing 
*movdi_32bit insn to blow up.  It allows DImode objects but has no 
predicates for moving them through float regs.  That's because normally 
they're never allocated to them, but now they are, and postreload can 
blow up.  Of course, by the time we get to emitting assembler, such 
patterns have been split.

OK for trunk?

--
Maxim

[-- Attachment #2: fsf-ls2ef-2-vector.ChangeLog --]
[-- Type: text/plain, Size: 1813 bytes --]

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>
	    Nathan Sidwell  <nathan@codesourcery.com>
	
	* config/mips/mips-modes.def: Add V8QI, V4HI and V2SI modes.
	* config/mips/mips-protos.h (mips_expand_vector_init): New.
	* config/mips/mips-ftypes.def: Add function types for Loongson-2E/2F
	builtins.
	* config/mips/mips.c (mips_output_move): Handle cases for
	zero-initializing Loongson vectors in floating-point
	registers and memory.
	(mips_hard_regno_mode_ok_p): Allow 64-bit vector modes for Loongson.
	(mips_vector_mode_supported_p): Add V2SImode, V4HImode and
	V8QImode cases.
	(LOONGSON_BUILTIN): New.
	(mips_loongson_2ef_bdesc): New.
	(mips_bdesc_arrays): Add mips_loongson_2ef_bdesc.
	(MIPS_ATYPE_UQI, MIPS_ATYPE_UDI, MIPS_ATYPE_V2SI, MIPS_ATYPE_UV2SI,
	MIPS_ATYPE_V4HI, MIPS_ATYPE_UV4HI, MIPS_ATYPE_V8QI, MIPS_ATYPE_UV8QI):
	New.
	(mips_init_builtins): Initialize Loongson builtins if
	appropriate.
	(mips_expand_vector_init): New.
	* config/mips/mips.h (HAVE_LOONGSON_VECTOR_MODES): New.
	(TARGET_CPU_CPP_BUILTINS): Define _MIPS_LOONGSON_VECTOR_MODES
	if appropriate.
	* config/mips/mips.md: Add unspec numbers for Loongson
	builtins.  Include loongson.md.
	(mode): Add V2SI, V4HI and V8QI values.
	(MOVE64): Move !TARGET_64BIT condition from split here.  Include
	Loongson vector modes.
	(SPLITF): Include Loongson vector modes.
	(HALFMODE): Handle Loongson vector modes.
	(*movdi_32bit): Add constraints for float<->float,int,mem moves.
	* config/mips/loongson.md: New.
	* config/mips/loongson.h: New.
	* config.gcc: Add loongson.h header for mips*-*-* targets.
	* doc/extend.texi (MIPS Loongson Built-in Functions): New.

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>

	* lib/target-supports.exp (check_effective_target_mips_loongson): New.
	* gcc.target/mips/loongson-simd.c: New.

[-- Attachment #3: fsf-ls2ef-2-vector.patch --]
[-- Type: text/plain, Size: 122335 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 62)
+++ gcc/doc/extend.texi	(working copy)
@@ -6757,6 +6757,7 @@ instructions, but allow the compiler to 
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
 * PowerPC AltiVec Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
@@ -8636,6 +8637,150 @@ value is the upper one.  The opposite or
 For example, the code above will set the lower half of @code{a} to
 @code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
+
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
+
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
+
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
+
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+@end smallexample
+
+Also provided are helper functions for loading and storing values of the
+above 64-bit vector types to and from memory:
+
+@smallexample
+uint32x2_t vec_load_uw (uint32x2_t *src);
+uint16x4_t vec_load_uh (uint16x4_t *src);
+uint8x8_t vec_load_ub (uint8x8_t *src);
+int32x2_t vec_load_sw (int32x2_t *src);
+int16x4_t vec_load_sh (int16x4_t *src);
+int8x8_t vec_load_sb (int8x8_t *src);
+void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
+void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
+void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
+void vec_store_sw (int32x2_t v, int32x2_t *dest);
+void vec_store_sh (int16x4_t v, int16x4_t *dest);
+void vec_store_sb (int8x8_t v, int8x8_t *dest);
+@end smallexample
+
 @menu
 * Paired-Single Arithmetic::
 * Paired-Single Built-in Functions::
Index: gcc/testsuite/gcc.target/mips/loongson-simd.c
===================================================================
--- gcc/testsuite/gcc.target/mips/loongson-simd.c	(revision 0)
+++ gcc/testsuite/gcc.target/mips/loongson-simd.c	(revision 0)
@@ -0,0 +1,2380 @@
+/* Test cases for ST Microelectronics Loongson-2E/2F SIMD intrinsics.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target mips_loongson } */
+
+#include "loongson.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+
+typedef union { int32x2_t v; int32_t a[2]; } int32x2_encap_t;
+typedef union { int16x4_t v; int16_t a[4]; } int16x4_encap_t;
+typedef union { int8x8_t v; int8_t a[8]; } int8x8_encap_t;
+typedef union { uint32x2_t v; uint32_t a[2]; } uint32x2_encap_t;
+typedef union { uint16x4_t v; uint16_t a[4]; } uint16x4_encap_t;
+typedef union { uint8x8_t v; uint8_t a[8]; } uint8x8_encap_t;
+
+#define UINT16x4_MAX USHRT_MAX
+#define UINT8x8_MAX UCHAR_MAX
+#define INT8x8_MAX SCHAR_MAX
+#define INT16x4_MAX SHRT_MAX
+#define INT32x2_MAX INT_MAX
+
+static void test_packsswh (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = INT16x4_MAX - 2;
+  s.a[1] = INT16x4_MAX - 1;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX + 1;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = packsswh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 2);
+  assert (r.a[1] == INT16x4_MAX - 1);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_packsshb (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = INT8x8_MAX - 6;
+  s.a[1] = INT8x8_MAX - 5;
+  s.a[2] = INT8x8_MAX - 4;
+  s.a[3] = INT8x8_MAX - 3;
+  t.a[0] = INT8x8_MAX - 2;
+  t.a[1] = INT8x8_MAX - 1;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX + 1;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = packsshb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_packushb (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = UINT8x8_MAX - 6;
+  s.a[1] = UINT8x8_MAX - 5;
+  s.a[2] = UINT8x8_MAX - 4;
+  s.a[3] = UINT8x8_MAX - 3;
+  t.a[0] = UINT8x8_MAX - 2;
+  t.a[1] = UINT8x8_MAX - 1;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX + 1;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = packushb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == UINT8x8_MAX - 6);
+  assert (r.a[1] == UINT8x8_MAX - 5);
+  assert (r.a[2] == UINT8x8_MAX - 4);
+  assert (r.a[3] == UINT8x8_MAX - 3);
+  assert (r.a[4] == UINT8x8_MAX - 2);
+  assert (r.a[5] == UINT8x8_MAX - 1);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_paddw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = paddw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 6);
+}
+
+static void test_paddw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = paddw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_paddh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = paddh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 6);
+  assert (r.a[1] == 8);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 12);
+}
+
+static void test_paddh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = paddh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+}
+
+static void test_paddb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 5;
+  s.a[5] = 6;
+  s.a[6] = 7;
+  s.a[7] = 8;
+  t.a[0] = 9;
+  t.a[1] = 10;
+  t.a[2] = 11;
+  t.a[3] = 12;
+  t.a[4] = 13;
+  t.a[5] = 14;
+  t.a[6] = 15;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = paddb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 12);
+  assert (r.a[2] == 14);
+  assert (r.a[3] == 16);
+  assert (r.a[4] == 18);
+  assert (r.a[5] == 20);
+  assert (r.a[6] == 22);
+  assert (r.a[7] == 24);
+}
+
+static void test_paddb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = paddb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+  assert (r.a[4] == -45);
+  assert (r.a[5] == -54);
+  assert (r.a[6] == -63);
+  assert (r.a[7] == -72);
+}
+
+static void test_paddd_u (void)
+{
+  uint64_t d = 123456;
+  uint64_t e = 789012;
+  uint64_t r;
+  r = paddd_u (d, e);
+  assert (r == 912468);
+}
+
+static void test_paddd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = paddd_s (d, e);
+  assert (r == -665556);
+}
+
+static void test_paddsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX;
+  t.a[2] = INT16x4_MAX;
+  t.a[3] = INT16x4_MAX;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = paddsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_paddsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = INT8x8_MAX;
+  t.a[1] = INT8x8_MAX;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX;
+  t.a[4] = INT8x8_MAX;
+  t.a[5] = INT8x8_MAX;
+  t.a[6] = INT8x8_MAX;
+  t.a[7] = INT8x8_MAX;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = paddsb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_paddush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  t.a[0] = UINT16x4_MAX;
+  t.a[1] = UINT16x4_MAX;
+  t.a[2] = UINT16x4_MAX;
+  t.a[3] = UINT16x4_MAX;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = paddush (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == UINT16x4_MAX);
+  assert (r.a[1] == UINT16x4_MAX);
+  assert (r.a[2] == UINT16x4_MAX);
+  assert (r.a[3] == UINT16x4_MAX);
+}
+
+static void test_paddusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  s.a[4] = 0;
+  s.a[5] = 1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = UINT8x8_MAX;
+  t.a[1] = UINT8x8_MAX;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX;
+  t.a[4] = UINT8x8_MAX;
+  t.a[5] = UINT8x8_MAX;
+  t.a[6] = UINT8x8_MAX;
+  t.a[7] = UINT8x8_MAX;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = paddusb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == UINT8x8_MAX);
+  assert (r.a[1] == UINT8x8_MAX);
+  assert (r.a[2] == UINT8x8_MAX);
+  assert (r.a[3] == UINT8x8_MAX);
+  assert (r.a[4] == UINT8x8_MAX);
+  assert (r.a[5] == UINT8x8_MAX);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_pandn_ud (void)
+{
+  uint64_t d1 = 0x0000ffff0000ffffull;
+  uint64_t d2 = 0x0000ffff0000ffffull;
+  uint64_t r;
+  r = pandn_ud (d1, d2);
+  assert (r == 0);
+}
+
+static void test_pandn_sd (void)
+{
+  int64_t d1 = (int64_t) 0x0000000000000000ull;
+  int64_t d2 = (int64_t) 0xfffffffffffffffeull;
+  int64_t r;
+  r = pandn_sd (d1, d2);
+  assert (r == -2);
+}
+
+static void test_pandn_uw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0x00000000;
+  t.a[1] = 0xffffffff;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pandn_uw (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pandn_sw (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0xffffffff;
+  t.a[1] = 0xfffffffe;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pandn_sw (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+}
+
+static void test_pandn_uh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0x0000;
+  t.a[1] = 0xffff;
+  t.a[2] = 0x0000;
+  t.a[3] = 0xffff;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pandn_uh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pandn_sh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0xffff;
+  t.a[1] = 0xfffe;
+  t.a[2] = 0xffff;
+  t.a[3] = 0xfffe;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pandn_sh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+}
+
+static void test_pandn_ub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0x00;
+  t.a[1] = 0xff;
+  t.a[2] = 0x00;
+  t.a[3] = 0xff;
+  t.a[4] = 0x00;
+  t.a[5] = 0xff;
+  t.a[6] = 0x00;
+  t.a[7] = 0xff;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pandn_ub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pandn_sb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0xff;
+  t.a[1] = 0xfe;
+  t.a[2] = 0xff;
+  t.a[3] = 0xfe;
+  t.a[4] = 0xff;
+  t.a[5] = 0xfe;
+  t.a[6] = 0xff;
+  t.a[7] = 0xfe;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pandn_sb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -2);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -2);
+}
+
+static void test_pavgh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pavgh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+}
+
+static void test_pavgb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 1;
+  s.a[5] = 2;
+  s.a[6] = 3;
+  s.a[7] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pavgb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+  assert (r.a[4] == 3);
+  assert (r.a[5] == 4);
+  assert (r.a[6] == 5);
+  assert (r.a[7] == 6);
+}
+
+static void test_pcmpeqw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pcmpeqw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpeqh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pcmpeqh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpeqb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 42;
+  s.a[5] = 43;
+  s.a[6] = 42;
+  s.a[7] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  t.a[4] = 43;
+  t.a[5] = 43;
+  t.a[6] = 43;
+  t.a[7] = 43;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pcmpeqb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpeqw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pcmpeqw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+}
+
+static void test_pcmpeqh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pcmpeqh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpeqb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = -42;
+  s.a[5] = -42;
+  s.a[6] = -42;
+  s.a[7] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = -42;
+  t.a[6] = 42;
+  t.a[7] = -42;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pcmpeqb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -1);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -1);
+}
+
+static void test_pcmpgtw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 42;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pcmpgtw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpgth_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 40;
+  t.a[1] = 41;
+  t.a[2] = 43;
+  t.a[3] = 42;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pcmpgth_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0x0000);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpgtb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 44;
+  s.a[5] = 45;
+  s.a[6] = 46;
+  s.a[7] = 47;
+  t.a[0] = 48;
+  t.a[1] = 47;
+  t.a[2] = 46;
+  t.a[3] = 45;
+  t.a[4] = 44;
+  t.a[5] = 43;
+  t.a[6] = 42;
+  t.a[7] = 41;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pcmpgtb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0x00);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0x00);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0xff);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpgtw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = -42;
+  t.a[0] = -42;
+  t.a[1] = -42;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pcmpgtw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 0);
+}
+
+static void test_pcmpgth_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = 43;
+  t.a[2] = 44;
+  t.a[3] = -43;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pcmpgth_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpgtb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = 42;
+  s.a[5] = 42;
+  s.a[6] = 42;
+  s.a[7] = 42;
+  t.a[0] = -45;
+  t.a[1] = -44;
+  t.a[2] = -43;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = 43;
+  t.a[6] = 41;
+  t.a[7] = 40;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pcmpgtb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == -1);
+  assert (r.a[7] == -1);
+}
+
+static void test_pextrh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s1 = vec_load_uh (&s.v);
+  r1 = pextrh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 41);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pextrh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -40;
+  s.a[1] = -41;
+  s.a[2] = -42;
+  s.a[3] = -43;
+  s1 = vec_load_sh (&s.v);
+  r1 = pextrh_s (s1, 2);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pinsrh_0123_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  s1 = vec_load_uh (&s.v);
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  t1 = vec_load_uh (&t.v);
+  r1 = pinsrh_0_u (t1, s1);
+  r1 = pinsrh_1_u (r1, s1);
+  r1 = pinsrh_2_u (r1, s1);
+  r1 = pinsrh_3_u (r1, s1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 42);
+  assert (r.a[1] == 42);
+  assert (r.a[2] == 42);
+  assert (r.a[3] == 42);
+}
+
+static void test_pinsrh_0123_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  t1 = vec_load_sh (&t.v);
+  r1 = pinsrh_0_s (t1, s1);
+  r1 = pinsrh_1_s (r1, s1);
+  r1 = pinsrh_2_s (r1, s1);
+  r1 = pinsrh_3_s (r1, s1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == -42);
+  assert (r.a[2] == -42);
+  assert (r.a[3] == -42);
+}
+
+static void test_pmaddhw (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -5;
+  s.a[1] = -4;
+  s.a[2] = -3;
+  s.a[3] = -2;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 10;
+  t.a[1] = 11;
+  t.a[2] = 12;
+  t.a[3] = 13;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmaddhw (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == (-5*10 + -4*11));
+  assert (r.a[1] == (-3*12 + -2*13));
+}
+
+static void test_pmaxsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmaxsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 20);
+  assert (r.a[1] == 40);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 50);
+}
+
+static void test_pmaxub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pmaxub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 80);
+  assert (r.a[1] == 70);
+  assert (r.a[2] == 60);
+  assert (r.a[3] == 50);
+  assert (r.a[4] == 50);
+  assert (r.a[5] == 60);
+  assert (r.a[6] == 70);
+  assert (r.a[7] == 80);
+}
+
+static void test_pminsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  t1 = vec_load_sh (&t.v);
+  r1 = pminsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -20);
+  assert (r.a[1] == -40);
+  assert (r.a[2] == -10);
+  assert (r.a[3] == -50);
+}
+
+static void test_pminub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pminub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 20);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 40);
+  assert (r.a[4] == 40);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 20);
+  assert (r.a[7] == 10);
+}
+
+static void test_pmovmskb_u (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_t s1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0xf0;
+  s.a[1] = 0x40;
+  s.a[2] = 0xf0;
+  s.a[3] = 0x40;
+  s.a[4] = 0xf0;
+  s.a[5] = 0x40;
+  s.a[6] = 0xf0;
+  s.a[7] = 0x40;
+  s1 = vec_load_ub (&s.v);
+  r1 = pmovmskb_u (s1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmovmskb_s (void)
+{
+  int8x8_encap_t s;
+  int8x8_t s1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 1;
+  s.a[2] = -1;
+  s.a[3] = 1;
+  s.a[4] = -1;
+  s.a[5] = 1;
+  s.a[6] = -1;
+  s.a[7] = 1;
+  s1 = vec_load_sb (&s.v);
+  r1 = pmovmskb_s (s1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmulhuh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xff00;
+  s.a[1] = 0xff00;
+  s.a[2] = 0xff00;
+  s.a[3] = 0xff00;
+  s1 = vec_load_uh (&s.v);
+  t.a[0] = 16;
+  t.a[1] = 16;
+  t.a[2] = 16;
+  t.a[3] = 16;
+  t1 = vec_load_uh (&t.v);
+  r1 = pmulhuh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x000f);
+  assert (r.a[1] == 0x000f);
+  assert (r.a[2] == 0x000f);
+  assert (r.a[3] == 0x000f);
+}
+
+static void test_pmulhh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmulhh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -16);
+  assert (r.a[1] == -16);
+  assert (r.a[2] == -16);
+  assert (r.a[3] == -16);
+}
+
+static void test_pmullh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmullh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 4096);
+  assert (r.a[1] == 4096);
+  assert (r.a[2] == 4096);
+  assert (r.a[3] == 4096);
+}
+
+static void test_pmuluw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint64_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xdeadbeef;
+  s.a[1] = 0;
+  t.a[0] = 0x0f00baaa;
+  t.a[1] = 0;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pmuluw (s1, t1);
+  assert (r1 == 0xd0cd08e1d1a70b6ull);
+}
+
+static void test_pasubub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pasubub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 70);
+  assert (r.a[1] == 50);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 10);
+  assert (r.a[4] == 10);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 50);
+  assert (r.a[7] == 70);
+}
+
+static void test_biadd (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  r1 = biadd (s1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 360);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psadbh (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = psadbh (s1, t1);
+  vec_store_uh (r1, &r.v);;
+  assert (r.a[0] == 0x0140);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pshufh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s1 = vec_load_uh (&s.v);
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r1 = vec_load_uh (&r.v);
+  r1 = pshufh_u (r1, s1, 0xe5);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_pshufh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 2;
+  s.a[2] = -3;
+  s.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r1 = vec_load_sh (&r.v);
+  r1 = pshufh_s (r1, s1, 0xe5);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+}
+
+static void test_psllh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0xffff;
+  s.a[2] = 0xffff;
+  s.a[3] = 0xffff;
+  s1 = vec_load_uh (&s.v);
+  r1 = psllh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0xfffe);
+  assert (r.a[1] == 0xfffe);
+  assert (r.a[2] == 0xfffe);
+  assert (r.a[3] == 0xfffe);
+}
+
+static void test_psllw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0xffffffff;
+  s1 = vec_load_uw (&s.v);
+  r1 = psllw_u (s1, 2);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0xfffffffc);
+  assert (r.a[1] == 0xfffffffc);
+}
+
+static void test_psllh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  s1 = vec_load_sh (&s.v);
+  r1 = psllh_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -2);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == -2);
+  assert (r.a[3] == -2);
+}
+
+static void test_psllw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s1 = vec_load_sw (&s.v);
+  r1 = psllw_s (s1, 2);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -4);
+  assert (r.a[1] == -4);
+}
+
+static void test_psrah_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  s1 = vec_load_uh (&s.v);
+  r1 = psrah_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0xfff7);
+  assert (r.a[1] == 0xfff7);
+  assert (r.a[2] == 0xfff7);
+  assert (r.a[3] == 0xfff7);
+}
+
+static void test_psraw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  s1 = vec_load_uw (&s.v);
+  r1 = psraw_u (s1, 1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0xfffffff7);
+  assert (r.a[1] == 0xfffffff7);
+}
+
+static void test_psrah_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s.a[2] = -2;
+  s.a[3] = -2;
+  s1 = vec_load_sh (&s.v);
+  r1 = psrah_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == -1);
+}
+
+static void test_psraw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s1 = vec_load_sw (&s.v);
+  r1 = psraw_s (s1, 1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+}
+
+static void test_psrlh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  s1 = vec_load_uh (&s.v);
+  r1 = psrlh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x7ff7);
+  assert (r.a[1] == 0x7ff7);
+  assert (r.a[2] == 0x7ff7);
+  assert (r.a[3] == 0x7ff7);
+}
+
+static void test_psrlw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  s1 = vec_load_uw (&s.v);
+  r1 = psrlw_u (s1, 1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x7ffffff7);
+  assert (r.a[1] == 0x7ffffff7);
+}
+
+static void test_psrlh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  s1 = vec_load_sh (&s.v);
+  r1 = psrlh_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psrlw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s1 = vec_load_sw (&s.v);
+  r1 = psrlw_s (s1, 1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == INT32x2_MAX);
+  assert (r.a[1] == INT32x2_MAX);
+}
+
+static void test_psubw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 3;
+  s.a[1] = 4;
+  t.a[0] = 2;
+  t.a[1] = 1;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = psubw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = -4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = psubw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 5;
+  s.a[1] = 6;
+  s.a[2] = 7;
+  s.a[3] = 8;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = psubh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 4);
+  assert (r.a[3] == 4);
+}
+
+static void test_psubh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = psubh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+}
+
+static void test_psubb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 11;
+  s.a[2] = 12;
+  s.a[3] = 13;
+  s.a[4] = 14;
+  s.a[5] = 15;
+  s.a[6] = 16;
+  s.a[7] = 17;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = psubb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 9);
+  assert (r.a[2] == 9);
+  assert (r.a[3] == 9);
+  assert (r.a[4] == 9);
+  assert (r.a[5] == 9);
+  assert (r.a[6] == 9);
+  assert (r.a[7] == 9);
+}
+
+static void test_psubb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = psubb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+  assert (r.a[4] == -55);
+  assert (r.a[5] == -66);
+  assert (r.a[6] == -77);
+  assert (r.a[7] == -88);
+}
+
+static void test_psubd_u (void)
+{
+  uint64_t d = 789012;
+  uint64_t e = 123456;
+  uint64_t r;
+  r = psubd_u (d, e);
+  assert (r == 665556);
+}
+
+static void test_psubd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = psubd_s (d, e);
+  assert (r == 912468);
+}
+
+static void test_psubsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = -INT16x4_MAX;
+  t.a[1] = -INT16x4_MAX;
+  t.a[2] = -INT16x4_MAX;
+  t.a[3] = -INT16x4_MAX;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = psubsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psubsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = -INT8x8_MAX;
+  t.a[1] = -INT8x8_MAX;
+  t.a[2] = -INT8x8_MAX;
+  t.a[3] = -INT8x8_MAX;
+  t.a[4] = -INT8x8_MAX;
+  t.a[5] = -INT8x8_MAX;
+  t.a[6] = -INT8x8_MAX;
+  t.a[7] = -INT8x8_MAX;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = psubsb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_psubush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = psubush (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psubusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  s.a[4] = 4;
+  s.a[5] = 5;
+  s.a[6] = 6;
+  s.a[7] = 7;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  t.a[4] = 5;
+  t.a[5] = 5;
+  t.a[6] = 7;
+  t.a[7] = 7;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = psubusb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_punpckhbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = punpckhbh_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == -11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == -13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == -15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = punpckhbh_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == 11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == 13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == 15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = punpckhhw_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == -6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = punpckhhw_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 5);
+  assert (r.a[1] == 6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = -4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = punpckhwd_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == -4);
+}
+
+static void test_punpckhwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = punpckhwd_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+}
+
+static void test_punpcklbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = punpcklbh_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == -5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == -7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = punpcklbh_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == 5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == 7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = punpcklhw_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = punpcklhw_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = punpcklwd_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == -2);
+}
+
+static void test_punpcklwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = punpcklwd_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+}
+
+int main (void)
+{
+  test_packsswh ();
+  test_packsshb ();
+  test_packushb ();
+  test_paddw_u ();
+  test_paddw_s ();
+  test_paddh_u ();
+  test_paddh_s ();
+  test_paddb_u ();
+  test_paddb_s ();
+  test_paddd_u ();
+  test_paddd_s ();
+  test_paddsh ();
+  test_paddsb ();
+  test_paddush ();
+  test_paddusb ();
+  test_pandn_ud ();
+  test_pandn_sd ();
+  test_pandn_uw ();
+  test_pandn_sw ();
+  test_pandn_uh ();
+  test_pandn_sh ();
+  test_pandn_ub ();
+  test_pandn_sb ();
+  test_pavgh ();
+  test_pavgb ();
+  test_pcmpeqw_u ();
+  test_pcmpeqh_u ();
+  test_pcmpeqb_u ();
+  test_pcmpeqw_s ();
+  test_pcmpeqh_s ();
+  test_pcmpeqb_s ();
+  test_pcmpgtw_u ();
+  test_pcmpgth_u ();
+  test_pcmpgtb_u ();
+  test_pcmpgtw_s ();
+  test_pcmpgth_s ();
+  test_pcmpgtb_s ();
+  test_pextrh_u ();
+  test_pextrh_s ();
+  test_pinsrh_0123_u ();
+  test_pinsrh_0123_s ();
+  test_pmaddhw ();
+  test_pmaxsh ();
+  test_pmaxub ();
+  test_pminsh ();
+  test_pminub ();
+  test_pmovmskb_u ();
+  test_pmovmskb_s ();
+  test_pmulhuh ();
+  test_pmulhh ();
+  test_pmullh ();
+  test_pmuluw ();
+  test_pasubub ();
+  test_biadd ();
+  test_psadbh ();
+  test_pshufh_u ();
+  test_pshufh_s ();
+  test_psllh_u ();
+  test_psllw_u ();
+  test_psllh_s ();
+  test_psllw_s ();
+  test_psrah_u ();
+  test_psraw_u ();
+  test_psrah_s ();
+  test_psraw_s ();
+  test_psrlh_u ();
+  test_psrlw_u ();
+  test_psrlh_s ();
+  test_psrlw_s ();
+  test_psubw_u ();
+  test_psubw_s ();
+  test_psubh_u ();
+  test_psubh_s ();
+  test_psubb_u ();
+  test_psubb_s ();
+  test_psubd_u ();
+  test_psubd_s ();
+  test_psubsh ();
+  test_psubsb ();
+  test_psubush ();
+  test_psubusb ();
+  test_punpckhbh_s ();
+  test_punpckhbh_u ();
+  test_punpckhhw_s ();
+  test_punpckhhw_u ();
+  test_punpckhwd_s ();
+  test_punpckhwd_u ();
+  test_punpcklbh_s ();
+  test_punpcklbh_u ();
+  test_punpcklhw_s ();
+  test_punpcklhw_u ();
+  test_punpcklwd_s ();
+  test_punpcklwd_u ();
+  return 0;
+}
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 62)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
     } "-mfpu=neon -mfloat-abi=softfp"]
 }
 
+# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+# the Loongson vector modes.
+
+proc check_effective_target_mips_loongson { } {
+    return [check_no_compiler_messages loongson assembly {
+	#if !defined(_MIPS_LOONGSON_VECTOR_MODES)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 62)
+++ gcc/config.gcc	(working copy)
@@ -345,6 +345,7 @@ m68k-*-*)
 mips*-*-*)
 	cpu_type=mips
 	need_64bit_hwint=yes
+	extra_headers="loongson.h"
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
Index: gcc/config/mips/loongson.md
===================================================================
--- gcc/config/mips/loongson.md	(revision 0)
+++ gcc/config/mips/loongson.md	(revision 0)
@@ -0,0 +1,575 @@
+;; Machine description for ST Microelectronics Loongson-2E/2F.
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Mode iterators and attributes.
+
+;; 64-bit vectors of bytes.
+(define_mode_iterator VB [V8QI])
+
+;; 64-bit vectors of halfwords.
+(define_mode_iterator VH [V4HI])
+
+;; 64-bit vectors of words.
+(define_mode_iterator VW [V2SI])
+
+;; 64-bit vectors of halfwords and bytes.
+(define_mode_iterator VHB [V4HI V8QI])
+
+;; 64-bit vectors of words and halfwords.
+(define_mode_iterator VWH [V2SI V4HI])
+
+;; 64-bit vectors of words, halfwords and bytes.
+(define_mode_iterator VWHB [V2SI V4HI V8QI])
+
+;; 64-bit vectors of words, halfwords and bytes; and DImode.
+(define_mode_iterator VWHBDI [V2SI V4HI V8QI DI])
+
+;; The Loongson instruction suffixes corresponding to the modes in the
+;; VWHB iterator.
+(define_mode_attr V_suffix [(V2SI "w") (V4HI "h") (V8QI "b")])
+
+;; Given a vector type T, the mode of a vector half the size of T
+;; and with the same number of elements.
+(define_mode_attr V_squash [(V2SI "V2HI") (V4HI "V4QI")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with half as many elements.
+(define_mode_attr V_stretch_half [(V2SI "DI") (V4HI "V2SI") (V8QI "V4HI")])
+
+;; The Loongson instruction suffixes corresponding to the transformation
+;; expressed by V_stretch_half.
+(define_mode_attr V_stretch_half_suffix [(V2SI "wd") (V4HI "hw") (V8QI "bh")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with twice as many elements.
+(define_mode_attr V_squash_double [(V2SI "V4HI") (V4HI "V8QI")])
+
+;; The Loongson instruction suffixes corresponding to the conversions
+;; specified by V_half_width.
+(define_mode_attr V_squash_double_suffix [(V2SI "wh") (V4HI "hb")])
+
+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand")
+	(match_operand:VWHB 1 "move_operand"))]
+  ""
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})
+
+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,r,f,r,m,f")
+	(match_operand:VWHB 1 "move_operand" "f,m,f,f,r,r,YG,YG"))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+{
+  return mips_output_move (operands[0], operands[1]);
+}
+  [(set_attr "type" "fpstore,fpload,*,mfc,mtc,*,fpstore,mtc")
+   (set_attr "mode" "<MODE>")])
+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+	(match_operand 1 "" ""))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  {
+    mips_expand_vector_init (operands[0], operands[1]);
+    DONE;
+  }
+)
+
+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	  (ss_truncate:<V_squash> (match_operand:VWH 1 "register_operand" "f"))
+          (ss_truncate:<V_squash> (match_operand:VWH 2 "register_operand" "f")))
+	
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packss<V_squash_double_suffix>\t%0,%1,%2"
+)
+
+;; Pack with unsigned saturation.
+(define_insn "vec_pack_usat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	  (us_truncate:<V_squash> (match_operand:VH 1 "register_operand" "f"))
+          (us_truncate:<V_squash> (match_operand:VH 2 "register_operand" "f"))
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packus<V_squash_double_suffix>\t%0,%1,%2"
+)
+
+;; Addition, treating overflow by wraparound.
+(define_insn "add<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (plus:VWHB
+          (match_operand:VWHB 1 "register_operand" "f")
+          (match_operand:VWHB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padd<V_suffix>\t%0,%1,%2"
+)
+
+;; Addition of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "paddd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (plus:DI
+          (match_operand:DI 1 "register_operand" "f")
+          (match_operand:DI 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddd\t%0,%1,%2"
+)
+
+;; Addition, treating overflow by signed saturation.
+(define_insn "ssadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_plus:VHB
+          (match_operand:VHB 1 "register_operand" "f")
+          (match_operand:VHB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padds<V_suffix>\t%0,%1,%2"
+)
+
+;; Addition, treating overflow by unsigned saturation.
+(define_insn "usadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_plus:VHB
+          (match_operand:VHB 1 "register_operand" "f")
+          (match_operand:VHB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddus<V_suffix>\t%0,%1,%2")
+
+;; Logical AND NOT.
+(define_insn "loongson_and_not_<mode>"
+  [(set (match_operand:VWHBDI 0 "register_operand" "=f")
+        (and:VWHBDI
+          (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
+          (match_operand:VWHBDI 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pandn\t%0,%1,%2"
+)
+
+;; Average.
+(define_insn "loongson_average_<mode>"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (unspec:VHB
+          [(match_operand:VHB 1 "register_operand" "f")
+           (match_operand:VHB 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_AVERAGE
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pavg<V_suffix>\t%0,%1,%2"
+)
+
+;; Equality test.
+(define_insn "loongson_eq_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB
+          [(match_operand:VWHB 1 "register_operand" "f")
+           (match_operand:VWHB 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_EQ
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpeq<V_suffix>\t%0,%1,%2"
+)
+
+;; Greater-than test.
+(define_insn "loongson_gt_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB
+          [(match_operand:VWHB 1 "register_operand" "f")
+           (match_operand:VWHB 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_GT
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpgt<V_suffix>\t%0,%1,%2"
+)
+
+;; Extract halfword.
+(define_insn "loongson_extract_halfword"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:SI 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_EXTRACT_HALFWORD
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pextr<V_suffix>\t%0,%1,%2"
+)
+
+;; Insert halfword.
+(define_insn "loongson_insert_halfword_0"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_INSERT_HALFWORD_0
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_0\t%0,%1,%2"
+)
+
+(define_insn "loongson_insert_halfword_1"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_INSERT_HALFWORD_1))
+  ]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_1\t%0,%1,%2"
+)
+
+(define_insn "loongson_insert_halfword_2"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_INSERT_HALFWORD_2
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_2\t%0,%1,%2"
+)
+
+(define_insn "loongson_insert_halfword_3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_INSERT_HALFWORD_3
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_3\t%0,%1,%2"
+)
+
+;; Multiply and add packed integers.
+(define_insn "loongson_mult_add"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half>
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_MULT_ADD
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2"
+)
+
+;; Maximum of signed halfwords.
+(define_insn "smax<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smax:VH
+          (match_operand:VH 1 "register_operand" "f")
+          (match_operand:VH 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxs<V_suffix>\t%0,%1,%2"
+)
+
+;; Maximum of unsigned bytes.
+(define_insn "umax<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umax:VB
+          (match_operand:VB 1 "register_operand" "f")
+          (match_operand:VB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxu<V_suffix>\t%0,%1,%2"
+)
+
+;; Minimum of signed halfwords.
+(define_insn "smin<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smin:VH
+          (match_operand:VH 1 "register_operand" "f")
+          (match_operand:VH 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmins<V_suffix>\t%0,%1,%2"
+)
+
+;; Minimum of unsigned bytes.
+(define_insn "umin<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umin:VB
+          (match_operand:VB 1 "register_operand" "f")
+          (match_operand:VB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pminu<V_suffix>\t%0,%1,%2"
+)
+
+;; Move byte mask.
+(define_insn "loongson_move_byte_mask"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB
+          [(match_operand:VB 1 "register_operand" "f")]
+	  UNSPEC_LOONGSON_MOVE_BYTE_MASK
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmovmsk<V_suffix>\t%0,%1"
+)
+
+;; Multiply unsigned integers and store high result.
+(define_insn "umul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_UMUL_HIGHPART
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulhu<V_suffix>\t%0,%1,%2"
+)
+
+;; Multiply signed integers and store high result.
+(define_insn "smul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_SMUL_HIGHPART
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulh<V_suffix>\t%0,%1,%2"
+)
+
+;; Multiply signed integers and store low result.
+(define_insn "loongson_smul_lowpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "f")
+           (match_operand:VH 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_SMUL_LOWPART
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmull<V_suffix>\t%0,%1,%2"
+)
+
+;; Multiply unsigned word integers.
+(define_insn "loongson_umul_word"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI
+          [(match_operand:VW 1 "register_operand" "f")
+           (match_operand:VW 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_UMUL_WORD
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulu<V_suffix>\t%0,%1,%2"
+)
+
+;; Absolute difference.
+(define_insn "loongson_pasubub"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB
+          [(match_operand:VB 1 "register_operand" "f")
+	   (match_operand:VB 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_PASUBUB
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2"
+)
+
+;; Sum of unsigned byte integers.
+(define_insn "reduc_uplus_<mode>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half>
+          [(match_operand:VB 1 "register_operand" "f")]
+	  UNSPEC_LOONGSON_BIADD
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "biadd\t%0,%1"
+)
+
+;; Sum of absolute differences.
+(define_insn "loongson_psadbh"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half>
+          [(match_operand:VB 1 "register_operand" "f")
+           (match_operand:VB 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_PSADBH
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2;biadd\t%0,%0"
+)
+
+;; Shuffle halfwords.
+(define_insn "loongson_pshufh"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH
+          [(match_operand:VH 1 "register_operand" "0")
+	   (match_operand:VH 2 "register_operand" "f")
+           (match_operand:SI 3 "register_operand" "f")]
+	  UNSPEC_LOONGSON_PSHUFH
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pshufh\t%0,%2,%3"
+)
+
+;; Shift left logical.
+(define_insn "loongson_psll<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashift:VWH
+          (match_operand:VWH 1 "register_operand" "f")
+	  (match_operand:SI 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psll<V_suffix>\t%0,%1,%2"
+)
+
+;; Shift right arithmetic.
+(define_insn "loongson_psra<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashiftrt:VWH
+          (match_operand:VWH 1 "register_operand" "f")
+	  (match_operand:SI 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psra<V_suffix>\t%0,%1,%2"
+)
+
+;; Shift right logical.
+(define_insn "loongson_psrl<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (lshiftrt:VWH
+          (match_operand:VWH 1 "register_operand" "f")
+	  (match_operand:SI 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psrl<V_suffix>\t%0,%1,%2"
+)
+
+;; Subtraction, treating overflow by wraparound.
+(define_insn "sub<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (minus:VWHB
+          (match_operand:VWHB 1 "register_operand" "f")
+          (match_operand:VWHB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psub<V_suffix>\t%0,%1,%2"
+)
+
+;; Subtraction of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "psubd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (minus:DI
+          (match_operand:DI 1 "register_operand" "f")
+          (match_operand:DI 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubd\t%0,%1,%2"
+)
+
+;; Subtraction, treating overflow by signed saturation.
+(define_insn "sssub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_minus:VHB
+          (match_operand:VHB 1 "register_operand" "f")
+          (match_operand:VHB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubs<V_suffix>\t%0,%1,%2"
+)
+
+;; Subtraction, treating overflow by unsigned saturation.
+(define_insn "ussub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_minus:VHB
+          (match_operand:VHB 1 "register_operand" "f")
+          (match_operand:VHB 2 "register_operand" "f")
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubus<V_suffix>\t%0,%1,%2"
+)
+
+;; Unpack high data.
+(define_insn "vec_interleave_high<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB
+          [(match_operand:VWHB 1 "register_operand" "f")
+           (match_operand:VWHB 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_UNPACK_HIGH
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2"
+)
+
+;; Unpack low data.
+(define_insn "vec_interleave_low<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB
+          [(match_operand:VWHB 1 "register_operand" "f")
+           (match_operand:VWHB 2 "register_operand" "f")]
+	  UNSPEC_LOONGSON_UNPACK_LOW
+	)
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2"
+)
Index: gcc/config/mips/mips-ftypes.def
===================================================================
--- gcc/config/mips/mips-ftypes.def	(revision 62)
+++ gcc/config/mips/mips-ftypes.def	(working copy)
@@ -66,6 +66,24 @@ DEF_MIPS_FTYPE (1, (SF, SF))
 DEF_MIPS_FTYPE (2, (SF, SF, SF))
 DEF_MIPS_FTYPE (1, (SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (UDI, UDI, UDI))
+DEF_MIPS_FTYPE (2, (UDI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UQI))
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV4HI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV8QI, UV8QI))
+
+DEF_MIPS_FTYPE (2, (UV8QI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV8QI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV8QI, UV8QI, UV8QI))
+
 DEF_MIPS_FTYPE (1, (V2HI, SI))
 DEF_MIPS_FTYPE (2, (V2HI, SI, SI))
 DEF_MIPS_FTYPE (3, (V2HI, SI, SI, SI))
@@ -81,12 +99,27 @@ DEF_MIPS_FTYPE (2, (V2SF, V2SF, V2SF))
 DEF_MIPS_FTYPE (3, (V2SF, V2SF, V2SF, INT))
 DEF_MIPS_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, UQI))
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V2SI, V4HI, V4HI))
+
+DEF_MIPS_FTYPE (2, (V4HI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, USI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, V4HI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, USI))
+
 DEF_MIPS_FTYPE (1, (V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V2HI, V2HI))
 DEF_MIPS_FTYPE (1, (V4QI, V4QI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, V4QI))
 
+DEF_MIPS_FTYPE (2, (V8QI, V4HI, V4HI))
+DEF_MIPS_FTYPE (1, (V8QI, V8QI))
+DEF_MIPS_FTYPE (2, (V8QI, V8QI, V8QI))
+
 DEF_MIPS_FTYPE (2, (VOID, SI, SI))
 DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI))
 DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI))
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 62)
+++ gcc/config/mips/mips.md	(working copy)
@@ -213,6 +213,28 @@
    (UNSPEC_DPAQX_SA_W_PH	446)
    (UNSPEC_DPSQX_S_W_PH		447)
    (UNSPEC_DPSQX_SA_W_PH	448)
+
+   ;; ST Microelectronics Loongson-2E/2F.
+   (UNSPEC_LOONGSON_AVERAGE		500)
+   (UNSPEC_LOONGSON_EQ			501)
+   (UNSPEC_LOONGSON_GT			502)
+   (UNSPEC_LOONGSON_EXTRACT_HALFWORD	503)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_0	504)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_1	505)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_2	506)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_3	507)
+   (UNSPEC_LOONGSON_MULT_ADD		508)
+   (UNSPEC_LOONGSON_MOVE_BYTE_MASK	509)
+   (UNSPEC_LOONGSON_UMUL_HIGHPART	510)
+   (UNSPEC_LOONGSON_SMUL_HIGHPART	511)
+   (UNSPEC_LOONGSON_SMUL_LOWPART	512)
+   (UNSPEC_LOONGSON_UMUL_WORD		513)
+   (UNSPEC_LOONGSON_PASUBUB             514)
+   (UNSPEC_LOONGSON_BIADD		515)
+   (UNSPEC_LOONGSON_PSADBH		516)
+   (UNSPEC_LOONGSON_PSHUFH		517)
+   (UNSPEC_LOONGSON_UNPACK_HIGH		518)
+   (UNSPEC_LOONGSON_UNPACK_LOW		519)
   ]
 )
 
@@ -304,7 +326,7 @@
 	(const_string "unknown")))
 
 ;; Main data type used by the insn
-(define_attr "mode" "unknown,none,QI,HI,SI,DI,SF,DF,FPSW"
+(define_attr "mode" "unknown,none,QI,HI,SI,DI,SF,DF,FPSW,V2SI,V4HI,V8QI"
   (const_string "unknown"))
 
 ;; Mode for conversion types (fcvt)
@@ -494,7 +516,10 @@
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
-  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
+  [(DI "!TARGET_64BIT") (DF "!TARGET_64BIT")
+   (V2SF "!TARGET_64BIT && TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "HAVE_LOONGSON_VECTOR_MODES") (V4HI "HAVE_LOONGSON_VECTOR_MODES")
+   (V8QI "HAVE_LOONGSON_VECTOR_MODES")])
 
 ;; This mode iterator allows the QI and HI extension patterns to be
 ;; defined from the same template.
@@ -516,9 +541,13 @@
 ;; A floating-point mode for which moves involving FPRs may need to be split.
 (define_mode_iterator SPLITF
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
-   (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
+   (DI "(!TARGET_64BIT && TARGET_DOUBLE_FLOAT)
+        || (HAVE_LOONGSON_VECTOR_MODES && !TARGET_FLOAT64)")
    (V2SF "!TARGET_64BIT && TARGET_PAIRED_SINGLE_FLOAT")
-   (TF "TARGET_64BIT && TARGET_FLOAT64")])
+   (TF "TARGET_64BIT && TARGET_FLOAT64")
+   (V2SI "HAVE_LOONGSON_VECTOR_MODES && !TARGET_FLOAT64")
+   (V4HI "HAVE_LOONGSON_VECTOR_MODES && !TARGET_FLOAT64")
+   (V8QI "HAVE_LOONGSON_VECTOR_MODES && !TARGET_FLOAT64")])
 
 ;; In GPR templates, a string like "<d>subu" will expand to "subu" in the
 ;; 32-bit version and "dsubu" in the 64-bit version.
@@ -570,7 +599,8 @@
 
 ;; This attribute gives the integer mode that has half the size of
 ;; the controlling mode.
-(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")])
+(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")
+                            (V2SI "SI") (V4HI "SI") (V8QI "SI")])
 
 ;; This attribute works around the early SB-1 rev2 core "F2" erratum:
 ;;
@@ -3454,15 +3484,15 @@
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*movdi_32bit"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=d,d,d,m,*a,*d,*B*C*D,*B*C*D,*d,*m")
-	(match_operand:DI 1 "move_operand" "d,i,m,d,*J*d,*a,*d,*m,*B*C*D,*B*C*D"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=d,d,d,m,*a,*d,*B*C*D,*B*C*D,*d,*m,*d,*f,*f,*f,*m")
+	(match_operand:DI 1 "move_operand" "d,i,m,d,*J*d,*a,*d,*m,*B*C*D,*B*C*D,*f,*d,*f,*m,*f"))]
   "!TARGET_64BIT && !TARGET_FLOAT64 && !TARGET_MIPS16
    && (register_operand (operands[0], DImode)
        || reg_or_0_operand (operands[1], DImode))"
   { return mips_output_move (operands[0], operands[1]); }
-  [(set_attr "type"	"multi,multi,load,store,mthilo,mfhilo,mtc,load,mfc,store")
+  [(set_attr "type" "multi,multi,load,store,mthilo,mfhilo,mtc,load,mfc,store,multi,multi,fmove,multi,multi")
    (set_attr "mode"	"DI")
-   (set_attr "length"   "8,16,*,*,8,8,8,*,8,*")])
+   (set_attr "length"   "8,16,*,*,8,8,8,*,8,*,8,8,*,8,8")])
 
 (define_insn "*movdi_gp32_fp64"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=d,d,d,m,*a,*d,*f,*f,*d,*m")
@@ -4090,8 +4120,7 @@
 (define_split
   [(set (match_operand:MOVE64 0 "nonimmediate_operand")
 	(match_operand:MOVE64 1 "move_operand"))]
-  "reload_completed && !TARGET_64BIT
-   && mips_split_64bit_move_p (operands[0], operands[1])"
+  "reload_completed && mips_split_64bit_move_p (operands[0], operands[1])"
   [(const_int 0)]
 {
   mips_split_doubleword_move (operands[0], operands[1]);
@@ -6381,3 +6410,6 @@
 
 ; MIPS fixed-point instructions.
 (include "mips-fixed.md")
+
+; ST-Microelectronics Loongson-2E/2F-specific patterns.
+(include "loongson.md")
Index: gcc/config/mips/mips-protos.h
===================================================================
--- gcc/config/mips/mips-protos.h	(revision 62)
+++ gcc/config/mips/mips-protos.h	(working copy)
@@ -302,4 +302,6 @@ union mips_gen_fn_ptrs
 extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 				     rtx, rtx, rtx, rtx);
 
+extern void mips_expand_vector_init (rtx, rtx);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
Index: gcc/config/mips/loongson.h
===================================================================
--- gcc/config/mips/loongson.h	(revision 0)
+++ gcc/config/mips/loongson.h	(revision 0)
@@ -0,0 +1,769 @@
+/* Intrinsics for ST Microelectronics Loongson-2E/2F SIMD operations.
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_LOONGSON_H
+#define _GCC_LOONGSON_H
+
+#if !defined(_MIPS_LOONGSON_VECTOR_MODES)
+# error "You must select -march=loongson2e or -march=loongson2f to use loongson.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Vectors of unsigned bytes, halfwords and words.  */
+typedef uint8_t uint8x8_t __attribute__ ((__vector_size__ (8)));
+typedef uint16_t uint16x4_t __attribute__ ((__vector_size__ (8)));
+typedef uint32_t uint32x2_t __attribute__ ((__vector_size__ (8)));
+
+/* Vectors of signed bytes, halfwords and words.  */
+typedef int8_t int8x8_t __attribute__ ((__vector_size__ (8)));
+typedef int16_t int16x4_t __attribute__ ((__vector_size__ (8)));
+typedef int32_t int32x2_t __attribute__ ((__vector_size__ (8)));
+
+/* Helpers for loading and storing vectors.  */
+
+/* Load from memory.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vec_load_uw (uint32x2_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vec_load_uh (uint16x4_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vec_load_ub (uint8x8_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vec_load_sw (int32x2_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vec_load_sh (int16x4_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vec_load_sb (int8x8_t *src)
+{
+  return *src;
+}
+
+/* Store to memory.  */
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_uw (uint32x2_t v, uint32x2_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_uh (uint16x4_t v, uint16x4_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_ub (uint8x8_t v, uint8x8_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sw (int32x2_t v, int32x2_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sh (int16x4_t v, int16x4_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sb (int8x8_t v, int8x8_t *dest)
+{
+  *dest = v;
+}
+
+/* SIMD intrinsics.
+   Unless otherwise noted, calls to the functions below will expand into
+   precisely one machine instruction, modulo any moves required to
+   satisfy register allocation constraints.  */
+
+/* Pack with signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+packsswh (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_packsswh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+packsshb (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_packsshb (s, t);
+}
+
+/* Pack with unsigned saturation.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+packushb (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_packushb (s, t);
+}
+
+/* Vector addition, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+paddw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_paddw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+paddw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_paddw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddb_s (s, t);
+}
+
+/* Addition of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+paddd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_paddd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+paddd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_paddd_s (s, t);
+}
+
+/* Vector addition, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddsb (s, t);
+}
+
+/* Vector addition, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddusb (s, t);
+}
+
+/* Logical AND NOT.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+pandn_ud (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_pandn_ud (s, t);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pandn_uw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pandn_uw (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pandn_uh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pandn_uh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pandn_ub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pandn_ub (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pandn_sd (int64_t s, int64_t t)
+{
+  return __builtin_loongson_pandn_sd (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pandn_sw (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pandn_sw (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pandn_sh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pandn_sh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pandn_sb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pandn_sb (s, t);
+}
+
+/* Average.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pavgh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pavgh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pavgb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pavgb (s, t);
+}
+
+/* Equality test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_s (s, t);
+}
+
+/* Greater-than test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpgth_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpgth_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_s (s, t);
+}
+
+/* Extract halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pextrh_u (uint16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_u (s, field);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pextrh_s (int16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_s (s, field);
+}
+
+/* Insert halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_u (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_s (s, t);
+}
+
+/* Multiply and add.  */
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pmaddhw (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaddhw (s, t);
+}
+
+/* Maximum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmaxsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaxsh (s, t);
+}
+
+/* Maximum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmaxub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pmaxub (s, t);
+}
+
+/* Minimum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pminsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pminsh (s, t);
+}
+
+/* Minimum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pminub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pminub (s, t);
+}
+
+/* Move byte mask.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmovmskb_u (uint8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_u (s);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pmovmskb_s (int8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_s (s);
+}
+
+/* Multiply unsigned integers and store high result.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pmulhuh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pmulhuh (s, t);
+}
+
+/* Multiply signed integers and store high result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmulhh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmulhh (s, t);
+}
+
+/* Multiply signed integers and store low result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmullh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmullh (s, t);
+}
+
+/* Multiply unsigned word integers.  */
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pmuluw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pmuluw (s, t);
+}
+
+/* Absolute difference.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pasubub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pasubub (s, t);
+}
+
+/* Sum of unsigned byte integers.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+biadd (uint8x8_t s)
+{
+  return __builtin_loongson_biadd (s);
+}
+
+/* Sum of absolute differences.
+   Note that this intrinsic expands into two machine instructions:
+   PASUBUB followed by BIADD.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psadbh (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psadbh (s, t);
+}
+
+/* Shuffle halfwords.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_u (dest, s, order);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_s (dest, s, order);
+}
+
+/* Shift left logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psllh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psllh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psllw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psllw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_s (s, amount);
+}
+
+/* Shift right logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrlh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrlh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psrlw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psrlw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_s (s, amount);
+}
+
+/* Shift right arithmetic.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrah_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrah_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psraw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psraw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_s (s, amount);
+}
+
+/* Vector subtraction, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psubw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_psubw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psubw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_psubw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubb_s (s, t);
+}
+
+/* Subtraction of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+psubd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_psubd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+psubd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_psubd_s (s, t);
+}
+
+/* Vector subtraction, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubsb (s, t);
+}
+
+/* Vector subtraction, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubusb (s, t);
+}
+
+/* Unpack high data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpckhwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpckhhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpckhbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpckhwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpckhhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpckhbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_s (s, t);
+}
+
+/* Unpack low data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpcklwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpcklhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpcklbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpcklwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpcklhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpcklbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_s (s, t);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 62)
+++ gcc/config/mips/mips.c	(working copy)
@@ -3518,6 +3518,23 @@ mips_output_move (rtx dest, rtx src)
   if (dbl_p && mips_split_64bit_move_p (dest, src))
     return "#";
 
+  /* Handle cases where the source is a constant zero vector on
+     Loongson targets.  */
+  if (HAVE_LOONGSON_VECTOR_MODES && src_code == CONST_VECTOR)
+    {
+      if (dest_code == REG)
+	{
+	  /* Move constant zero vector to floating-point register.  */
+	  gcc_assert (FP_REG_P (REGNO (dest)));
+	  return "dmtc1\t$0,%0";
+	}
+      else if (dest_code == MEM)
+	/* Move constant zero vector to memory.  */
+	return "sd\t$0,%0";
+      else
+	gcc_unreachable ();
+    }
+
   if ((src_code == REG && GP_REG_P (REGNO (src)))
       || (!TARGET_MIPS16 && src == CONST0_RTX (mode)))
     {
@@ -8844,6 +8861,15 @@ mips_hard_regno_mode_ok_p (unsigned int 
       if (mode == TFmode && ISA_HAS_8CC)
 	return true;
 
+      /* Allow 64-bit vector modes for Loongson-2E/2F.  */
+      if (HAVE_LOONGSON_VECTOR_MODES
+	  && (mode == V2SImode
+	      || mode == V4HImode
+	      || mode == V8QImode
+	      /* In O32 mode, pairs of FP regs can hold DI mode.  */
+	      || mode == DImode))
+	return true;
+
       if (class == MODE_FLOAT
 	  || class == MODE_COMPLEX_FLOAT
 	  || class == MODE_VECTOR_FLOAT)
@@ -9190,6 +9216,11 @@ mips_vector_mode_supported_p (enum machi
     case V4UQQmode:
       return TARGET_DSP;
 
+    case V2SImode:
+    case V4HImode:
+    case V8QImode:
+      return HAVE_LOONGSON_VECTOR_MODES;
+
     default:
       return false;
     }
@@ -10310,6 +10341,213 @@ static const struct mips_builtin_descrip
   DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2)
 };
 
+/* Define a Loongson MIPS_BUILTIN_DIRECT function for instruction
+   CODE_FOR_mips_<INSN>.  FUNCTION_TYPE and TARGET_FLAGS are
+   builtin_description fields.  */
+#define LOONGSON_BUILTIN(FN_NAME, INSN, FUNCTION_TYPE)		\
+  { CODE_FOR_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
+    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, 0 }
+
+/* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
+static const struct mips_builtin_description mips_loongson_2ef_bdesc [] =
+{
+  /* Pack with signed saturation.  */
+  LOONGSON_BUILTIN (packsswh, vec_pack_ssat_v2si,
+                    MIPS_V4HI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (packsshb, vec_pack_ssat_v4hi,
+                    MIPS_V8QI_FTYPE_V4HI_V4HI),
+  /* Pack with unsigned saturation.  */
+  LOONGSON_BUILTIN (packushb, vec_pack_usat_v4hi,
+                    MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
+  /* Vector addition, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddw_u, addv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (paddh_u, addv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddb_u, addv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (paddw_s, addv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (paddh_s, addv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddb_s, addv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Addition of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddd_u, paddd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (paddd_s, paddd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector addition, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (paddsh, ssaddv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddsb, ssaddv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector addition, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (paddush, usaddv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddusb, usaddv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Logical AND NOT.  */
+  LOONGSON_BUILTIN (pandn_ud, loongson_and_not_di, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (pandn_uw, loongson_and_not_v2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pandn_uh, loongson_and_not_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pandn_ub, loongson_and_not_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pandn_sd, loongson_and_not_di, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (pandn_sw, loongson_and_not_v2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pandn_sh, loongson_and_not_v4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pandn_sb, loongson_and_not_v8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Average.  */
+  LOONGSON_BUILTIN (pavgh, loongson_average_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pavgb, loongson_average_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Equality test.  */
+  LOONGSON_BUILTIN (pcmpeqw_u, loongson_eq_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpeqh_u, loongson_eq_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpeqb_u, loongson_eq_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpeqw_s, loongson_eq_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpeqh_s, loongson_eq_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpeqb_s, loongson_eq_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Greater-than test.  */
+  LOONGSON_BUILTIN (pcmpgtw_u, loongson_gt_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpgth_u, loongson_gt_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpgtb_u, loongson_gt_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpgtw_s, loongson_gt_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpgth_s, loongson_gt_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpgtb_s, loongson_gt_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Extract halfword.  */
+  LOONGSON_BUILTIN (pextrh_u, loongson_extract_halfword,
+  		    MIPS_UV4HI_FTYPE_UV4HI_USI),
+  LOONGSON_BUILTIN (pextrh_s, loongson_extract_halfword,
+  		    MIPS_V4HI_FTYPE_V4HI_USI),
+  /* Insert halfword.  */
+  LOONGSON_BUILTIN (pinsrh_0_u, loongson_insert_halfword_0,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_1_u, loongson_insert_halfword_1,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_2_u, loongson_insert_halfword_2,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_3_u, loongson_insert_halfword_3,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_0_s, loongson_insert_halfword_0,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_1_s, loongson_insert_halfword_1,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_2_s, loongson_insert_halfword_2,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_3_s, loongson_insert_halfword_3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply and add.  */
+  LOONGSON_BUILTIN (pmaddhw, loongson_mult_add,
+  		    MIPS_V2SI_FTYPE_V4HI_V4HI),
+  /* Maximum of signed halfwords.  */
+  LOONGSON_BUILTIN (pmaxsh, smaxv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Maximum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pmaxub, umaxv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Minimum of signed halfwords.  */
+  LOONGSON_BUILTIN (pminsh, sminv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Minimum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pminub, uminv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Move byte mask.  */
+  LOONGSON_BUILTIN (pmovmskb_u, loongson_move_byte_mask,
+  		    MIPS_UV8QI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN (pmovmskb_s, loongson_move_byte_mask,
+  		    MIPS_V8QI_FTYPE_V8QI),
+  /* Multiply unsigned integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhuh, umulv4hi3_highpart,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  /* Multiply signed integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhh, smulv4hi3_highpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply signed integers and store low result.  */
+  LOONGSON_BUILTIN (pmullh, loongson_smul_lowpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply unsigned word integers.  */
+  LOONGSON_BUILTIN (pmuluw, loongson_umul_word,
+  		    MIPS_UDI_FTYPE_UV2SI_UV2SI),
+  /* Absolute difference.  */
+  LOONGSON_BUILTIN (pasubub, loongson_pasubub,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Sum of unsigned byte integers.  */
+  LOONGSON_BUILTIN (biadd, reduc_uplus_v8qi,
+		    MIPS_UV4HI_FTYPE_UV8QI),
+  /* Sum of absolute differences.  */
+  LOONGSON_BUILTIN (psadbh, loongson_psadbh,
+  		    MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
+  /* Shuffle halfwords.  */
+  LOONGSON_BUILTIN (pshufh_u, loongson_pshufh,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
+  LOONGSON_BUILTIN (pshufh_s, loongson_pshufh,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
+  /* Shift left logical.  */
+  LOONGSON_BUILTIN (psllh_u, loongson_psllv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psllh_s, loongson_psllv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psllw_u, loongson_psllv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psllw_s, loongson_psllv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right arithmetic.  */
+  LOONGSON_BUILTIN (psrah_u, loongson_psrav4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrah_s, loongson_psrav4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psraw_u, loongson_psrav2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psraw_s, loongson_psrav2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right logical.  */
+  LOONGSON_BUILTIN (psrlh_u, loongson_psrlv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrlh_s, loongson_psrlv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psrlw_u, loongson_psrlv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psrlw_s, loongson_psrlv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Vector subtraction, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubw_u, subv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (psubh_u, subv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubb_u, subv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (psubw_s, subv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (psubh_s, subv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubb_s, subv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Subtraction of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubd_u, psubd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (psubd_s, psubd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector subtraction, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (psubsh, sssubv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubsb, sssubv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector subtraction, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (psubush, ussubv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubusb, ussubv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Unpack high data.  */
+  LOONGSON_BUILTIN (punpckhbh_u, vec_interleave_highv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpckhhw_u, vec_interleave_highv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpckhwd_u, vec_interleave_highv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpckhbh_s, vec_interleave_highv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpckhhw_s, vec_interleave_highv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpckhwd_s, vec_interleave_highv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  /* Unpack low data.  */
+  LOONGSON_BUILTIN (punpcklbh_u, vec_interleave_lowv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpcklhw_u, vec_interleave_lowv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpcklwd_u, vec_interleave_lowv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpcklbh_s, vec_interleave_lowv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpcklhw_s, vec_interleave_lowv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpcklwd_s, vec_interleave_lowv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI)
+};
+
 /* This structure describes an array of mips_builtin_description entries.  */
 struct mips_bdesc_map {
   /* The array that this entry describes.  */
@@ -10333,7 +10571,9 @@ static const struct mips_bdesc_map mips_
   { mips_sb1_bdesc, ARRAY_SIZE (mips_sb1_bdesc), PROCESSOR_SB1, 0 },
   { mips_dsp_bdesc, ARRAY_SIZE (mips_dsp_bdesc), PROCESSOR_MAX, 0 },
   { mips_dsp_32only_bdesc, ARRAY_SIZE (mips_dsp_32only_bdesc),
-    PROCESSOR_MAX, MASK_64BIT }
+    PROCESSOR_MAX, MASK_64BIT },
+  { mips_loongson_2ef_bdesc, ARRAY_SIZE (mips_loongson_2ef_bdesc),
+    PROCESSOR_MAX, 0 }
 };
 
 /* MODE is a vector mode whose elements have type TYPE.  Return the type
@@ -10355,16 +10595,27 @@ mips_builtin_vector_type (tree type, enu
 #define MIPS_ATYPE_POINTER ptr_type_node
 
 /* Standard mode-based argument types.  */
+#define MIPS_ATYPE_UQI unsigned_intQI_type_node
 #define MIPS_ATYPE_SI intSI_type_node
 #define MIPS_ATYPE_USI unsigned_intSI_type_node
 #define MIPS_ATYPE_DI intDI_type_node
+#define MIPS_ATYPE_UDI unsigned_intDI_type_node
 #define MIPS_ATYPE_SF float_type_node
 #define MIPS_ATYPE_DF double_type_node
 
 /* Vector argument types.  */
 #define MIPS_ATYPE_V2SF mips_builtin_vector_type (float_type_node, V2SFmode)
 #define MIPS_ATYPE_V2HI mips_builtin_vector_type (intHI_type_node, V2HImode)
+#define MIPS_ATYPE_V2SI mips_builtin_vector_type (intSI_type_node, V2SImode)
+#define MIPS_ATYPE_UV2SI \
+  mips_builtin_vector_type (unsigned_intSI_type_node, V2SImode)
 #define MIPS_ATYPE_V4QI mips_builtin_vector_type (intQI_type_node, V4QImode)
+#define MIPS_ATYPE_V4HI mips_builtin_vector_type (intHI_type_node, V4HImode)
+#define MIPS_ATYPE_UV4HI \
+  mips_builtin_vector_type (unsigned_intHI_type_node, V4HImode)
+#define MIPS_ATYPE_V8QI mips_builtin_vector_type (intQI_type_node, V8QImode)
+#define MIPS_ATYPE_UV8QI \
+  mips_builtin_vector_type (unsigned_intQI_type_node, V8QImode)
 
 /* MIPS_FTYPE_ATYPESN takes N MIPS_FTYPES-like type codes and lists
    their associated MIPS_ATYPEs.  */
@@ -10422,10 +10673,14 @@ mips_init_builtins (void)
        m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
        m++)
     {
+      bool loongson_p = (m->bdesc == mips_loongson_2ef_bdesc);
+
       if ((m->proc == PROCESSOR_MAX || m->proc == mips_arch)
-	  && (m->unsupported_target_flags & target_flags) == 0)
+ 	  && (m->unsupported_target_flags & target_flags) == 0
+ 	  && (!loongson_p || HAVE_LOONGSON_VECTOR_MODES))
 	for (d = m->bdesc; d < &m->bdesc[m->size]; d++)
-	  if ((d->target_flags & target_flags) == d->target_flags)
+ 	  if (((d->target_flags & target_flags) == d->target_flags)
+ 	      || loongson_p)
 	    add_builtin_function (d->name,
 				  mips_build_function_type (d->function_type),
 				  d - m->bdesc + offset,
@@ -12525,6 +12780,26 @@ mips_order_regs_for_local_alloc (void)
       reg_alloc_order[24] = 0;
     }
 }
+
+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode = GET_MODE (target);
+  enum machine_mode inner = GET_MODE_INNER (mode);
+  unsigned int i, n_elts = GET_MODE_NUNITS (mode);
+  rtx mem;
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
 \f
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(revision 62)
+++ gcc/config/mips/mips.h	(working copy)
@@ -266,6 +266,11 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
 
+/* Whether vector modes and intrinsics for ST Microelectronics
+   Loongson-2E/2F processors should be enabled.  In o32 pairs of
+   floating-point registers provide 64-bit values.  */
+#define HAVE_LOONGSON_VECTOR_MODES TARGET_LOONGSON_2EF
+
 /* True if the pre-reload scheduler should try to create chains of
    multiply-add or multiply-subtract instructions.  For example,
    suppose we have:
@@ -496,6 +501,10 @@ enum mips_code_readable_setting {
 	  builtin_define_std ("MIPSEL");				\
 	  builtin_define ("_MIPSEL");					\
 	}								\
+                                                                        \
+      /* Whether Loongson vector modes are enabled.  */                 \
+      if (HAVE_LOONGSON_VECTOR_MODES)                                   \
+        builtin_define ("_MIPS_LOONGSON_VECTOR_MODES");                 \
 									\
       /* Macros dependent on the C dialect.  */				\
       if (preprocessing_asm_p ())					\
Index: gcc/config/mips/mips-modes.def
===================================================================
--- gcc/config/mips/mips-modes.def	(revision 62)
+++ gcc/config/mips/mips-modes.def	(working copy)
@@ -26,6 +26,7 @@ RESET_FLOAT_FORMAT (DF, mips_double_form
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
+VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [MIPS][LS2][3/5] Miscellaneous instructions
  2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
  2008-05-22 17:53 ` [MIPS][LS2][1/5] Generic support Maxim Kuvyrkov
  2008-05-22 18:08 ` [MIPS][LS2][2/5] Vector intrinsics Maxim Kuvyrkov
@ 2008-05-22 18:16 ` Maxim Kuvyrkov
  2008-05-22 19:33   ` Richard Sandiford
  2008-05-22 18:22 ` [MIPS][LS2][4/5] Scheduling and tuning Maxim Kuvyrkov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-22 18:16 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Zhang Le, Eric Fisher

[-- Attachment #1: Type: text/plain, Size: 1958 bytes --]

This patch adds support for several non-MIPS 3 instructions ST Loongson 
2E/2F CPUs support.

Generally, Loongson is a MIPS3 core, but it also supports

* movn/movz for integer modes
* [n]madd/[n]msub instructions, which are analogues of respective 
instructions from MIPS4 ISA.
* A subset of paired-single float instructions of MIPS5 ISA.

Support for first two items is pretty straightforward.

To add support for Loongson paired-single float instructions the 
following was done.

To select the subset of paired-single float instructions Loongson 
supports new macro TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2 and 
mode_iterator ANYF_MIPS5_LS2 were declared.

MIPS5 .ps instructions are described with help of mode_iterator ANYF in 
mips.md:

;; This mode macro allows :ANYF to be used wherever a scalar or vector
;; floating-point mode is allowed.
(define_mode_iterator ANYF
  [(SF "TARGET_HARD_FLOAT")
   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
   (V2SF "TARGET_PAIRED_SINGLE_FLOAT")])

To facilitate declaration of .ps instructions for Loongson2 I introduced 
one more target mask MASK_PAIRED_SINGLE_FLOAT_MIPS5_LS2 which is set 
whenever TARGET_PAIRED_SINGLE_FLOAT of TARGET_LOONGSON_2EF is set.

Then I used ANYF_MIPS5_LS2 mode_iterator (see below) to describe 
instructions both MIPS5 and Loongson support and ANYF mode_macro to 
describe those which only MIPS5 ISA has.

;; This mode macro allows :ANYF_MIPS5_LS2 to be used wherever
;; a scalar or Loongson2 vector floating-point mode is allowed.
(define_mode_macro ANYF_MIPS5_LS2
  [(SF "TARGET_HARD_FLOAT")
   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
   (V2SF "TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2")])

Hence the instructions from MIPS5 ISA, which happen to be supported by 
Loongson2, are declared with ANYF_MIPS5_LS2 mode_iterator.  I don't like 
this, but can't figure out alternative way to name / describe these 
instructions.


OK for trunk?


Thanks,

Maxim Kuvyrkov
CodeSourcery

[-- Attachment #2: fsf-ls2ef-3-insns.ChangeLog --]
[-- Type: text/plain, Size: 1844 bytes --]

2008-05-22  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/mips/mips.h (ISA_HAS_CONDMOVE): Slice ISA_HAS_FP_CONDMOVE
	from it.
	(ISA_HAS_FP_CONDMOVE): New macro.
	(ISA_HAS_FP_MADD4_MSUB4, ISA_HAS_FP_MADD3_MSUB3): New macros.
	(ISA_HAS_NMADD_NMSUB): Rename to ISA_HAS_NMADD4_NMSUB4.
	(ISA_HAS_NMADD3_NMSUB3): New macro.
	* config/mips/mips.opt (mpaired-single-loongson2): New machine-specific
	option.
	* doc/invoke.texi: Document it.
	* config/mips/mips.c (mips_rtx_costs): Update.
	(override_options): Enable paired-single float instructions when
	compiling for ST Loongson 2E/2F.
	(mips_vector_mode_supported_p): Update.
	* config/mips/mips-ps-3d.md (mips_abs_ps, mips_c_cond_ps, s<cond>_ps):
	Use patterns when compiling for ST Loongson 2E/2F.
	* config/mips/mips.md (MOVECC): Don't use FP conditional moves when
	compiling for ST Loongson 2E/2F.
	(ANYF_MIPS5_LS2): New mode macro to use instead of ANYF when insn
	is eligible for either MIPS5 or ST Loongson 2E/2F.
	(add<mode>3, sub<mode>3): Use ANYF_MIPS5_LS2 instead of ANYF.
	(mulv2sf3): Use insn when compiling for ST Loongson 2E/2F.
	(madd<mode>): Rename to madd4<mode>.  Update.
	(madd3<mode>): New pattern.
	(msub<mode>): Rename to msub4<mode>.  Update.
	(msub3<mode>): New pattern.
	(nmadd<mode>): Rename to nmadd4<mode>.  Update.
	(nmadd3<mode>): New pattern.
	(nmadd<mode>_fastmath): Rename to nmadd4<mode>_fastmath.  Update.
	(nmadd3<mode>_fastmath): New pattern.
	(nmsub<mode>): Rename to nmsub4<mode>.  Update.
	(nmsub3<mode>): New pattern.
	(nmsub<mode>_fastmath): Rename to nmsub4<mode>_fastmath.  Update.
	(nmsub3<mode>_fastmath): New pattern.
	(abs<mode>2, neg<mode>2): Use ANYF_MIPS5_LS2 instead of ANYF.
	(movv2sf_hardfloat_64bit, movv2sf_hardfloat_32bit): Use insns when
	compiling for ST Loongson 2E/2F.
	(mov<SCALARF:mode>_on_<MOVECC:mode>, mov<mode>cc): Update.

[-- Attachment #3: fsf-ls2ef-3-insns.patch --]
[-- Type: text/plain, Size: 18394 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 64)
+++ gcc/doc/invoke.texi	(working copy)
@@ -12205,6 +12205,16 @@ Use (do not use) paired-single floating-
 @xref{MIPS Paired-Single Support}.  This option requires
 hardware floating-point support to be enabled.
 
+@item -mpaired-single-loongson2
+@itemx -mno-paired-single-loongson2
+@opindex mpaired-single-loongson2
+@opindex mno-paired-single-loongson2
+Use (do not use) paired-single floating-point instructions of
+ST Loongson 2E/2F CPUs.
+@xref{MIPS Paired-Single Support}.  This option can only be used
+when generating 64-bit code and requires hardware floating-point
+support to be enabled.  !!! This should be fixed by NathanS' patch.
+
 @item -mdmx
 @itemx -mno-mdmx
 @opindex mdmx
Index: gcc/config/mips/mips-ps-3d.md
===================================================================
--- gcc/config/mips/mips-ps-3d.md	(revision 64)
+++ gcc/config/mips/mips-ps-3d.md	(working copy)
@@ -295,7 +295,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_ABS_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2"
   "abs.ps\t%0,%1"
   [(set_attr "type" "fabs")
    (set_attr "mode" "SF")])
@@ -389,7 +389,7 @@
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand 3 "const_int_operand" "")]
 		     UNSPEC_C))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2"
   "c.%Y3.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -416,7 +416,7 @@
 	   [(fcond (match_operand:V2SF 1 "register_operand" "f")
 		   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2"
   "c.<fcond>.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -427,7 +427,7 @@
 	   [(swapped_fcond (match_operand:V2SF 1 "register_operand" "f")
 			   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2"
   "c.<swapped_fcond>.ps\t%0,%2,%1"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 64)
+++ gcc/config/mips/mips.md	(working copy)
@@ -512,7 +512,8 @@
 
 ;; This mode iterator allows :MOVECC to be used anywhere that a
 ;; conditional-move-type condition is needed.
-(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT") (CC "TARGET_HARD_FLOAT")])
+(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT")
+                              (CC "TARGET_HARD_FLOAT && !TARGET_LOONGSON_2EF")])
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
@@ -534,6 +535,13 @@
 			    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
 			    (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
 
+;; This mode iterator allows :ANYF_MIPS5_LS2 to be used wherever a scalar
+;; or MIPS5 / Loongson2 vector floating-point mode is allowed.
+(define_mode_iterator ANYF_MIPS5_LS2
+  [(SF "TARGET_HARD_FLOAT")
+   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
+   (V2SF "TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2")])
+
 ;; Like ANYF, but only applies to scalar modes.
 (define_mode_iterator SCALARF [(SF "TARGET_HARD_FLOAT")
 			       (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
@@ -831,9 +839,10 @@
 ;;
 
 (define_insn "add<mode>3"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-	(plus:ANYF (match_operand:ANYF 1 "register_operand" "f")
-		   (match_operand:ANYF 2 "register_operand" "f")))]
+  [(set (match_operand:ANYF_MIPS5_LS2 0 "register_operand" "=f")
+	(plus:ANYF_MIPS5_LS2
+	 (match_operand:ANYF_MIPS5_LS2 1 "register_operand" "f")
+	 (match_operand:ANYF_MIPS5_LS2 2 "register_operand" "f")))]
   ""
   "add.<fmt>\t%0,%1,%2"
   [(set_attr "type" "fadd")
@@ -1049,9 +1058,10 @@
 ;;
 
 (define_insn "sub<mode>3"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-	(minus:ANYF (match_operand:ANYF 1 "register_operand" "f")
-		    (match_operand:ANYF 2 "register_operand" "f")))]
+  [(set (match_operand:ANYF_MIPS5_LS2 0 "register_operand" "=f")
+	(minus:ANYF_MIPS5_LS2
+	 (match_operand:ANYF_MIPS5_LS2 1 "register_operand" "f")
+	 (match_operand:ANYF_MIPS5_LS2 2 "register_operand" "f")))]
   ""
   "sub.<fmt>\t%0,%1,%2"
   [(set_attr "type" "fadd")
@@ -1118,7 +1128,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(mult:V2SF (match_operand:V2SF 1 "register_operand" "f")
 		   (match_operand:V2SF 2 "register_operand" "f")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2"
   "mul.ps\t%0,%1,%2"
   [(set_attr "type" "fmul")
    (set_attr "mode" "SF")])
@@ -1898,33 +1908,55 @@
 
 ;; Floating point multiply accumulate instructions.
 
-(define_insn "*madd<mode>"
+(define_insn "*madd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "madd.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*msub<mode>"
+(define_insn "*madd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD
+   && !HONOR_NANS (<MODE>mode)"
+  "madd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*msub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			       (match_operand:ANYF 2 "register_operand" "f"))
 		    (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "msub.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>"
+(define_insn "*msub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			       (match_operand:ANYF 2 "register_operand" "f"))
+		    (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD
+   && !HONOR_NANS (<MODE>mode)"
+  "msub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (plus:ANYF
 		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1932,13 +1964,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>_fastmath"
+(define_insn "*nmadd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (plus:ANYF
+		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
 		    (match_operand:ANYF 2 "register_operand" "f"))
 	 (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1946,13 +1992,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>"
+(define_insn "*nmadd3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
+		    (match_operand:ANYF 2 "register_operand" "f"))
+	 (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (minus:ANYF
 		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 			      (match_operand:ANYF 3 "register_operand" "f"))
 		   (match_operand:ANYF 1 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1960,19 +2020,47 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>_fastmath"
+(define_insn "*nmsub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (minus:ANYF
+		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+			      (match_operand:ANYF 3 "register_operand" "f"))
+		   (match_operand:ANYF 1 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (match_operand:ANYF 1 "register_operand" "f")
 	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 		    (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
   "nmsub.<fmt>\t%0,%1,%2,%3"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (match_operand:ANYF 1 "register_operand" "f")
+	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+		    (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
 \f
 ;;
 ;;  ....................
@@ -2145,8 +2233,9 @@
 ;; abs.fmt if the signs of NaNs matter.
 
 (define_insn "abs<mode>2"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-	(abs:ANYF (match_operand:ANYF 1 "register_operand" "f")))]
+  [(set (match_operand:ANYF_MIPS5_LS2 0 "register_operand" "=f")
+	(abs:ANYF_MIPS5_LS2
+	 (match_operand:ANYF_MIPS5_LS2 1 "register_operand" "f")))]
   "!HONOR_NANS (<MODE>mode)"
   "abs.<fmt>\t%0,%1"
   [(set_attr "type" "fabs")
@@ -2201,8 +2290,9 @@
 ;; neg.fmt if the signs of NaNs matter.
 
 (define_insn "neg<mode>2"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-	(neg:ANYF (match_operand:ANYF 1 "register_operand" "f")))]
+  [(set (match_operand:ANYF_MIPS5_LS2 0 "register_operand" "=f")
+	(neg:ANYF_MIPS5_LS2
+	 (match_operand:ANYF_MIPS5_LS2 1 "register_operand" "f")))]
   "!HONOR_NANS (<MODE>mode)"
   "neg.<fmt>\t%0,%1"
   [(set_attr "type" "fneg")
@@ -4147,7 +4237,7 @@
 (define_expand "movv2sf"
   [(set (match_operand:V2SF 0)
 	(match_operand:V2SF 1))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2"
 {
   if (mips_legitimize_move (V2SFmode, operands[0], operands[1]))
     DONE;
@@ -4157,7 +4247,7 @@
   [(set (match_operand:V2SF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*d,*d,*d,*m")
 	(match_operand:V2SF 1 "move_operand" "f,YG,m,f,YG,*d,*f,*d*YG,*m,*d"))]
   "TARGET_HARD_FLOAT
-   && TARGET_PAIRED_SINGLE_FLOAT
+   && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2
    && TARGET_64BIT
    && (register_operand (operands[0], V2SFmode)
        || reg_or_0_operand (operands[1], V2SFmode))"
@@ -4170,7 +4260,7 @@
   [(set (match_operand:V2SF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*d,*d,*d,*m")
 	(match_operand:V2SF 1 "move_operand" "f,YG,m,f,YG,*d,*f,*d*YG,*m,*d"))]
   "TARGET_HARD_FLOAT
-   && TARGET_PAIRED_SINGLE_FLOAT
+   && TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2
    && !TARGET_64BIT
    && (register_operand (operands[0], V2SFmode)
        || reg_or_0_operand (operands[1], V2SFmode))"
@@ -6274,7 +6364,7 @@
 		 (const_int 0)])
 	 (match_operand:SCALARF 2 "register_operand" "f,0")
 	 (match_operand:SCALARF 3 "register_operand" "0,f")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
   "@
     mov%T4.<fmt>\t%0,%2,%1
     mov%t4.<fmt>\t%0,%3,%1"
@@ -6301,7 +6391,7 @@
 	(if_then_else:SCALARF (match_dup 5)
 			      (match_operand:SCALARF 2 "register_operand")
 			      (match_operand:SCALARF 3 "register_operand")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
 {
   mips_expand_conditional_move (operands);
   DONE;
Index: gcc/config/mips/mips.opt
===================================================================
--- gcc/config/mips/mips.opt	(revision 64)
+++ gcc/config/mips/mips.opt	(working copy)
@@ -232,6 +232,10 @@ mpaired-single
 Target Report Mask(PAIRED_SINGLE_FLOAT)
 Use paired-single floating-point instructions
 
+mpaired-single-loongson2
+Target Report Mask(PAIRED_SINGLE_FLOAT_MIPS5_LS2)
+Use paired-single floating-point instructions of ST Loongson2 2E/2F CPUs
+
 mshared
 Target Report Var(TARGET_SHARED) Init(1)
 When generating -mabicalls code, make the code suitable for use in shared libraries
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 64)
+++ gcc/config/mips/mips.c	(working copy)
@@ -3229,7 +3229,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case MINUS:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && !HONOR_SIGNED_ZEROS (mode))
@@ -3280,7 +3280,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case NEG:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && HONOR_SIGNED_ZEROS (mode))
@@ -9204,7 +9204,7 @@ mips_vector_mode_supported_p (enum machi
   switch (mode)
     {
     case V2SFmode:
-      return TARGET_PAIRED_SINGLE_FLOAT;
+      return TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2;
 
     case V2HImode:
     case V4QImode:
@@ -12573,6 +12573,11 @@ mips_override_options (void)
   if (TARGET_MIPS3D)
     target_flags |= MASK_PAIRED_SINGLE_FLOAT;
 
+  /* If TARGET_PAIRED_SINGLE_FLOAT or HAVE_LOONGSON_VECTOR_MODES,
+     enable MIPS5 / Loongson2 *.ps instructions.  */
+  if (TARGET_PAIRED_SINGLE_FLOAT || HAVE_LOONGSON_VECTOR_MODES)
+    target_flags |= MASK_PAIRED_SINGLE_FLOAT_MIPS5_LS2;
+
   /* Make sure that when TARGET_PAIRED_SINGLE_FLOAT is true, TARGET_FLOAT64
      and TARGET_HARD_FLOAT_ABI are both true.  */
   if (TARGET_PAIRED_SINGLE_FLOAT && !(TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI))
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(revision 64)
+++ gcc/config/mips/mips.h	(working copy)
@@ -741,14 +741,19 @@ enum mips_code_readable_setting {
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS16)
 
-/* ISA has the conditional move instructions introduced in mips4.  */
-#define ISA_HAS_CONDMOVE	((ISA_MIPS4				\
+/* ISA has the floating-point conditional move instructions introduced
+   in mips4.  */
+#define ISA_HAS_FP_CONDMOVE	((ISA_MIPS4				\
 				  || ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS5500			\
 				 && !TARGET_MIPS16)
 
+/* ISA has the integer conditional move instructions introduced in mips4 and
+   ST Loongson 2E/2F.  */
+#define ISA_HAS_CONDMOVE        (ISA_HAS_FP_CONDMOVE || TARGET_LOONGSON_2EF)
+
 /* ISA has LDC1 and SDC1.  */
 #define ISA_HAS_LDC1_SDC1	(!ISA_MIPS1 && !TARGET_MIPS16)
 
@@ -783,14 +788,28 @@ enum mips_code_readable_setting {
 /* Integer multiply-accumulate instructions should be generated.  */
 #define GENERATE_MADD_MSUB      (ISA_HAS_MADD_MSUB && !TUNE_74K)
 
-/* ISA has floating-point nmadd and nmsub instructions for mode MODE.  */
-#define ISA_HAS_NMADD_NMSUB(MODE) \
+/* ISA has floating-point madd and msub instructions 'd = a * b [+-] c'.  */
+#define ISA_HAS_FP_MADD4_MSUB4  ISA_HAS_FP4
+
+/* ISA has floating-point madd and msub instructions 'c [+-]= a * b'.  */
+#define ISA_HAS_FP_MADD3_MSUB3  (TARGET_LOONGSON_2EF		\
+				 && !ISA_HAS_FP_MADD4_MSUB4)
+
+/* ISA has floating-point nmadd and nmsub instructions
+   'd = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD4_NMSUB4(MODE)					\
 				((ISA_MIPS4				\
 				  || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
 				  || ISA_MIPS64)			\
 				 && (!TARGET_MIPS5400 || TARGET_MAD)	\
 				 && !TARGET_MIPS16)
 
+/* ISA has floating-point nmadd and nmsub instructions
+   'c = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD3_NMSUB3(MODE)					\
+                                (TARGET_LOONGSON_2EF			\
+				 && !ISA_HAS_NMADD4_NMSUB4 (MODE))
+
 /* ISA has count leading zeroes/ones instruction (not implemented).  */
 #define ISA_HAS_CLZ_CLO		((ISA_MIPS32				\
 				  || ISA_MIPS32R2			\

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [MIPS][LS2][4/5] Scheduling and tuning
  2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
                   ` (2 preceding siblings ...)
  2008-05-22 18:16 ` [MIPS][LS2][3/5] Miscellaneous instructions Maxim Kuvyrkov
@ 2008-05-22 18:22 ` Maxim Kuvyrkov
  2008-05-23  3:07   ` Zhang Le
  2008-05-25 11:57   ` Richard Sandiford
  2008-05-22 18:29 ` [MIPS][LS2][5/5] Support for native MIPS GCC Maxim Kuvyrkov
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-22 18:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Zhang Le, Eric Fisher

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]

Hello,

This patch adds pipeline model, scheduler hooks and tuning bits for ST 
Loongson 2E/2F CPUs.


OK for trunk?


Thanks,

Maxim Kuvyrkov
CodeSourcery

[-- Attachment #2: fsf-ls2ef-4-sched.ChangeLog --]
[-- Type: text/plain, Size: 2063 bytes --]

2008-05-22  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/mips/loongson2ef.md: New file.
	* config/mips/mips.md (UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN,
	UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN,
	UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN,
	UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN): New constants.
	(define_attr "cpu": "loongson_2e, loongson_2f"): New values.
	(loongson2ef.md): New include.
	* config/mips/loongson.md (mov<mode>_internal, vec_pack_ssat_<mode>,
	vec_pack_usat_<mode>, add<mode>3, paddd, ssadd<mode>3, usadd<mode>3,
	loongson_and_not_<mode>, loongson_average_<mode>, loongson_eq_<mode>,
	loongson_gt_<mode>, loongson_extract_halfword,
	loongson_insert_halfword_0, loongson_insert_halfword_2,
	loongson_insert_halfword_3, loongson_mult_add, smax<mode>3,
	umax<mode>3, smin<mode>3, umin<mode>3, loongson_move_byte_mask,
	umul<mode>3_highpart, smul<mode>3_highpart, loongson_smul_lowpart,
	loongson_umul_word, loongson_pasubub, reduc_uplus_<mode>,
	loongson_psadbh, loongson_pshufh, loongson_psll<mode>,
	loongson_psra<mode>, loongson_psrl<mode>, sub<mode>3, psubd,
	sssub<mode>3, ussub<mode>3, vec_interleave_high<mode>,
	vec_interleave_low<mode>): Define type attribute.
	
	* config/mips/mips.c (TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN,
	TARGET_SCHED_DFA_POST_ADVANCE_CYCLE): Override target hooks.
	(struct sched_ls2_def): New type.
	(struct machine_function: _sched_ls2): New field.
	(sched_ls2): New macro.
	(mips_sched_init): Initialize data for Loongson scheduling.
	(mips_ls2_variable_issue): New static function.
	(mips_variable_issue): Update to handle tuning for Loongson 2E/2F.
	(mips_issue_rate): Ditto.
	(mips_init_dfa_post_cycle_insn, sched_ls2_dfa_post_advance_cycle,
	mips_dfa_post_advance_cycle): Implement target scheduling hooks.
	(mips_multipass_dfa_lookahead): Update to handle tuning for
	Loongson 2E/2F.
	* config/mips/mips.h (TUNE_LOONGSON_2EF): New macros.
	(ISA_HAS_XFER_DELAY, ISA_HAS_FCMP_DELAY, ISA_HAS_HILO_INTERLOCKS):
	Handle ST Loongson 2E/2F cores.
	(CPU_UNITS_QUERY): Define macro to enable querying of DFA units.

[-- Attachment #3: fsf-ls2ef-4-sched.patch --]
[-- Type: text/plain, Size: 37299 bytes --]

Index: gcc/config/mips/loongson.md
===================================================================
--- gcc/config/mips/loongson.md	(revision 66)
+++ gcc/config/mips/loongson.md	(working copy)
@@ -85,7 +85,7 @@
 {
   return mips_output_move (operands[0], operands[1]);
 }
-  [(set_attr "type" "fpstore,fpload,*,mfc,mtc,*,fpstore,mtc")
+  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
    (set_attr "mode" "<MODE>")])
 
 ;; Initialization of a vector.
@@ -112,7 +112,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "packss<V_squash_double_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Pack with unsigned saturation.
 (define_insn "vec_pack_usat_<mode>"
@@ -124,6 +124,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "packus<V_squash_double_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")]
 )
 
 ;; Addition, treating overflow by wraparound.
@@ -136,7 +137,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "padd<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Addition of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
@@ -149,7 +150,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "paddd\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Addition, treating overflow by signed saturation.
 (define_insn "ssadd<mode>3"
@@ -161,7 +162,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "padds<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Addition, treating overflow by unsigned saturation.
 (define_insn "usadd<mode>3"
@@ -172,7 +173,8 @@
 	)
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
-  "paddus<V_suffix>\t%0,%1,%2")
+  "paddus<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Logical AND NOT.
 (define_insn "loongson_and_not_<mode>"
@@ -184,7 +186,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pandn\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Average.
 (define_insn "loongson_average_<mode>"
@@ -197,7 +199,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pavg<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Equality test.
 (define_insn "loongson_eq_<mode>"
@@ -210,7 +212,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pcmpeq<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Greater-than test.
 (define_insn "loongson_gt_<mode>"
@@ -223,7 +225,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pcmpgt<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Extract halfword.
 (define_insn "loongson_extract_halfword"
@@ -236,7 +238,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pextr<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Insert halfword.
 (define_insn "loongson_insert_halfword_0"
@@ -249,7 +251,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_0\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_insert_halfword_1"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -260,7 +262,7 @@
   ]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_1\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_insert_halfword_2"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -272,7 +274,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_2\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_insert_halfword_3"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -284,7 +286,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_3\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Multiply and add packed integers.
 (define_insn "loongson_mult_add"
@@ -297,7 +299,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmadd<V_stretch_half_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Maximum of signed halfwords.
 (define_insn "smax<mode>3"
@@ -309,7 +311,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmaxs<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Maximum of unsigned bytes.
 (define_insn "umax<mode>3"
@@ -321,7 +323,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmaxu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Minimum of signed halfwords.
 (define_insn "smin<mode>3"
@@ -333,7 +335,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmins<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Minimum of unsigned bytes.
 (define_insn "umin<mode>3"
@@ -345,7 +347,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pminu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Move byte mask.
 (define_insn "loongson_move_byte_mask"
@@ -357,7 +359,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmovmsk<V_suffix>\t%0,%1"
-)
+  [(set_attr "type" "fabs")])
 
 ;; Multiply unsigned integers and store high result.
 (define_insn "umul<mode>3_highpart"
@@ -370,7 +372,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmulhu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Multiply signed integers and store high result.
 (define_insn "smul<mode>3_highpart"
@@ -383,7 +385,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmulh<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Multiply signed integers and store low result.
 (define_insn "loongson_smul_lowpart"
@@ -396,7 +398,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmull<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Multiply unsigned word integers.
 (define_insn "loongson_umul_word"
@@ -409,7 +411,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmulu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Absolute difference.
 (define_insn "loongson_pasubub"
@@ -422,7 +424,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pasubub\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Sum of unsigned byte integers.
 (define_insn "reduc_uplus_<mode>"
@@ -434,7 +436,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "biadd\t%0,%1"
-)
+  [(set_attr "type" "fabs")])
 
 ;; Sum of absolute differences.
 (define_insn "loongson_psadbh"
@@ -447,7 +449,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pasubub\t%0,%1,%2;biadd\t%0,%0"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Shuffle halfwords.
 (define_insn "loongson_pshufh"
@@ -461,7 +463,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pshufh\t%0,%2,%3"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Shift left logical.
 (define_insn "loongson_psll<mode>"
@@ -473,7 +475,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psll<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Shift right arithmetic.
 (define_insn "loongson_psra<mode>"
@@ -485,7 +487,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psra<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Shift right logical.
 (define_insn "loongson_psrl<mode>"
@@ -497,7 +499,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psrl<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Subtraction, treating overflow by wraparound.
 (define_insn "sub<mode>3"
@@ -509,7 +511,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psub<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
@@ -522,7 +524,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psubd\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction, treating overflow by signed saturation.
 (define_insn "sssub<mode>3"
@@ -534,7 +536,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psubs<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction, treating overflow by unsigned saturation.
 (define_insn "ussub<mode>3"
@@ -546,7 +548,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psubus<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Unpack high data.
 (define_insn "vec_interleave_high<mode>"
@@ -559,7 +561,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "punpckh<V_stretch_half_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Unpack low data.
 (define_insn "vec_interleave_low<mode>"
@@ -572,4 +574,4 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "punpckl<V_stretch_half_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
Index: gcc/config/mips/loongson2ef.md
===================================================================
--- gcc/config/mips/loongson2ef.md	(revision 0)
+++ gcc/config/mips/loongson2ef.md	(revision 0)
@@ -0,0 +1,486 @@
+;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
+
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Automaton for integer instructions.
+(define_automaton "ls2_alu")
+
+;; ALU1 and ALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_alu1_core,ls2_alu2_core" "ls2_alu")
+
+;; Pseudo units to help modeling of ALU1/2 round-robin dispatch strategy.
+(define_cpu_unit "ls2_alu1_turn,ls2_alu2_turn" "ls2_alu")
+
+;; Pseudo units to enable/disable ls2_alu[12]_turn units.
+;; ls2_alu[12]_turn unit can be subscribed only after ls2_alu[12]_turn_enabled
+;; unit is subscribed.
+(define_cpu_unit "ls2_alu1_turn_enabled,ls2_alu2_turn_enabled" "ls2_alu")
+(presence_set "ls2_alu1_turn" "ls2_alu1_turn_enabled")
+(presence_set "ls2_alu2_turn" "ls2_alu2_turn_enabled")
+
+;; Reservations for ALU1 (ALU2) instructions.
+;; Instruction goes to ALU1 (ALU2) and makes next ALU1/2 instruction to
+;; be dispatched to ALU2 (ALU1).
+(define_reservation "ls2_alu1"
+  "(ls2_alu1_core+ls2_alu2_turn_enabled)|ls2_alu1_core")
+(define_reservation "ls2_alu2"
+  "(ls2_alu2_core+ls2_alu1_turn_enabled)|ls2_alu2_core")
+
+;; Reservation for ALU1/2 instructions.
+;; Instruction will go to ALU1 iff ls2_alu1_turn_enabled is subscribed and
+;; switch the turn to ALU2 by subscribing ls2_alu2_turn_enabled.
+;; Or to ALU2 otherwise.
+(define_reservation "ls2_alu"
+  "(ls2_alu1_core+ls2_alu1_turn+ls2_alu2_turn_enabled)
+   |(ls2_alu1_core+ls2_alu1_turn)
+   |(ls2_alu2_core+ls2_alu2_turn+ls2_alu1_turn_enabled)
+   |(ls2_alu2_core+ls2_alu2_turn)")
+
+;; Automaton for floating-point instructions.
+(define_automaton "ls2_falu")
+
+;; FALU1 and FALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_falu1_core,ls2_falu2_core" "ls2_falu")
+
+;; Pseudo units to help modeling of FALU1/2 round-robin dispatch strategy. 
+(define_cpu_unit "ls2_falu1_turn,ls2_falu2_turn" "ls2_falu")
+
+;; Pseudo units to enable/disable ls2_falu[12]_turn units.
+;; ls2_falu[12]_turn unit can be subscribed only after
+;; ls2_falu[12]_turn_enabled unit is subscribed.
+(define_cpu_unit "ls2_falu1_turn_enabled,ls2_falu2_turn_enabled"
+  "ls2_falu")
+(presence_set "ls2_falu1_turn" "ls2_falu1_turn_enabled")
+(presence_set "ls2_falu2_turn" "ls2_falu2_turn_enabled")
+
+;; Reservations for FALU1 (FALU2) instructions.
+;; Instruction goes to FALU1 (FALU2) and makes next FALU1/2 instruction to
+;; be dispatched to FALU2 (FALU1).
+(define_reservation "ls2_falu1"
+  "(ls2_falu1_core+ls2_falu2_turn_enabled)|ls2_falu1_core")
+(define_reservation "ls2_falu2"
+  "(ls2_falu2_core+ls2_falu1_turn_enabled)|ls2_falu2_core")
+
+;; Reservation for FALU1/2 instructions.
+;; Instruction will go to FALU1 iff ls2_falu1_turn_enabled is subscribed and
+;; switch the turn to FALU2 by subscribing ls2_falu2_turn_enabled.
+;; Or to FALU2 otherwise.
+(define_reservation "ls2_falu"
+  "(ls2_falu1+ls2_falu1_turn+ls2_falu2_turn_enabled)
+   |(ls2_falu1+ls2_falu1_turn)
+   |(ls2_falu2+ls2_falu2_turn+ls2_falu1_turn_enabled)
+   |(ls2_falu2+ls2_falu2_turn)")
+
+;; The following 4 instructions each subscribe one of
+;; ls2_[f]alu{1,2}_turn_enabled units according to this attribute.
+;; These instructions are used in mips.c: sched_ls2_dfa_post_advance_cycle.
+
+(define_attr "ls2_turn_type" "alu1,alu2,falu1,falu2,unknown"
+  (const_string "unknown"))
+
+;; Subscribe ls2_alu1_turn_enabled.
+(define_insn "ls2_alu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "alu1")])
+
+(define_insn_reservation "ls2_alu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu1")
+  "ls2_alu1_turn_enabled")
+
+;; Subscribe ls2_alu2_turn_enabled.
+(define_insn "ls2_alu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "alu2")])
+
+(define_insn_reservation "ls2_alu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu2")
+  "ls2_alu2_turn_enabled")
+
+;; Subscribe ls2_falu1_turn_enabled.
+(define_insn "ls2_falu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "falu1")])
+
+(define_insn_reservation "ls2_falu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu1")
+  "ls2_falu1_turn_enabled")
+
+;; Subscribe ls2_falu2_turn_enabled.
+(define_insn "ls2_falu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "falu2")])
+
+(define_insn_reservation "ls2_falu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu2")
+  "ls2_falu2_turn_enabled")
+
+;; Automaton for memory operations.
+(define_automaton "ls2_mem")
+
+;; Memory unit.
+(define_query_cpu_unit "ls2_mem" "ls2_mem")
+
+;; Reservation for integer instructions.
+(define_insn_reservation "ls2_alu" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "arith,condmove,const,logical,mfhilo,move,
+                        mthilo,nop,shift,signext,slt"))
+  "ls2_alu")
+
+;; Reservation for branch instructions.
+(define_insn_reservation "ls2_branch" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "branch,jump,call,trap"))
+  "ls2_alu1")
+
+;; Reservation for integer multiplication instructions.
+(define_insn_reservation "ls2_imult" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "imul,imul3"))
+  "ls2_alu2,ls2_alu2_core")
+
+;; Reservation for integer division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take 2-38 cycles.
+(define_insn_reservation "ls2_idiv" 20
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "idiv"))
+  "ls2_alu2,ls2_alu2_core*18")
+
+;; Reservation for memory load instructions.
+(define_insn_reservation "ls2_load" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "load,fpload,mfc,mtc"))
+  "ls2_mem")
+
+;; Reservation for memory store instructions.
+;; With stores we assume they don't alias with dependent loads.
+;; Therefore we set the latency to zero.
+(define_insn_reservation "ls2_store" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "store,fpstore"))
+  "ls2_mem")
+
+;; Reservation for floating-point instructions of latency 3.
+(define_insn_reservation "ls2_fp3" 3
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fabs,fneg,fcmp,fmove"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions of latency 5.
+(define_insn_reservation "ls2_fp5" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fcvt"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions that can go
+;; to either of FALU1/2 units.
+(define_insn_reservation "ls2_falu" 7
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fadd,fmul,fmadd"))
+  "ls2_falu")
+
+;; Reservation for floating-point division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; div.s takes 5-11 cycles
+;; div.d takes 5-18 cycles
+(define_insn_reservation "ls2_fdiv" 9
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fdiv"))
+  "ls2_falu2,ls2_falu2_core*7")
+
+;; Reservation for floating-point sqrt instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; sqrt.s takes 5-17 cycles
+;; sqrt.d takes 5-32 cycles
+(define_insn_reservation "ls2_fsqrt" 15
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fsqrt"))
+  "ls2_falu2,ls2_falu2_core*13")
+
+;; Two consecutive ALU instructions.
+(define_insn_reservation "ls2_multi" 4
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "multi"))
+  "(ls2_alu1,ls2_alu2_core)|(ls2_alu2,ls2_alu1_core)")
+;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
+
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Automaton for integer instructions.
+(define_automaton "ls2_alu")
+
+;; ALU1 and ALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_alu1_core,ls2_alu2_core" "ls2_alu")
+
+;; Pseudo units to help modeling of ALU1/2 round-robin dispatch strategy.
+(define_cpu_unit "ls2_alu1_turn,ls2_alu2_turn" "ls2_alu")
+
+;; Pseudo units to enable/disable ls2_alu[12]_turn units.
+;; ls2_alu[12]_turn unit can be subscribed only after ls2_alu[12]_turn_enabled
+;; unit is subscribed.
+(define_cpu_unit "ls2_alu1_turn_enabled,ls2_alu2_turn_enabled" "ls2_alu")
+(presence_set "ls2_alu1_turn" "ls2_alu1_turn_enabled")
+(presence_set "ls2_alu2_turn" "ls2_alu2_turn_enabled")
+
+;; Reservations for ALU1 (ALU2) instructions.
+;; Instruction goes to ALU1 (ALU2) and makes next ALU1/2 instruction to
+;; be dispatched to ALU2 (ALU1).
+(define_reservation "ls2_alu1"
+  "(ls2_alu1_core+ls2_alu2_turn_enabled)|ls2_alu1_core")
+(define_reservation "ls2_alu2"
+  "(ls2_alu2_core+ls2_alu1_turn_enabled)|ls2_alu2_core")
+
+;; Reservation for ALU1/2 instructions.
+;; Instruction will go to ALU1 iff ls2_alu1_turn_enabled is subscribed and
+;; switch the turn to ALU2 by subscribing ls2_alu2_turn_enabled.
+;; Or to ALU2 otherwise.
+(define_reservation "ls2_alu"
+  "(ls2_alu1_core+ls2_alu1_turn+ls2_alu2_turn_enabled)
+   |(ls2_alu1_core+ls2_alu1_turn)
+   |(ls2_alu2_core+ls2_alu2_turn+ls2_alu1_turn_enabled)
+   |(ls2_alu2_core+ls2_alu2_turn)")
+
+;; Automaton for floating-point instructions.
+(define_automaton "ls2_falu")
+
+;; FALU1 and FALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_falu1_core,ls2_falu2_core" "ls2_falu")
+
+;; Pseudo units to help modeling of FALU1/2 round-robin dispatch strategy. 
+(define_cpu_unit "ls2_falu1_turn,ls2_falu2_turn" "ls2_falu")
+
+;; Pseudo units to enable/disable ls2_falu[12]_turn units.
+;; ls2_falu[12]_turn unit can be subscribed only after
+;; ls2_falu[12]_turn_enabled unit is subscribed.
+(define_cpu_unit "ls2_falu1_turn_enabled,ls2_falu2_turn_enabled"
+  "ls2_falu")
+(presence_set "ls2_falu1_turn" "ls2_falu1_turn_enabled")
+(presence_set "ls2_falu2_turn" "ls2_falu2_turn_enabled")
+
+;; Reservations for FALU1 (FALU2) instructions.
+;; Instruction goes to FALU1 (FALU2) and makes next FALU1/2 instruction to
+;; be dispatched to FALU2 (FALU1).
+(define_reservation "ls2_falu1"
+  "(ls2_falu1_core+ls2_falu2_turn_enabled)|ls2_falu1_core")
+(define_reservation "ls2_falu2"
+  "(ls2_falu2_core+ls2_falu1_turn_enabled)|ls2_falu2_core")
+
+;; Reservation for FALU1/2 instructions.
+;; Instruction will go to FALU1 iff ls2_falu1_turn_enabled is subscribed and
+;; switch the turn to FALU2 by subscribing ls2_falu2_turn_enabled.
+;; Or to FALU2 otherwise.
+(define_reservation "ls2_falu"
+  "(ls2_falu1+ls2_falu1_turn+ls2_falu2_turn_enabled)
+   |(ls2_falu1+ls2_falu1_turn)
+   |(ls2_falu2+ls2_falu2_turn+ls2_falu1_turn_enabled)
+   |(ls2_falu2+ls2_falu2_turn)")
+
+;; The following 4 instructions each subscribe one of
+;; ls2_[f]alu{1,2}_turn_enabled units according to this attribute.
+;; These instructions are used in mips.c: sched_ls2_dfa_post_advance_cycle.
+
+(define_attr "ls2_turn_type" "alu1,alu2,falu1,falu2,unknown"
+  (const_string "unknown"))
+
+;; Subscribe ls2_alu1_turn_enabled.
+(define_insn "ls2_alu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "alu1")])
+
+(define_insn_reservation "ls2_alu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu1")
+  "ls2_alu1_turn_enabled")
+
+;; Subscribe ls2_alu2_turn_enabled.
+(define_insn "ls2_alu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "alu2")])
+
+(define_insn_reservation "ls2_alu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu2")
+  "ls2_alu2_turn_enabled")
+
+;; Subscribe ls2_falu1_turn_enabled.
+(define_insn "ls2_falu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "falu1")])
+
+(define_insn_reservation "ls2_falu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu1")
+  "ls2_falu1_turn_enabled")
+
+;; Subscribe ls2_falu2_turn_enabled.
+(define_insn "ls2_falu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "falu2")])
+
+(define_insn_reservation "ls2_falu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu2")
+  "ls2_falu2_turn_enabled")
+
+;; Automaton for memory operations.
+(define_automaton "ls2_mem")
+
+;; Memory unit.
+(define_query_cpu_unit "ls2_mem" "ls2_mem")
+
+;; Reservation for integer instructions.
+(define_insn_reservation "ls2_alu" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "arith,condmove,const,logical,mfhilo,move,
+                        mthilo,nop,shift,signext,slt"))
+  "ls2_alu")
+
+;; Reservation for branch instructions.
+(define_insn_reservation "ls2_branch" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "branch,jump,call,trap"))
+  "ls2_alu1")
+
+;; Reservation for integer multiplication instructions.
+(define_insn_reservation "ls2_imult" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "imul,imul3"))
+  "ls2_alu2,ls2_alu2_core")
+
+;; Reservation for integer division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take 2-38 cycles.
+(define_insn_reservation "ls2_idiv" 20
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "idiv"))
+  "ls2_alu2,ls2_alu2_core*18")
+
+;; Reservation for memory load instructions.
+(define_insn_reservation "ls2_load" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "load,fpload,mfc,mtc"))
+  "ls2_mem")
+
+;; Reservation for memory store instructions.
+;; With stores we assume they don't alias with dependent loads.
+;; Therefore we set the latency to zero.
+(define_insn_reservation "ls2_store" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "store,fpstore"))
+  "ls2_mem")
+
+;; Reservation for floating-point instructions of latency 3.
+(define_insn_reservation "ls2_fp3" 3
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fabs,fneg,fcmp,fmove"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions of latency 5.
+(define_insn_reservation "ls2_fp5" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fcvt"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions that can go
+;; to either of FALU1/2 units.
+(define_insn_reservation "ls2_falu" 7
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fadd,fmul,fmadd"))
+  "ls2_falu")
+
+;; Reservation for floating-point division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; div.s takes 5-11 cycles
+;; div.d takes 5-18 cycles
+(define_insn_reservation "ls2_fdiv" 9
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fdiv"))
+  "ls2_falu2,ls2_falu2_core*7")
+
+;; Reservation for floating-point sqrt instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; sqrt.s takes 5-17 cycles
+;; sqrt.d takes 5-32 cycles
+(define_insn_reservation "ls2_fsqrt" 15
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fsqrt"))
+  "ls2_falu2,ls2_falu2_core*13")
+
+;; Two consecutive ALU instructions.
+(define_insn_reservation "ls2_multi" 4
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "multi"))
+  "(ls2_alu1,ls2_alu2_core)|(ls2_alu2,ls2_alu1_core)")
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 66)
+++ gcc/config/mips/mips.md	(working copy)
@@ -235,6 +235,12 @@
    (UNSPEC_LOONGSON_PSHUFH		517)
    (UNSPEC_LOONGSON_UNPACK_HIGH		518)
    (UNSPEC_LOONGSON_UNPACK_LOW		519)
+
+   ;; Used in loongson2ef.md
+   (UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN   530)
+   (UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN   531)
+   (UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN  532)
+   (UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN  533)
   ]
 )
 
@@ -437,7 +443,7 @@
 ;; Attribute describing the processor.  This attribute must match exactly
 ;; with the processor_type enumeration in mips.h.
 (define_attr "cpu"
-  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000"
+  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,loongson_2e,loongson_2f"
   (const (symbol_ref "mips_tune")))
 
 ;; The type of hardware hazard associated with this instruction.
@@ -783,6 +789,7 @@
 (include "9000.md")
 (include "sb1.md")
 (include "sr71k.md")
+(include "loongson2ef.md")
 (include "generic.md")
 \f
 ;;
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 66)
+++ gcc/config/mips/mips.c	(working copy)
@@ -273,6 +273,40 @@ struct mips_frame_info GTY(()) {
   HOST_WIDE_INT hard_frame_pointer_offset;
 };
 
+/* Variables and flags used in scheduler hooks when tuning for
+   Loongson 2E/2F.  */
+struct sched_ls2_def GTY(())
+{
+  /* Variables to support Loongson 2E/2F round-robin [F]ALU1/2 dispatch
+     strategy.  */
+
+  /* If true, then next ALU1/2 instruction will go to ALU1.  */
+  bool alu1_turn_p;
+
+  /* If true, then next FALU1/2 unstruction will go to FALU1.  */
+  bool falu1_turn_p;
+
+  /* Codes to query if [f]alu{1,2}_core units are subscribed or not.  */
+  int alu1_core_unit_code;
+  int alu2_core_unit_code;
+  int falu1_core_unit_code;
+  int falu2_core_unit_code;
+
+  /* True if current cycle has a multi instruction.
+     This flag is used in sched_ls2_dfa_post_advance_cycle.  */
+  bool cycle_has_multi_p;
+
+  /* Instructions to subscribe ls2_[f]alu{1,2}_turn_enabled units.
+     These are used in sched_ls2_dfa_post_advance_cycle to initialize
+     DFA state.
+     E.g., when alu1_turn_enabled_insn is issued it makes next ALU1/2
+     instruction to go ALU1.  */
+  rtx alu1_turn_enabled_insn;
+  rtx alu2_turn_enabled_insn;
+  rtx falu1_turn_enabled_insn;
+  rtx falu2_turn_enabled_insn;
+};
+
 struct machine_function GTY(()) {
   /* The register returned by mips16_gp_pseudo_reg; see there for details.  */
   rtx mips16_gp_pseudo_rtx;
@@ -301,8 +335,14 @@ struct machine_function GTY(()) {
   /* True if we have emitted an instruction to initialize
      mips16_gp_pseudo_rtx.  */
   bool initialized_mips16_gp_pseudo_p;
+
+  /* Data used when scheduling for Loongson 2E/2F.  */
+  struct sched_ls2_def _sched_ls2;
 };
 
+/* A convenient shortcut.  */
+#define sched_ls2 (cfun->machine->_sched_ls2)
+
 /* Information about a single argument.  */
 struct mips_arg_info {
   /* True if the argument is passed in a floating-point register, or
@@ -9707,11 +9747,115 @@ mips_issue_rate (void)
 	 reach the theoretical max of 4.  */
       return 3;
 
+    case PROCESSOR_LOONGSON_2E:
+    case PROCESSOR_LOONGSON_2F:
+      return 4;
+
     default:
       return 1;
     }
 }
 
+/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook.
+   Init data used in mips_dfa_post_advance_cycle.  */
+static void
+mips_init_dfa_post_cycle_insn (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    {
+      start_sequence ();
+      emit_insn (gen_ls2_alu1_turn_enabled_insn ());
+      sched_ls2.alu1_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      start_sequence ();
+      emit_insn (gen_ls2_alu2_turn_enabled_insn ());
+      sched_ls2.alu2_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      start_sequence ();
+      emit_insn (gen_ls2_falu1_turn_enabled_insn ());
+      sched_ls2.falu1_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      start_sequence ();
+      emit_insn (gen_ls2_falu2_turn_enabled_insn ());
+      sched_ls2.falu2_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      sched_ls2.alu1_core_unit_code = get_cpu_unit_code ("ls2_alu1_core");
+      sched_ls2.alu2_core_unit_code = get_cpu_unit_code ("ls2_alu2_core");
+      sched_ls2.falu1_core_unit_code = get_cpu_unit_code ("ls2_falu1_core");
+      sched_ls2.falu2_core_unit_code = get_cpu_unit_code ("ls2_falu2_core");
+    }
+}
+
+/* Initialize STATE when scheduling for Loongson 2E/2F.
+   Support round-robin dispatch scheme by enabling only one of
+   ALU1/ALU2 and one of FALU1/FALU2 units for ALU1/2 and FALU1/2 instructions
+   respectively.  */
+static void
+sched_ls2_dfa_post_advance_cycle (state_t state)
+{
+  if (cpu_unit_reservation_p (state, sched_ls2.alu1_core_unit_code))
+    {
+      /* Though there are no non-pipelined ALU1 insns,
+	 we can get an instruction of type 'multi' before reload.  */
+      gcc_assert (sched_ls2.cycle_has_multi_p);
+      sched_ls2.alu1_turn_p = false;
+    }
+
+  sched_ls2.cycle_has_multi_p = false;
+
+  if (cpu_unit_reservation_p (state, sched_ls2.alu2_core_unit_code))
+    /* We have a non-pipelined alu instruction in the core,
+       adjust round-robin counter.  */
+    sched_ls2.alu1_turn_p = true;
+
+  if (sched_ls2.alu1_turn_p)
+    {
+      if (state_transition (state, sched_ls2.alu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, sched_ls2.alu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+
+  if (cpu_unit_reservation_p (state, sched_ls2.falu1_core_unit_code))
+    {
+      /* There are no non-pipelined FALU1 insns.  */
+      gcc_unreachable ();
+      sched_ls2.falu1_turn_p = false;
+    }
+
+  if (cpu_unit_reservation_p (state, sched_ls2.falu2_core_unit_code))
+    /* We have a non-pipelined falu instruction in the core,
+       adjust round-robin counter.  */
+    sched_ls2.falu1_turn_p = true;
+
+  if (sched_ls2.falu1_turn_p)
+    {
+      if (state_transition (state, sched_ls2.falu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, sched_ls2.falu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+}
+
+/* Implement TARGET_SCHED_DFA_POST_ADVANCE_CYCLE.
+   This hook is being called at the start of each cycle.  */
+static void
+mips_dfa_post_advance_cycle (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    sched_ls2_dfa_post_advance_cycle (curr_state);
+}
+
 /* Implement TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD.  This should
    be as wide as the scheduling freedom in the DFA.  */
 
@@ -9722,6 +9866,9 @@ mips_multipass_dfa_lookahead (void)
   if (TUNE_SB1)
     return 4;
 
+  if (TUNE_LOONGSON_2EF)
+    return 4;
+
   return 0;
 }
 \f
@@ -9982,6 +10129,14 @@ mips_sched_init (FILE *file ATTRIBUTE_UN
   mips_macc_chains_last_hilo = 0;
   vr4130_last_insn = 0;
   mips_74k_agen_init (NULL_RTX);
+
+  if (TUNE_LOONGSON_2EF)
+    {
+      /* Branch instructions go to ALU1, therefore basic block is most likely
+ 	 to start with round-robin counter pointed to ALU2.  */
+      sched_ls2.alu1_turn_p = false;
+      sched_ls2.falu1_turn_p = true;
+    }
 }
 
 /* Implement TARGET_SCHED_REORDER and TARGET_SCHED_REORDER2.  */
@@ -10007,6 +10162,33 @@ mips_sched_reorder (FILE *file ATTRIBUTE
   return mips_issue_rate ();
 }
 
+/* Update round-robin counters for ALU1/2 and FALU1/2.  */
+static void
+mips_ls2_variable_issue (void)
+{
+  if (sched_ls2.alu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.alu1_core_unit_code))
+	sched_ls2.alu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.alu2_core_unit_code))
+	sched_ls2.alu1_turn_p = true;
+    }
+
+  if (sched_ls2.falu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.falu1_core_unit_code))
+	sched_ls2.falu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.falu2_core_unit_code))
+	sched_ls2.falu1_turn_p = true;
+    }
+}
+
 /* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
 
 static int
@@ -10022,6 +10204,21 @@ mips_variable_issue (FILE *file ATTRIBUT
       vr4130_last_insn = insn;
       if (TUNE_74K)
 	mips_74k_agen_init (insn);
+      else if (TUNE_LOONGSON_2EF)
+	{
+	  mips_ls2_variable_issue ();
+
+	  if (recog_memoized (insn) >= 0)
+	    {
+	      sched_ls2.cycle_has_multi_p |= (get_attr_type (insn)
+					      == TYPE_MULTI);
+
+	      /* Instructions of type 'multi' should all be split before
+		 second scheduling pass.  */
+	      gcc_assert (!sched_ls2.cycle_has_multi_p
+			  || !reload_completed);
+	    }
+	}
     }
   return more;
 }
@@ -12835,6 +13032,10 @@ mips_expand_vector_init (rtx target, rtx
 #define TARGET_SCHED_ADJUST_COST mips_adjust_cost
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE mips_issue_rate
+#undef TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN
+#define TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN mips_init_dfa_post_cycle_insn
+#undef TARGET_SCHED_DFA_POST_ADVANCE_CYCLE
+#define TARGET_SCHED_DFA_POST_ADVANCE_CYCLE mips_dfa_post_advance_cycle
 #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
 #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
   mips_multipass_dfa_lookahead
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(revision 66)
+++ gcc/config/mips/mips.h	(working copy)
@@ -265,6 +265,8 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF1_1  \
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
+#define TUNE_LOONGSON_2EF           (mips_tune == PROCESSOR_LOONGSON_2E	\
+				     || mips_tune == PROCESSOR_LOONGSON_2F)
 
 /* Whether vector modes and intrinsics for ST Microelectronics
    Loongson-2E/2F processors should be enabled.  In o32 pairs of
@@ -908,10 +910,12 @@ enum mips_code_readable_setting {
 				 && !TARGET_MIPS16)
 
 /* Likewise mtc1 and mfc1.  */
-#define ISA_HAS_XFER_DELAY	(mips_isa <= 3)
+#define ISA_HAS_XFER_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* Likewise floating-point comparisons.  */
-#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3)
+#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* True if mflo and mfhi can be immediately followed by instructions
    which write to the HI and LO registers.
@@ -928,7 +932,8 @@ enum mips_code_readable_setting {
 #define ISA_HAS_HILO_INTERLOCKS	(ISA_MIPS32				\
 				 || ISA_MIPS32R2			\
 				 || ISA_MIPS64				\
-				 || TARGET_MIPS5500)
+				 || TARGET_MIPS5500			\
+				 || TARGET_LOONGSON_2EF)
 
 /* ISA includes synci, jr.hb and jalr.hb.  */
 #define ISA_HAS_SYNCI (ISA_MIPS32R2 && !TARGET_MIPS16)
@@ -3229,3 +3234,6 @@ extern const struct mips_cpu_info *mips_
 extern const struct mips_rtx_cost_data *mips_cost;
 extern enum mips_code_readable_setting mips_code_readable;
 #endif
+
+/* Enable querying of DFA units.  */
+#define CPU_UNITS_QUERY 1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
                   ` (3 preceding siblings ...)
  2008-05-22 18:22 ` [MIPS][LS2][4/5] Scheduling and tuning Maxim Kuvyrkov
@ 2008-05-22 18:29 ` Maxim Kuvyrkov
  2008-05-25 12:02   ` Richard Sandiford
  2008-05-22 19:25 ` [MIPS] ST Loongson 2E/2F submission Gerald Pfeifer
  2008-05-23  4:46 ` Eric Fisher
  6 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-22 18:29 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Zhang Le, Eric Fisher

[-- Attachment #1: Type: text/plain, Size: 321 bytes --]

Hello,

This patch adds support for -march=native and -mtune=native options.

File driver-st.c contains a routine that checks "cpu model" line in 
/proc/cpuinfo and appends a proper option to compiler command line.

This patch also specifies layout for Loongson multilibs.


OK for trunk?

--
Maxim Kuvyrkov
CodeSourcery

[-- Attachment #2: fsf-ls2ef-5-configure.ChangeLog --]
[-- Type: text/plain, Size: 312 bytes --]

2008-02-12  Daniel Jacobowitz  <dan@codesourcery.com>
	    Kazu Hirata  <kazu@codesourcery.com>

	* config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
	* config.host: Use driver-st.o and mips/x-st for ST MIPS.
	* config/mips/st.h, config/mips/t-st, config/mips/driver-st.c,
	config/mips/x-st: New.

[-- Attachment #3: fsf-ls2ef-5-configure.patch --]
[-- Type: text/plain, Size: 6543 bytes --]

Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 68)
+++ gcc/config.gcc	(working copy)
@@ -1697,6 +1697,12 @@ mips64*-*-linux*)
 	tm_file="dbxelf.h elfos.h svr4.h linux.h ${tm_file} mips/linux.h mips/linux64.h"
 	tmake_file="${tmake_file} mips/t-linux64"
 	tm_defines="${tm_defines} MIPS_ABI_DEFAULT=ABI_N32"
+	case ${target} in
+		mips64el-st-linux-gnu)
+			tm_file="${tm_file} mips/st.h"
+			tmake_file="${tmake_file} mips/t-st"
+			;;
+	esac
 	gnu_ld=yes
 	gas=yes
 	test x$with_llsc != x || with_llsc=yes
Index: gcc/config.host
===================================================================
--- gcc/config.host	(revision 68)
+++ gcc/config.host	(working copy)
@@ -104,6 +104,14 @@ case ${host} in
 	;;
     esac
     ;;
+  mips*-*-linux*)
+    case ${target} in
+      mips64el-st-linux-gnu)
+	host_extra_gcc_objs="driver-st.o"
+	host_xmake_file="${host_xmake_file} mips/x-st"
+      ;;
+    esac
+    ;;
 esac
 
 case ${host} in
Index: gcc/config/mips/driver-st.c
===================================================================
--- gcc/config/mips/driver-st.c	(revision 0)
+++ gcc/config/mips/driver-st.c	(revision 0)
@@ -0,0 +1,71 @@
+/* Subroutines for the gcc driver.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+
+/* This will be called by the spec parser in gcc.c when it sees
+   a %:local_cpu_detect(args) construct.  Currently it will be called
+   with either "arch" or "tune" as argument depending on if -march=native
+   or -mtune=native is to be substituted.
+
+   It returns a string containing new command line parameters to be
+   put at the place of the above two options, depending on what CPU
+   this is executed.  E.g. "-march=loongson2f" on a Loongson 2F for
+   -march=native.
+
+   ARGC and ARGV are set depending on the actual arguments given
+   in the spec.  */
+const char *
+host_detect_local_cpu (int argc, const char **argv)
+{
+  const char *cpu = NULL;
+  char buf[128];
+  FILE *f;
+  bool arch;
+
+  if (argc < 1)
+    return NULL;
+
+  arch = strcmp (argv[0], "arch") == 0;
+  if (!arch && strcmp (argv[0], "tune"))
+    return NULL;
+
+  f = fopen ("/proc/cpuinfo", "r");
+  if (f == NULL)
+    return NULL;
+
+  while (fgets (buf, sizeof (buf), f) != NULL)
+    if (strncmp (buf, "cpu model", sizeof ("cpu model") - 1) == 0)
+      {
+	if (strstr (buf, "Godson2 V0.2") != NULL
+	    || strstr (buf, "Loongson-2 V0.2") != NULL)
+	  cpu = "loongson2e";
+	else if (strstr (buf, "Godson2 V0.3") != NULL
+		 || strstr (buf, "Loongson-2 V0.3") != NULL)
+	  cpu = "loongson2f";
+	break;
+      }
+
+  fclose (f);
+
+  if (cpu == NULL)
+    return NULL;
+  return concat ("-m", argv[0], "=", cpu, NULL);
+}
Index: gcc/config/mips/t-st
===================================================================
--- gcc/config/mips/t-st	(revision 0)
+++ gcc/config/mips/t-st	(revision 0)
@@ -0,0 +1,14 @@
+MULTILIB_OPTIONS = march=loongson2e/march=loongson2f mabi=n32/mabi=32/mabi=64 
+MULTILIB_DIRNAMES = 2e 2f lib32 lib lib64
+
+MULTILIB_OSDIRNAMES  = march.loongson2e/mabi.n32=../lib32/2e
+MULTILIB_OSDIRNAMES += march.loongson2e/mabi.32=../lib/2e
+MULTILIB_OSDIRNAMES += march.loongson2e/mabi.64=../lib64/2e
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.n32=../lib32/2f
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.32=../lib/2f
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.64=../lib64/2f
+MULTILIB_OSDIRNAMES += mabi.n32=../lib32
+MULTILIB_OSDIRNAMES += mabi.32=../lib
+MULTILIB_OSDIRNAMES += mabi.64=../lib64
+
+EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
Index: gcc/config/mips/x-st
===================================================================
--- gcc/config/mips/x-st	(revision 0)
+++ gcc/config/mips/x-st	(revision 0)
@@ -0,0 +1,3 @@
+driver-st.o : $(srcdir)/config/mips/driver-st.c \
+  $(CONFIG_H) $(SYSTEM_H)
+	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
Index: gcc/config/mips/st.h
===================================================================
--- gcc/config/mips/st.h	(revision 0)
+++ gcc/config/mips/st.h	(revision 0)
@@ -0,0 +1,47 @@
+/* ST 2e / 2f GNU/Linux Configuration.
+   Copyright (C) 2008
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* The various C libraries each have their own subdirectory.  */
+#undef SYSROOT_SUFFIX_SPEC
+#define SYSROOT_SUFFIX_SPEC			\
+  "%{march=loongson2e:/2e ;			\
+     march=loongson2f:/2f}"
+
+#undef STARTFILE_PREFIX_SPEC
+#define STARTFILE_PREFIX_SPEC				\
+  "%{mabi=32: /usr/local/lib/ /lib/ /usr/lib/}		\
+   %{mabi=n32: /usr/local/lib32/ /lib32/ /usr/lib32/}	\
+   %{mabi=64: /usr/local/lib64/ /lib64/ /usr/lib64/}"
+
+/* -march=native handling only makes sense with compiler running on
+   a MIPS chip.  */
+#if defined(__mips__)
+extern const char *host_detect_local_cpu (int argc, const char **argv);
+# define EXTRA_SPEC_FUNCTIONS \
+  { "local_cpu_detect", host_detect_local_cpu },
+
+#undef SUBTARGET_SELF_SPECS
+#define SUBTARGET_SELF_SPECS \
+"%{!EB:%{!EL:%(endian_spec)}}", \
+"%{!mabi=*: -mabi=n32}", \
+"%{march=native:%<march=native %:local_cpu_detect(arch) \
+  %{!mtune=*:%<mtune=native %:local_cpu_detect(tune)}} \
+%{mtune=native:%<mtune=native %:local_cpu_detect(tune)}"
+#endif

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS] ST Loongson 2E/2F submission
  2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
                   ` (4 preceding siblings ...)
  2008-05-22 18:29 ` [MIPS][LS2][5/5] Support for native MIPS GCC Maxim Kuvyrkov
@ 2008-05-22 19:25 ` Gerald Pfeifer
  2008-05-23  4:46 ` Eric Fisher
  6 siblings, 0 replies; 66+ messages in thread
From: Gerald Pfeifer @ 2008-05-22 19:25 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Sandiford, gcc-patches, Zhang Le, Eric Fisher

On Thu, 22 May 2008, Maxim Kuvyrkov wrote:
> In this thread I'll post patches that add support for ST Loongson 2E/2F 
> CPUs to MIPS GCC port.

Once these are in, would you mind also adding this to gcc-4.4/changes.html
and if you (or Richard) things this is of sufficient prominence also 
provide a news entry for our main page?

Thanks!

Gerald

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][1/5] Generic support
  2008-05-22 17:53 ` [MIPS][LS2][1/5] Generic support Maxim Kuvyrkov
@ 2008-05-22 19:27   ` Richard Sandiford
  2008-05-26 13:47     ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-05-22 19:27 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Hi Maxim,

Thanks for the patches.  I've reviewed 1-3 so far; hope to get to
4 and 5 over the weekend.

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> This patch adds generic support for ST Loongson 2E/2F CPUs such as 
> loongson2e and loongson2f values for -march= and mtune= options.
>
> OK for trunk?

As per the comment above mips_cpu_info_table:

   To ease comparison, please keep this table in the same order
   as GAS's mips_cpu_info_table.  Please also make sure that
   MIPS_ISA_LEVEL_SPEC and MIPS_ARCH_FLOAT_SPEC handle all -march
   options correctly.  */

(Not a whinge, it's easy to miss.)  So the patch needs to add the new
options to MIPS_ISA_LEVEL_SPEC too.

MIPS_ARCH_FLOAT_SPEC lists soft-float processors, so no change is needed
there.

> @@ -11925,7 +11925,8 @@ The processor names are:
>  @samp{sb1},
>  @samp{sr71000},
>  @samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},
> -@samp{vr5000}, @samp{vr5400} and @samp{vr5500}.
> +@samp{vr5000}, @samp{vr5400}, @samp{vr5500}, @samp{loongson2e} and
> +@samp{loongson2f}.

Please keep this list alphabetically sorted.

> Index: gcc/config/mips/mips.h
> ===================================================================
> --- gcc/config/mips/mips.h	(revision 60)
> +++ gcc/config/mips/mips.h	(working copy)
> @@ -67,6 +67,8 @@ enum processor_type {
>    PROCESSOR_SB1,
>    PROCESSOR_SB1A,
>    PROCESSOR_SR71000,
> +  PROCESSOR_LOONGSON_2E,
> +  PROCESSOR_LOONGSON_2F,
>    PROCESSOR_MAX
>  };

Likewise.  The patch (and commit) that adds to processor_type must also
be the one that adds to the "cpu" attribute in mips.md.

(For avoidance of doubt, please submit the new patches with the
comments addressed.  The second review should just be a formality.)

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-05-22 18:16 ` [MIPS][LS2][3/5] Miscellaneous instructions Maxim Kuvyrkov
@ 2008-05-22 19:33   ` Richard Sandiford
  2008-06-08 19:59     ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-05-22 19:33 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

> This patch adds support for several non-MIPS 3 instructions ST Loongson 
> 2E/2F CPUs support.
>
> Generally, Loongson is a MIPS3 core, but it also supports
>
> * movn/movz for integer modes
> * [n]madd/[n]msub instructions, which are analogues of respective 
> instructions from MIPS4 ISA.
> * A subset of paired-single float instructions of MIPS5 ISA.
>
> Support for first two items is pretty straightforward.
>
> To add support for Loongson paired-single float instructions the 
> following was done.
>
> To select the subset of paired-single float instructions Loongson 
> supports new macro TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2 and 
> mode_iterator ANYF_MIPS5_LS2 were declared.
>
> MIPS5 .ps instructions are described with help of mode_iterator ANYF in 
> mips.md:
>
> ;; This mode macro allows :ANYF to be used wherever a scalar or vector
> ;; floating-point mode is allowed.
> (define_mode_iterator ANYF
>   [(SF "TARGET_HARD_FLOAT")
>    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
>    (V2SF "TARGET_PAIRED_SINGLE_FLOAT")])
>
> To facilitate declaration of .ps instructions for Loongson2 I introduced 
> one more target mask MASK_PAIRED_SINGLE_FLOAT_MIPS5_LS2 which is set 
> whenever TARGET_PAIRED_SINGLE_FLOAT of TARGET_LOONGSON_2EF is set.
>
> Then I used ANYF_MIPS5_LS2 mode_iterator (see below) to describe 
> instructions both MIPS5 and Loongson support and ANYF mode_macro to 
> describe those which only MIPS5 ISA has.
>
> ;; This mode macro allows :ANYF_MIPS5_LS2 to be used wherever
> ;; a scalar or Loongson2 vector floating-point mode is allowed.
> (define_mode_macro ANYF_MIPS5_LS2
>   [(SF "TARGET_HARD_FLOAT")
>    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
>    (V2SF "TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2")])
>
> Hence the instructions from MIPS5 ISA, which happen to be supported by 
> Loongson2, are declared with ANYF_MIPS5_LS2 mode_iterator.  I don't like 
> this, but can't figure out alternative way to name / describe these 
> instructions.

I'd prefer to keep TARGET_PAIRED_SINGLE_FLOAT for the cases that
are common between Loongson and non-Loongson mode (i.e. the cases in
which .ps is available in some form).  Then add new ISA_HAS_FOO
macros for each class of instruction that Loongson doesn't have.
E.g.

    ISA_HAS_PXX_PS

for PUU.PS & co.

Feel free to run a list of ISA_HAS_* macros by me before testing.

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi	(revision 64)
> +++ gcc/doc/invoke.texi	(working copy)
> @@ -12205,6 +12205,16 @@ Use (do not use) paired-single floating-
>  @xref{MIPS Paired-Single Support}.  This option requires
>  hardware floating-point support to be enabled.
>  
> +@item -mpaired-single-loongson2
> +@itemx -mno-paired-single-loongson2
> +@opindex mpaired-single-loongson2
> +@opindex mno-paired-single-loongson2
> +Use (do not use) paired-single floating-point instructions of
> +ST Loongson 2E/2F CPUs.
> +@xref{MIPS Paired-Single Support}.  This option can only be used
> +when generating 64-bit code and requires hardware floating-point
> +support to be enabled.  !!! This should be fixed by NathanS' patch.

Don't think you meant to keep that ;)

Why would you want to enable and disable these instructions separately
from -march=loongson?  If you wouldn't, I think it's better to keep
things simple: make TARGET_LOONGSON_2EF select all the new stuff
whenever it's available.

Apart from that, the patch looks good.  E.g. I agree with the way
you've split up the existing ISA_HAS_* macros.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-05-22 18:08 ` [MIPS][LS2][2/5] Vector intrinsics Maxim Kuvyrkov
@ 2008-05-22 19:35   ` Richard Sandiford
  2008-05-28 12:52     ` Maxim Kuvyrkov
  2008-06-05 10:38     ` Maxim Kuvyrkov
  0 siblings, 2 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-05-22 19:35 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

The patch generally looks good.  My main concern is the FPR move handling.
It looks like you use MOV.D for 64-bit integer moves, but this is usually
incorrect.  In the standard ISA spec, MOV.D is unpredictable unless
(a) the source is uninterpreted or (b) it has been interpreted as a
double-precision floating-point value.

So: does Loongson specifically exempt itself from this restriction?
Or does it have special MOV.FOO instructions for the new modes?

Either way, the patch is inconsistent.  mips_mode_ok_for_mov_fmt_p
should return true for any mode that can/will be handled by MOV.FMT.

I don't understand why you need FPR<->FPR DImode moves for 32-bit
targets but not 64-bit targets.  (movdi_64bit doesn't have FPR<->FPR
moves either.)

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp	(revision 62)
> +++ gcc/testsuite/lib/target-supports.exp	(working copy)
> @@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
>      } "-mfpu=neon -mfloat-abi=softfp"]
>  }
>  
> +# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
> +# the Loongson vector modes.
> +
> +proc check_effective_target_mips_loongson { } {
> +    return [check_no_compiler_messages loongson assembly {
> +	#if !defined(_MIPS_LOONGSON_VECTOR_MODES)
> +	#error FOO
> +	#endif
> +    }]
> +}
> +

I think this is a poor choice of name for a user-visible macro.  "modes"
are an internal gcc concept, and your .h-based API shields the user from
the "__attribute__"s needed to construct the types.

> +;; Move patterns.
> +
> +;; Expander to legitimize moves involving values of vector modes.
> +(define_expand "mov<mode>"
> +  [(set (match_operand:VWHB 0 "nonimmediate_operand")
> +	(match_operand:VWHB 1 "move_operand"))]
> +  ""
> +{
> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
> +    DONE;
> +})

Hmm.  This is probably going to cause problems if other ASEs use
the same modes in future, but I guess this is OK until then.

Local style is not to have predicates for move expanders.
The predicates aren't checked, and I think it's confusing
to have an expander with "move_operand" as its predicate,
and to then call a function (mips_legitimize_move) that
deals with non-move_operands.  So:

  [(set (match_operand:VWHB 0)
	(match_operand:VWHB 1)]

> +;; Handle legitimized moves between values of vector modes.
> +(define_insn "mov<mode>_internal"
> +  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,r,f,r,m,f")
> +	(match_operand:VWHB 1 "move_operand" "f,m,f,f,r,r,YG,YG"))]

"d" rather than "r".

> +  "HAVE_LOONGSON_VECTOR_MODES"
> +{
> +  return mips_output_move (operands[0], operands[1]);
> +}

Local style is to format single-line C blocks as:

  "HAVE_LOONGSON_VECTOR_MODES"
  { return mips_output_move (operands[0], operands[1]); }

> +  [(set_attr "type" "fpstore,fpload,*,mfc,mtc,*,fpstore,mtc")

"type" shouldn't be "*", but you fixed this in patch 4.
Please include this fix, and the other type attributes,
in the original loongson.md patch.

> +   (set_attr "mode" "<MODE>")])
> +
> +;; Initialization of a vector.
> +
> +(define_expand "vec_init<mode>"
> +  [(set (match_operand:VWHB 0 "register_operand" "=f")
> +	(match_operand 1 "" ""))]
> +  "HAVE_LOONGSON_VECTOR_MODES"
> +  {
> +    mips_expand_vector_init (operands[0], operands[1]);
> +    DONE;
> +  }
> +)

Expanders shouldn't have constraints.  Inconsistent formatting wrt
previous patterns (which followed local style):

(define_expand "vec_init<mode>"
  [(set (match_operand:VWHB 0 "register_operand")
	(match_operand 1))]
  "HAVE_LOONGSON_VECTOR_MODES"
{
  mips_expand_vector_init (operands[0], operands[1]);
  DONE;
})

> +;; Instruction patterns for SIMD instructions.
> +
> +;; Pack with signed saturation.
> +(define_insn "vec_pack_ssat_<mode>"
> +  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
> +        (vec_concat:<V_squash_double>
> +	  (ss_truncate:<V_squash> (match_operand:VWH 1 "register_operand" "f"))
> +          (ss_truncate:<V_squash> (match_operand:VWH 2 "register_operand" "f")))
> +	
> +   )]
> +  "HAVE_LOONGSON_VECTOR_MODES"
> +  "packss<V_squash_double_suffix>\t%0,%1,%2"
> +)

Inconsistent indentation (tabs vs. spaces by the looks of things).
Inconsistent position for closing ")" (which you fixed in patch 4).

In general, local style is to put ")" and "]" on the same line as the
thing they're closing, even if it means breaking a line.  So:

(define_insn "vec_pack_ssat_<mode>"
  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
	(vec_concat:<V_squash_double>
	  (ss_truncate:<V_squash>
	    (match_operand:VWH 1 "register_operand" "f"))
	  (ss_truncate:<V_squash>
	    (match_operand:VWH 2 "register_operand" "f"))))]
  "HAVE_LOONGSON_VECTOR_MODES"
  "packss<V_squash_double_suffix>\t%0,%1,%2")

Other instances.

> @@ -494,7 +516,10 @@
>  
>  ;; 64-bit modes for which we provide move patterns.
>  (define_mode_iterator MOVE64
> -  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
> +  [(DI "!TARGET_64BIT") (DF "!TARGET_64BIT")
> +   (V2SF "!TARGET_64BIT && TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
> +   (V2SI "HAVE_LOONGSON_VECTOR_MODES") (V4HI "HAVE_LOONGSON_VECTOR_MODES")
> +   (V8QI "HAVE_LOONGSON_VECTOR_MODES")])

Since we need more than one line, put V2SF and each new entry on its
own line.  The changes to the existing modes aren't right; they aren't
consistent with the comment.

> Index: gcc/config/mips/mips.c
> ===================================================================
> --- gcc/config/mips/mips.c	(revision 62)
> +++ gcc/config/mips/mips.c	(working copy)
> @@ -3518,6 +3518,23 @@ mips_output_move (rtx dest, rtx src)
>    if (dbl_p && mips_split_64bit_move_p (dest, src))
>      return "#";
>  
> +  /* Handle cases where the source is a constant zero vector on
> +     Loongson targets.  */
> +  if (HAVE_LOONGSON_VECTOR_MODES && src_code == CONST_VECTOR)
> +    {
> +      if (dest_code == REG)
> +	{
> +	  /* Move constant zero vector to floating-point register.  */
> +	  gcc_assert (FP_REG_P (REGNO (dest)));
> +	  return "dmtc1\t$0,%0";
> +	}
> +      else if (dest_code == MEM)
> +	/* Move constant zero vector to memory.  */
> +	return "sd\t$0,%0";
> +      else
> +	gcc_unreachable ();
> +    }
> +

Why doesn't the normal zero handling work?

> +/* Initialize vector TARGET to VALS.  */
> +
> +void
> +mips_expand_vector_init (rtx target, rtx vals)
> +{
> +  enum machine_mode mode = GET_MODE (target);
> +  enum machine_mode inner = GET_MODE_INNER (mode);
> +  unsigned int i, n_elts = GET_MODE_NUNITS (mode);
> +  rtx mem;
> +
> +  gcc_assert (VECTOR_MODE_P (mode));
> +
> +  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
> +  for (i = 0; i < n_elts; i++)
> +    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
> +                    XVECEXP (vals, 0, i));
> +
> +  emit_move_insn (target, mem);
> +}

Please keep initialisation and code separate.

Do we really want to create a new stack slot for every initialisation?
It seems on the face of it that some sort of reuse would be nice.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-05-22 18:22 ` [MIPS][LS2][4/5] Scheduling and tuning Maxim Kuvyrkov
@ 2008-05-23  3:07   ` Zhang Le
  2008-05-23 13:17     ` Maxim Kuvyrkov
  2008-05-25 11:57   ` Richard Sandiford
  1 sibling, 1 reply; 66+ messages in thread
From: Zhang Le @ 2008-05-23  3:07 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Sandiford, gcc-patches, Eric Fisher

On 21:57 Thu 22 May     , Maxim Kuvyrkov wrote:
> Index: gcc/config/mips/loongson2ef.md

Hi, Maxim
There are some duplicated parts in this file

> ===================================================================
> --- gcc/config/mips/loongson2ef.md	(revision 0)
> +++ gcc/config/mips/loongson2ef.md	(revision 0)
> @@ -0,0 +1,486 @@
> +;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
> +
> +;; Copyright (C) 2008 Free Software Foundation, Inc.
> +;; Contributed by CodeSourcery.

[...]

> +;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
> +
> +;; Copyright (C) 2008 Free Software Foundation, Inc.
> +;; Contributed by CodeSourcery.
> +;;

I got this error when trying this patch:
/var/tmp/portage/sys-devel/gcc-4.3.0/work/gcc-4.3.0/gcc/config/mips/loongson2ef.md:338:
duplicate definition for attribute ls2_turn_type
/var/tmp/portage/sys-devel/gcc-4.3.0/work/gcc-4.3.0/gcc/config/mips/loongson2ef.md:95:
previous definition

Regards,

Robert

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS] ST Loongson 2E/2F submission
  2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
                   ` (5 preceding siblings ...)
  2008-05-22 19:25 ` [MIPS] ST Loongson 2E/2F submission Gerald Pfeifer
@ 2008-05-23  4:46 ` Eric Fisher
  2008-05-23  6:05   ` Zhang Le
  6 siblings, 1 reply; 66+ messages in thread
From: Eric Fisher @ 2008-05-23  4:46 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Sandiford, gcc-patches, Zhang Le

Hello,

 I'd like to test the patches on my boxd. Are the patches based on the
4.3.0 or svn version?

Thanks,

Eric Fisher

2008/5/23 Maxim Kuvyrkov <maxim@codesourcery.com>:
> Hello Richard,
>
> In this thread I'll post patches that add support for ST Loongson 2E/2F CPUs
> to MIPS GCC port.
>
> These patches were developed at CodeSourcery Inc. by
>
> * Mark Shinwell
> * Nathan Sidwell
> * Daniel Jacobowitz
> * Kazu Hirata
> * me
>
> I'm now testing the cumulative patch at mips64el-st-linux-gnu Loongson 2E
> and Loongson 2F boxes.
>
>
> Thanks,
>
> Maxim Kuvyrkov
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS] ST Loongson 2E/2F submission
  2008-05-23  4:46 ` Eric Fisher
@ 2008-05-23  6:05   ` Zhang Le
  0 siblings, 0 replies; 66+ messages in thread
From: Zhang Le @ 2008-05-23  6:05 UTC (permalink / raw)
  To: Eric Fisher; +Cc: Maxim Kuvyrkov, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 303 bytes --]

On 11:06 Fri 23 May     , Eric Fisher wrote:
> Hello,
> 
>  I'd like to test the patches on my boxd. Are the patches based on the
> 4.3.0 or svn version?

The patches could be applied on 4.3.0

btw, I have modified patch 4, included in the attachment.

Regards,

Zhang Le
http://dev.gentoo.org/~r0bertz

[-- Attachment #2: fsf-ls2ef-4-sched.patch --]
[-- Type: text/plain, Size: 28466 bytes --]

Index: gcc/config/mips/loongson.md
===================================================================
--- gcc/config/mips/loongson.md	(revision 66)
+++ gcc/config/mips/loongson.md	(working copy)
@@ -85,7 +85,7 @@
 {
   return mips_output_move (operands[0], operands[1]);
 }
-  [(set_attr "type" "fpstore,fpload,*,mfc,mtc,*,fpstore,mtc")
+  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
    (set_attr "mode" "<MODE>")])
 
 ;; Initialization of a vector.
@@ -112,7 +112,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "packss<V_squash_double_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Pack with unsigned saturation.
 (define_insn "vec_pack_usat_<mode>"
@@ -124,6 +124,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "packus<V_squash_double_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")]
 )
 
 ;; Addition, treating overflow by wraparound.
@@ -136,7 +137,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "padd<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Addition of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
@@ -149,7 +150,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "paddd\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Addition, treating overflow by signed saturation.
 (define_insn "ssadd<mode>3"
@@ -161,7 +162,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "padds<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Addition, treating overflow by unsigned saturation.
 (define_insn "usadd<mode>3"
@@ -172,7 +173,8 @@
 	)
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
-  "paddus<V_suffix>\t%0,%1,%2")
+  "paddus<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Logical AND NOT.
 (define_insn "loongson_and_not_<mode>"
@@ -184,7 +186,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pandn\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Average.
 (define_insn "loongson_average_<mode>"
@@ -197,7 +199,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pavg<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Equality test.
 (define_insn "loongson_eq_<mode>"
@@ -210,7 +212,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pcmpeq<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Greater-than test.
 (define_insn "loongson_gt_<mode>"
@@ -223,7 +225,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pcmpgt<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Extract halfword.
 (define_insn "loongson_extract_halfword"
@@ -236,7 +238,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pextr<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Insert halfword.
 (define_insn "loongson_insert_halfword_0"
@@ -249,7 +251,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_0\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_insert_halfword_1"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -260,7 +262,7 @@
   ]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_1\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_insert_halfword_2"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -272,7 +274,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_2\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_insert_halfword_3"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -284,7 +286,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pinsr<V_suffix>_3\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Multiply and add packed integers.
 (define_insn "loongson_mult_add"
@@ -297,7 +299,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmadd<V_stretch_half_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Maximum of signed halfwords.
 (define_insn "smax<mode>3"
@@ -309,7 +311,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmaxs<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Maximum of unsigned bytes.
 (define_insn "umax<mode>3"
@@ -321,7 +323,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmaxu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Minimum of signed halfwords.
 (define_insn "smin<mode>3"
@@ -333,7 +335,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmins<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Minimum of unsigned bytes.
 (define_insn "umin<mode>3"
@@ -345,7 +347,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pminu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Move byte mask.
 (define_insn "loongson_move_byte_mask"
@@ -357,7 +359,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmovmsk<V_suffix>\t%0,%1"
-)
+  [(set_attr "type" "fabs")])
 
 ;; Multiply unsigned integers and store high result.
 (define_insn "umul<mode>3_highpart"
@@ -370,7 +372,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmulhu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Multiply signed integers and store high result.
 (define_insn "smul<mode>3_highpart"
@@ -383,7 +385,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmulh<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Multiply signed integers and store low result.
 (define_insn "loongson_smul_lowpart"
@@ -396,7 +398,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmull<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Multiply unsigned word integers.
 (define_insn "loongson_umul_word"
@@ -409,7 +411,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pmulu<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Absolute difference.
 (define_insn "loongson_pasubub"
@@ -422,7 +424,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pasubub\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Sum of unsigned byte integers.
 (define_insn "reduc_uplus_<mode>"
@@ -434,7 +436,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "biadd\t%0,%1"
-)
+  [(set_attr "type" "fabs")])
 
 ;; Sum of absolute differences.
 (define_insn "loongson_psadbh"
@@ -447,7 +449,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pasubub\t%0,%1,%2;biadd\t%0,%0"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Shuffle halfwords.
 (define_insn "loongson_pshufh"
@@ -461,7 +463,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "pshufh\t%0,%2,%3"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Shift left logical.
 (define_insn "loongson_psll<mode>"
@@ -473,7 +475,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psll<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fmul")])
 
 ;; Shift right arithmetic.
 (define_insn "loongson_psra<mode>"
@@ -485,7 +487,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psra<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Shift right logical.
 (define_insn "loongson_psrl<mode>"
@@ -497,7 +499,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psrl<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Subtraction, treating overflow by wraparound.
 (define_insn "sub<mode>3"
@@ -509,7 +511,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psub<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
@@ -522,7 +524,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psubd\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction, treating overflow by signed saturation.
 (define_insn "sssub<mode>3"
@@ -534,7 +536,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psubs<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction, treating overflow by unsigned saturation.
 (define_insn "ussub<mode>3"
@@ -546,7 +548,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "psubus<V_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fadd")])
 
 ;; Unpack high data.
 (define_insn "vec_interleave_high<mode>"
@@ -559,7 +561,7 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "punpckh<V_stretch_half_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
 
 ;; Unpack low data.
 (define_insn "vec_interleave_low<mode>"
@@ -572,4 +574,4 @@
    )]
   "HAVE_LOONGSON_VECTOR_MODES"
   "punpckl<V_stretch_half_suffix>\t%0,%1,%2"
-)
+  [(set_attr "type" "fdiv")])
Index: gcc/config/mips/loongson2ef.md
===================================================================
--- gcc/config/mips/loongson2ef.md	(revision 0)
+++ gcc/config/mips/loongson2ef.md	(revision 0)
@@ -0,0 +1,486 @@
+;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
+
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Automaton for integer instructions.
+(define_automaton "ls2_alu")
+
+;; ALU1 and ALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_alu1_core,ls2_alu2_core" "ls2_alu")
+
+;; Pseudo units to help modeling of ALU1/2 round-robin dispatch strategy.
+(define_cpu_unit "ls2_alu1_turn,ls2_alu2_turn" "ls2_alu")
+
+;; Pseudo units to enable/disable ls2_alu[12]_turn units.
+;; ls2_alu[12]_turn unit can be subscribed only after ls2_alu[12]_turn_enabled
+;; unit is subscribed.
+(define_cpu_unit "ls2_alu1_turn_enabled,ls2_alu2_turn_enabled" "ls2_alu")
+(presence_set "ls2_alu1_turn" "ls2_alu1_turn_enabled")
+(presence_set "ls2_alu2_turn" "ls2_alu2_turn_enabled")
+
+;; Reservations for ALU1 (ALU2) instructions.
+;; Instruction goes to ALU1 (ALU2) and makes next ALU1/2 instruction to
+;; be dispatched to ALU2 (ALU1).
+(define_reservation "ls2_alu1"
+  "(ls2_alu1_core+ls2_alu2_turn_enabled)|ls2_alu1_core")
+(define_reservation "ls2_alu2"
+  "(ls2_alu2_core+ls2_alu1_turn_enabled)|ls2_alu2_core")
+
+;; Reservation for ALU1/2 instructions.
+;; Instruction will go to ALU1 iff ls2_alu1_turn_enabled is subscribed and
+;; switch the turn to ALU2 by subscribing ls2_alu2_turn_enabled.
+;; Or to ALU2 otherwise.
+(define_reservation "ls2_alu"
+  "(ls2_alu1_core+ls2_alu1_turn+ls2_alu2_turn_enabled)
+   |(ls2_alu1_core+ls2_alu1_turn)
+   |(ls2_alu2_core+ls2_alu2_turn+ls2_alu1_turn_enabled)
+   |(ls2_alu2_core+ls2_alu2_turn)")
+
+;; Automaton for floating-point instructions.
+(define_automaton "ls2_falu")
+
+;; FALU1 and FALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_falu1_core,ls2_falu2_core" "ls2_falu")
+
+;; Pseudo units to help modeling of FALU1/2 round-robin dispatch strategy. 
+(define_cpu_unit "ls2_falu1_turn,ls2_falu2_turn" "ls2_falu")
+
+;; Pseudo units to enable/disable ls2_falu[12]_turn units.
+;; ls2_falu[12]_turn unit can be subscribed only after
+;; ls2_falu[12]_turn_enabled unit is subscribed.
+(define_cpu_unit "ls2_falu1_turn_enabled,ls2_falu2_turn_enabled"
+  "ls2_falu")
+(presence_set "ls2_falu1_turn" "ls2_falu1_turn_enabled")
+(presence_set "ls2_falu2_turn" "ls2_falu2_turn_enabled")
+
+;; Reservations for FALU1 (FALU2) instructions.
+;; Instruction goes to FALU1 (FALU2) and makes next FALU1/2 instruction to
+;; be dispatched to FALU2 (FALU1).
+(define_reservation "ls2_falu1"
+  "(ls2_falu1_core+ls2_falu2_turn_enabled)|ls2_falu1_core")
+(define_reservation "ls2_falu2"
+  "(ls2_falu2_core+ls2_falu1_turn_enabled)|ls2_falu2_core")
+
+;; Reservation for FALU1/2 instructions.
+;; Instruction will go to FALU1 iff ls2_falu1_turn_enabled is subscribed and
+;; switch the turn to FALU2 by subscribing ls2_falu2_turn_enabled.
+;; Or to FALU2 otherwise.
+(define_reservation "ls2_falu"
+  "(ls2_falu1+ls2_falu1_turn+ls2_falu2_turn_enabled)
+   |(ls2_falu1+ls2_falu1_turn)
+   |(ls2_falu2+ls2_falu2_turn+ls2_falu1_turn_enabled)
+   |(ls2_falu2+ls2_falu2_turn)")
+
+;; The following 4 instructions each subscribe one of
+;; ls2_[f]alu{1,2}_turn_enabled units according to this attribute.
+;; These instructions are used in mips.c: sched_ls2_dfa_post_advance_cycle.
+
+(define_attr "ls2_turn_type" "alu1,alu2,falu1,falu2,unknown"
+  (const_string "unknown"))
+
+;; Subscribe ls2_alu1_turn_enabled.
+(define_insn "ls2_alu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "alu1")])
+
+(define_insn_reservation "ls2_alu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu1")
+  "ls2_alu1_turn_enabled")
+
+;; Subscribe ls2_alu2_turn_enabled.
+(define_insn "ls2_alu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "alu2")])
+
+(define_insn_reservation "ls2_alu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu2")
+  "ls2_alu2_turn_enabled")
+
+;; Subscribe ls2_falu1_turn_enabled.
+(define_insn "ls2_falu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "falu1")])
+
+(define_insn_reservation "ls2_falu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu1")
+  "ls2_falu1_turn_enabled")
+
+;; Subscribe ls2_falu2_turn_enabled.
+(define_insn "ls2_falu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+{
+  gcc_unreachable ();
+  return "";
+}
+  [(set_attr "ls2_turn_type" "falu2")])
+
+(define_insn_reservation "ls2_falu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu2")
+  "ls2_falu2_turn_enabled")
+
+;; Automaton for memory operations.
+(define_automaton "ls2_mem")
+
+;; Memory unit.
+(define_query_cpu_unit "ls2_mem" "ls2_mem")
+
+;; Reservation for integer instructions.
+(define_insn_reservation "ls2_alu" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "arith,condmove,const,logical,mfhilo,move,
+                        mthilo,nop,shift,signext,slt"))
+  "ls2_alu")
+
+;; Reservation for branch instructions.
+(define_insn_reservation "ls2_branch" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "branch,jump,call,trap"))
+  "ls2_alu1")
+
+;; Reservation for integer multiplication instructions.
+(define_insn_reservation "ls2_imult" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "imul,imul3"))
+  "ls2_alu2,ls2_alu2_core")
+
+;; Reservation for integer division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take 2-38 cycles.
+(define_insn_reservation "ls2_idiv" 20
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "idiv"))
+  "ls2_alu2,ls2_alu2_core*18")
+
+;; Reservation for memory load instructions.
+(define_insn_reservation "ls2_load" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "load,fpload,mfc,mtc"))
+  "ls2_mem")
+
+;; Reservation for memory store instructions.
+;; With stores we assume they don't alias with dependent loads.
+;; Therefore we set the latency to zero.
+(define_insn_reservation "ls2_store" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "store,fpstore"))
+  "ls2_mem")
+
+;; Reservation for floating-point instructions of latency 3.
+(define_insn_reservation "ls2_fp3" 3
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fabs,fneg,fcmp,fmove"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions of latency 5.
+(define_insn_reservation "ls2_fp5" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fcvt"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions that can go
+;; to either of FALU1/2 units.
+(define_insn_reservation "ls2_falu" 7
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fadd,fmul,fmadd"))
+  "ls2_falu")
+
+;; Reservation for floating-point division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; div.s takes 5-11 cycles
+;; div.d takes 5-18 cycles
+(define_insn_reservation "ls2_fdiv" 9
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fdiv"))
+  "ls2_falu2,ls2_falu2_core*7")
+
+;; Reservation for floating-point sqrt instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; sqrt.s takes 5-17 cycles
+;; sqrt.d takes 5-32 cycles
+(define_insn_reservation "ls2_fsqrt" 15
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fsqrt"))
+  "ls2_falu2,ls2_falu2_core*13")
+
+;; Two consecutive ALU instructions.
+(define_insn_reservation "ls2_multi" 4
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "multi"))
+  "(ls2_alu1,ls2_alu2_core)|(ls2_alu2,ls2_alu1_core)")
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 66)
+++ gcc/config/mips/mips.md	(working copy)
@@ -235,6 +235,12 @@
    (UNSPEC_LOONGSON_PSHUFH		517)
    (UNSPEC_LOONGSON_UNPACK_HIGH		518)
    (UNSPEC_LOONGSON_UNPACK_LOW		519)
+
+   ;; Used in loongson2ef.md
+   (UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN   530)
+   (UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN   531)
+   (UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN  532)
+   (UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN  533)
   ]
 )
 
@@ -437,7 +443,7 @@
 ;; Attribute describing the processor.  This attribute must match exactly
 ;; with the processor_type enumeration in mips.h.
 (define_attr "cpu"
-  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000"
+  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,loongson_2e,loongson_2f"
   (const (symbol_ref "mips_tune")))
 
 ;; The type of hardware hazard associated with this instruction.
@@ -783,6 +789,7 @@
 (include "9000.md")
 (include "sb1.md")
 (include "sr71k.md")
+(include "loongson2ef.md")
 (include "generic.md")
 \f
 ;;
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 66)
+++ gcc/config/mips/mips.c	(working copy)
@@ -273,6 +273,40 @@ struct mips_frame_info GTY(()) {
   HOST_WIDE_INT hard_frame_pointer_offset;
 };
 
+/* Variables and flags used in scheduler hooks when tuning for
+   Loongson 2E/2F.  */
+struct sched_ls2_def GTY(())
+{
+  /* Variables to support Loongson 2E/2F round-robin [F]ALU1/2 dispatch
+     strategy.  */
+
+  /* If true, then next ALU1/2 instruction will go to ALU1.  */
+  bool alu1_turn_p;
+
+  /* If true, then next FALU1/2 unstruction will go to FALU1.  */
+  bool falu1_turn_p;
+
+  /* Codes to query if [f]alu{1,2}_core units are subscribed or not.  */
+  int alu1_core_unit_code;
+  int alu2_core_unit_code;
+  int falu1_core_unit_code;
+  int falu2_core_unit_code;
+
+  /* True if current cycle has a multi instruction.
+     This flag is used in sched_ls2_dfa_post_advance_cycle.  */
+  bool cycle_has_multi_p;
+
+  /* Instructions to subscribe ls2_[f]alu{1,2}_turn_enabled units.
+     These are used in sched_ls2_dfa_post_advance_cycle to initialize
+     DFA state.
+     E.g., when alu1_turn_enabled_insn is issued it makes next ALU1/2
+     instruction to go ALU1.  */
+  rtx alu1_turn_enabled_insn;
+  rtx alu2_turn_enabled_insn;
+  rtx falu1_turn_enabled_insn;
+  rtx falu2_turn_enabled_insn;
+};
+
 struct machine_function GTY(()) {
   /* The register returned by mips16_gp_pseudo_reg; see there for details.  */
   rtx mips16_gp_pseudo_rtx;
@@ -301,8 +335,14 @@ struct machine_function GTY(()) {
   /* True if we have emitted an instruction to initialize
      mips16_gp_pseudo_rtx.  */
   bool initialized_mips16_gp_pseudo_p;
+
+  /* Data used when scheduling for Loongson 2E/2F.  */
+  struct sched_ls2_def _sched_ls2;
 };
 
+/* A convenient shortcut.  */
+#define sched_ls2 (cfun->machine->_sched_ls2)
+
 /* Information about a single argument.  */
 struct mips_arg_info {
   /* True if the argument is passed in a floating-point register, or
@@ -9707,11 +9747,115 @@ mips_issue_rate (void)
 	 reach the theoretical max of 4.  */
       return 3;
 
+    case PROCESSOR_LOONGSON_2E:
+    case PROCESSOR_LOONGSON_2F:
+      return 4;
+
     default:
       return 1;
     }
 }
 
+/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook.
+   Init data used in mips_dfa_post_advance_cycle.  */
+static void
+mips_init_dfa_post_cycle_insn (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    {
+      start_sequence ();
+      emit_insn (gen_ls2_alu1_turn_enabled_insn ());
+      sched_ls2.alu1_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      start_sequence ();
+      emit_insn (gen_ls2_alu2_turn_enabled_insn ());
+      sched_ls2.alu2_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      start_sequence ();
+      emit_insn (gen_ls2_falu1_turn_enabled_insn ());
+      sched_ls2.falu1_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      start_sequence ();
+      emit_insn (gen_ls2_falu2_turn_enabled_insn ());
+      sched_ls2.falu2_turn_enabled_insn = get_insns ();
+      end_sequence ();
+
+      sched_ls2.alu1_core_unit_code = get_cpu_unit_code ("ls2_alu1_core");
+      sched_ls2.alu2_core_unit_code = get_cpu_unit_code ("ls2_alu2_core");
+      sched_ls2.falu1_core_unit_code = get_cpu_unit_code ("ls2_falu1_core");
+      sched_ls2.falu2_core_unit_code = get_cpu_unit_code ("ls2_falu2_core");
+    }
+}
+
+/* Initialize STATE when scheduling for Loongson 2E/2F.
+   Support round-robin dispatch scheme by enabling only one of
+   ALU1/ALU2 and one of FALU1/FALU2 units for ALU1/2 and FALU1/2 instructions
+   respectively.  */
+static void
+sched_ls2_dfa_post_advance_cycle (state_t state)
+{
+  if (cpu_unit_reservation_p (state, sched_ls2.alu1_core_unit_code))
+    {
+      /* Though there are no non-pipelined ALU1 insns,
+	 we can get an instruction of type 'multi' before reload.  */
+      gcc_assert (sched_ls2.cycle_has_multi_p);
+      sched_ls2.alu1_turn_p = false;
+    }
+
+  sched_ls2.cycle_has_multi_p = false;
+
+  if (cpu_unit_reservation_p (state, sched_ls2.alu2_core_unit_code))
+    /* We have a non-pipelined alu instruction in the core,
+       adjust round-robin counter.  */
+    sched_ls2.alu1_turn_p = true;
+
+  if (sched_ls2.alu1_turn_p)
+    {
+      if (state_transition (state, sched_ls2.alu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, sched_ls2.alu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+
+  if (cpu_unit_reservation_p (state, sched_ls2.falu1_core_unit_code))
+    {
+      /* There are no non-pipelined FALU1 insns.  */
+      gcc_unreachable ();
+      sched_ls2.falu1_turn_p = false;
+    }
+
+  if (cpu_unit_reservation_p (state, sched_ls2.falu2_core_unit_code))
+    /* We have a non-pipelined falu instruction in the core,
+       adjust round-robin counter.  */
+    sched_ls2.falu1_turn_p = true;
+
+  if (sched_ls2.falu1_turn_p)
+    {
+      if (state_transition (state, sched_ls2.falu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, sched_ls2.falu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+}
+
+/* Implement TARGET_SCHED_DFA_POST_ADVANCE_CYCLE.
+   This hook is being called at the start of each cycle.  */
+static void
+mips_dfa_post_advance_cycle (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    sched_ls2_dfa_post_advance_cycle (curr_state);
+}
+
 /* Implement TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD.  This should
    be as wide as the scheduling freedom in the DFA.  */
 
@@ -9722,6 +9866,9 @@ mips_multipass_dfa_lookahead (void)
   if (TUNE_SB1)
     return 4;
 
+  if (TUNE_LOONGSON_2EF)
+    return 4;
+
   return 0;
 }
 \f
@@ -9982,6 +10129,14 @@ mips_sched_init (FILE *file ATTRIBUTE_UN
   mips_macc_chains_last_hilo = 0;
   vr4130_last_insn = 0;
   mips_74k_agen_init (NULL_RTX);
+
+  if (TUNE_LOONGSON_2EF)
+    {
+      /* Branch instructions go to ALU1, therefore basic block is most likely
+ 	 to start with round-robin counter pointed to ALU2.  */
+      sched_ls2.alu1_turn_p = false;
+      sched_ls2.falu1_turn_p = true;
+    }
 }
 
 /* Implement TARGET_SCHED_REORDER and TARGET_SCHED_REORDER2.  */
@@ -10007,6 +10162,33 @@ mips_sched_reorder (FILE *file ATTRIBUTE
   return mips_issue_rate ();
 }
 
+/* Update round-robin counters for ALU1/2 and FALU1/2.  */
+static void
+mips_ls2_variable_issue (void)
+{
+  if (sched_ls2.alu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.alu1_core_unit_code))
+	sched_ls2.alu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.alu2_core_unit_code))
+	sched_ls2.alu1_turn_p = true;
+    }
+
+  if (sched_ls2.falu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.falu1_core_unit_code))
+	sched_ls2.falu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, sched_ls2.falu2_core_unit_code))
+	sched_ls2.falu1_turn_p = true;
+    }
+}
+
 /* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
 
 static int
@@ -10022,6 +10204,21 @@ mips_variable_issue (FILE *file ATTRIBUT
       vr4130_last_insn = insn;
       if (TUNE_74K)
 	mips_74k_agen_init (insn);
+      else if (TUNE_LOONGSON_2EF)
+	{
+	  mips_ls2_variable_issue ();
+
+	  if (recog_memoized (insn) >= 0)
+	    {
+	      sched_ls2.cycle_has_multi_p |= (get_attr_type (insn)
+					      == TYPE_MULTI);
+
+	      /* Instructions of type 'multi' should all be split before
+		 second scheduling pass.  */
+	      gcc_assert (!sched_ls2.cycle_has_multi_p
+			  || !reload_completed);
+	    }
+	}
     }
   return more;
 }
@@ -12835,6 +13032,10 @@ mips_expand_vector_init (rtx target, rtx
 #define TARGET_SCHED_ADJUST_COST mips_adjust_cost
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE mips_issue_rate
+#undef TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN
+#define TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN mips_init_dfa_post_cycle_insn
+#undef TARGET_SCHED_DFA_POST_ADVANCE_CYCLE
+#define TARGET_SCHED_DFA_POST_ADVANCE_CYCLE mips_dfa_post_advance_cycle
 #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
 #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
   mips_multipass_dfa_lookahead
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(revision 66)
+++ gcc/config/mips/mips.h	(working copy)
@@ -265,6 +265,8 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF1_1  \
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
+#define TUNE_LOONGSON_2EF           (mips_tune == PROCESSOR_LOONGSON_2E	\
+				     || mips_tune == PROCESSOR_LOONGSON_2F)
 
 /* Whether vector modes and intrinsics for ST Microelectronics
    Loongson-2E/2F processors should be enabled.  In o32 pairs of
@@ -908,10 +910,12 @@ enum mips_code_readable_setting {
 				 && !TARGET_MIPS16)
 
 /* Likewise mtc1 and mfc1.  */
-#define ISA_HAS_XFER_DELAY	(mips_isa <= 3)
+#define ISA_HAS_XFER_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* Likewise floating-point comparisons.  */
-#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3)
+#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* True if mflo and mfhi can be immediately followed by instructions
    which write to the HI and LO registers.
@@ -928,7 +932,8 @@ enum mips_code_readable_setting {
 #define ISA_HAS_HILO_INTERLOCKS	(ISA_MIPS32				\
 				 || ISA_MIPS32R2			\
 				 || ISA_MIPS64				\
-				 || TARGET_MIPS5500)
+				 || TARGET_MIPS5500			\
+				 || TARGET_LOONGSON_2EF)
 
 /* ISA includes synci, jr.hb and jalr.hb.  */
 #define ISA_HAS_SYNCI (ISA_MIPS32R2 && !TARGET_MIPS16)
@@ -3229,3 +3234,6 @@ extern const struct mips_cpu_info *mips_
 extern const struct mips_rtx_cost_data *mips_cost;
 extern enum mips_code_readable_setting mips_code_readable;
 #endif
+
+/* Enable querying of DFA units.  */
+#define CPU_UNITS_QUERY 1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-05-23  3:07   ` Zhang Le
@ 2008-05-23 13:17     ` Maxim Kuvyrkov
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-23 13:17 UTC (permalink / raw)
  To: Zhang Le; +Cc: Richard Sandiford, gcc-patches, Eric Fisher

Zhang Le wrote:
> On 21:57 Thu 22 May     , Maxim Kuvyrkov wrote:
>> Index: gcc/config/mips/loongson2ef.md
> 
> Hi, Maxim
> There are some duplicated parts in this file

The whole file is duplicated, that's just copy-paste-paste mistake. 
I'll repost the patch after the review.

--
Maxim

> 
>> ===================================================================
>> --- gcc/config/mips/loongson2ef.md	(revision 0)
>> +++ gcc/config/mips/loongson2ef.md	(revision 0)
>> @@ -0,0 +1,486 @@
>> +;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
>> +
>> +;; Copyright (C) 2008 Free Software Foundation, Inc.
>> +;; Contributed by CodeSourcery.
> 
> [...]
> 
>> +;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
>> +
>> +;; Copyright (C) 2008 Free Software Foundation, Inc.
>> +;; Contributed by CodeSourcery.
>> +;;
> 
> I got this error when trying this patch:
> /var/tmp/portage/sys-devel/gcc-4.3.0/work/gcc-4.3.0/gcc/config/mips/loongson2ef.md:338:
> duplicate definition for attribute ls2_turn_type
> /var/tmp/portage/sys-devel/gcc-4.3.0/work/gcc-4.3.0/gcc/config/mips/loongson2ef.md:95:
> previous definition
> 
> Regards,
> 
> Robert
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-05-22 18:22 ` [MIPS][LS2][4/5] Scheduling and tuning Maxim Kuvyrkov
  2008-05-23  3:07   ` Zhang Le
@ 2008-05-25 11:57   ` Richard Sandiford
  2008-06-12 13:45     ` Maxim Kuvyrkov
  1 sibling, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-05-25 11:57 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

[Silly thing, but I tried applying these patches, and was alerted
to trailing whitespace in:

    gcc/testsuite/gcc.target/mips/loongson-simd.c
    gcc/config/mips/loongson.md
    gcc/config/mips/loongson2ef.md

Please run them through your favourite trailing-whitespace remover
before applying.]

Anyway, this patch looks good, thanks.  A neat use of CPU querying ;)

My usual handful of mind-numbing nits follow...

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> +;; Pseudo units to enable/disable ls2_falu[12]_turn units.
> +;; ls2_falu[12]_turn unit can be subscribed only after
> +;; ls2_falu[12]_turn_enabled unit is subscribed.
> +(define_cpu_unit "ls2_falu1_turn_enabled,ls2_falu2_turn_enabled"
> +  "ls2_falu")

Needless line break (inconsistent with integer/core stuff).

> +;; Subscribe ls2_alu1_turn_enabled.
> +(define_insn "ls2_alu1_turn_enabled_insn"
> +  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN)]
> +  "TUNE_LOONGSON_2EF"
> +{
> +  gcc_unreachable ();
> +  return "";
> +}

Plain:

  { gcc_unreachable (); }

ought to be OK.  gcc_unreachable() is noreturn.

> @@ -301,8 +335,14 @@ struct machine_function GTY(()) {
>    /* True if we have emitted an instruction to initialize
>       mips16_gp_pseudo_rtx.  */
>    bool initialized_mips16_gp_pseudo_p;
> +
> +  /* Data used when scheduling for Loongson 2E/2F.  */
> +  struct sched_ls2_def _sched_ls2;
>  };
>  
> +/* A convenient shortcut.  */
> +#define sched_ls2 (cfun->machine->_sched_ls2)
> +

Hmm, we really shouldn't use _foo names.

I don't see why this has to be in machine_function anyway.  It's
pass-local, so why not just use a static sched_ls2 variable, like
we do for other scheduling stuff?

(I assume putting this in machine_function keeps the fake insns
around between sched1 and sched2, even though we create new insns
during the initialisation phase of sched2.)

Actually, make that "ls2_sched" or "mips_ls2_sched", for consistency
with other ISA- or processor-prefixed names.  Same with with the other
similar names in the patch; sometimes you've used "sched_ls2_foo"
and sometimes you've used "mips_ls2_foo".  "mips_ls2_foo" is fine.

> +/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook.
> +   Init data used in mips_dfa_post_advance_cycle.  */
> +static void
> +mips_init_dfa_post_cycle_insn (void)

Local style is to have a blank line before the function.
Other instances.

> +{
> +  if (TUNE_LOONGSON_2EF)
> +    {
> +      start_sequence ();
> +      emit_insn (gen_ls2_alu1_turn_enabled_insn ());
> +      sched_ls2.alu1_turn_enabled_insn = get_insns ();
> +      end_sequence ();
> +
> +      start_sequence ();
> +      emit_insn (gen_ls2_alu2_turn_enabled_insn ());
> +      sched_ls2.alu2_turn_enabled_insn = get_insns ();
> +      end_sequence ();
> +
> +      start_sequence ();
> +      emit_insn (gen_ls2_falu1_turn_enabled_insn ());
> +      sched_ls2.falu1_turn_enabled_insn = get_insns ();
> +      end_sequence ();
> +
> +      start_sequence ();
> +      emit_insn (gen_ls2_falu2_turn_enabled_insn ());
> +      sched_ls2.falu2_turn_enabled_insn = get_insns ();
> +      end_sequence ();
> +
> +      sched_ls2.alu1_core_unit_code = get_cpu_unit_code ("ls2_alu1_core");
> +      sched_ls2.alu2_core_unit_code = get_cpu_unit_code ("ls2_alu2_core");
> +      sched_ls2.falu1_core_unit_code = get_cpu_unit_code ("ls2_falu1_core");
> +      sched_ls2.falu2_core_unit_code = get_cpu_unit_code ("ls2_falu2_core");
> +    }

Please put the "if" body in a separate ls2_init_dfa_post_cycle_insn
function, for consistency with other hooks.  (You did this for
sched_ls2_dfa_post_advance_cycle, thanks.)

> @@ -9982,6 +10129,14 @@ mips_sched_init (FILE *file ATTRIBUTE_UN
>    mips_macc_chains_last_hilo = 0;
>    vr4130_last_insn = 0;
>    mips_74k_agen_init (NULL_RTX);
> +
> +  if (TUNE_LOONGSON_2EF)
> +    {
> +      /* Branch instructions go to ALU1, therefore basic block is most likely
> + 	 to start with round-robin counter pointed to ALU2.  */
> +      sched_ls2.alu1_turn_p = false;
> +      sched_ls2.falu1_turn_p = true;
> +    }

As you can see from the context, we initialise other schedulers'
information unconditionally.  That's amenable to change if you like,
but whatever we do, let's be consistent.

> @@ -10022,6 +10204,21 @@ mips_variable_issue (FILE *file ATTRIBUT
>        vr4130_last_insn = insn;
>        if (TUNE_74K)
>  	mips_74k_agen_init (insn);
> +      else if (TUNE_LOONGSON_2EF)
> +	{
> +	  mips_ls2_variable_issue ();
> +
> +	  if (recog_memoized (insn) >= 0)
> +	    {
> +	      sched_ls2.cycle_has_multi_p |= (get_attr_type (insn)
> +					      == TYPE_MULTI);
> +
> +	      /* Instructions of type 'multi' should all be split before
> +		 second scheduling pass.  */
> +	      gcc_assert (!sched_ls2.cycle_has_multi_p
> +			  || !reload_completed);
> +	    }
> +	}

Please put all the Loongson bits in mips_ls2_variable_issue.

I agree that the assert is a good thing, but I think we should require
it for all targets or none.  If we don't, it's too likely that someone
working on another target will accidentally break Loongson.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-05-22 18:29 ` [MIPS][LS2][5/5] Support for native MIPS GCC Maxim Kuvyrkov
@ 2008-05-25 12:02   ` Richard Sandiford
  2008-06-16 18:55     ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-05-25 12:02 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> This patch adds support for -march=native and -mtune=native options.
>
> File driver-st.c contains a routine that checks "cpu model" line in 
> /proc/cpuinfo and appends a proper option to compiler command line.
>
> This patch also specifies layout for Loongson multilibs.

OK, the obvious question here is: should the -march=native support
be specific to mips*-st*-linux* configurations, or should it apply
to all mips*-linux* configurations?  I can imagine you discussed
this internally; if you did, why settle on the former?

My gut feeling was that -march=native ought to be supported for all
mips*-linux* configurations, and that mips*-st*-linux* ought simply to
specify a particular selection of multilibs (and associated multilib
layout).  Thus driver-st.c would be called something more generic and
would be included by config/mips/linux.h.  We could then add other
names to the list as the need arises.

I think it would be worth adding a comment saying that, if we can't
detect a known processor, we simply discard the -march or -mtune option.
This is in contrast to x86, where we force a lowest common demoninator.
(For the record, I agree the behaviour you've got makes sense.)

You need to document the new option.

The implementation itself looks fine, thanks.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][1/5] Generic support
  2008-05-22 19:27   ` Richard Sandiford
@ 2008-05-26 13:47     ` Maxim Kuvyrkov
  2008-05-27 18:41       ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-26 13:47 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

Richard Sandiford wrote:
> Hi Maxim,
> 
> Thanks for the patches.  I've reviewed 1-3 so far; hope to get to
> 4 and 5 over the weekend.

Richard,

Thanks for the review.  I'll be posting updated patches as they are ready.

> 
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> This patch adds generic support for ST Loongson 2E/2F CPUs such as 
>> loongson2e and loongson2f values for -march= and mtune= options.
>>
>> OK for trunk?
> 
> As per the comment above mips_cpu_info_table:
> 
>    To ease comparison, please keep this table in the same order
>    as GAS's mips_cpu_info_table.  Please also make sure that
>    MIPS_ISA_LEVEL_SPEC and MIPS_ARCH_FLOAT_SPEC handle all -march
>    options correctly.  */

In GAS loongson entries are placed between entries for sb1a and octeon. 
  GCC port doesn't have an entry for octeon, but has for sr71000.  So I 
placed the loongson entries between entries for sb1a and sr71000.

Everything else fixed thusly.

--
Maxim

[-- Attachment #2: fsf-ls2ef-1-generic.ChangeLog --]
[-- Type: text/plain, Size: 555 bytes --]

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>

	* config/mips/mips.c (mips_cpu_info_table): Add loongson2e
	and loongson2f entries.
	(mips_rtx_cost_data): Add entries for Loongson-2E/2F.
	* config/mips/mips.h (processor_type): Add Loongson-2E
	and Loongson-2F entries.
	(TARGET_LOONGSON_2E, TARGET_LOONGSON_2F, TARGET_LOONGSON_2EF): New.
	(MIPS_ISA_LEVEL_SPEC): Handle Loongson-2E/2F.
	* config/mips/mips.md (define_attr cpu): Add loongson2e and loongson2f.
	* doc/invoke.texi (MIPS Options): Document loongson2e
	and loongson2f processor names.

[-- Attachment #3: fsf-ls2ef-1-generic.patch --]
[-- Type: text/plain, Size: 3996 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(.../gcc-trunk)	(revision 72)
+++ gcc/doc/invoke.texi	(.../gcc-1)	(revision 72)
@@ -11917,6 +11917,7 @@ The processor names are:
 @samp{24kec}, @samp{24kef2_1}, @samp{24kef1_1},
 @samp{34kc}, @samp{34kf2_1}, @samp{34kf1_1},
 @samp{74kc}, @samp{74kf2_1}, @samp{74kf1_1}, @samp{74kf3_2},
+@samp{loongson2e}, @samp{loongson2f},
 @samp{m4k},
 @samp{orion},
 @samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(.../gcc-trunk)	(revision 72)
+++ gcc/config/mips/mips.md	(.../gcc-1)	(revision 72)
@@ -415,7 +415,7 @@
 ;; Attribute describing the processor.  This attribute must match exactly
 ;; with the processor_type enumeration in mips.h.
 (define_attr "cpu"
-  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000"
+  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson2e,loongson2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000"
   (const (symbol_ref "mips_tune")))
 
 ;; The type of hardware hazard associated with this instruction.
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(.../gcc-trunk)	(revision 72)
+++ gcc/config/mips/mips.c	(.../gcc-1)	(revision 72)
@@ -641,6 +641,11 @@ static const struct mips_cpu_info mips_c
   { "20kc", PROCESSOR_20KC, 64, PTF_AVOID_BRANCHLIKELY },
   { "sb1", PROCESSOR_SB1, 64, PTF_AVOID_BRANCHLIKELY },
   { "sb1a", PROCESSOR_SB1A, 64, PTF_AVOID_BRANCHLIKELY },
+
+  /* ST Loongson 2E/2F processors.  */
+  { "loongson2e", PROCESSOR_LOONGSON_2E, 3, PTF_AVOID_BRANCHLIKELY },
+  { "loongson2f", PROCESSOR_LOONGSON_2F, 3, PTF_AVOID_BRANCHLIKELY },
+
   { "sr71000", PROCESSOR_SR71000, 64, PTF_AVOID_BRANCHLIKELY },
 };
 
@@ -1003,6 +1008,12 @@ static const struct mips_rtx_cost_data m
 		     1,           /* branch_cost */
 		     4            /* memory_latency */
   },
+  { /* Loongson-2E */
+    DEFAULT_COSTS
+  },
+  { /* Loongson-2F */
+    DEFAULT_COSTS
+  },
   { /* SR71000 */
     DEFAULT_COSTS
   },
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(.../gcc-trunk)	(revision 72)
+++ gcc/config/mips/mips.h	(.../gcc-1)	(revision 72)
@@ -47,6 +47,8 @@ enum processor_type {
   PROCESSOR_74KF2_1,
   PROCESSOR_74KF1_1,
   PROCESSOR_74KF3_2,
+  PROCESSOR_LOONGSON_2E,
+  PROCESSOR_LOONGSON_2F,
   PROCESSOR_M4K,
   PROCESSOR_R3900,
   PROCESSOR_R6000,
@@ -237,6 +239,9 @@ enum mips_code_readable_setting {
 #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
 				     || mips_arch == PROCESSOR_SB1A)
 #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
+#define TARGET_LOONGSON_2E          (mips_arch == PROCESSOR_LOONGSON_2E)
+#define TARGET_LOONGSON_2F          (mips_arch == PROCESSOR_LOONGSON_2F)
+#define TARGET_LOONGSON_2EF         (TARGET_LOONGSON_2E || TARGET_LOONGSON_2F)
 
 /* Scheduling target defines.  */
 #define TUNE_MIPS3000               (mips_tune == PROCESSOR_R3000)
@@ -646,7 +651,7 @@ enum mips_code_readable_setting {
   "%{" MIPS_ISA_LEVEL_OPTION_SPEC ":;: \
      %{march=mips1|march=r2000|march=r3000|march=r3900:-mips1} \
      %{march=mips2|march=r6000:-mips2} \
-     %{march=mips3|march=r4*|march=vr4*|march=orion:-mips3} \
+     %{march=mips3|march=r4*|march=vr4*|march=orion|march=loongson2*:-mips3} \
      %{march=mips4|march=r8000|march=vr5*|march=rm7000|march=rm9000:-mips4} \
      %{march=mips32|march=4kc|march=4km|march=4kp|march=4ksc:-mips32} \
      %{march=mips32r2|march=m4k|march=4ke*|march=4ksd|march=24k* \

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][1/5] Generic support
  2008-05-26 13:47     ` Maxim Kuvyrkov
@ 2008-05-27 18:41       ` Richard Sandiford
  2008-05-28 12:29         ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-05-27 18:41 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> In GAS loongson entries are placed between entries for sb1a and octeon. 
>   GCC port doesn't have an entry for octeon, but has for sr71000.  So I 
> placed the loongson entries between entries for sb1a and sr71000.

I think that's the wrong place.  In both GAS and GCC, processors
are supposed to be grouped by ISA level, so Loongson ought to go in
the MIPS III list.  I think you can change the GAS table as obvious.

OK with that change, thanks.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][1/5] Generic support
  2008-05-27 18:41       ` Richard Sandiford
@ 2008-05-28 12:29         ` Maxim Kuvyrkov
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-28 12:29 UTC (permalink / raw)
  To: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> In GAS loongson entries are placed between entries for sb1a and octeon. 
>>   GCC port doesn't have an entry for octeon, but has for sr71000.  So I 
>> placed the loongson entries between entries for sb1a and sr71000.
> 
> I think that's the wrong place.  In both GAS and GCC, processors
> are supposed to be grouped by ISA level, so Loongson ought to go in
> the MIPS III list.  I think you can change the GAS table as obvious.
> 
> OK with that change, thanks.

I posted a patch to binutils@ and checked in this patch (rev. 136071).


Thanks,

Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-05-22 19:35   ` Richard Sandiford
@ 2008-05-28 12:52     ` Maxim Kuvyrkov
  2008-05-28 18:20       ` Richard Sandiford
  2008-06-05 10:38     ` Maxim Kuvyrkov
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-05-28 12:52 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard,

Thanks for the review, I'll post an updated patch when it's finished. 
Below is a couple of notes.

> 
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Index: gcc/testsuite/lib/target-supports.exp
>> ===================================================================
>> --- gcc/testsuite/lib/target-supports.exp	(revision 62)
>> +++ gcc/testsuite/lib/target-supports.exp	(working copy)
>> @@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
>>      } "-mfpu=neon -mfloat-abi=softfp"]
>>  }
>>  
>> +# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
>> +# the Loongson vector modes.
>> +
>> +proc check_effective_target_mips_loongson { } {
>> +    return [check_no_compiler_messages loongson assembly {
>> +	#if !defined(_MIPS_LOONGSON_VECTOR_MODES)
>> +	#error FOO
>> +	#endif
>> +    }]
>> +}
>> +
> 
> I think this is a poor choice of name for a user-visible macro.  "modes"
> are an internal gcc concept, and your .h-based API shields the user from
> the "__attribute__"s needed to construct the types.

Does "_MIPS_LOONGSON_VECTOR_INSNS" look good?

> 
>> +;; Move patterns.
>> +
>> +;; Expander to legitimize moves involving values of vector modes.
>> +(define_expand "mov<mode>"
>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand")
>> +	(match_operand:VWHB 1 "move_operand"))]
>> +  ""
>> +{
>> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
>> +    DONE;
>> +})
> 
> Hmm.  This is probably going to cause problems if other ASEs use
> the same modes in future, but I guess this is OK until then.
> 
> Local style is not to have predicates for move expanders.
> The predicates aren't checked, and I think it's confusing
> to have an expander with "move_operand" as its predicate,
> and to then call a function (mips_legitimize_move) that
> deals with non-move_operands.  So:
> 
>   [(set (match_operand:VWHB 0)
> 	(match_operand:VWHB 1)]

I don't mean to challenge local style, but it is valid and useful to 
specify predicates for operands in define_expand as well as for the 
whole expand.  At least, this is what GCC Internals say:

<sic>
14.15 Defining RTL Sequences for Code Generation

...

<In the context of define_expand, not define_insn>:
The RTL template, in addition to controlling generation of RTL insns, 
also describes the operands that need to be specified when this pattern 
is used. In particular, it gives a predicate for each operand.

A true operand, which needs to be specified in order to generate RTL 
from the pattern, should be described with a match_operand in its first 
occurrence in the RTL template. This enters information on the operandÂ’s 
predicate into the tables that record such things. GCC uses the 
information to preload the operand into a register if that is required 
for valid RTL code. If the operand is referred to more than once, 
subsequent references should use match_dup.
</sic>

Am I missing something?


--
Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-05-28 12:52     ` Maxim Kuvyrkov
@ 2008-05-28 18:20       ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-05-28 18:20 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>> Index: gcc/testsuite/lib/target-supports.exp
>>> ===================================================================
>>> --- gcc/testsuite/lib/target-supports.exp	(revision 62)
>>> +++ gcc/testsuite/lib/target-supports.exp	(working copy)
>>> @@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
>>>      } "-mfpu=neon -mfloat-abi=softfp"]
>>>  }
>>>  
>>> +# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
>>> +# the Loongson vector modes.
>>> +
>>> +proc check_effective_target_mips_loongson { } {
>>> +    return [check_no_compiler_messages loongson assembly {
>>> +	#if !defined(_MIPS_LOONGSON_VECTOR_MODES)
>>> +	#error FOO
>>> +	#endif
>>> +    }]
>>> +}
>>> +
>> 
>> I think this is a poor choice of name for a user-visible macro.  "modes"
>> are an internal gcc concept, and your .h-based API shields the user from
>> the "__attribute__"s needed to construct the types.
>
> Does "_MIPS_LOONGSON_VECTOR_INSNS" look good?

How about "__mips_loongson_vector_rev"?  We could then define it
to 2 or more if future iterations have extra capabilities.
I'm open to other suggestions if you don't like that.

>>> +;; Move patterns.
>>> +
>>> +;; Expander to legitimize moves involving values of vector modes.
>>> +(define_expand "mov<mode>"
>>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand")
>>> +	(match_operand:VWHB 1 "move_operand"))]
>>> +  ""
>>> +{
>>> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
>>> +    DONE;
>>> +})
>> 
>> Hmm.  This is probably going to cause problems if other ASEs use
>> the same modes in future, but I guess this is OK until then.
>> 
>> Local style is not to have predicates for move expanders.
>> The predicates aren't checked, and I think it's confusing
>> to have an expander with "move_operand" as its predicate,
>> and to then call a function (mips_legitimize_move) that
>> deals with non-move_operands.  So:
>> 
>>   [(set (match_operand:VWHB 0)
>> 	(match_operand:VWHB 1)]
>
> I don't mean to challenge local style, but it is valid and useful to 
> specify predicates for operands in define_expand as well as for the 
> whole expand.  At least, this is what GCC Internals say:
> [...]
> Am I missing something?

Well, like I say, predicates on move expanders aren't checked,
so using them in the way you did doesn't really achieve anything.
The move expander above will still get passed destinations that
aren't nonimmediate_operands and sources that aren't move_operands.

And that's how it has to be.  The usual way of enforcing a predicate
source is:

   if (!insn_data...predicate (foo, mode))
     foo = force_reg (mode, foo);

but doing this for move expanders wouldn't work, because the
force_reg would then need to move the rejected value of FOO
into the new register.  We wouldn't make progress.  The same
thing applies to destination operands.

(Forcing constants to memory is another way of making operands match,
but that's handled by LEGITIMATE_CONSTANT_P instead.)

There's also reload to consider.  Quoting from the movMM
documentation in md.texi:

---------------------------------------------------------------------
During reload a memory reference with an invalid address may be passed
as an operand.  Such an address will be replaced with a valid address
later in the reload pass.  In this case, nothing may be done with the
address except to use it as it stands.  If it is copied, it will not be
replaced with a valid address.  No attempt should be made to make such
an address into a valid address and no routine (such as
@code{change_address}) that will do so may be called.  Note that
@code{general_operand} will fail when applied to such an address.
---------------------------------------------------------------------

(with emphasis on the last line).

Also, one of mips_legitimize_move's jobs is to handle source values
that are general_operands but not move_operands.

My attitude is that, if folk see predicates on the move expanders,
they'll assume that those predicates are enforced.  I think it's
less confusing if we don't add them, even if they would only be
there for "decorative" purposes.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-05-22 19:35   ` Richard Sandiford
  2008-05-28 12:52     ` Maxim Kuvyrkov
@ 2008-06-05 10:38     ` Maxim Kuvyrkov
  2008-06-05 16:16       ` Richard Sandiford
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-05 10:38 UTC (permalink / raw)
  To: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 9029 bytes --]

Richard Sandiford wrote:
> The patch generally looks good.

Hi Richard,

Sorry for the delay, I needed time to refactor the patch to fix issues 
you pointed out.  I've mostly changed parts of the patch for mips.md and 
mips.c.  I think they look much cleaner now.  New changes are at the 
beginning of the patch.

The patch was regtested on gcc, g++ and libstdc++ testsuites for n32, 
o32 and 64 ABIs with and without -march=loongson2?.

The only new FAIL was gcc.dg/tree-ssa/gen-vect-11c.c when 
-march=loongson2? is used.  This testcase should be fixed to either 
XFAIL when compiled for loongson or to disable loongson vector 
instructions (the switch we don't have).

> My main concern is the FPR move handling.
> It looks like you use MOV.D for 64-bit integer moves, but this is usually
> incorrect.  In the standard ISA spec, MOV.D is unpredictable unless
> (a) the source is uninterpreted or (b) it has been interpreted as a
> double-precision floating-point value.
> 
> So: does Loongson specifically exempt itself from this restriction?
> Or does it have special MOV.FOO instructions for the new modes?
> 
> Either way, the patch is inconsistent.  mips_mode_ok_for_mov_fmt_p
> should return true for any mode that can/will be handled by MOV.FMT.
> 
> I don't understand why you need FPR<->FPR DImode moves for 32-bit
> targets but not 64-bit targets.  (movdi_64bit doesn't have FPR<->FPR
> moves either.)

Loongson behaves as generic MIPS III in the respect to moves to and from 
FP registers.  So to handle new modes I added them to MOVE64 and SPLITF 
mode_iterators and adjusted mips_split_doubleword_move() accordingly.

To handle new vector types I had to change mips_builtin_vector_type() to 
distinguish between signed and unsigned basic types as new modes are 
used both for signed and unsigned cases.

> 
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Index: gcc/testsuite/lib/target-supports.exp
>> ===================================================================
>> --- gcc/testsuite/lib/target-supports.exp	(revision 62)
>> +++ gcc/testsuite/lib/target-supports.exp	(working copy)
>> @@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
>>      } "-mfpu=neon -mfloat-abi=softfp"]
>>  }
>>  
>> +# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
>> +# the Loongson vector modes.
>> +
>> +proc check_effective_target_mips_loongson { } {
>> +    return [check_no_compiler_messages loongson assembly {
>> +	#if !defined(_MIPS_LOONGSON_VECTOR_MODES)
>> +	#error FOO
>> +	#endif
>> +    }]
>> +}
>> +
> 
> I think this is a poor choice of name for a user-visible macro.  "modes"
> are an internal gcc concept, and your .h-based API shields the user from
> the "__attribute__"s needed to construct the types.

__mips_loongson_vector_rev it is.

> 
>> +;; Move patterns.
>> +
>> +;; Expander to legitimize moves involving values of vector modes.
>> +(define_expand "mov<mode>"
>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand")
>> +	(match_operand:VWHB 1 "move_operand"))]
>> +  ""
>> +{
>> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
>> +    DONE;
>> +})
> 
> Hmm.  This is probably going to cause problems if other ASEs use
> the same modes in future, but I guess this is OK until then.
> 
> Local style is not to have predicates for move expanders.
> The predicates aren't checked, and I think it's confusing
> to have an expander with "move_operand" as its predicate,
> and to then call a function (mips_legitimize_move) that
> deals with non-move_operands.  So:
> 
>   [(set (match_operand:VWHB 0)
> 	(match_operand:VWHB 1)]

Fixed.

> 
>> +;; Handle legitimized moves between values of vector modes.
>> +(define_insn "mov<mode>_internal"
>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,r,f,r,m,f")
>> +	(match_operand:VWHB 1 "move_operand" "f,m,f,f,r,r,YG,YG"))]
> 
> "d" rather than "r".

Fixed.

> 
>> +  "HAVE_LOONGSON_VECTOR_MODES"
>> +{
>> +  return mips_output_move (operands[0], operands[1]);
>> +}
> 
> Local style is to format single-line C blocks as:
> 
>   "HAVE_LOONGSON_VECTOR_MODES"
>   { return mips_output_move (operands[0], operands[1]); }
> 
>> +  [(set_attr "type" "fpstore,fpload,*,mfc,mtc,*,fpstore,mtc")
> 
> "type" shouldn't be "*", but you fixed this in patch 4.
> Please include this fix, and the other type attributes,
> in the original loongson.md patch.

Fixed.

> 
>> +   (set_attr "mode" "<MODE>")])

"mode" set to "DI".

>> +
>> +;; Initialization of a vector.
>> +
>> +(define_expand "vec_init<mode>"
>> +  [(set (match_operand:VWHB 0 "register_operand" "=f")
>> +	(match_operand 1 "" ""))]
>> +  "HAVE_LOONGSON_VECTOR_MODES"
>> +  {
>> +    mips_expand_vector_init (operands[0], operands[1]);
>> +    DONE;
>> +  }
>> +)
> 
> Expanders shouldn't have constraints.  Inconsistent formatting wrt
> previous patterns (which followed local style):
> 
> (define_expand "vec_init<mode>"
>   [(set (match_operand:VWHB 0 "register_operand")
> 	(match_operand 1))]
>   "HAVE_LOONGSON_VECTOR_MODES"
> {
>   mips_expand_vector_init (operands[0], operands[1]);
>   DONE;
> })

Fixed.

> 
>> +;; Instruction patterns for SIMD instructions.
>> +
>> +;; Pack with signed saturation.
>> +(define_insn "vec_pack_ssat_<mode>"
>> +  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
>> +        (vec_concat:<V_squash_double>
>> +	  (ss_truncate:<V_squash> (match_operand:VWH 1 "register_operand" "f"))
>> +          (ss_truncate:<V_squash> (match_operand:VWH 2 "register_operand" "f")))
>> +	
>> +   )]
>> +  "HAVE_LOONGSON_VECTOR_MODES"
>> +  "packss<V_squash_double_suffix>\t%0,%1,%2"
>> +)
> 
> Inconsistent indentation (tabs vs. spaces by the looks of things).
> Inconsistent position for closing ")" (which you fixed in patch 4).
> 
> In general, local style is to put ")" and "]" on the same line as the
> thing they're closing, even if it means breaking a line.  So:
> 
> (define_insn "vec_pack_ssat_<mode>"
>   [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
> 	(vec_concat:<V_squash_double>
> 	  (ss_truncate:<V_squash>
> 	    (match_operand:VWH 1 "register_operand" "f"))
> 	  (ss_truncate:<V_squash>
> 	    (match_operand:VWH 2 "register_operand" "f"))))]
>   "HAVE_LOONGSON_VECTOR_MODES"
>   "packss<V_squash_double_suffix>\t%0,%1,%2")
> 
> Other instances.

Fixed.

> 
>> @@ -494,7 +516,10 @@
>>  
>>  ;; 64-bit modes for which we provide move patterns.
>>  (define_mode_iterator MOVE64
>> -  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
>> +  [(DI "!TARGET_64BIT") (DF "!TARGET_64BIT")
>> +   (V2SF "!TARGET_64BIT && TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
>> +   (V2SI "HAVE_LOONGSON_VECTOR_MODES") (V4HI "HAVE_LOONGSON_VECTOR_MODES")
>> +   (V8QI "HAVE_LOONGSON_VECTOR_MODES")])
> 
> Since we need more than one line, put V2SF and each new entry on its
> own line.  The changes to the existing modes aren't right; they aren't
> consistent with the comment.

Fixed.  Turned out the changes to existing modes weren't necessary at all.

> 
>> Index: gcc/config/mips/mips.c
>> ===================================================================
>> --- gcc/config/mips/mips.c	(revision 62)
>> +++ gcc/config/mips/mips.c	(working copy)
>> @@ -3518,6 +3518,23 @@ mips_output_move (rtx dest, rtx src)
>>    if (dbl_p && mips_split_64bit_move_p (dest, src))
>>      return "#";
>>  
>> +  /* Handle cases where the source is a constant zero vector on
>> +     Loongson targets.  */
>> +  if (HAVE_LOONGSON_VECTOR_MODES && src_code == CONST_VECTOR)
>> +    {
>> +      if (dest_code == REG)
>> +	{
>> +	  /* Move constant zero vector to floating-point register.  */
>> +	  gcc_assert (FP_REG_P (REGNO (dest)));
>> +	  return "dmtc1\t$0,%0";
>> +	}
>> +      else if (dest_code == MEM)
>> +	/* Move constant zero vector to memory.  */
>> +	return "sd\t$0,%0";
>> +      else
>> +	gcc_unreachable ();
>> +    }
>> +
> 
> Why doesn't the normal zero handling work?

Don't know.  I removed this piece and everything worked.

> 
>> +/* Initialize vector TARGET to VALS.  */
>> +
>> +void
>> +mips_expand_vector_init (rtx target, rtx vals)
>> +{
>> +  enum machine_mode mode = GET_MODE (target);
>> +  enum machine_mode inner = GET_MODE_INNER (mode);
>> +  unsigned int i, n_elts = GET_MODE_NUNITS (mode);
>> +  rtx mem;
>> +
>> +  gcc_assert (VECTOR_MODE_P (mode));
>> +
>> +  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
>> +  for (i = 0; i < n_elts; i++)
>> +    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
>> +                    XVECEXP (vals, 0, i));
>> +
>> +  emit_move_insn (target, mem);
>> +}
> 
> Please keep initialisation and code separate.

Fixed.

> 
> Do we really want to create a new stack slot for every initialisation?
> It seems on the face of it that some sort of reuse would be nice.

I didn't address this last issue.  What I don't understand is how we can 
reuse stack slots given that the accesses to different variables can 
easily step on each others toes.


--
Maxim


[-- Attachment #2: fsf-ls2ef-2-vector.ChangeLog --]
[-- Type: text/plain, Size: 1738 bytes --]

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>
	    Nathan Sidwell  <nathan@codesourcery.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	
	* config/mips/mips-modes.def: Add V8QI, V4HI and V2SI modes.
	* config/mips/mips-protos.h (mips_expand_vector_init): New.
	* config/mips/mips-ftypes.def: Add function types for Loongson-2E/2F
	builtins.
	* config/mips/mips.c (mips_split_doubleword_move): Handle new modes.
	(mips_hard_regno_mode_ok_p): Allow 64-bit vector modes for Loongson.
	(mips_vector_mode_supported_p): Add V2SImode, V4HImode and
	V8QImode cases.
	(LOONGSON_BUILTIN): New.
	(mips_loongson_2ef_bdesc): New.
	(mips_bdesc_arrays): Add mips_loongson_2ef_bdesc.
	(mips_builtin_vector_type): Handle unsigned versions of vector modes.
	Add new parameter for that.
	(MIPS_ATYPE_UQI, MIPS_ATYPE_UDI, MIPS_ATYPE_V2SI, MIPS_ATYPE_UV2SI)
	(MIPS_ATYPE_V4HI, MIPS_ATYPE_UV4HI, MIPS_ATYPE_V8QI, MIPS_ATYPE_UV8QI):
	New.
	(mips_init_builtins): Initialize Loongson builtins if
	appropriate.
	(mips_expand_vector_init): New.
	* config/mips/mips.h (HAVE_LOONGSON_VECTOR_MODES): New.
	(TARGET_CPU_CPP_BUILTINS): Define __mips_loongson_vector_rev
	if appropriate.
	* config/mips/mips.md: Add unspec numbers for Loongson
	builtins.  Include loongson.md.
	(MOVE64): Include Loongson vector modes.
	(SPLITF): Include Loongson vector modes.
	(HALFMODE): Handle Loongson vector modes.
	* config/mips/loongson.md: New.
	* config/mips/loongson.h: New.
	* config.gcc: Add loongson.h header for mips*-*-* targets.
	* doc/extend.texi (MIPS Loongson Built-in Functions): New.

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>

	* lib/target-supports.exp (check_effective_target_mips_loongson): New.
	* gcc.target/mips/loongson-simd.c: New.

[-- Attachment #3: fsf-ls2ef-2-vector.patch --]
[-- Type: text/plain, Size: 119638 bytes --]

--- config/mips/mips.md	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips.md	(/local/gcc-2/gcc)	(revision 373)
@@ -213,6 +213,28 @@
    (UNSPEC_DPAQX_SA_W_PH	446)
    (UNSPEC_DPSQX_S_W_PH		447)
    (UNSPEC_DPSQX_SA_W_PH	448)
+
+   ;; ST Microelectronics Loongson-2E/2F.
+   (UNSPEC_LOONGSON_AVERAGE		500)
+   (UNSPEC_LOONGSON_EQ			501)
+   (UNSPEC_LOONGSON_GT			502)
+   (UNSPEC_LOONGSON_EXTRACT_HALFWORD	503)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_0	504)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_1	505)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_2	506)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_3	507)
+   (UNSPEC_LOONGSON_MULT_ADD		508)
+   (UNSPEC_LOONGSON_MOVE_BYTE_MASK	509)
+   (UNSPEC_LOONGSON_UMUL_HIGHPART	510)
+   (UNSPEC_LOONGSON_SMUL_HIGHPART	511)
+   (UNSPEC_LOONGSON_SMUL_LOWPART	512)
+   (UNSPEC_LOONGSON_UMUL_WORD		513)
+   (UNSPEC_LOONGSON_PASUBUB             514)
+   (UNSPEC_LOONGSON_BIADD		515)
+   (UNSPEC_LOONGSON_PSADBH		516)
+   (UNSPEC_LOONGSON_PSHUFH		517)
+   (UNSPEC_LOONGSON_UNPACK_HIGH		518)
+   (UNSPEC_LOONGSON_UNPACK_LOW		519)
   ]
 )
 
@@ -494,7 +516,11 @@
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
-  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
+  [DI DF
+   (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "HAVE_LOONGSON_VECTOR_MODES")
+   (V4HI "HAVE_LOONGSON_VECTOR_MODES")
+   (V8QI "HAVE_LOONGSON_VECTOR_MODES")])
 
 ;; 128-bit modes for which we provide move patterns on 64-bit targets.
 (define_mode_iterator MOVE128 [TF])
@@ -521,6 +547,9 @@
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (V2SF "!TARGET_64BIT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
+   (V4HI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
+   (V8QI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
    (TF "TARGET_64BIT && TARGET_FLOAT64")])
 
 ;; In GPR templates, a string like "<d>subu" will expand to "subu" in the
@@ -573,7 +602,9 @@
 
 ;; This attribute gives the integer mode that has half the size of
 ;; the controlling mode.
-(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")])
+(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI")
+                            (V2SI "SI") (V4HI "SI") (V8QI "SI")
+			    (TF "DI")])
 
 ;; This attribute works around the early SB-1 rev2 core "F2" erratum:
 ;;
@@ -6406,3 +6437,6 @@
 
 ; MIPS fixed-point instructions.
 (include "mips-fixed.md")
+
+; ST-Microelectronics Loongson-2E/2F-specific patterns.
+(include "loongson.md")
--- config/mips/mips.c	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips.c	(/local/gcc-2/gcc)	(revision 373)
@@ -3531,6 +3531,12 @@ mips_split_doubleword_move (rtx dest, rt
 	emit_insn (gen_move_doubleword_fprdf (dest, src));
       else if (!TARGET_64BIT && GET_MODE (dest) == V2SFmode)
 	emit_insn (gen_move_doubleword_fprv2sf (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V2SImode)
+	emit_insn (gen_move_doubleword_fprv2si (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V4HImode)
+	emit_insn (gen_move_doubleword_fprv4hi (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V8QImode)
+	emit_insn (gen_move_doubleword_fprv8qi (dest, src));
       else if (TARGET_64BIT && GET_MODE (dest) == TFmode)
 	emit_insn (gen_move_doubleword_fprtf (dest, src));
       else
@@ -8922,6 +8928,14 @@ mips_hard_regno_mode_ok_p (unsigned int 
       if (mode == TFmode && ISA_HAS_8CC)
 	return true;
 
+      /* Allow 64-bit vector modes for Loongson-2E/2F.  */
+      if (HAVE_LOONGSON_VECTOR_MODES
+	  && (mode == V2SImode
+	      || mode == V4HImode
+	      || mode == V8QImode
+	      || mode == DImode))
+	return true;
+
       if (class == MODE_FLOAT
 	  || class == MODE_COMPLEX_FLOAT
 	  || class == MODE_VECTOR_FLOAT)
@@ -9268,6 +9282,11 @@ mips_vector_mode_supported_p (enum machi
     case V4UQQmode:
       return TARGET_DSP;
 
+    case V2SImode:
+    case V4HImode:
+    case V8QImode:
+      return HAVE_LOONGSON_VECTOR_MODES;
+
     default:
       return false;
     }
@@ -10388,6 +10407,213 @@ static const struct mips_builtin_descrip
   DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2)
 };
 
+/* Define a Loongson MIPS_BUILTIN_DIRECT function for instruction
+   CODE_FOR_mips_<INSN>.  FUNCTION_TYPE and TARGET_FLAGS are
+   builtin_description fields.  */
+#define LOONGSON_BUILTIN(FN_NAME, INSN, FUNCTION_TYPE)		\
+  { CODE_FOR_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
+    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, 0 }
+
+/* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
+static const struct mips_builtin_description mips_loongson_2ef_bdesc [] =
+{
+  /* Pack with signed saturation.  */
+  LOONGSON_BUILTIN (packsswh, vec_pack_ssat_v2si,
+                    MIPS_V4HI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (packsshb, vec_pack_ssat_v4hi,
+                    MIPS_V8QI_FTYPE_V4HI_V4HI),
+  /* Pack with unsigned saturation.  */
+  LOONGSON_BUILTIN (packushb, vec_pack_usat_v4hi,
+                    MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
+  /* Vector addition, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddw_u, addv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (paddh_u, addv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddb_u, addv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (paddw_s, addv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (paddh_s, addv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddb_s, addv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Addition of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddd_u, paddd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (paddd_s, paddd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector addition, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (paddsh, ssaddv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddsb, ssaddv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector addition, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (paddush, usaddv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddusb, usaddv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Logical AND NOT.  */
+  LOONGSON_BUILTIN (pandn_ud, loongson_and_not_di, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (pandn_uw, loongson_and_not_v2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pandn_uh, loongson_and_not_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pandn_ub, loongson_and_not_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pandn_sd, loongson_and_not_di, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (pandn_sw, loongson_and_not_v2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pandn_sh, loongson_and_not_v4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pandn_sb, loongson_and_not_v8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Average.  */
+  LOONGSON_BUILTIN (pavgh, loongson_average_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pavgb, loongson_average_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Equality test.  */
+  LOONGSON_BUILTIN (pcmpeqw_u, loongson_eq_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpeqh_u, loongson_eq_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpeqb_u, loongson_eq_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpeqw_s, loongson_eq_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpeqh_s, loongson_eq_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpeqb_s, loongson_eq_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Greater-than test.  */
+  LOONGSON_BUILTIN (pcmpgtw_u, loongson_gt_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpgth_u, loongson_gt_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpgtb_u, loongson_gt_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpgtw_s, loongson_gt_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpgth_s, loongson_gt_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpgtb_s, loongson_gt_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Extract halfword.  */
+  LOONGSON_BUILTIN (pextrh_u, loongson_extract_halfword,
+  		    MIPS_UV4HI_FTYPE_UV4HI_USI),
+  LOONGSON_BUILTIN (pextrh_s, loongson_extract_halfword,
+  		    MIPS_V4HI_FTYPE_V4HI_USI),
+  /* Insert halfword.  */
+  LOONGSON_BUILTIN (pinsrh_0_u, loongson_insert_halfword_0,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_1_u, loongson_insert_halfword_1,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_2_u, loongson_insert_halfword_2,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_3_u, loongson_insert_halfword_3,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_0_s, loongson_insert_halfword_0,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_1_s, loongson_insert_halfword_1,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_2_s, loongson_insert_halfword_2,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_3_s, loongson_insert_halfword_3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply and add.  */
+  LOONGSON_BUILTIN (pmaddhw, loongson_mult_add,
+  		    MIPS_V2SI_FTYPE_V4HI_V4HI),
+  /* Maximum of signed halfwords.  */
+  LOONGSON_BUILTIN (pmaxsh, smaxv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Maximum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pmaxub, umaxv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Minimum of signed halfwords.  */
+  LOONGSON_BUILTIN (pminsh, sminv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Minimum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pminub, uminv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Move byte mask.  */
+  LOONGSON_BUILTIN (pmovmskb_u, loongson_move_byte_mask,
+  		    MIPS_UV8QI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN (pmovmskb_s, loongson_move_byte_mask,
+  		    MIPS_V8QI_FTYPE_V8QI),
+  /* Multiply unsigned integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhuh, umulv4hi3_highpart,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  /* Multiply signed integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhh, smulv4hi3_highpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply signed integers and store low result.  */
+  LOONGSON_BUILTIN (pmullh, loongson_smul_lowpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply unsigned word integers.  */
+  LOONGSON_BUILTIN (pmuluw, loongson_umul_word,
+  		    MIPS_UDI_FTYPE_UV2SI_UV2SI),
+  /* Absolute difference.  */
+  LOONGSON_BUILTIN (pasubub, loongson_pasubub,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Sum of unsigned byte integers.  */
+  LOONGSON_BUILTIN (biadd, reduc_uplus_v8qi,
+		    MIPS_UV4HI_FTYPE_UV8QI),
+  /* Sum of absolute differences.  */
+  LOONGSON_BUILTIN (psadbh, loongson_psadbh,
+  		    MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
+  /* Shuffle halfwords.  */
+  LOONGSON_BUILTIN (pshufh_u, loongson_pshufh,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
+  LOONGSON_BUILTIN (pshufh_s, loongson_pshufh,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
+  /* Shift left logical.  */
+  LOONGSON_BUILTIN (psllh_u, loongson_psllv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psllh_s, loongson_psllv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psllw_u, loongson_psllv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psllw_s, loongson_psllv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right arithmetic.  */
+  LOONGSON_BUILTIN (psrah_u, loongson_psrav4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrah_s, loongson_psrav4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psraw_u, loongson_psrav2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psraw_s, loongson_psrav2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right logical.  */
+  LOONGSON_BUILTIN (psrlh_u, loongson_psrlv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrlh_s, loongson_psrlv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psrlw_u, loongson_psrlv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psrlw_s, loongson_psrlv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Vector subtraction, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubw_u, subv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (psubh_u, subv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubb_u, subv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (psubw_s, subv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (psubh_s, subv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubb_s, subv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Subtraction of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubd_u, psubd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (psubd_s, psubd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector subtraction, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (psubsh, sssubv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubsb, sssubv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector subtraction, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (psubush, ussubv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubusb, ussubv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Unpack high data.  */
+  LOONGSON_BUILTIN (punpckhbh_u, vec_interleave_highv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpckhhw_u, vec_interleave_highv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpckhwd_u, vec_interleave_highv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpckhbh_s, vec_interleave_highv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpckhhw_s, vec_interleave_highv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpckhwd_s, vec_interleave_highv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  /* Unpack low data.  */
+  LOONGSON_BUILTIN (punpcklbh_u, vec_interleave_lowv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpcklhw_u, vec_interleave_lowv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpcklwd_u, vec_interleave_lowv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpcklbh_s, vec_interleave_lowv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpcklhw_s, vec_interleave_lowv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpcklwd_s, vec_interleave_lowv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI)
+};
+
 /* This structure describes an array of mips_builtin_description entries.  */
 struct mips_bdesc_map {
   /* The array that this entry describes.  */
@@ -10411,20 +10637,29 @@ static const struct mips_bdesc_map mips_
   { mips_sb1_bdesc, ARRAY_SIZE (mips_sb1_bdesc), PROCESSOR_SB1, 0 },
   { mips_dsp_bdesc, ARRAY_SIZE (mips_dsp_bdesc), PROCESSOR_MAX, 0 },
   { mips_dsp_32only_bdesc, ARRAY_SIZE (mips_dsp_32only_bdesc),
-    PROCESSOR_MAX, MASK_64BIT }
+    PROCESSOR_MAX, MASK_64BIT },
+  { mips_loongson_2ef_bdesc, ARRAY_SIZE (mips_loongson_2ef_bdesc),
+    PROCESSOR_MAX, 0 }
 };
 
-/* MODE is a vector mode whose elements have type TYPE.  Return the type
-   of the vector itself.  */
+/* MODE is a vector mode whose elements have type TYPE.
+   TYPE is signed or unsigned depending on UNSIGNED_P.
+   Return the type of the vector itself.  */
 
 static tree
-mips_builtin_vector_type (tree type, enum machine_mode mode)
+mips_builtin_vector_type (tree type, enum machine_mode mode, bool unsigned_p)
 {
-  static tree types[(int) MAX_MACHINE_MODE];
+  static tree types[2 * (int) MAX_MACHINE_MODE];
+  int mode_index;
+
+  mode_index = (int) mode;
 
-  if (types[(int) mode] == NULL_TREE)
-    types[(int) mode] = build_vector_type_for_mode (type, mode);
-  return types[(int) mode];
+  if (unsigned_p)
+    mode_index += MAX_MACHINE_MODE;
+
+  if (types[mode_index] == NULL_TREE)
+    types[mode_index] = build_vector_type_for_mode (type, mode);
+  return types[mode_index];
 }
 
 /* Source-level argument types.  */
@@ -10433,16 +10668,33 @@ mips_builtin_vector_type (tree type, enu
 #define MIPS_ATYPE_POINTER ptr_type_node
 
 /* Standard mode-based argument types.  */
+#define MIPS_ATYPE_UQI unsigned_intQI_type_node
 #define MIPS_ATYPE_SI intSI_type_node
 #define MIPS_ATYPE_USI unsigned_intSI_type_node
 #define MIPS_ATYPE_DI intDI_type_node
+#define MIPS_ATYPE_UDI unsigned_intDI_type_node
 #define MIPS_ATYPE_SF float_type_node
 #define MIPS_ATYPE_DF double_type_node
 
 /* Vector argument types.  */
-#define MIPS_ATYPE_V2SF mips_builtin_vector_type (float_type_node, V2SFmode)
-#define MIPS_ATYPE_V2HI mips_builtin_vector_type (intHI_type_node, V2HImode)
-#define MIPS_ATYPE_V4QI mips_builtin_vector_type (intQI_type_node, V4QImode)
+#define MIPS_ATYPE_V2SF						\
+  mips_builtin_vector_type (float_type_node, V2SFmode, false)
+#define MIPS_ATYPE_V2HI						\
+  mips_builtin_vector_type (intHI_type_node, V2HImode, false)
+#define MIPS_ATYPE_V2SI						\
+  mips_builtin_vector_type (intSI_type_node, V2SImode, false)
+#define MIPS_ATYPE_V4QI						\
+  mips_builtin_vector_type (intQI_type_node, V4QImode, false)
+#define MIPS_ATYPE_V4HI						\
+  mips_builtin_vector_type (intHI_type_node, V4HImode, false)
+#define MIPS_ATYPE_V8QI						\
+  mips_builtin_vector_type (intQI_type_node, V8QImode, false)
+#define MIPS_ATYPE_UV2SI						\
+  mips_builtin_vector_type (unsigned_intSI_type_node, V2SImode, true)
+#define MIPS_ATYPE_UV4HI						\
+  mips_builtin_vector_type (unsigned_intHI_type_node, V4HImode, true)
+#define MIPS_ATYPE_UV8QI						\
+  mips_builtin_vector_type (unsigned_intQI_type_node, V8QImode, true)
 
 /* MIPS_FTYPE_ATYPESN takes N MIPS_FTYPES-like type codes and lists
    their associated MIPS_ATYPEs.  */
@@ -10500,10 +10752,14 @@ mips_init_builtins (void)
        m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
        m++)
     {
+      bool loongson_p = (m->bdesc == mips_loongson_2ef_bdesc);
+
       if ((m->proc == PROCESSOR_MAX || m->proc == mips_arch)
-	  && (m->unsupported_target_flags & target_flags) == 0)
+ 	  && (m->unsupported_target_flags & target_flags) == 0
+ 	  && (!loongson_p || HAVE_LOONGSON_VECTOR_MODES))
 	for (d = m->bdesc; d < &m->bdesc[m->size]; d++)
-	  if ((d->target_flags & target_flags) == d->target_flags)
+ 	  if (((d->target_flags & target_flags) == d->target_flags)
+ 	      || loongson_p)
 	    add_builtin_function (d->name,
 				  mips_build_function_type (d->function_type),
 				  d - m->bdesc + offset,
@@ -12603,6 +12859,30 @@ mips_order_regs_for_local_alloc (void)
       reg_alloc_order[24] = 0;
     }
 }
+
+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode;
+  enum machine_mode inner;
+  unsigned int i, n_elts;
+  rtx mem;
+
+  mode = GET_MODE (target);
+  inner = GET_MODE_INNER (mode);
+  n_elts = GET_MODE_NUNITS (mode);
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
 \f
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
--- doc/extend.texi	(/local/gcc-trunk/gcc)	(revision 373)
+++ doc/extend.texi	(/local/gcc-2/gcc)	(revision 373)
@@ -6788,6 +6788,7 @@ instructions, but allow the compiler to 
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
 * PowerPC AltiVec Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
@@ -8667,6 +8668,150 @@ value is the upper one.  The opposite or
 For example, the code above will set the lower half of @code{a} to
 @code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
+
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
+
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
+
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
+
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+@end smallexample
+
+Also provided are helper functions for loading and storing values of the
+above 64-bit vector types to and from memory:
+
+@smallexample
+uint32x2_t vec_load_uw (uint32x2_t *src);
+uint16x4_t vec_load_uh (uint16x4_t *src);
+uint8x8_t vec_load_ub (uint8x8_t *src);
+int32x2_t vec_load_sw (int32x2_t *src);
+int16x4_t vec_load_sh (int16x4_t *src);
+int8x8_t vec_load_sb (int8x8_t *src);
+void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
+void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
+void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
+void vec_store_sw (int32x2_t v, int32x2_t *dest);
+void vec_store_sh (int16x4_t v, int16x4_t *dest);
+void vec_store_sb (int8x8_t v, int8x8_t *dest);
+@end smallexample
+
 @menu
 * Paired-Single Arithmetic::
 * Paired-Single Built-in Functions::
--- testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-trunk/gcc)	(revision 373)
+++ testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-2/gcc)	(revision 373)
@@ -0,0 +1,2380 @@
+/* Test cases for ST Microelectronics Loongson-2E/2F SIMD intrinsics.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target mips_loongson } */
+
+#include "loongson.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+
+typedef union { int32x2_t v; int32_t a[2]; } int32x2_encap_t;
+typedef union { int16x4_t v; int16_t a[4]; } int16x4_encap_t;
+typedef union { int8x8_t v; int8_t a[8]; } int8x8_encap_t;
+typedef union { uint32x2_t v; uint32_t a[2]; } uint32x2_encap_t;
+typedef union { uint16x4_t v; uint16_t a[4]; } uint16x4_encap_t;
+typedef union { uint8x8_t v; uint8_t a[8]; } uint8x8_encap_t;
+
+#define UINT16x4_MAX USHRT_MAX
+#define UINT8x8_MAX UCHAR_MAX
+#define INT8x8_MAX SCHAR_MAX
+#define INT16x4_MAX SHRT_MAX
+#define INT32x2_MAX INT_MAX
+
+static void test_packsswh (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = INT16x4_MAX - 2;
+  s.a[1] = INT16x4_MAX - 1;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX + 1;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = packsswh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 2);
+  assert (r.a[1] == INT16x4_MAX - 1);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_packsshb (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = INT8x8_MAX - 6;
+  s.a[1] = INT8x8_MAX - 5;
+  s.a[2] = INT8x8_MAX - 4;
+  s.a[3] = INT8x8_MAX - 3;
+  t.a[0] = INT8x8_MAX - 2;
+  t.a[1] = INT8x8_MAX - 1;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX + 1;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = packsshb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_packushb (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = UINT8x8_MAX - 6;
+  s.a[1] = UINT8x8_MAX - 5;
+  s.a[2] = UINT8x8_MAX - 4;
+  s.a[3] = UINT8x8_MAX - 3;
+  t.a[0] = UINT8x8_MAX - 2;
+  t.a[1] = UINT8x8_MAX - 1;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX + 1;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = packushb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == UINT8x8_MAX - 6);
+  assert (r.a[1] == UINT8x8_MAX - 5);
+  assert (r.a[2] == UINT8x8_MAX - 4);
+  assert (r.a[3] == UINT8x8_MAX - 3);
+  assert (r.a[4] == UINT8x8_MAX - 2);
+  assert (r.a[5] == UINT8x8_MAX - 1);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_paddw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = paddw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 6);
+}
+
+static void test_paddw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = paddw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_paddh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = paddh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 6);
+  assert (r.a[1] == 8);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 12);
+}
+
+static void test_paddh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = paddh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+}
+
+static void test_paddb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 5;
+  s.a[5] = 6;
+  s.a[6] = 7;
+  s.a[7] = 8;
+  t.a[0] = 9;
+  t.a[1] = 10;
+  t.a[2] = 11;
+  t.a[3] = 12;
+  t.a[4] = 13;
+  t.a[5] = 14;
+  t.a[6] = 15;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = paddb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 12);
+  assert (r.a[2] == 14);
+  assert (r.a[3] == 16);
+  assert (r.a[4] == 18);
+  assert (r.a[5] == 20);
+  assert (r.a[6] == 22);
+  assert (r.a[7] == 24);
+}
+
+static void test_paddb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = paddb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+  assert (r.a[4] == -45);
+  assert (r.a[5] == -54);
+  assert (r.a[6] == -63);
+  assert (r.a[7] == -72);
+}
+
+static void test_paddd_u (void)
+{
+  uint64_t d = 123456;
+  uint64_t e = 789012;
+  uint64_t r;
+  r = paddd_u (d, e);
+  assert (r == 912468);
+}
+
+static void test_paddd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = paddd_s (d, e);
+  assert (r == -665556);
+}
+
+static void test_paddsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX;
+  t.a[2] = INT16x4_MAX;
+  t.a[3] = INT16x4_MAX;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = paddsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_paddsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = INT8x8_MAX;
+  t.a[1] = INT8x8_MAX;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX;
+  t.a[4] = INT8x8_MAX;
+  t.a[5] = INT8x8_MAX;
+  t.a[6] = INT8x8_MAX;
+  t.a[7] = INT8x8_MAX;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = paddsb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_paddush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  t.a[0] = UINT16x4_MAX;
+  t.a[1] = UINT16x4_MAX;
+  t.a[2] = UINT16x4_MAX;
+  t.a[3] = UINT16x4_MAX;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = paddush (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == UINT16x4_MAX);
+  assert (r.a[1] == UINT16x4_MAX);
+  assert (r.a[2] == UINT16x4_MAX);
+  assert (r.a[3] == UINT16x4_MAX);
+}
+
+static void test_paddusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  s.a[4] = 0;
+  s.a[5] = 1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = UINT8x8_MAX;
+  t.a[1] = UINT8x8_MAX;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX;
+  t.a[4] = UINT8x8_MAX;
+  t.a[5] = UINT8x8_MAX;
+  t.a[6] = UINT8x8_MAX;
+  t.a[7] = UINT8x8_MAX;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = paddusb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == UINT8x8_MAX);
+  assert (r.a[1] == UINT8x8_MAX);
+  assert (r.a[2] == UINT8x8_MAX);
+  assert (r.a[3] == UINT8x8_MAX);
+  assert (r.a[4] == UINT8x8_MAX);
+  assert (r.a[5] == UINT8x8_MAX);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_pandn_ud (void)
+{
+  uint64_t d1 = 0x0000ffff0000ffffull;
+  uint64_t d2 = 0x0000ffff0000ffffull;
+  uint64_t r;
+  r = pandn_ud (d1, d2);
+  assert (r == 0);
+}
+
+static void test_pandn_sd (void)
+{
+  int64_t d1 = (int64_t) 0x0000000000000000ull;
+  int64_t d2 = (int64_t) 0xfffffffffffffffeull;
+  int64_t r;
+  r = pandn_sd (d1, d2);
+  assert (r == -2);
+}
+
+static void test_pandn_uw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0x00000000;
+  t.a[1] = 0xffffffff;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pandn_uw (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pandn_sw (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0xffffffff;
+  t.a[1] = 0xfffffffe;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pandn_sw (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+}
+
+static void test_pandn_uh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0x0000;
+  t.a[1] = 0xffff;
+  t.a[2] = 0x0000;
+  t.a[3] = 0xffff;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pandn_uh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pandn_sh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0xffff;
+  t.a[1] = 0xfffe;
+  t.a[2] = 0xffff;
+  t.a[3] = 0xfffe;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pandn_sh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+}
+
+static void test_pandn_ub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0x00;
+  t.a[1] = 0xff;
+  t.a[2] = 0x00;
+  t.a[3] = 0xff;
+  t.a[4] = 0x00;
+  t.a[5] = 0xff;
+  t.a[6] = 0x00;
+  t.a[7] = 0xff;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pandn_ub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pandn_sb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0xff;
+  t.a[1] = 0xfe;
+  t.a[2] = 0xff;
+  t.a[3] = 0xfe;
+  t.a[4] = 0xff;
+  t.a[5] = 0xfe;
+  t.a[6] = 0xff;
+  t.a[7] = 0xfe;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pandn_sb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -2);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -2);
+}
+
+static void test_pavgh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pavgh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+}
+
+static void test_pavgb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 1;
+  s.a[5] = 2;
+  s.a[6] = 3;
+  s.a[7] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pavgb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+  assert (r.a[4] == 3);
+  assert (r.a[5] == 4);
+  assert (r.a[6] == 5);
+  assert (r.a[7] == 6);
+}
+
+static void test_pcmpeqw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pcmpeqw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpeqh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pcmpeqh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpeqb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 42;
+  s.a[5] = 43;
+  s.a[6] = 42;
+  s.a[7] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  t.a[4] = 43;
+  t.a[5] = 43;
+  t.a[6] = 43;
+  t.a[7] = 43;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pcmpeqb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpeqw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pcmpeqw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+}
+
+static void test_pcmpeqh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pcmpeqh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpeqb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = -42;
+  s.a[5] = -42;
+  s.a[6] = -42;
+  s.a[7] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = -42;
+  t.a[6] = 42;
+  t.a[7] = -42;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pcmpeqb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -1);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -1);
+}
+
+static void test_pcmpgtw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 42;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pcmpgtw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpgth_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 40;
+  t.a[1] = 41;
+  t.a[2] = 43;
+  t.a[3] = 42;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pcmpgth_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0x0000);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpgtb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 44;
+  s.a[5] = 45;
+  s.a[6] = 46;
+  s.a[7] = 47;
+  t.a[0] = 48;
+  t.a[1] = 47;
+  t.a[2] = 46;
+  t.a[3] = 45;
+  t.a[4] = 44;
+  t.a[5] = 43;
+  t.a[6] = 42;
+  t.a[7] = 41;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pcmpgtb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0x00);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0x00);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0xff);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpgtw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = -42;
+  t.a[0] = -42;
+  t.a[1] = -42;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pcmpgtw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 0);
+}
+
+static void test_pcmpgth_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = 43;
+  t.a[2] = 44;
+  t.a[3] = -43;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pcmpgth_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpgtb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = 42;
+  s.a[5] = 42;
+  s.a[6] = 42;
+  s.a[7] = 42;
+  t.a[0] = -45;
+  t.a[1] = -44;
+  t.a[2] = -43;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = 43;
+  t.a[6] = 41;
+  t.a[7] = 40;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pcmpgtb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == -1);
+  assert (r.a[7] == -1);
+}
+
+static void test_pextrh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s1 = vec_load_uh (&s.v);
+  r1 = pextrh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 41);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pextrh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -40;
+  s.a[1] = -41;
+  s.a[2] = -42;
+  s.a[3] = -43;
+  s1 = vec_load_sh (&s.v);
+  r1 = pextrh_s (s1, 2);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pinsrh_0123_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  s1 = vec_load_uh (&s.v);
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  t1 = vec_load_uh (&t.v);
+  r1 = pinsrh_0_u (t1, s1);
+  r1 = pinsrh_1_u (r1, s1);
+  r1 = pinsrh_2_u (r1, s1);
+  r1 = pinsrh_3_u (r1, s1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 42);
+  assert (r.a[1] == 42);
+  assert (r.a[2] == 42);
+  assert (r.a[3] == 42);
+}
+
+static void test_pinsrh_0123_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  t1 = vec_load_sh (&t.v);
+  r1 = pinsrh_0_s (t1, s1);
+  r1 = pinsrh_1_s (r1, s1);
+  r1 = pinsrh_2_s (r1, s1);
+  r1 = pinsrh_3_s (r1, s1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == -42);
+  assert (r.a[2] == -42);
+  assert (r.a[3] == -42);
+}
+
+static void test_pmaddhw (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -5;
+  s.a[1] = -4;
+  s.a[2] = -3;
+  s.a[3] = -2;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 10;
+  t.a[1] = 11;
+  t.a[2] = 12;
+  t.a[3] = 13;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmaddhw (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == (-5*10 + -4*11));
+  assert (r.a[1] == (-3*12 + -2*13));
+}
+
+static void test_pmaxsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmaxsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 20);
+  assert (r.a[1] == 40);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 50);
+}
+
+static void test_pmaxub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pmaxub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 80);
+  assert (r.a[1] == 70);
+  assert (r.a[2] == 60);
+  assert (r.a[3] == 50);
+  assert (r.a[4] == 50);
+  assert (r.a[5] == 60);
+  assert (r.a[6] == 70);
+  assert (r.a[7] == 80);
+}
+
+static void test_pminsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  t1 = vec_load_sh (&t.v);
+  r1 = pminsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -20);
+  assert (r.a[1] == -40);
+  assert (r.a[2] == -10);
+  assert (r.a[3] == -50);
+}
+
+static void test_pminub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pminub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 20);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 40);
+  assert (r.a[4] == 40);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 20);
+  assert (r.a[7] == 10);
+}
+
+static void test_pmovmskb_u (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_t s1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0xf0;
+  s.a[1] = 0x40;
+  s.a[2] = 0xf0;
+  s.a[3] = 0x40;
+  s.a[4] = 0xf0;
+  s.a[5] = 0x40;
+  s.a[6] = 0xf0;
+  s.a[7] = 0x40;
+  s1 = vec_load_ub (&s.v);
+  r1 = pmovmskb_u (s1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmovmskb_s (void)
+{
+  int8x8_encap_t s;
+  int8x8_t s1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 1;
+  s.a[2] = -1;
+  s.a[3] = 1;
+  s.a[4] = -1;
+  s.a[5] = 1;
+  s.a[6] = -1;
+  s.a[7] = 1;
+  s1 = vec_load_sb (&s.v);
+  r1 = pmovmskb_s (s1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmulhuh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xff00;
+  s.a[1] = 0xff00;
+  s.a[2] = 0xff00;
+  s.a[3] = 0xff00;
+  s1 = vec_load_uh (&s.v);
+  t.a[0] = 16;
+  t.a[1] = 16;
+  t.a[2] = 16;
+  t.a[3] = 16;
+  t1 = vec_load_uh (&t.v);
+  r1 = pmulhuh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x000f);
+  assert (r.a[1] == 0x000f);
+  assert (r.a[2] == 0x000f);
+  assert (r.a[3] == 0x000f);
+}
+
+static void test_pmulhh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmulhh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -16);
+  assert (r.a[1] == -16);
+  assert (r.a[2] == -16);
+  assert (r.a[3] == -16);
+}
+
+static void test_pmullh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmullh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 4096);
+  assert (r.a[1] == 4096);
+  assert (r.a[2] == 4096);
+  assert (r.a[3] == 4096);
+}
+
+static void test_pmuluw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint64_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xdeadbeef;
+  s.a[1] = 0;
+  t.a[0] = 0x0f00baaa;
+  t.a[1] = 0;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pmuluw (s1, t1);
+  assert (r1 == 0xd0cd08e1d1a70b6ull);
+}
+
+static void test_pasubub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pasubub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 70);
+  assert (r.a[1] == 50);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 10);
+  assert (r.a[4] == 10);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 50);
+  assert (r.a[7] == 70);
+}
+
+static void test_biadd (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  r1 = biadd (s1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 360);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psadbh (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = psadbh (s1, t1);
+  vec_store_uh (r1, &r.v);;
+  assert (r.a[0] == 0x0140);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pshufh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s1 = vec_load_uh (&s.v);
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r1 = vec_load_uh (&r.v);
+  r1 = pshufh_u (r1, s1, 0xe5);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_pshufh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 2;
+  s.a[2] = -3;
+  s.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r1 = vec_load_sh (&r.v);
+  r1 = pshufh_s (r1, s1, 0xe5);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+}
+
+static void test_psllh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0xffff;
+  s.a[2] = 0xffff;
+  s.a[3] = 0xffff;
+  s1 = vec_load_uh (&s.v);
+  r1 = psllh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0xfffe);
+  assert (r.a[1] == 0xfffe);
+  assert (r.a[2] == 0xfffe);
+  assert (r.a[3] == 0xfffe);
+}
+
+static void test_psllw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0xffffffff;
+  s1 = vec_load_uw (&s.v);
+  r1 = psllw_u (s1, 2);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0xfffffffc);
+  assert (r.a[1] == 0xfffffffc);
+}
+
+static void test_psllh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  s1 = vec_load_sh (&s.v);
+  r1 = psllh_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -2);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == -2);
+  assert (r.a[3] == -2);
+}
+
+static void test_psllw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s1 = vec_load_sw (&s.v);
+  r1 = psllw_s (s1, 2);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -4);
+  assert (r.a[1] == -4);
+}
+
+static void test_psrah_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  s1 = vec_load_uh (&s.v);
+  r1 = psrah_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0xfff7);
+  assert (r.a[1] == 0xfff7);
+  assert (r.a[2] == 0xfff7);
+  assert (r.a[3] == 0xfff7);
+}
+
+static void test_psraw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  s1 = vec_load_uw (&s.v);
+  r1 = psraw_u (s1, 1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0xfffffff7);
+  assert (r.a[1] == 0xfffffff7);
+}
+
+static void test_psrah_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s.a[2] = -2;
+  s.a[3] = -2;
+  s1 = vec_load_sh (&s.v);
+  r1 = psrah_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == -1);
+}
+
+static void test_psraw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s1 = vec_load_sw (&s.v);
+  r1 = psraw_s (s1, 1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+}
+
+static void test_psrlh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  s1 = vec_load_uh (&s.v);
+  r1 = psrlh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x7ff7);
+  assert (r.a[1] == 0x7ff7);
+  assert (r.a[2] == 0x7ff7);
+  assert (r.a[3] == 0x7ff7);
+}
+
+static void test_psrlw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  s1 = vec_load_uw (&s.v);
+  r1 = psrlw_u (s1, 1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x7ffffff7);
+  assert (r.a[1] == 0x7ffffff7);
+}
+
+static void test_psrlh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  s1 = vec_load_sh (&s.v);
+  r1 = psrlh_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psrlw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s1 = vec_load_sw (&s.v);
+  r1 = psrlw_s (s1, 1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == INT32x2_MAX);
+  assert (r.a[1] == INT32x2_MAX);
+}
+
+static void test_psubw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 3;
+  s.a[1] = 4;
+  t.a[0] = 2;
+  t.a[1] = 1;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = psubw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = -4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = psubw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 5;
+  s.a[1] = 6;
+  s.a[2] = 7;
+  s.a[3] = 8;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = psubh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 4);
+  assert (r.a[3] == 4);
+}
+
+static void test_psubh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = psubh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+}
+
+static void test_psubb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 11;
+  s.a[2] = 12;
+  s.a[3] = 13;
+  s.a[4] = 14;
+  s.a[5] = 15;
+  s.a[6] = 16;
+  s.a[7] = 17;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = psubb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 9);
+  assert (r.a[2] == 9);
+  assert (r.a[3] == 9);
+  assert (r.a[4] == 9);
+  assert (r.a[5] == 9);
+  assert (r.a[6] == 9);
+  assert (r.a[7] == 9);
+}
+
+static void test_psubb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = psubb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+  assert (r.a[4] == -55);
+  assert (r.a[5] == -66);
+  assert (r.a[6] == -77);
+  assert (r.a[7] == -88);
+}
+
+static void test_psubd_u (void)
+{
+  uint64_t d = 789012;
+  uint64_t e = 123456;
+  uint64_t r;
+  r = psubd_u (d, e);
+  assert (r == 665556);
+}
+
+static void test_psubd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = psubd_s (d, e);
+  assert (r == 912468);
+}
+
+static void test_psubsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = -INT16x4_MAX;
+  t.a[1] = -INT16x4_MAX;
+  t.a[2] = -INT16x4_MAX;
+  t.a[3] = -INT16x4_MAX;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = psubsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psubsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = -INT8x8_MAX;
+  t.a[1] = -INT8x8_MAX;
+  t.a[2] = -INT8x8_MAX;
+  t.a[3] = -INT8x8_MAX;
+  t.a[4] = -INT8x8_MAX;
+  t.a[5] = -INT8x8_MAX;
+  t.a[6] = -INT8x8_MAX;
+  t.a[7] = -INT8x8_MAX;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = psubsb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_psubush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = psubush (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psubusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  s.a[4] = 4;
+  s.a[5] = 5;
+  s.a[6] = 6;
+  s.a[7] = 7;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  t.a[4] = 5;
+  t.a[5] = 5;
+  t.a[6] = 7;
+  t.a[7] = 7;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = psubusb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_punpckhbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = punpckhbh_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == -11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == -13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == -15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = punpckhbh_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == 11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == 13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == 15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = punpckhhw_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == -6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = punpckhhw_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 5);
+  assert (r.a[1] == 6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = -4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = punpckhwd_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == -4);
+}
+
+static void test_punpckhwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = punpckhwd_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+}
+
+static void test_punpcklbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = punpcklbh_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == -5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == -7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = punpcklbh_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == 5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == 7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = punpcklhw_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = punpcklhw_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = punpcklwd_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == -2);
+}
+
+static void test_punpcklwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = punpcklwd_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+}
+
+int main (void)
+{
+  test_packsswh ();
+  test_packsshb ();
+  test_packushb ();
+  test_paddw_u ();
+  test_paddw_s ();
+  test_paddh_u ();
+  test_paddh_s ();
+  test_paddb_u ();
+  test_paddb_s ();
+  test_paddd_u ();
+  test_paddd_s ();
+  test_paddsh ();
+  test_paddsb ();
+  test_paddush ();
+  test_paddusb ();
+  test_pandn_ud ();
+  test_pandn_sd ();
+  test_pandn_uw ();
+  test_pandn_sw ();
+  test_pandn_uh ();
+  test_pandn_sh ();
+  test_pandn_ub ();
+  test_pandn_sb ();
+  test_pavgh ();
+  test_pavgb ();
+  test_pcmpeqw_u ();
+  test_pcmpeqh_u ();
+  test_pcmpeqb_u ();
+  test_pcmpeqw_s ();
+  test_pcmpeqh_s ();
+  test_pcmpeqb_s ();
+  test_pcmpgtw_u ();
+  test_pcmpgth_u ();
+  test_pcmpgtb_u ();
+  test_pcmpgtw_s ();
+  test_pcmpgth_s ();
+  test_pcmpgtb_s ();
+  test_pextrh_u ();
+  test_pextrh_s ();
+  test_pinsrh_0123_u ();
+  test_pinsrh_0123_s ();
+  test_pmaddhw ();
+  test_pmaxsh ();
+  test_pmaxub ();
+  test_pminsh ();
+  test_pminub ();
+  test_pmovmskb_u ();
+  test_pmovmskb_s ();
+  test_pmulhuh ();
+  test_pmulhh ();
+  test_pmullh ();
+  test_pmuluw ();
+  test_pasubub ();
+  test_biadd ();
+  test_psadbh ();
+  test_pshufh_u ();
+  test_pshufh_s ();
+  test_psllh_u ();
+  test_psllw_u ();
+  test_psllh_s ();
+  test_psllw_s ();
+  test_psrah_u ();
+  test_psraw_u ();
+  test_psrah_s ();
+  test_psraw_s ();
+  test_psrlh_u ();
+  test_psrlw_u ();
+  test_psrlh_s ();
+  test_psrlw_s ();
+  test_psubw_u ();
+  test_psubw_s ();
+  test_psubh_u ();
+  test_psubh_s ();
+  test_psubb_u ();
+  test_psubb_s ();
+  test_psubd_u ();
+  test_psubd_s ();
+  test_psubsh ();
+  test_psubsb ();
+  test_psubush ();
+  test_psubusb ();
+  test_punpckhbh_s ();
+  test_punpckhbh_u ();
+  test_punpckhhw_s ();
+  test_punpckhhw_u ();
+  test_punpckhwd_s ();
+  test_punpckhwd_u ();
+  test_punpcklbh_s ();
+  test_punpcklbh_u ();
+  test_punpcklhw_s ();
+  test_punpcklhw_u ();
+  test_punpcklwd_s ();
+  test_punpcklwd_u ();
+  return 0;
+}
--- testsuite/lib/target-supports.exp	(/local/gcc-trunk/gcc)	(revision 373)
+++ testsuite/lib/target-supports.exp	(/local/gcc-2/gcc)	(revision 373)
@@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
     } "-mfpu=neon -mfloat-abi=softfp"]
 }
 
+# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+# the Loongson vector modes.
+
+proc check_effective_target_mips_loongson { } {
+    return [check_no_compiler_messages loongson assembly {
+	#if !defined(__mips_loongson_vector_rev)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {
--- config.gcc	(/local/gcc-trunk/gcc)	(revision 373)
+++ config.gcc	(/local/gcc-2/gcc)	(revision 373)
@@ -349,6 +349,7 @@ m68k-*-*)
 mips*-*-*)
 	cpu_type=mips
 	need_64bit_hwint=yes
+	extra_headers="loongson.h"
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
--- config/mips/loongson.md	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/loongson.md	(/local/gcc-2/gcc)	(revision 373)
@@ -0,0 +1,429 @@
+;; Machine description for ST Microelectronics Loongson-2E/2F.
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Mode iterators and attributes.
+
+;; 64-bit vectors of bytes.
+(define_mode_iterator VB [V8QI])
+
+;; 64-bit vectors of halfwords.
+(define_mode_iterator VH [V4HI])
+
+;; 64-bit vectors of words.
+(define_mode_iterator VW [V2SI])
+
+;; 64-bit vectors of halfwords and bytes.
+(define_mode_iterator VHB [V4HI V8QI])
+
+;; 64-bit vectors of words and halfwords.
+(define_mode_iterator VWH [V2SI V4HI])
+
+;; 64-bit vectors of words, halfwords and bytes.
+(define_mode_iterator VWHB [V2SI V4HI V8QI])
+
+;; 64-bit vectors of words, halfwords and bytes; and DImode.
+(define_mode_iterator VWHBDI [V2SI V4HI V8QI DI])
+
+;; The Loongson instruction suffixes corresponding to the modes in the
+;; VWHB iterator.
+(define_mode_attr V_suffix [(V2SI "w") (V4HI "h") (V8QI "b")])
+
+;; Given a vector type T, the mode of a vector half the size of T
+;; and with the same number of elements.
+(define_mode_attr V_squash [(V2SI "V2HI") (V4HI "V4QI")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with half as many elements.
+(define_mode_attr V_stretch_half [(V2SI "DI") (V4HI "V2SI") (V8QI "V4HI")])
+
+;; The Loongson instruction suffixes corresponding to the transformation
+;; expressed by V_stretch_half.
+(define_mode_attr V_stretch_half_suffix [(V2SI "wd") (V4HI "hw") (V8QI "bh")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with twice as many elements.
+(define_mode_attr V_squash_double [(V2SI "V4HI") (V4HI "V8QI")])
+
+;; The Loongson instruction suffixes corresponding to the conversions
+;; specified by V_half_width.
+(define_mode_attr V_squash_double_suffix [(V2SI "wh") (V4HI "hb")])
+
+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0)
+	(match_operand:VWHB 1))]
+  ""
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})
+
+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,d,f,d,m, f")
+	(match_operand:VWHB 1 "move_operand"          "f,m,f,f,d,d,YG,YG"))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  { return mips_output_move (operands[0], operands[1]); }
+  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
+   (set_attr "mode" "DI")])
+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand")
+	(match_operand 1 ""))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+{
+  mips_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
+
+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 1 "register_operand" "f"))
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 2 "register_operand" "f"))))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packss<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Pack with unsigned saturation.
+(define_insn "vec_pack_usat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 1 "register_operand" "f"))
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 2 "register_operand" "f"))))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packus<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by wraparound.
+(define_insn "add<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (plus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		   (match_operand:VWHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padd<V_suffix>\t%0,%1,%2")
+
+;; Addition of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "paddd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (plus:DI (match_operand:DI 1 "register_operand" "f")
+		 (match_operand:DI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddd\t%0,%1,%2")
+
+;; Addition, treating overflow by signed saturation.
+(define_insn "ssadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padds<V_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by unsigned saturation.
+(define_insn "usadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddus<V_suffix>\t%0,%1,%2")
+
+;; Logical AND NOT.
+(define_insn "loongson_and_not_<mode>"
+  [(set (match_operand:VWHBDI 0 "register_operand" "=f")
+        (and:VWHBDI
+	 (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
+	 (match_operand:VWHBDI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pandn\t%0,%1,%2")
+
+;; Average.
+(define_insn "loongson_average_<mode>"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (unspec:VHB [(match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")]
+		    UNSPEC_LOONGSON_AVERAGE))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pavg<V_suffix>\t%0,%1,%2")
+
+;; Equality test.
+(define_insn "loongson_eq_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_EQ))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpeq<V_suffix>\t%0,%1,%2")
+
+;; Greater-than test.
+(define_insn "loongson_gt_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_GT))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpgt<V_suffix>\t%0,%1,%2")
+
+;; Extract halfword.
+(define_insn "loongson_extract_halfword"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+ 		    (match_operand:SI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_EXTRACT_HALFWORD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pextr<V_suffix>\t%0,%1,%2")
+
+;; Insert halfword.
+(define_insn "loongson_insert_halfword_0"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_0))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_0\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_1"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_1))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_1\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_2"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_2))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_2\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_3))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_3\t%0,%1,%2")
+
+;; Multiply and add packed integers.
+(define_insn "loongson_mult_add"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VH 1 "register_operand" "f")
+				  (match_operand:VH 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_MULT_ADD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Maximum of signed halfwords.
+(define_insn "smax<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smax:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxs<V_suffix>\t%0,%1,%2")
+
+;; Maximum of unsigned bytes.
+(define_insn "umax<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umax:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxu<V_suffix>\t%0,%1,%2")
+
+;; Minimum of signed halfwords.
+(define_insn "smin<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smin:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmins<V_suffix>\t%0,%1,%2")
+
+;; Minimum of unsigned bytes.
+(define_insn "umin<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umin:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pminu<V_suffix>\t%0,%1,%2")
+
+;; Move byte mask.
+(define_insn "loongson_move_byte_mask"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")]
+		   UNSPEC_LOONGSON_MOVE_BYTE_MASK))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmovmsk<V_suffix>\t%0,%1")
+
+;; Multiply unsigned integers and store high result.
+(define_insn "umul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_UMUL_HIGHPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulhu<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store high result.
+(define_insn "smul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_SMUL_HIGHPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulh<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store low result.
+(define_insn "loongson_smul_lowpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_SMUL_LOWPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmull<V_suffix>\t%0,%1,%2")
+
+;; Multiply unsigned word integers.
+(define_insn "loongson_umul_word"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:VW 1 "register_operand" "f")
+		    (match_operand:VW 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_UMUL_WORD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulu<V_suffix>\t%0,%1,%2")
+
+;; Absolute difference.
+(define_insn "loongson_pasubub"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")
+		    (match_operand:VB 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PASUBUB))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2")
+
+;; Sum of unsigned byte integers.
+(define_insn "reduc_uplus_<mode>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")]
+				 UNSPEC_LOONGSON_BIADD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "biadd\t%0,%1")
+
+;; Sum of absolute differences.
+(define_insn "loongson_psadbh"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")
+				  (match_operand:VB 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PSADBH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2;biadd\t%0,%0")
+
+;; Shuffle halfwords.
+(define_insn "loongson_pshufh"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "0")
+		    (match_operand:VH 2 "register_operand" "f")
+		    (match_operand:SI 3 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSHUFH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pshufh\t%0,%2,%3")
+
+;; Shift left logical.
+(define_insn "loongson_psll<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashift:VWH (match_operand:VWH 1 "register_operand" "f")
+		    (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psll<V_suffix>\t%0,%1,%2")
+
+;; Shift right arithmetic.
+(define_insn "loongson_psra<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psra<V_suffix>\t%0,%1,%2")
+
+;; Shift right logical.
+(define_insn "loongson_psrl<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (lshiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psrl<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by wraparound.
+(define_insn "sub<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (minus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		    (match_operand:VWHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psub<V_suffix>\t%0,%1,%2")
+
+;; Subtraction of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "psubd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (minus:DI (match_operand:DI 1 "register_operand" "f")
+		  (match_operand:DI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubd\t%0,%1,%2")
+
+;; Subtraction, treating overflow by signed saturation.
+(define_insn "sssub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubs<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by unsigned saturation.
+(define_insn "ussub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubus<V_suffix>\t%0,%1,%2")
+
+;; Unpack high data.
+(define_insn "vec_interleave_high<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_UNPACK_HIGH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Unpack low data.
+(define_insn "vec_interleave_low<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_UNPACK_LOW))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2")
--- config/mips/mips-ftypes.def	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips-ftypes.def	(/local/gcc-2/gcc)	(revision 373)
@@ -66,6 +66,24 @@ DEF_MIPS_FTYPE (1, (SF, SF))
 DEF_MIPS_FTYPE (2, (SF, SF, SF))
 DEF_MIPS_FTYPE (1, (SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (UDI, UDI, UDI))
+DEF_MIPS_FTYPE (2, (UDI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UQI))
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV4HI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV8QI, UV8QI))
+
+DEF_MIPS_FTYPE (2, (UV8QI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV8QI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV8QI, UV8QI, UV8QI))
+
 DEF_MIPS_FTYPE (1, (V2HI, SI))
 DEF_MIPS_FTYPE (2, (V2HI, SI, SI))
 DEF_MIPS_FTYPE (3, (V2HI, SI, SI, SI))
@@ -81,12 +99,27 @@ DEF_MIPS_FTYPE (2, (V2SF, V2SF, V2SF))
 DEF_MIPS_FTYPE (3, (V2SF, V2SF, V2SF, INT))
 DEF_MIPS_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, UQI))
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V2SI, V4HI, V4HI))
+
+DEF_MIPS_FTYPE (2, (V4HI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, USI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, V4HI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, USI))
+
 DEF_MIPS_FTYPE (1, (V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V2HI, V2HI))
 DEF_MIPS_FTYPE (1, (V4QI, V4QI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, V4QI))
 
+DEF_MIPS_FTYPE (2, (V8QI, V4HI, V4HI))
+DEF_MIPS_FTYPE (1, (V8QI, V8QI))
+DEF_MIPS_FTYPE (2, (V8QI, V8QI, V8QI))
+
 DEF_MIPS_FTYPE (2, (VOID, SI, SI))
 DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI))
 DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI))
--- config/mips/mips-protos.h	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips-protos.h	(/local/gcc-2/gcc)	(revision 373)
@@ -303,4 +303,6 @@ union mips_gen_fn_ptrs
 extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 				     rtx, rtx, rtx, rtx);
 
+extern void mips_expand_vector_init (rtx, rtx);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
--- config/mips/loongson.h	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/loongson.h	(/local/gcc-2/gcc)	(revision 373)
@@ -0,0 +1,769 @@
+/* Intrinsics for ST Microelectronics Loongson-2E/2F SIMD operations.
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_LOONGSON_H
+#define _GCC_LOONGSON_H
+
+#if !defined(__mips_loongson_vector_rev)
+# error "You must select -march=loongson2e or -march=loongson2f to use loongson.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Vectors of unsigned bytes, halfwords and words.  */
+typedef uint8_t uint8x8_t __attribute__((vector_size (8)));
+typedef uint16_t uint16x4_t __attribute__((vector_size (8)));
+typedef uint32_t uint32x2_t __attribute__((vector_size (8)));
+
+/* Vectors of signed bytes, halfwords and words.  */
+typedef int8_t int8x8_t __attribute__((vector_size (8)));
+typedef int16_t int16x4_t __attribute__((vector_size (8)));
+typedef int32_t int32x2_t __attribute__((vector_size (8)));
+
+/* Helpers for loading and storing vectors.  */
+
+/* Load from memory.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vec_load_uw (uint32x2_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vec_load_uh (uint16x4_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vec_load_ub (uint8x8_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vec_load_sw (int32x2_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vec_load_sh (int16x4_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vec_load_sb (int8x8_t *src)
+{
+  return *src;
+}
+
+/* Store to memory.  */
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_uw (uint32x2_t v, uint32x2_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_uh (uint16x4_t v, uint16x4_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_ub (uint8x8_t v, uint8x8_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sw (int32x2_t v, int32x2_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sh (int16x4_t v, int16x4_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sb (int8x8_t v, int8x8_t *dest)
+{
+  *dest = v;
+}
+
+/* SIMD intrinsics.
+   Unless otherwise noted, calls to the functions below will expand into
+   precisely one machine instruction, modulo any moves required to
+   satisfy register allocation constraints.  */
+
+/* Pack with signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+packsswh (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_packsswh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+packsshb (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_packsshb (s, t);
+}
+
+/* Pack with unsigned saturation.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+packushb (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_packushb (s, t);
+}
+
+/* Vector addition, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+paddw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_paddw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+paddw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_paddw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddb_s (s, t);
+}
+
+/* Addition of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+paddd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_paddd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+paddd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_paddd_s (s, t);
+}
+
+/* Vector addition, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddsb (s, t);
+}
+
+/* Vector addition, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddusb (s, t);
+}
+
+/* Logical AND NOT.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+pandn_ud (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_pandn_ud (s, t);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pandn_uw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pandn_uw (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pandn_uh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pandn_uh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pandn_ub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pandn_ub (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pandn_sd (int64_t s, int64_t t)
+{
+  return __builtin_loongson_pandn_sd (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pandn_sw (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pandn_sw (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pandn_sh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pandn_sh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pandn_sb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pandn_sb (s, t);
+}
+
+/* Average.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pavgh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pavgh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pavgb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pavgb (s, t);
+}
+
+/* Equality test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_s (s, t);
+}
+
+/* Greater-than test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpgth_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpgth_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_s (s, t);
+}
+
+/* Extract halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pextrh_u (uint16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_u (s, field);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pextrh_s (int16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_s (s, field);
+}
+
+/* Insert halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_u (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_s (s, t);
+}
+
+/* Multiply and add.  */
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pmaddhw (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaddhw (s, t);
+}
+
+/* Maximum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmaxsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaxsh (s, t);
+}
+
+/* Maximum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmaxub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pmaxub (s, t);
+}
+
+/* Minimum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pminsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pminsh (s, t);
+}
+
+/* Minimum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pminub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pminub (s, t);
+}
+
+/* Move byte mask.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmovmskb_u (uint8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_u (s);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pmovmskb_s (int8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_s (s);
+}
+
+/* Multiply unsigned integers and store high result.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pmulhuh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pmulhuh (s, t);
+}
+
+/* Multiply signed integers and store high result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmulhh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmulhh (s, t);
+}
+
+/* Multiply signed integers and store low result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmullh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmullh (s, t);
+}
+
+/* Multiply unsigned word integers.  */
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pmuluw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pmuluw (s, t);
+}
+
+/* Absolute difference.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pasubub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pasubub (s, t);
+}
+
+/* Sum of unsigned byte integers.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+biadd (uint8x8_t s)
+{
+  return __builtin_loongson_biadd (s);
+}
+
+/* Sum of absolute differences.
+   Note that this intrinsic expands into two machine instructions:
+   PASUBUB followed by BIADD.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psadbh (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psadbh (s, t);
+}
+
+/* Shuffle halfwords.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_u (dest, s, order);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_s (dest, s, order);
+}
+
+/* Shift left logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psllh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psllh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psllw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psllw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_s (s, amount);
+}
+
+/* Shift right logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrlh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrlh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psrlw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psrlw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_s (s, amount);
+}
+
+/* Shift right arithmetic.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrah_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrah_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psraw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psraw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_s (s, amount);
+}
+
+/* Vector subtraction, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psubw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_psubw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psubw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_psubw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubb_s (s, t);
+}
+
+/* Subtraction of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+psubd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_psubd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+psubd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_psubd_s (s, t);
+}
+
+/* Vector subtraction, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubsb (s, t);
+}
+
+/* Vector subtraction, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubusb (s, t);
+}
+
+/* Unpack high data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpckhwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpckhhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpckhbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpckhwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpckhhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpckhbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_s (s, t);
+}
+
+/* Unpack low data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpcklwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpcklhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpcklbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpcklwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpcklhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpcklbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_s (s, t);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- config/mips/mips.h	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips.h	(/local/gcc-2/gcc)	(revision 373)
@@ -266,6 +266,11 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
 
+/* Whether vector modes and intrinsics for ST Microelectronics
+   Loongson-2E/2F processors should be enabled.  In o32 pairs of
+   floating-point registers provide 64-bit values.  */
+#define HAVE_LOONGSON_VECTOR_MODES TARGET_LOONGSON_2EF
+
 /* True if the pre-reload scheduler should try to create chains of
    multiply-add or multiply-subtract instructions.  For example,
    suppose we have:
@@ -496,6 +501,10 @@ enum mips_code_readable_setting {
 	  builtin_define_std ("MIPSEL");				\
 	  builtin_define ("_MIPSEL");					\
 	}								\
+                                                                        \
+      /* Whether Loongson vector modes are enabled.  */                 \
+      if (HAVE_LOONGSON_VECTOR_MODES)                                   \
+        builtin_define ("__mips_loongson_vector_rev");                  \
 									\
       /* Macros dependent on the C dialect.  */				\
       if (preprocessing_asm_p ())					\
--- config/mips/mips-modes.def	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips-modes.def	(/local/gcc-2/gcc)	(revision 373)
@@ -26,6 +26,7 @@ RESET_FLOAT_FORMAT (DF, mips_double_form
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
+VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-05 10:38     ` Maxim Kuvyrkov
@ 2008-06-05 16:16       ` Richard Sandiford
  2008-06-06  8:08         ` Ruan Beihong
  2008-06-06 12:31         ` Maxim Kuvyrkov
  0 siblings, 2 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-05 16:16 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Hi Maxim,

Thanks for the update.

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> My main concern is the FPR move handling.
>> It looks like you use MOV.D for 64-bit integer moves, but this is usually
>> incorrect.  In the standard ISA spec, MOV.D is unpredictable unless
>> (a) the source is uninterpreted or (b) it has been interpreted as a
>> double-precision floating-point value.
>> 
>> So: does Loongson specifically exempt itself from this restriction?
>> Or does it have special MOV.FOO instructions for the new modes?
>> 
>> Either way, the patch is inconsistent.  mips_mode_ok_for_mov_fmt_p
>> should return true for any mode that can/will be handled by MOV.FMT.
>> 
>> I don't understand why you need FPR<->FPR DImode moves for 32-bit
>> targets but not 64-bit targets.  (movdi_64bit doesn't have FPR<->FPR
>> moves either.)
>
> Loongson behaves as generic MIPS III in the respect to moves to and from 
> FP registers.  So to handle new modes I added them to MOVE64 and SPLITF 
> mode_iterators and adjusted mips_split_doubleword_move() accordingly.

This part looks good as far as it goes, thanks, but your vector move
pattern still has an "f<-f" (aka "fmove") alternative.  Like I say,
that causes GCC to use a MOV.D instruction for something that is not a
double-precision floating-point value.  My main concern was that using
MOV.D in this way is usually incorrect; see:

  - section 5.7 in volume 1 of the MIPS32/64 ISA spec
  - the documentation of MOV.FMT in volume 2 of the MIPS32/64 ISA spec

for more details.

Thus the standard ISA spec has two separate 64-bit move instructions:
MOV.D and MOV.PS.  Both instructions move one 64-bit FPR to another,
but they are used for two different kinds of 64-bit data.

There is no MOV.L instruction, so 64-bit integer moves must be done
through a GPR.  (This is true for both for 32-bit and 64-bit targets.)
Using MOV.D or MOV.PS for 64-bit integers is incorrect.

So when it comes to the new modes, I think there are two cases:

  (1) Loongson specifically exempts itself from this restriction.
      You can use MOV.D for any kind of data, regardless of how
      the source FPR has been used, or how the destination FPR
      will be used.

  (2) We need to move through GPRs for the new modes too.

The current implementation falls between two stools.  It provides
an "fmove" alternative in the move pattern (suggesting (1)),
but mips_mode_ok_for_mov_fmt_p still returns false (suggesting (2)).

If (2) is correct, you need to remove the "fmove" alternative.
If (1) is correct, you need to make mips_mode_ok_for_mov_fmt_p
return true for the new modes, with a comment saying that this
explicitly OK for Loongson.  There should also be a comment
in the MOV.D part of mips_output_move.

If the Loongson spec doesn't say one way or the other, it'd be
nice to have confirmation from the Loongson folks as to which
is right.  It's best not to go for "appears to work" for
something like this. ;)  (2) is the conservatively-correct
choice though.

> To handle new vector types I had to change mips_builtin_vector_type() to 
> distinguish between signed and unsigned basic types as new modes are 
> used both for signed and unsigned cases.

This looks good, but see below for a possible simplification.

>>> @@ -494,7 +516,10 @@
>>>  
>>>  ;; 64-bit modes for which we provide move patterns.
>>>  (define_mode_iterator MOVE64
>>> -  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
>>> +  [(DI "!TARGET_64BIT") (DF "!TARGET_64BIT")
>>> +   (V2SF "!TARGET_64BIT && TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
>>> +   (V2SI "HAVE_LOONGSON_VECTOR_MODES") (V4HI "HAVE_LOONGSON_VECTOR_MODES")
>>> +   (V8QI "HAVE_LOONGSON_VECTOR_MODES")])
>> 
>> Since we need more than one line, put V2SF and each new entry on its
>> own line.  The changes to the existing modes aren't right; they aren't
>> consistent with the comment.
>
> Fixed.  Turned out the changes to existing modes weren't necessary at all.

Thanks.

>>> Index: gcc/config/mips/mips.c
>>> ===================================================================
>>> --- gcc/config/mips/mips.c	(revision 62)
>>> +++ gcc/config/mips/mips.c	(working copy)
>>> @@ -3518,6 +3518,23 @@ mips_output_move (rtx dest, rtx src)
>>>    if (dbl_p && mips_split_64bit_move_p (dest, src))
>>>      return "#";
>>>  
>>> +  /* Handle cases where the source is a constant zero vector on
>>> +     Loongson targets.  */
>>> +  if (HAVE_LOONGSON_VECTOR_MODES && src_code == CONST_VECTOR)
>>> +    {
>>> +      if (dest_code == REG)
>>> +	{
>>> +	  /* Move constant zero vector to floating-point register.  */
>>> +	  gcc_assert (FP_REG_P (REGNO (dest)));
>>> +	  return "dmtc1\t$0,%0";
>>> +	}
>>> +      else if (dest_code == MEM)
>>> +	/* Move constant zero vector to memory.  */
>>> +	return "sd\t$0,%0";
>>> +      else
>>> +	gcc_unreachable ();
>>> +    }
>>> +
>> 
>> Why doesn't the normal zero handling work?
>
> Don't know.  I removed this piece and everything worked.

The best result ;)

>>> +/* Initialize vector TARGET to VALS.  */
>>> +
>>> +void
>>> +mips_expand_vector_init (rtx target, rtx vals)
>>> +{
>>> +  enum machine_mode mode = GET_MODE (target);
>>> +  enum machine_mode inner = GET_MODE_INNER (mode);
>>> +  unsigned int i, n_elts = GET_MODE_NUNITS (mode);
>>> +  rtx mem;
>>> +
>>> +  gcc_assert (VECTOR_MODE_P (mode));
>>> +
>>> +  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
>>> +  for (i = 0; i < n_elts; i++)
>>> +    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
>>> +                    XVECEXP (vals, 0, i));
>>> +
>>> +  emit_move_insn (target, mem);
>>> +}
[...]
>> Do we really want to create a new stack slot for every initialisation?
>> It seems on the face of it that some sort of reuse would be nice.
>
> I didn't address this last issue.  What I don't understand is how we can 
> reuse stack slots given that the accesses to different variables can 
> easily step on each others toes.

Yeah, this was more a general question than an objection.  I'm happy
keeping it as-is for now.

> -/* MODE is a vector mode whose elements have type TYPE.  Return the type
> -   of the vector itself.  */
> +/* MODE is a vector mode whose elements have type TYPE.
> +   TYPE is signed or unsigned depending on UNSIGNED_P.
> +   Return the type of the vector itself.  */
>  
>  static tree
> -mips_builtin_vector_type (tree type, enum machine_mode mode)
> +mips_builtin_vector_type (tree type, enum machine_mode mode, bool unsigned_p)
>  {
> -  static tree types[(int) MAX_MACHINE_MODE];
> +  static tree types[2 * (int) MAX_MACHINE_MODE];
> +  int mode_index;
> +
> +  mode_index = (int) mode;
>  
> -  if (types[(int) mode] == NULL_TREE)
> -    types[(int) mode] = build_vector_type_for_mode (type, mode);
> -  return types[(int) mode];
> +  if (unsigned_p)
> +    mode_index += MAX_MACHINE_MODE;
> +
> +  if (types[mode_index] == NULL_TREE)
> +    types[mode_index] = build_vector_type_for_mode (type, mode);
> +  return types[mode_index];
>  }

Couldn't you replace the unsigned_p argument with the test:

  if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type))

?  That'd avoid accidentally mixing the unsigned_p and type arguments.

> +Also provided are helper functions for loading and storing values of the
> +above 64-bit vector types to and from memory:
> +
> +@smallexample
> +uint32x2_t vec_load_uw (uint32x2_t *src);
> +uint16x4_t vec_load_uh (uint16x4_t *src);
> +uint8x8_t vec_load_ub (uint8x8_t *src);
> +int32x2_t vec_load_sw (int32x2_t *src);
> +int16x4_t vec_load_sh (int16x4_t *src);
> +int8x8_t vec_load_sb (int8x8_t *src);
> +void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
> +void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
> +void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
> +void vec_store_sw (int32x2_t v, int32x2_t *dest);
> +void vec_store_sh (int16x4_t v, int16x4_t *dest);
> +void vec_store_sb (int8x8_t v, int8x8_t *dest);
> +@end smallexample

I assume this is an existing cross-compiler API you're implementing.
Is it worth saying that plain C pointer dereferences would also work?
Or do we explicitly want to steer the user away from that, because
doing so doesn't conform to the API?  I think we should say something
either way.

(For the record, I'm OK with keeping these functions, if you're
implementing an existing API.)

I believe:

   foo_t vec_load_bar (const volatile foo_t *src);

would be more general; as it stands, I think using the functions
on constant or volatile data would result in a warning.  Likewise:

   void vec_store_bar (foo_t v, volatile foo_t *dest);

But again, if you need to be precisely compatible with an API,
I'm OK keeping it as-is.

> +;; Handle legitimized moves between values of vector modes.
> +(define_insn "mov<mode>_internal"
> +  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,d,f,d,m, f")
> +	(match_operand:VWHB 1 "move_operand"          "f,m,f,f,d,d,YG,YG"))]
> +  "HAVE_LOONGSON_VECTOR_MODES"
> +  { return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
                                                          ^^^^^^^
Just "store"; this is a GPR store.  The pattern is missing "m<-d",
"d<-m" and "d<-YG", which you also need to handle.  I think:

(define_insn "mov<mode>_internal"
  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,d,m,d")
	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
  "HAVE_LOONGSON_VECTOR_MODES"
  { return mips_output_move (operands[0], operands[1]); }
  [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")

would be correct for (2) above.  Add the fmove alternative back for (1).

> --- config/mips/mips.h	(/local/gcc-trunk/gcc)	(revision 373)
> +++ config/mips/mips.h	(/local/gcc-2/gcc)	(revision 373)
> @@ -266,6 +266,11 @@ enum mips_code_readable_setting {
>  				     || mips_tune == PROCESSOR_74KF3_2)
>  #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
>  
> +/* Whether vector modes and intrinsics for ST Microelectronics
> +   Loongson-2E/2F processors should be enabled.  In o32 pairs of
> +   floating-point registers provide 64-bit values.  */
> +#define HAVE_LOONGSON_VECTOR_MODES TARGET_LOONGSON_2EF
> +
>  /* True if the pre-reload scheduler should try to create chains of
>     multiply-add or multiply-subtract instructions.  For example,
>     suppose we have:

I think this needs to depend on TARGET_HARD_FLOAT as well.

The patch looks good otherwise, thanks.  The only major hurdle is the
fmove question.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-05 16:16       ` Richard Sandiford
@ 2008-06-06  8:08         ` Ruan Beihong
  2008-06-09 18:24           ` Maxim Kuvyrkov
  2008-06-06 12:31         ` Maxim Kuvyrkov
  1 sibling, 1 reply; 66+ messages in thread
From: Ruan Beihong @ 2008-06-06  8:08 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Hi every one,
There are something special with Loongson.
See below. (Extracted from binutils-2.18.50.0.5/opcodes/mips-opc.c) I
wonder if these instruction would be in gcc.
210:{"add",     "D,S,T",        0x4b40000c,     0xffe0003f,
RD_S|RD_T|WR_D|FP_S,    0,      IL2F    },
240:{"and",     "D,S,T",        0x4bc00002,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
546:{"dadd",    "D,S,T",        0x4b60000c,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
660:{"dsll",    "D,S,T",        0x4b20000e,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
667:{"dsra",    "D,S,T",        0x4b60000f,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
674:{"dsrl",    "D,S,T",        0x4b20000f,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
678:{"dsub",    "D,S,T",        0x4b60000d,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
1041:{"nor",    "D,S,T",        0x4ba00002,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
1051:{"or",     "D,S,T",        0x4b20000c,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
1175:{"seq",    "S,T",          0x4ba0000c,     0xffe007ff,
RD_S|RD_T|WR_CC|FP_D,   0,      IL2F    },
1203:{"sle",    "S,T",          0x4ba0000e,     0xffe007ff,
RD_S|RD_T|WR_CC|FP_D,   0,      IL2F    },
1207:{"sleu",   "S,T",          0x4b80000e,     0xffe007ff,
RD_S|RD_T|WR_CC|FP_D,   0,      IL2F    },
1212:{"sll",    "D,S,T",        0x4b00000e,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
1220:{"slt",    "S,T",          0x4ba0000d,     0xffe007ff,
RD_S|RD_T|WR_CC|FP_D,   0,      IL2F    },
1226:{"sltu",   "S,T",          0x4b80000d,     0xffe007ff,
RD_S|RD_T|WR_CC|FP_D,   0,      IL2F    },
1236:{"sra",    "D,S,T",        0x4b40000f,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
1242:{"srl",    "D,S,T",        0x4b00000f,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },
1252:{"sub",    "D,S,T",        0x4b40000d,     0xffe0003f,
RD_S|RD_T|WR_D|FP_S,    0,      IL2F    },
1259:{"sub.ps",  "D,V,T",       0x46c00001, 0xffe0003f,
WR_D|RD_S|RD_T|FP_D,    0,              I5_33|IL2F      },
1269:{"subu",   "D,S,T",        0x4b00000d,     0xffe0003f,
RD_S|RD_T|WR_D|FP_S,    0,      IL2F    },
1372:{"xor",    "D,S,T",        0x4b800002,     0xffe0003f,
RD_S|RD_T|WR_D|FP_D,    0,      IL2F    },

Those instruction is designed for using FPU to to do some easy task of
ALU thus reducing usage of m[ft]c1.
There are both mov.d and mov.ps in Loongson and one more: "or" (1051) on FPU.

James Ruan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-05 16:16       ` Richard Sandiford
  2008-06-06  8:08         ` Ruan Beihong
@ 2008-06-06 12:31         ` Maxim Kuvyrkov
  2008-06-06 14:06           ` Richard Sandiford
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-06 12:31 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:

...

> Maxim Kuvyrkov <maxim@codesourcery.com> writes:

...

>> Loongson behaves as generic MIPS III in the respect to moves to and from 
>> FP registers.  So to handle new modes I added them to MOVE64 and SPLITF 
>> mode_iterators and adjusted mips_split_doubleword_move() accordingly.
> 
> This part looks good as far as it goes, thanks, but your vector move
> pattern still has an "f<-f" (aka "fmove") alternative.  Like I say,
> that causes GCC to use a MOV.D instruction for something that is not a
> double-precision floating-point value.  My main concern was that using
> MOV.D in this way is usually incorrect; see:
> 
>   - section 5.7 in volume 1 of the MIPS32/64 ISA spec
>   - the documentation of MOV.FMT in volume 2 of the MIPS32/64 ISA spec
> 
> for more details.
> 
> Thus the standard ISA spec has two separate 64-bit move instructions:
> MOV.D and MOV.PS.  Both instructions move one 64-bit FPR to another,
> but they are used for two different kinds of 64-bit data.
> 
> There is no MOV.L instruction, so 64-bit integer moves must be done
> through a GPR.  (This is true for both for 32-bit and 64-bit targets.)
> Using MOV.D or MOV.PS for 64-bit integers is incorrect.
> 
> So when it comes to the new modes, I think there are two cases:
> 
>   (1) Loongson specifically exempts itself from this restriction.
>       You can use MOV.D for any kind of data, regardless of how
>       the source FPR has been used, or how the destination FPR
>       will be used.
> 
>   (2) We need to move through GPRs for the new modes too.
> 
> The current implementation falls between two stools.  It provides
> an "fmove" alternative in the move pattern (suggesting (1)),
> but mips_mode_ok_for_mov_fmt_p still returns false (suggesting (2)).

It is the (2).  I didn't spot "f" -> "f" alternative in
mov<mode>_internal when fixed the patch; this alternative should be
removed.  I checked with Loongson designers and they confirmed that
MOV.D should not be used in this case.

...

> Couldn't you replace the unsigned_p argument with the test:
> 
>   if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type))
> 
> ?  That'd avoid accidentally mixing the unsigned_p and type arguments.

Fixed.

> 
>> +Also provided are helper functions for loading and storing values of the
>> +above 64-bit vector types to and from memory:
>> +
>> +@smallexample
>> +uint32x2_t vec_load_uw (uint32x2_t *src);
>> +uint16x4_t vec_load_uh (uint16x4_t *src);
>> +uint8x8_t vec_load_ub (uint8x8_t *src);
>> +int32x2_t vec_load_sw (int32x2_t *src);
>> +int16x4_t vec_load_sh (int16x4_t *src);
>> +int8x8_t vec_load_sb (int8x8_t *src);
>> +void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
>> +void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
>> +void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
>> +void vec_store_sw (int32x2_t v, int32x2_t *dest);
>> +void vec_store_sh (int16x4_t v, int16x4_t *dest);
>> +void vec_store_sb (int8x8_t v, int8x8_t *dest);
>> +@end smallexample
> 
> I assume this is an existing cross-compiler API you're implementing.

This API was developed at CodeSourcery and was signed off by ST 
Microelectronics, the producers of Loongson CPUs.  I speculate that the 
design of the API is similar to arm_neon.h.

> Is it worth saying that plain C pointer dereferences would also work?
> Or do we explicitly want to steer the user away from that, because
> doing so doesn't conform to the API?  I think we should say something
> either way.
> 
> (For the record, I'm OK with keeping these functions, if you're
> implementing an existing API.)

How about "While it is possible to use plain C pointer dereferences, the 
following helper functions provide stable interface for loading and 
storing values of the above 64-bit vector types to and from memory" ?

> 
> I believe:
> 
>    foo_t vec_load_bar (const volatile foo_t *src);
> 
> would be more general; as it stands, I think using the functions
> on constant or volatile data would result in a warning.  Likewise:
> 
>    void vec_store_bar (foo_t v, volatile foo_t *dest);
> 
> But again, if you need to be precisely compatible with an API,
> I'm OK keeping it as-is.

"volatile" will probably kill CSE optimizations.  "const" seems like a 
good addition.

> 
>> +;; Handle legitimized moves between values of vector modes.
>> +(define_insn "mov<mode>_internal"
>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,d,f,d,m, f")
>> +	(match_operand:VWHB 1 "move_operand"          "f,m,f,f,d,d,YG,YG"))]
>> +  "HAVE_LOONGSON_VECTOR_MODES"
>> +  { return mips_output_move (operands[0], operands[1]); }
>> +  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
>                                                           ^^^^^^^
> Just "store"; this is a GPR store.  The pattern is missing "m<-d",
> "d<-m" and "d<-YG", which you also need to handle.  I think:
> 
> (define_insn "mov<mode>_internal"
>   [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,d,m,d")
> 	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
>   "HAVE_LOONGSON_VECTOR_MODES"
>   { return mips_output_move (operands[0], operands[1]); }
>   [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
> 
> would be correct for (2) above.  Add the fmove alternative back for (1).

Fixed, no fmove alternative now.  Thanks for pointing at this.

> 
>> --- config/mips/mips.h	(/local/gcc-trunk/gcc)	(revision 373)
>> +++ config/mips/mips.h	(/local/gcc-2/gcc)	(revision 373)
>> @@ -266,6 +266,11 @@ enum mips_code_readable_setting {
>>  				     || mips_tune == PROCESSOR_74KF3_2)
>>  #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
>>  
>> +/* Whether vector modes and intrinsics for ST Microelectronics
>> +   Loongson-2E/2F processors should be enabled.  In o32 pairs of
>> +   floating-point registers provide 64-bit values.  */
>> +#define HAVE_LOONGSON_VECTOR_MODES TARGET_LOONGSON_2EF
>> +
>>  /* True if the pre-reload scheduler should try to create chains of
>>     multiply-add or multiply-subtract instructions.  For example,
>>     suppose we have:
> 
> I think this needs to depend on TARGET_HARD_FLOAT as well.

Fixed.

--
Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-06 12:31         ` Maxim Kuvyrkov
@ 2008-06-06 14:06           ` Richard Sandiford
  2008-06-09 18:27             ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-06 14:06 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>>> Loongson behaves as generic MIPS III in the respect to moves to and from 
>>> FP registers.  So to handle new modes I added them to MOVE64 and SPLITF 
>>> mode_iterators and adjusted mips_split_doubleword_move() accordingly.
>> 
>> This part looks good as far as it goes, thanks, but your vector move
>> pattern still has an "f<-f" (aka "fmove") alternative.  Like I say,
>> that causes GCC to use a MOV.D instruction for something that is not a
>> double-precision floating-point value.  My main concern was that using
>> MOV.D in this way is usually incorrect; see:
>> 
>>   - section 5.7 in volume 1 of the MIPS32/64 ISA spec
>>   - the documentation of MOV.FMT in volume 2 of the MIPS32/64 ISA spec
>> 
>> for more details.
>> 
>> Thus the standard ISA spec has two separate 64-bit move instructions:
>> MOV.D and MOV.PS.  Both instructions move one 64-bit FPR to another,
>> but they are used for two different kinds of 64-bit data.
>> 
>> There is no MOV.L instruction, so 64-bit integer moves must be done
>> through a GPR.  (This is true for both for 32-bit and 64-bit targets.)
>> Using MOV.D or MOV.PS for 64-bit integers is incorrect.
>> 
>> So when it comes to the new modes, I think there are two cases:
>> 
>>   (1) Loongson specifically exempts itself from this restriction.
>>       You can use MOV.D for any kind of data, regardless of how
>>       the source FPR has been used, or how the destination FPR
>>       will be used.
>> 
>>   (2) We need to move through GPRs for the new modes too.
>> 
>> The current implementation falls between two stools.  It provides
>> an "fmove" alternative in the move pattern (suggesting (1)),
>> but mips_mode_ok_for_mov_fmt_p still returns false (suggesting (2)).
>
> It is the (2).  I didn't spot "f" -> "f" alternative in
> mov<mode>_internal when fixed the patch; this alternative should be
> removed.  I checked with Loongson designers and they confirmed that
> MOV.D should not be used in this case.
>
> ...

OK, thanks.

>>> +Also provided are helper functions for loading and storing values of the
>>> +above 64-bit vector types to and from memory:
>>> +
>>> +@smallexample
>>> +uint32x2_t vec_load_uw (uint32x2_t *src);
>>> +uint16x4_t vec_load_uh (uint16x4_t *src);
>>> +uint8x8_t vec_load_ub (uint8x8_t *src);
>>> +int32x2_t vec_load_sw (int32x2_t *src);
>>> +int16x4_t vec_load_sh (int16x4_t *src);
>>> +int8x8_t vec_load_sb (int8x8_t *src);
>>> +void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
>>> +void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
>>> +void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
>>> +void vec_store_sw (int32x2_t v, int32x2_t *dest);
>>> +void vec_store_sh (int16x4_t v, int16x4_t *dest);
>>> +void vec_store_sb (int8x8_t v, int8x8_t *dest);
>>> +@end smallexample
>> 
>> I assume this is an existing cross-compiler API you're implementing.
>
> This API was developed at CodeSourcery and was signed off by ST 
> Microelectronics, the producers of Loongson CPUs.  I speculate that the 
> design of the API is similar to arm_neon.h.
>
>> Is it worth saying that plain C pointer dereferences would also work?
>> Or do we explicitly want to steer the user away from that, because
>> doing so doesn't conform to the API?  I think we should say something
>> either way.
>> 
>> (For the record, I'm OK with keeping these functions, if you're
>> implementing an existing API.)
>
> How about "While it is possible to use plain C pointer dereferences, the 
> following helper functions provide stable interface for loading and 
> storing values of the above 64-bit vector types to and from memory" ?

Hmm.  If this is a newly-defined interface, I really have to question
the wisdom of these functions.  The wording above suggests that there's
something "unstable" about normal C pointer and array accesses.
There shouldn't be ;)  They ought to work as expected.

The patch rightly uses well-known insn names for well-known operations
like vector addition, vector maximum, and so on.  As well as allowing
autovectorisation, I believe this means you could write:

    uint8x8_t *a;

    a[0] = a[1] + a[2];

(It might be nice to have tests to make sure that this does indeed
work when using the new header file.  It could just be cut-&-paste
from the version that uses intrinsic functions.)

I just think that, given GCC's vector extensions, having these
functions as well is confusing.  I take what you say about it
being consistent with arm_neon.h, but AltiVec doesn't have these
sorts of function, and GCC's generic vector support was heavily
influenced by AltiVec.

Sorry for shooting the messenger here.  I realise this wasn't
your decision.

>> I believe:
>> 
>>    foo_t vec_load_bar (const volatile foo_t *src);
>> 
>> would be more general; as it stands, I think using the functions
>> on constant or volatile data would result in a warning.  Likewise:
>> 
>>    void vec_store_bar (foo_t v, volatile foo_t *dest);
>> 
>> But again, if you need to be precisely compatible with an API,
>> I'm OK keeping it as-is.
>
> "volatile" will probably kill CSE optimizations.  "const" seems like a 
> good addition.

Hmm, good point.  That's another mark against these functions IMO ;)

>>> +;; Handle legitimized moves between values of vector modes.
>>> +(define_insn "mov<mode>_internal"
>>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,d,f,d,m, f")
>>> +	(match_operand:VWHB 1 "move_operand"          "f,m,f,f,d,d,YG,YG"))]
>>> +  "HAVE_LOONGSON_VECTOR_MODES"
>>> +  { return mips_output_move (operands[0], operands[1]); }
>>> +  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
>>                                                           ^^^^^^^
>> Just "store"; this is a GPR store.  The pattern is missing "m<-d",
>> "d<-m" and "d<-YG", which you also need to handle.  I think:
>> 
>> (define_insn "mov<mode>_internal"
>>   [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,d,m,d")
>> 	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
>>   "HAVE_LOONGSON_VECTOR_MODES"
>>   { return mips_output_move (operands[0], operands[1]); }
>>   [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
>> 
>> would be correct for (2) above.  Add the fmove alternative back for (1).
>
> Fixed, no fmove alternative now.

Thanks.  It sounds from James Ruan's message that 2F could use an FPU
OR instruction here, but it's fine to handle that separately.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-05-22 19:33   ` Richard Sandiford
@ 2008-06-08 19:59     ` Maxim Kuvyrkov
  2008-06-09 13:16       ` Maxim Kuvyrkov
  2008-06-09 17:39       ` Richard Sandiford
  0 siblings, 2 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-08 19:59 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

Richard Sandiford wrote:

...

>> ;; This mode macro allows :ANYF_MIPS5_LS2 to be used wherever
>> ;; a scalar or Loongson2 vector floating-point mode is allowed.
>> (define_mode_macro ANYF_MIPS5_LS2
>>   [(SF "TARGET_HARD_FLOAT")
>>    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
>>    (V2SF "TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2")])
>>
>> Hence the instructions from MIPS5 ISA, which happen to be supported by 
>> Loongson2, are declared with ANYF_MIPS5_LS2 mode_iterator.  I don't like 
>> this, but can't figure out alternative way to name / describe these 
>> instructions.
> 
> I'd prefer to keep TARGET_PAIRED_SINGLE_FLOAT for the cases that
> are common between Loongson and non-Loongson mode (i.e. the cases in
> which .ps is available in some form).  Then add new ISA_HAS_FOO
> macros for each class of instruction that Loongson doesn't have.
> E.g.
> 
>     ISA_HAS_PXX_PS
> 
> for PUU.PS & co.
> 
> Feel free to run a list of ISA_HAS_* macros by me before testing.

How does the following patch look?  Now the patch disables certain .ps 
instruction when compiling for Loongson and uses everything else (in 
contrast, previous version enabled instruction that Loongson does 
support and disabled everything else).

I also noticed that the previous version of the patch didn't enable 
[n]madd3/msub3 instructions and fixed that in this one.

--
Maxim

[-- Attachment #2: fsf-ls2ef-3-insns.ChangeLog --]
[-- Type: text/plain, Size: 2096 bytes --]

2008-05-22  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/mips/mips.h (ISA_HAS_CONDMOVE): Slice ISA_HAS_FP_CONDMOVE
	from it.
	(ISA_HAS_FP_CONDMOVE): New macro.
	(ISA_HAS_PAIRED_SINGLE_LS2): New macro.
	(ISA_HAS_PAIRED_SINGLE): Use it.
	(ISA_HAS_FP_MADD4_MSUB4, ISA_HAS_FP_MADD3_MSUB3): New macros.
	(ISA_HAS_NMADD_NMSUB): Rename to ISA_HAS_NMADD4_NMSUB4.
	(ISA_HAS_NMADD3_NMSUB3): New macro.
	(ISA_HAS_MOVCC_PS, ISA_HAS_PXX_PS, ISA_HAS_CVT_PS, ISA_HAS_ALNV_PS)
	(ISA_HAS_ADDR_PS, ISA_HAS_MULR_PS, ISA_HAS_ABS_PS, ISA_HAS_CABS_PS)
	(ISA_HAS_C_COND_4S, ISA_HAS_C_COND_PS, ISA_HAS_SCC_PS)
	(ISA_HAS_BC1ANY_PS, ISA_HAS_RSQRT_PS): New macros.
	* config/mips/mips.c (mips_rtx_costs): Update.
	(override_options): Enable paired-single float instructions when
	compiling for ST Loongson 2E/2F.
	(mips_vector_mode_supported_p): Update.
	* config/mips/mips-ps-3d.md (movcc_v2sf_<mode>, mips_cond_move_tf_ps)
	(movv2sfcc, mips_pul_ps, mips_pXX_ps, vec_initv2sf)
	(vec_initv2sf_internal, vec_extractv2sf, vec_setv2sf, mips_cvt_ps_s)
	(mips_cvt_s_pl, mips_cvt_s_pu, mips_alnv_ps, mips_addr_ps)
	(mips_cvt_pw_ps, mips_cvt_ps_pw, mips_mulr_ps, mips_abs_ps)
	(mips_cabs_cond_<fmt>, mips_c_cond_4s, mips_cabs_cond_4s)
	(mips_c_cond_ps, mips_cabs_cond_ps, s<code>_ps, bc1any4X, bc1any2X)
	(mips_rsqrt1_<fmt>, mips_rsqrt2_<fmt>, mips_recip1_<fmt>)
	(mips_recip2_<fmt>, vcondv2sf, sminv2sf3, smaxv2sf3):
	Update predicates.
	* config/mips/mips.md (MOVECC): Don't use FP conditional moves when
	compiling for ST Loongson 2E/2F.
	(madd<mode>): Rename to madd4<mode>.  Update.
	(madd3<mode>): New pattern.
	(msub<mode>): Rename to msub4<mode>.  Update.
	(msub3<mode>): New pattern.
	(nmadd<mode>): Rename to nmadd4<mode>.  Update.
	(nmadd3<mode>): New pattern.
	(nmadd<mode>_fastmath): Rename to nmadd4<mode>_fastmath.  Update.
	(nmadd3<mode>_fastmath): New pattern.
	(nmsub<mode>): Rename to nmsub4<mode>.  Update.
	(nmsub3<mode>): New pattern.
	(nmsub<mode>_fastmath): Rename to nmsub4<mode>_fastmath.  Update.
	(nmsub3<mode>_fastmath): New pattern.
	(mov<SCALARF:mode>_on_<MOVECC:mode>, mov<mode>cc): Update.

[-- Attachment #3: fsf-ls2ef-3-insns.patch --]
[-- Type: text/plain, Size: 25913 bytes --]

--- gcc/config/mips/mips-ps-3d.md	(/local/gcc-2)	(revision 381)
+++ gcc/config/mips/mips-ps-3d.md	(/local/gcc-3)	(revision 381)
@@ -25,7 +25,7 @@
 			  (const_int 0)])
 	 (match_operand:V2SF 2 "register_operand" "f,0")
 	 (match_operand:V2SF 3 "register_operand" "0,f")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_MOVCC_PS"
   "@
     mov%T4.ps\t%0,%2,%1
     mov%t4.ps\t%0,%3,%1"
@@ -38,7 +38,7 @@
 		      (match_operand:V2SF 2 "register_operand" "0,f")
 		      (match_operand:CCV2 3 "register_operand" "z,z")]
 		     UNSPEC_MOVE_TF_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_MOVCC_PS"
   "@
     movt.ps\t%0,%1,%3
     movf.ps\t%0,%2,%3"
@@ -51,7 +51,7 @@
 	(if_then_else:V2SF (match_dup 5)
 			   (match_operand:V2SF 2 "register_operand")
 			   (match_operand:V2SF 3 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_MOVCC_PS"
 {
   /* We can only support MOVN.PS and MOVZ.PS.
      NOTE: MOVT.PS and MOVF.PS have different semantics from MOVN.PS and 
@@ -72,7 +72,7 @@
 	 (match_operand:V2SF 1 "register_operand" "f")
 	 (match_operand:V2SF 2 "register_operand" "f")
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_PXX_PS"
   "pul.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -86,7 +86,7 @@
 			  (parallel [(const_int 1)
 				     (const_int 0)]))
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_PXX_PS"
   "puu.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -100,7 +100,7 @@
 				     (const_int 0)]))
 	 (match_operand:V2SF 2 "register_operand" "f")
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_PXX_PS"
   "pll.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -116,7 +116,7 @@
 			  (parallel [(const_int 1)
 				     (const_int 0)]))
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_PXX_PS"
   "plu.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -125,7 +125,7 @@
 (define_expand "vec_initv2sf"
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:V2SF 1 "")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
 {
   rtx op0 = force_reg (SFmode, XVECEXP (operands[1], 0, 0));
   rtx op1 = force_reg (SFmode, XVECEXP (operands[1], 0, 1));
@@ -138,7 +138,7 @@
 	(vec_concat:V2SF
 	 (match_operand:SF 1 "register_operand" "f")
 	 (match_operand:SF 2 "register_operand" "f")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
 {
   if (BYTES_BIG_ENDIAN)
     return "cvt.ps.s\t%0,%1,%2";
@@ -157,7 +157,7 @@
 	(vec_select:SF (match_operand:V2SF 1 "register_operand" "f")
 		       (parallel
 			[(match_operand 2 "const_0_or_1_operand" "")])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
 {
   if (INTVAL (operands[2]) == !BYTES_BIG_ENDIAN)
     return "cvt.s.pu\t%0,%1";
@@ -174,7 +174,7 @@
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:SF 1 "register_operand")
    (match_operand 2 "const_0_or_1_operand")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_PXX_PS"
 {
   rtx temp;
 
@@ -194,7 +194,7 @@
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:SF 1 "register_operand")
    (match_operand:SF 2 "register_operand")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
 {
   if (BYTES_BIG_ENDIAN)
     emit_insn (gen_vec_initv2sf_internal (operands[0], operands[1],
@@ -210,7 +210,7 @@
   [(set (match_operand:SF 0 "register_operand")
 	(vec_select:SF (match_operand:V2SF 1 "register_operand")
 		       (parallel [(match_dup 2)])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
   { operands[2] = GEN_INT (BYTES_BIG_ENDIAN); })
 
 ; cvt.s.pu - Floating Point Convert Pair Upper to Single Floating Point
@@ -218,7 +218,7 @@
   [(set (match_operand:SF 0 "register_operand")
 	(vec_select:SF (match_operand:V2SF 1 "register_operand")
 		       (parallel [(match_dup 2)])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
   { operands[2] = GEN_INT (!BYTES_BIG_ENDIAN); })
 
 ; alnv.ps - Floating Point Align Variable
@@ -228,7 +228,7 @@
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand:SI 3 "register_operand" "d")]
 		     UNSPEC_ALNV_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_ALNV_PS"
   "alnv.ps\t%0,%1,%2,%3"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -239,7 +239,7 @@
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")
 		      (match_operand:V2SF 2 "register_operand" "f")]
 		     UNSPEC_ADDR_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_ADDR_PS"
   "addr.ps\t%0,%1,%2"
   [(set_attr "type" "fadd")
    (set_attr "mode" "SF")])
@@ -249,7 +249,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_CVT_PW_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
   "cvt.pw.ps\t%0,%1"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "SF")])
@@ -259,7 +259,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_CVT_PS_PW))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CVT_PS"
   "cvt.ps.pw\t%0,%1"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "SF")])
@@ -270,7 +270,7 @@
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")
 		      (match_operand:V2SF 2 "register_operand" "f")]
 		     UNSPEC_MULR_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_MULR_PS"
   "mulr.ps\t%0,%1,%2"
   [(set_attr "type" "fmul")
    (set_attr "mode" "SF")])
@@ -280,7 +280,7 @@
   [(set (match_operand:V2SF 0 "register_operand")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand")]
 		     UNSPEC_ABS_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_ABS_PS"
 {
   /* If we can ignore NaNs, this operation is equivalent to the
      rtl ABS code.  */
@@ -295,7 +295,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_ABS_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_ABS_PS"
   "abs.ps\t%0,%1"
   [(set_attr "type" "fabs")
    (set_attr "mode" "SF")])
@@ -310,7 +310,7 @@
 		    (match_operand:SCALARF 2 "register_operand" "f")
 		    (match_operand 3 "const_int_operand" "")]
 		   UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CABS_PS"
   "cabs.%Y3.<fmt>\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -328,7 +328,7 @@
 		      (match_operand:V2SF 4 "register_operand" "f")
 		      (match_operand 5 "const_int_operand" "")]
 		     UNSPEC_C))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_C_COND_4S"
   "#"
   "&& reload_completed"
   [(set (match_dup 6)
@@ -357,7 +357,7 @@
 		      (match_operand:V2SF 4 "register_operand" "f")
 		      (match_operand 5 "const_int_operand" "")]
 		     UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CABS_PS"
   "#"
   "&& reload_completed"
   [(set (match_dup 6)
@@ -389,7 +389,7 @@
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand 3 "const_int_operand" "")]
 		     UNSPEC_C))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_C_COND_PS"
   "c.%Y3.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -400,7 +400,7 @@
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand 3 "const_int_operand" "")]
 		     UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_CABS_PS"
   "cabs.%Y3.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -416,7 +416,7 @@
 	   [(fcond (match_operand:V2SF 1 "register_operand" "f")
 		   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_SCC_PS"
   "c.<fcond>.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -427,7 +427,7 @@
 	   [(swapped_fcond (match_operand:V2SF 1 "register_operand" "f")
 			   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_SCC_PS"
   "c.<swapped_fcond>.ps\t%0,%2,%1"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -443,7 +443,7 @@
 			  (const_int 0))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_BC1ANY_PS"
   "%*bc1any4t\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -455,7 +455,7 @@
 			  (const_int -1))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_BC1ANY_PS"
   "%*bc1any4f\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -467,7 +467,7 @@
 			  (const_int 0))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_BC1ANY_PS"
   "%*bc1any2t\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -479,7 +479,7 @@
 			  (const_int -1))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_BC1ANY_PS"
   "%*bc1any2f\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -545,7 +545,7 @@
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
 		     UNSPEC_RSQRT1))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_RSQRT_PS"
   "rsqrt1.<fmt>\t%0,%1"
   [(set_attr "type" "frsqrt1")
    (set_attr "mode" "<UNITMODE>")])
@@ -555,7 +555,7 @@
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
 		      (match_operand:ANYF 2 "register_operand" "f")]
 		     UNSPEC_RSQRT2))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_RSQRT_PS"
   "rsqrt2.<fmt>\t%0,%1,%2"
   [(set_attr "type" "frsqrt2")
    (set_attr "mode" "<UNITMODE>")])
@@ -564,7 +564,7 @@
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
 		     UNSPEC_RECIP1))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_RSQRT_PS"
   "recip1.<fmt>\t%0,%1"
   [(set_attr "type" "frdiv1")
    (set_attr "mode" "<UNITMODE>")])
@@ -574,7 +574,7 @@
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
 		      (match_operand:ANYF 2 "register_operand" "f")]
 		     UNSPEC_RECIP2))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_RSQRT_PS"
   "recip2.<fmt>\t%0,%1,%2"
   [(set_attr "type" "frdiv2")
    (set_attr "mode" "<UNITMODE>")])
@@ -587,7 +587,7 @@
 	     (match_operand:V2SF 5 "register_operand")])
 	  (match_operand:V2SF 1 "register_operand")
 	  (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 GET_CODE (operands[3]), operands[4], operands[5]);
@@ -598,7 +598,7 @@
   [(set (match_operand:V2SF 0 "register_operand")
 	(smin:V2SF (match_operand:V2SF 1 "register_operand")
 		   (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 LE, operands[1], operands[2]);
@@ -609,7 +609,7 @@
   [(set (match_operand:V2SF 0 "register_operand")
 	(smax:V2SF (match_operand:V2SF 1 "register_operand")
 		   (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 LE, operands[2], operands[1]);
--- gcc/config/mips/mips.md	(/local/gcc-2)	(revision 381)
+++ gcc/config/mips/mips.md	(/local/gcc-3)	(revision 381)
@@ -512,7 +512,8 @@
 
 ;; This mode iterator allows :MOVECC to be used anywhere that a
 ;; conditional-move-type condition is needed.
-(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT") (CC "TARGET_HARD_FLOAT")])
+(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT")
+                              (CC "TARGET_HARD_FLOAT && !TARGET_LOONGSON_2EF")])
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
@@ -1902,33 +1903,55 @@
 
 ;; Floating point multiply accumulate instructions.
 
-(define_insn "*madd<mode>"
+(define_insn "*madd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "madd.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*msub<mode>"
+(define_insn "*madd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD
+   && !HONOR_NANS (<MODE>mode)"
+  "madd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*msub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			       (match_operand:ANYF 2 "register_operand" "f"))
 		    (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "msub.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>"
+(define_insn "*msub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			       (match_operand:ANYF 2 "register_operand" "f"))
+		    (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD
+   && !HONOR_NANS (<MODE>mode)"
+  "msub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (plus:ANYF
 		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1936,13 +1959,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>_fastmath"
+(define_insn "*nmadd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (plus:ANYF
+		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
 		    (match_operand:ANYF 2 "register_operand" "f"))
 	 (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1950,13 +1987,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>"
+(define_insn "*nmadd3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
+		    (match_operand:ANYF 2 "register_operand" "f"))
+	 (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (minus:ANYF
 		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 			      (match_operand:ANYF 3 "register_operand" "f"))
 		   (match_operand:ANYF 1 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1964,19 +2015,47 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>_fastmath"
+(define_insn "*nmsub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (minus:ANYF
+		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+			      (match_operand:ANYF 3 "register_operand" "f"))
+		   (match_operand:ANYF 1 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (match_operand:ANYF 1 "register_operand" "f")
 	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 		    (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
   "nmsub.<fmt>\t%0,%1,%2,%3"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (match_operand:ANYF 1 "register_operand" "f")
+	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+		    (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
 \f
 ;;
 ;;  ....................
@@ -6301,7 +6380,7 @@
 		 (const_int 0)])
 	 (match_operand:SCALARF 2 "register_operand" "f,0")
 	 (match_operand:SCALARF 3 "register_operand" "0,f")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
   "@
     mov%T4.<fmt>\t%0,%2,%1
     mov%t4.<fmt>\t%0,%3,%1"
@@ -6328,7 +6407,7 @@
 	(if_then_else:SCALARF (match_dup 5)
 			      (match_operand:SCALARF 2 "register_operand")
 			      (match_operand:SCALARF 3 "register_operand")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
 {
   mips_expand_conditional_move (operands);
   DONE;
--- gcc/config/mips/mips.c	(/local/gcc-2)	(revision 381)
+++ gcc/config/mips/mips.c	(/local/gcc-3)	(revision 381)
@@ -3287,7 +3287,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case MINUS:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && !HONOR_SIGNED_ZEROS (mode))
@@ -3338,7 +3338,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case NEG:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && HONOR_SIGNED_ZEROS (mode))
@@ -12641,8 +12641,9 @@ mips_override_options (void)
       && !TARGET_PAIRED_SINGLE_FLOAT)
     error ("%<-mips3d%> requires %<-mpaired-single%>");
 
-  /* If TARGET_MIPS3D, enable MASK_PAIRED_SINGLE_FLOAT.  */
-  if (TARGET_MIPS3D)
+  /* If TARGET_MIPS3D or compiling for Loongson,
+     enable MASK_PAIRED_SINGLE_FLOAT.  */
+  if (TARGET_MIPS3D || ISA_HAS_PAIRED_SINGLE_LS2)
     target_flags |= MASK_PAIRED_SINGLE_FLOAT;
 
   /* Make sure that when TARGET_PAIRED_SINGLE_FLOAT is true, TARGET_FLOAT64
--- gcc/config/mips/mips.h	(/local/gcc-2)	(revision 381)
+++ gcc/config/mips/mips.h	(/local/gcc-3)	(revision 381)
@@ -742,14 +742,19 @@ enum mips_code_readable_setting {
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS16)
 
-/* ISA has the conditional move instructions introduced in mips4.  */
-#define ISA_HAS_CONDMOVE	((ISA_MIPS4				\
+/* ISA has the floating-point conditional move instructions introduced
+   in mips4.  */
+#define ISA_HAS_FP_CONDMOVE	((ISA_MIPS4				\
 				  || ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS5500			\
 				 && !TARGET_MIPS16)
 
+/* ISA has the integer conditional move instructions introduced in mips4 and
+   ST Loongson 2E/2F.  */
+#define ISA_HAS_CONDMOVE        (ISA_HAS_FP_CONDMOVE || TARGET_LOONGSON_2EF)
+
 /* ISA has LDC1 and SDC1.  */
 #define ISA_HAS_LDC1_SDC1	(!ISA_MIPS1 && !TARGET_MIPS16)
 
@@ -768,8 +773,14 @@ enum mips_code_readable_setting {
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS16)
 
+#define ISA_HAS_PAIRED_SINGLE_LS2					\
+                                (TARGET_LOONGSON_2EF			\
+				 && TARGET_HARD_FLOAT			\
+				 && TARGET_FLOAT64)
+
 /* ISA has paired-single instructions.  */
-#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2 || ISA_MIPS64)
+#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2 || ISA_MIPS64	\
+				 || ISA_HAS_PAIRED_SINGLE_LS2)
 
 /* ISA has conditional trap instructions.  */
 #define ISA_HAS_COND_TRAP	(!ISA_MIPS1				\
@@ -784,14 +795,28 @@ enum mips_code_readable_setting {
 /* Integer multiply-accumulate instructions should be generated.  */
 #define GENERATE_MADD_MSUB      (ISA_HAS_MADD_MSUB && !TUNE_74K)
 
-/* ISA has floating-point nmadd and nmsub instructions for mode MODE.  */
-#define ISA_HAS_NMADD_NMSUB(MODE) \
+/* ISA has floating-point madd and msub instructions 'd = a * b [+-] c'.  */
+#define ISA_HAS_FP_MADD4_MSUB4  ISA_HAS_FP4
+
+/* ISA has floating-point madd and msub instructions 'c [+-]= a * b'.  */
+#define ISA_HAS_FP_MADD3_MSUB3  (!ISA_HAS_FP_MADD4_MSUB4	\
+				 && TARGET_LOONGSON_2EF)
+
+/* ISA has floating-point nmadd and nmsub instructions
+   'd = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD4_NMSUB4(MODE)					\
 				((ISA_MIPS4				\
 				  || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
 				  || ISA_MIPS64)			\
 				 && (!TARGET_MIPS5400 || TARGET_MAD)	\
 				 && !TARGET_MIPS16)
 
+/* ISA has floating-point nmadd and nmsub instructions
+   'c = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD3_NMSUB3(MODE)					\
+                                (!ISA_HAS_NMADD4_NMSUB4 (MODE)		\
+				 && TARGET_LOONGSON_2EF)
+
 /* ISA has count leading zeroes/ones instruction (not implemented).  */
 #define ISA_HAS_CLZ_CLO		((ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
@@ -930,6 +955,52 @@ enum mips_code_readable_setting {
   (target_flags_explicit & MASK_LLSC	\
    ? TARGET_LLSC && !TARGET_MIPS16	\
    : ISA_HAS_LL_SC)
+
+/* Predicates for paired-single float instructions.
+   ST Loongson 2E/2F CPUs support only a subset of all
+   paired-single float instructions, so we use below predicates
+   to restrict unsupported instructions.  */
+
+#define ISA_HAS_MOVCC_PS   (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_PXX_PS     (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_CVT_PS     (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_ALNV_PS    (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_ADDR_PS    (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_MULR_PS    (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+
+#define ISA_HAS_ABS_PS     (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT)
+
+#define ISA_HAS_CABS_PS    (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_C_COND_4S  (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+
+#define ISA_HAS_C_COND_PS  (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT)
+#define ISA_HAS_SCC_PS     (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT)
+
+#define ISA_HAS_BC1ANY_PS  (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_RSQRT_PS   (TARGET_HARD_FLOAT			\
+			    && TARGET_PAIRED_SINGLE_FLOAT	\
+			    && !TARGET_LOONGSON_2EF)
 \f
 /* Add -G xx support.  */
 

Property changes on: 
___________________________________________________________________
Name: svk:merge
  23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-trunk:531
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:371
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-2:377
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:370
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-06-08 19:59     ` Maxim Kuvyrkov
@ 2008-06-09 13:16       ` Maxim Kuvyrkov
  2008-06-09 17:45         ` Richard Sandiford
  2008-06-09 17:39       ` Richard Sandiford
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-09 13:16 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Maxim Kuvyrkov wrote:
> Richard Sandiford wrote:
> 
> ...
> 
>>> ;; This mode macro allows :ANYF_MIPS5_LS2 to be used wherever
>>> ;; a scalar or Loongson2 vector floating-point mode is allowed.
>>> (define_mode_macro ANYF_MIPS5_LS2
>>>   [(SF "TARGET_HARD_FLOAT")
>>>    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
>>>    (V2SF "TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2")])
>>>
>>> Hence the instructions from MIPS5 ISA, which happen to be supported 
>>> by Loongson2, are declared with ANYF_MIPS5_LS2 mode_iterator.  I 
>>> don't like this, but can't figure out alternative way to name / 
>>> describe these instructions.
>>
>> I'd prefer to keep TARGET_PAIRED_SINGLE_FLOAT for the cases that
>> are common between Loongson and non-Loongson mode (i.e. the cases in
>> which .ps is available in some form).  Then add new ISA_HAS_FOO
>> macros for each class of instruction that Loongson doesn't have.
>> E.g.
>>
>>     ISA_HAS_PXX_PS
>>
>> for PUU.PS & co.
>>
>> Feel free to run a list of ISA_HAS_* macros by me before testing.
> 
> How does the following patch look?  Now the patch disables certain .ps 
> instruction when compiling for Loongson and uses everything else (in 
> contrast, previous version enabled instruction that Loongson does 
> support and disabled everything else).
> 
> I also noticed that the previous version of the patch didn't enable 
> [n]madd3/msub3 instructions and fixed that in this one.

During testing I discovered that the last version of the patch doesn't 
properly handles paired-single float builtins.

The last field in mips_builtin_description is target_flags.  For all 
builtins for paired-single float instructions target_flags is 
MASK_PAIRED_SINGLE_FLOAT.  As this mask specifies backend to support all 
paired-single float instruction Loongson gets too much intrinsics it 
can't back up.

I think, I misunderstood your original suggestion to make 
MASK_PAIRED_SINGLE_FLOAT to specify instructions that both Loongson and 
generic MIPS5 have.  If so, then we need a new mask to use with builtins 
that MIPS5 supports and Loongson doesn't.

--
Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-06-08 19:59     ` Maxim Kuvyrkov
  2008-06-09 13:16       ` Maxim Kuvyrkov
@ 2008-06-09 17:39       ` Richard Sandiford
  2008-06-17 19:52         ` Maxim Kuvyrkov
  2008-06-18 19:07         ` Maxim Kuvyrkov
  1 sibling, 2 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-09 17:39 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>>> ;; This mode macro allows :ANYF_MIPS5_LS2 to be used wherever
>>> ;; a scalar or Loongson2 vector floating-point mode is allowed.
>>> (define_mode_macro ANYF_MIPS5_LS2
>>>   [(SF "TARGET_HARD_FLOAT")
>>>    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")
>>>    (V2SF "TARGET_PAIRED_SINGLE_FLOAT_MIPS5_LS2")])
>>>
>>> Hence the instructions from MIPS5 ISA, which happen to be supported by 
>>> Loongson2, are declared with ANYF_MIPS5_LS2 mode_iterator.  I don't like 
>>> this, but can't figure out alternative way to name / describe these 
>>> instructions.
>> 
>> I'd prefer to keep TARGET_PAIRED_SINGLE_FLOAT for the cases that
>> are common between Loongson and non-Loongson mode (i.e. the cases in
>> which .ps is available in some form).  Then add new ISA_HAS_FOO
>> macros for each class of instruction that Loongson doesn't have.
>> E.g.
>> 
>>     ISA_HAS_PXX_PS
>> 
>> for PUU.PS & co.
>> 
>> Feel free to run a list of ISA_HAS_* macros by me before testing.
>
> How does the following patch look?  Now the patch disables certain .ps 
> instruction when compiling for Loongson and uses everything else (in 
> contrast, previous version enabled instruction that Loongson does 
> support and disabled everything else).
>
> I also noticed that the previous version of the patch didn't enable 
> [n]madd3/msub3 instructions and fixed that in this one.

Looks pretty good, thanks.  I'll reply to your follow-on post
about built-in functions separately.

The classification looks good.  However, the macros were simply
supposed to replace uses of TARGET_PAIRED_SINGLE_FLOAT; they weren't
meant to include the TARGET_HARD_FLOAT condition.

Looking back, my review didn't say either way, sorry, so I can
understand why did what you did.  I take the blame for that and
I've attached an adjusted patch below.

> @@ -768,8 +773,14 @@ enum mips_code_readable_setting {
>  				  || ISA_MIPS64)			\
>  				 && !TARGET_MIPS16)
>  
> +#define ISA_HAS_PAIRED_SINGLE_LS2					\
> +                                (TARGET_LOONGSON_2EF			\
> +				 && TARGET_HARD_FLOAT			\
> +				 && TARGET_FLOAT64)
> +
>  /* ISA has paired-single instructions.  */
> -#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2 || ISA_MIPS64)
> +#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2 || ISA_MIPS64	\
> +				 || ISA_HAS_PAIRED_SINGLE_LS2)

Just check TARGET_LOONGSON_2EF in ISA_HAS_PAIRED_SINGLE.  In the
other use of ISA_HAS_PAIRED_SINGLE_LS2, here...

> @@ -12641,8 +12641,9 @@ mips_override_options (void)
>        && !TARGET_PAIRED_SINGLE_FLOAT)
>      error ("%<-mips3d%> requires %<-mpaired-single%>");
>  
> -  /* If TARGET_MIPS3D, enable MASK_PAIRED_SINGLE_FLOAT.  */
> -  if (TARGET_MIPS3D)
> +  /* If TARGET_MIPS3D or compiling for Loongson,
> +     enable MASK_PAIRED_SINGLE_FLOAT.  */
> +  if (TARGET_MIPS3D || ISA_HAS_PAIRED_SINGLE_LS2)
>      target_flags |= MASK_PAIRED_SINGLE_FLOAT;
>  
>    /* Make sure that when TARGET_PAIRED_SINGLE_FLOAT is true, TARGET_FLOAT64

...the condition should not technically depend on TARGET_HARD_FLOAT,
but TARGET_HARD_FLOAT_ABI.  This only makes a difference for MIPS16,
which Loongson probably doesn't support, but the principle still stands.

I suggest:

  /* Make sure that when TARGET_PAIRED_SINGLE_FLOAT is true, TARGET_FLOAT64
     and TARGET_HARD_FLOAT_ABI are both true.  Select */
  if (TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI)
    {
      /* Make sure that the user didn't turn off paired single support when
	 MIPS-3D support is requested.  */
      if (TARGET_MIPS3D
	  && (target_flags_explicit & MASK_PAIRED_SINGLE_FLOAT)
	  && !TARGET_PAIRED_SINGLE_FLOAT)
	error ("%<-mips3d%> requires %<-mpaired-single%>");

      /* We can use paired-single instructions.  Select them for targets
	 that always provide them.  */
      if (TARGET_LOONGSON_2EF || TARGET_MIPS3D)
	target_flags |= MASK_PAIRED_SINGLE_FLOAT;
    }
  else
    {
      const char *missing_option;

      /* If we need TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI, pick the
	 most likely missing option.  -mfp64 only makes sense for
	 -mhard-float, so if both conditions are false, warn about
	 -mhard-float.  */
      missing_option = (TARGET_HARD_FLOAT_ABI ? "-mfp64" : "-mhard-float");
      if (TARGET_MIPS3D)
	error ("%qs must be used with %qs", "-mips3d", missing_option);
      else if (TARGET_PAIRED_SINGLE_FLOAT)
	error ("%qs must be used with %qs", "-mpaired-single", missing_option);
    }


> +/* ISA has floating-point madd and msub instructions 'd = a * b [+-] c'.  */
> +#define ISA_HAS_FP_MADD4_MSUB4  ISA_HAS_FP4
> +
> +/* ISA has floating-point madd and msub instructions 'c [+-]= a * b'.  */
> +#define ISA_HAS_FP_MADD3_MSUB3  (!ISA_HAS_FP_MADD4_MSUB4	\
> +				 && TARGET_LOONGSON_2EF)

I realise that you're trying to make sure that the two macros are
mutually-exclusive, but they should be naturally exclusive without
the explicit check.  So:

#define ISA_HAS_FP_MADD3_MSUB3  TARGET_LOONGSON_2EF

Same...

> +/* ISA has floating-point nmadd and nmsub instructions
> +   'd = -(a * b) [+-] c'.  */
> +#define ISA_HAS_NMADD4_NMSUB4(MODE)					\
>  				((ISA_MIPS4				\
>  				  || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
>  				  || ISA_MIPS64)			\
>  				 && (!TARGET_MIPS5400 || TARGET_MAD)	\
>  				 && !TARGET_MIPS16)
>  
> +/* ISA has floating-point nmadd and nmsub instructions
> +   'c = -(a * b) [+-] c'.  */
> +#define ISA_HAS_NMADD3_NMSUB3(MODE)					\
> +                                (!ISA_HAS_NMADD4_NMSUB4 (MODE)		\
> +				 && TARGET_LOONGSON_2EF)

...here.

>  /* ISA has count leading zeroes/ones instruction (not implemented).  */
>  #define ISA_HAS_CLZ_CLO		((ISA_MIPS32				\
>  				  || ISA_MIPS32R2			\
> @@ -930,6 +955,52 @@ enum mips_code_readable_setting {
>    (target_flags_explicit & MASK_LLSC	\
>     ? TARGET_LLSC && !TARGET_MIPS16	\
>     : ISA_HAS_LL_SC)
> +
> +/* Predicates for paired-single float instructions.
> +   ST Loongson 2E/2F CPUs support only a subset of all
> +   paired-single float instructions, so we use below predicates
> +   to restrict unsupported instructions.  */
> +
> +#define ISA_HAS_MOVCC_PS   (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +#define ISA_HAS_PXX_PS     (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +#define ISA_HAS_CVT_PS     (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +#define ISA_HAS_ALNV_PS    (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +#define ISA_HAS_ADDR_PS    (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +#define ISA_HAS_MULR_PS    (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +
> +#define ISA_HAS_ABS_PS     (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT)
> +
> +#define ISA_HAS_CABS_PS    (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +#define ISA_HAS_C_COND_4S  (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +
> +#define ISA_HAS_C_COND_PS  (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT)
> +#define ISA_HAS_SCC_PS     (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT)
> +
> +#define ISA_HAS_BC1ANY_PS  (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)
> +#define ISA_HAS_RSQRT_PS   (TARGET_HARD_FLOAT			\
> +			    && TARGET_PAIRED_SINGLE_FLOAT	\
> +			    && !TARGET_LOONGSON_2EF)

The ordering and blanks look odd to me here.  How about putting all the
Loongson ones first, then a blank line, then all the non-Loongson ones?

> @@ -1902,33 +1903,55 @@
>  
>  ;; Floating point multiply accumulate instructions.
>  
> -(define_insn "*madd<mode>"
> +(define_insn "*madd4<mode>"
>    [(set (match_operand:ANYF 0 "register_operand" "=f")
>  	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
>  			      (match_operand:ANYF 2 "register_operand" "f"))
>  		   (match_operand:ANYF 3 "register_operand" "f")))]
> -  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
> +  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
>    "madd.<fmt>\t%0,%3,%1,%2"
>    [(set_attr "type" "fmadd")
>     (set_attr "mode" "<UNITMODE>")])
>  
> -(define_insn "*msub<mode>"
> +(define_insn "*madd3<mode>"
> +  [(set (match_operand:ANYF 0 "register_operand" "=f")
> +	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
> +			      (match_operand:ANYF 2 "register_operand" "f"))
> +		   (match_operand:ANYF 3 "register_operand" "0")))]
> +  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD
> +   && !HONOR_NANS (<MODE>mode)"
> +  "madd.<fmt>\t%0,%1,%2"
> +  [(set_attr "type" "fmadd")
> +   (set_attr "mode" "<UNITMODE>")])
> +
> +(define_insn "*msub4<mode>"
>    [(set (match_operand:ANYF 0 "register_operand" "=f")
>  	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
>  			       (match_operand:ANYF 2 "register_operand" "f"))
>  		    (match_operand:ANYF 3 "register_operand" "f")))]
> -  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
> +  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
>    "msub.<fmt>\t%0,%3,%1,%2"
>    [(set_attr "type" "fmadd")
>     (set_attr "mode" "<UNITMODE>")])
>  
> -(define_insn "*nmadd<mode>"
> +(define_insn "*msub3<mode>"
> +  [(set (match_operand:ANYF 0 "register_operand" "=f")
> +	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
> +			       (match_operand:ANYF 2 "register_operand" "f"))
> +		    (match_operand:ANYF 3 "register_operand" "0")))]
> +  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD
> +   && !HONOR_NANS (<MODE>mode)"
> +  "msub.<fmt>\t%0,%1,%2"
> +  [(set_attr "type" "fmadd")
> +   (set_attr "mode" "<UNITMODE>")])
> +

I'd like a comment to say why we need !HONOR_NANS (<MODE>mode) for
the 3-operand form but not the 4-operand form.

Like I say, here's an adjusted patch.  I've made all the changes
except for the last one (because I don't know the reason why we
need the HONOR_NANS check).

If the revised patch is OK with you, then it's OK to commit after testing.
Please run the HONOR_NANS comment past me first, though.  (Alternatively,
if we don't need the extra !HONOR_NANS condition, the patch is OK with
that removed.)

Thanks,
Richard


Index: gcc/config/mips/mips-ps-3d.md
===================================================================
--- gcc/config/mips/mips-ps-3d.md	2008-06-09 18:20:40.000000000 +0100
+++ gcc/config/mips/mips-ps-3d.md	2008-06-09 18:25:04.000000000 +0100
@@ -25,7 +25,7 @@ (define_insn "*movcc_v2sf_<mode>"
 			  (const_int 0)])
 	 (match_operand:V2SF 2 "register_operand" "f,0")
 	 (match_operand:V2SF 3 "register_operand" "0,f")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
   "@
     mov%T4.ps\t%0,%2,%1
     mov%t4.ps\t%0,%3,%1"
@@ -38,7 +38,7 @@ (define_insn "mips_cond_move_tf_ps"
 		      (match_operand:V2SF 2 "register_operand" "0,f")
 		      (match_operand:CCV2 3 "register_operand" "z,z")]
 		     UNSPEC_MOVE_TF_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
   "@
     movt.ps\t%0,%1,%3
     movf.ps\t%0,%2,%3"
@@ -51,7 +51,7 @@ (define_expand "movv2sfcc"
 	(if_then_else:V2SF (match_dup 5)
 			   (match_operand:V2SF 2 "register_operand")
 			   (match_operand:V2SF 3 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   /* We can only support MOVN.PS and MOVZ.PS.
      NOTE: MOVT.PS and MOVF.PS have different semantics from MOVN.PS and 
@@ -72,7 +72,7 @@ (define_insn "mips_pul_ps"
 	 (match_operand:V2SF 1 "register_operand" "f")
 	 (match_operand:V2SF 2 "register_operand" "f")
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "pul.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -86,7 +86,7 @@ (define_insn "mips_puu_ps"
 			  (parallel [(const_int 1)
 				     (const_int 0)]))
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "puu.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -100,7 +100,7 @@ (define_insn "mips_pll_ps"
 				     (const_int 0)]))
 	 (match_operand:V2SF 2 "register_operand" "f")
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "pll.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -116,7 +116,7 @@ (define_insn "mips_plu_ps"
 			  (parallel [(const_int 1)
 				     (const_int 0)]))
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "plu.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -125,7 +125,7 @@ (define_insn "mips_plu_ps"
 (define_expand "vec_initv2sf"
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:V2SF 1 "")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   rtx op0 = force_reg (SFmode, XVECEXP (operands[1], 0, 0));
   rtx op1 = force_reg (SFmode, XVECEXP (operands[1], 0, 1));
@@ -138,7 +138,7 @@ (define_insn "vec_initv2sf_internal"
 	(vec_concat:V2SF
 	 (match_operand:SF 1 "register_operand" "f")
 	 (match_operand:SF 2 "register_operand" "f")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   if (BYTES_BIG_ENDIAN)
     return "cvt.ps.s\t%0,%1,%2";
@@ -157,7 +157,7 @@ (define_insn "vec_extractv2sf"
 	(vec_select:SF (match_operand:V2SF 1 "register_operand" "f")
 		       (parallel
 			[(match_operand 2 "const_0_or_1_operand" "")])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   if (INTVAL (operands[2]) == !BYTES_BIG_ENDIAN)
     return "cvt.s.pu\t%0,%1";
@@ -174,7 +174,7 @@ (define_expand "vec_setv2sf"
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:SF 1 "register_operand")
    (match_operand 2 "const_0_or_1_operand")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
 {
   rtx temp;
 
@@ -194,7 +194,7 @@ (define_expand "mips_cvt_ps_s"
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:SF 1 "register_operand")
    (match_operand:SF 2 "register_operand")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   if (BYTES_BIG_ENDIAN)
     emit_insn (gen_vec_initv2sf_internal (operands[0], operands[1],
@@ -210,7 +210,7 @@ (define_expand "mips_cvt_s_pl"
   [(set (match_operand:SF 0 "register_operand")
 	(vec_select:SF (match_operand:V2SF 1 "register_operand")
 		       (parallel [(match_dup 2)])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   { operands[2] = GEN_INT (BYTES_BIG_ENDIAN); })
 
 ; cvt.s.pu - Floating Point Convert Pair Upper to Single Floating Point
@@ -218,7 +218,7 @@ (define_expand "mips_cvt_s_pu"
   [(set (match_operand:SF 0 "register_operand")
 	(vec_select:SF (match_operand:V2SF 1 "register_operand")
 		       (parallel [(match_dup 2)])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   { operands[2] = GEN_INT (!BYTES_BIG_ENDIAN); })
 
 ; alnv.ps - Floating Point Align Variable
@@ -228,7 +228,7 @@ (define_insn "mips_alnv_ps"
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand:SI 3 "register_operand" "d")]
 		     UNSPEC_ALNV_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ALNV_PS"
   "alnv.ps\t%0,%1,%2,%3"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -239,7 +239,7 @@ (define_insn "mips_addr_ps"
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")
 		      (match_operand:V2SF 2 "register_operand" "f")]
 		     UNSPEC_ADDR_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ADDR_PS"
   "addr.ps\t%0,%1,%2"
   [(set_attr "type" "fadd")
    (set_attr "mode" "SF")])
@@ -249,7 +249,7 @@ (define_insn "mips_cvt_pw_ps"
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_CVT_PW_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   "cvt.pw.ps\t%0,%1"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "SF")])
@@ -259,7 +259,7 @@ (define_insn "mips_cvt_ps_pw"
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_CVT_PS_PW))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   "cvt.ps.pw\t%0,%1"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "SF")])
@@ -270,7 +270,7 @@ (define_insn "mips_mulr_ps"
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")
 		      (match_operand:V2SF 2 "register_operand" "f")]
 		     UNSPEC_MULR_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MULR_PS"
   "mulr.ps\t%0,%1,%2"
   [(set_attr "type" "fmul")
    (set_attr "mode" "SF")])
@@ -280,7 +280,7 @@ (define_expand "mips_abs_ps"
   [(set (match_operand:V2SF 0 "register_operand")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand")]
 		     UNSPEC_ABS_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ABS_PS"
 {
   /* If we can ignore NaNs, this operation is equivalent to the
      rtl ABS code.  */
@@ -295,7 +295,7 @@ (define_insn "*mips_abs_ps"
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_ABS_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ABS_PS"
   "abs.ps\t%0,%1"
   [(set_attr "type" "fabs")
    (set_attr "mode" "SF")])
@@ -310,7 +310,7 @@ (define_insn "mips_cabs_cond_<fmt>"
 		    (match_operand:SCALARF 2 "register_operand" "f")
 		    (match_operand 3 "const_int_operand" "")]
 		   UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CABS_PS"
   "cabs.%Y3.<fmt>\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -328,7 +328,7 @@ (define_insn_and_split "mips_c_cond_4s"
 		      (match_operand:V2SF 4 "register_operand" "f")
 		      (match_operand 5 "const_int_operand" "")]
 		     UNSPEC_C))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_C_COND_4S"
   "#"
   "&& reload_completed"
   [(set (match_dup 6)
@@ -357,7 +357,7 @@ (define_insn_and_split "mips_cabs_cond_4
 		      (match_operand:V2SF 4 "register_operand" "f")
 		      (match_operand 5 "const_int_operand" "")]
 		     UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CABS_PS"
   "#"
   "&& reload_completed"
   [(set (match_dup 6)
@@ -389,7 +389,7 @@ (define_insn "mips_c_cond_ps"
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand 3 "const_int_operand" "")]
 		     UNSPEC_C))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_C_COND_PS"
   "c.%Y3.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -400,7 +400,7 @@ (define_insn "mips_cabs_cond_ps"
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand 3 "const_int_operand" "")]
 		     UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CABS_PS"
   "cabs.%Y3.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -416,7 +416,7 @@ (define_insn "s<code>_ps"
 	   [(fcond (match_operand:V2SF 1 "register_operand" "f")
 		   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_SCC_PS"
   "c.<fcond>.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -427,7 +427,7 @@ (define_insn "s<code>_ps"
 	   [(swapped_fcond (match_operand:V2SF 1 "register_operand" "f")
 			   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_SCC_PS"
   "c.<swapped_fcond>.ps\t%0,%2,%1"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -443,7 +443,7 @@ (define_insn "bc1any4t"
 			  (const_int 0))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any4t\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -455,7 +455,7 @@ (define_insn "bc1any4f"
 			  (const_int -1))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any4f\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -467,7 +467,7 @@ (define_insn "bc1any2t"
 			  (const_int 0))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any2t\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -479,7 +479,7 @@ (define_insn "bc1any2f"
 			  (const_int -1))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any2f\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -545,7 +545,7 @@ (define_insn "mips_rsqrt1_<fmt>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
 		     UNSPEC_RSQRT1))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "rsqrt1.<fmt>\t%0,%1"
   [(set_attr "type" "frsqrt1")
    (set_attr "mode" "<UNITMODE>")])
@@ -555,7 +555,7 @@ (define_insn "mips_rsqrt2_<fmt>"
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
 		      (match_operand:ANYF 2 "register_operand" "f")]
 		     UNSPEC_RSQRT2))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "rsqrt2.<fmt>\t%0,%1,%2"
   [(set_attr "type" "frsqrt2")
    (set_attr "mode" "<UNITMODE>")])
@@ -564,7 +564,7 @@ (define_insn "mips_recip1_<fmt>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
 		     UNSPEC_RECIP1))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "recip1.<fmt>\t%0,%1"
   [(set_attr "type" "frdiv1")
    (set_attr "mode" "<UNITMODE>")])
@@ -574,7 +574,7 @@ (define_insn "mips_recip2_<fmt>"
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
 		      (match_operand:ANYF 2 "register_operand" "f")]
 		     UNSPEC_RECIP2))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "recip2.<fmt>\t%0,%1,%2"
   [(set_attr "type" "frdiv2")
    (set_attr "mode" "<UNITMODE>")])
@@ -587,7 +587,7 @@ (define_expand "vcondv2sf"
 	     (match_operand:V2SF 5 "register_operand")])
 	  (match_operand:V2SF 1 "register_operand")
 	  (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 GET_CODE (operands[3]), operands[4], operands[5]);
@@ -598,7 +598,7 @@ (define_expand "sminv2sf3"
   [(set (match_operand:V2SF 0 "register_operand")
 	(smin:V2SF (match_operand:V2SF 1 "register_operand")
 		   (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 LE, operands[1], operands[2]);
@@ -609,7 +609,7 @@ (define_expand "smaxv2sf3"
   [(set (match_operand:V2SF 0 "register_operand")
 	(smax:V2SF (match_operand:V2SF 1 "register_operand")
 		   (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 LE, operands[2], operands[1]);
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2008-06-09 18:20:40.000000000 +0100
+++ gcc/config/mips/mips.c	2008-06-09 18:31:32.000000000 +0100
@@ -3286,7 +3286,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case MINUS:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && !HONOR_SIGNED_ZEROS (mode))
@@ -3337,7 +3337,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case NEG:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && HONOR_SIGNED_ZEROS (mode))
@@ -12440,23 +12440,34 @@ mips_override_options (void)
   REAL_MODE_FORMAT (TFmode) = &MIPS_TFMODE_FORMAT;
 #endif
 
-  /* Make sure that the user didn't turn off paired single support when
-     MIPS-3D support is requested.  */
-  if (TARGET_MIPS3D
-      && (target_flags_explicit & MASK_PAIRED_SINGLE_FLOAT)
-      && !TARGET_PAIRED_SINGLE_FLOAT)
-    error ("%<-mips3d%> requires %<-mpaired-single%>");
-
-  /* If TARGET_MIPS3D, enable MASK_PAIRED_SINGLE_FLOAT.  */
-  if (TARGET_MIPS3D)
-    target_flags |= MASK_PAIRED_SINGLE_FLOAT;
-
-  /* Make sure that when TARGET_PAIRED_SINGLE_FLOAT is true, TARGET_FLOAT64
-     and TARGET_HARD_FLOAT_ABI are both true.  */
-  if (TARGET_PAIRED_SINGLE_FLOAT && !(TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI))
-    error ("%qs must be used with %qs",
-	   TARGET_MIPS3D ? "-mips3d" : "-mpaired-single",
-	   TARGET_HARD_FLOAT_ABI ? "-mfp64" : "-mhard-float");
+  if (TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI)
+    {
+      /* Make sure that the user didn't turn off paired single support when
+	 MIPS-3D support is requested.  */
+      if (TARGET_MIPS3D
+	  && (target_flags_explicit & MASK_PAIRED_SINGLE_FLOAT)
+	  && !TARGET_PAIRED_SINGLE_FLOAT)
+	error ("%<-mips3d%> requires %<-mpaired-single%>");
+
+      /* We can use paired-single instructions.  Select them for targets
+	 that always provide them.  */
+      if (TARGET_LOONGSON_2EF || TARGET_MIPS3D)
+	target_flags |= MASK_PAIRED_SINGLE_FLOAT;
+    }
+  else
+    {
+      const char *missing_option;
+
+      /* If we need TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI, pick the
+	 most likely missing option.  -mfp64 only makes sense for
+	 -mhard-float, so if both conditions are false, complain about
+	 -mhard-float.  */
+      missing_option = (TARGET_HARD_FLOAT_ABI ? "-mfp64" : "-mhard-float");
+      if (TARGET_MIPS3D)
+	error ("%qs must be used with %qs", "-mips3d", missing_option);
+      else if (TARGET_PAIRED_SINGLE_FLOAT)
+	error ("%qs must be used with %qs", "-mpaired-single", missing_option);
+    }
 
   /* Make sure that the ISA supports TARGET_PAIRED_SINGLE_FLOAT when it is
      enabled.  */
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	2008-06-09 18:20:40.000000000 +0100
+++ gcc/config/mips/mips.h	2008-06-09 18:25:04.000000000 +0100
@@ -733,14 +733,19 @@ #define ISA_HAS_MUL3		((TARGET_MIPS3900 
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS16)
 
-/* ISA has the conditional move instructions introduced in mips4.  */
-#define ISA_HAS_CONDMOVE	((ISA_MIPS4				\
+/* ISA has the floating-point conditional move instructions introduced
+   in mips4.  */
+#define ISA_HAS_FP_CONDMOVE	((ISA_MIPS4				\
 				  || ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS5500			\
 				 && !TARGET_MIPS16)
 
+/* ISA has the integer conditional move instructions introduced in mips4 and
+   ST Loongson 2E/2F.  */
+#define ISA_HAS_CONDMOVE        (ISA_HAS_FP_CONDMOVE || TARGET_LOONGSON_2EF)
+
 /* ISA has LDC1 and SDC1.  */
 #define ISA_HAS_LDC1_SDC1	(!ISA_MIPS1 && !TARGET_MIPS16)
 
@@ -760,7 +765,9 @@ #define ISA_HAS_FP4		((ISA_MIPS4				\
 				 && !TARGET_MIPS16)
 
 /* ISA has paired-single instructions.  */
-#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2 || ISA_MIPS64)
+#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2				\
+				 || ISA_MIPS64				\
+				 || TARGET_LOONGSON_2EF)
 
 /* ISA has conditional trap instructions.  */
 #define ISA_HAS_COND_TRAP	(!ISA_MIPS1				\
@@ -775,14 +782,26 @@ #define ISA_HAS_MADD_MSUB	((ISA_MIPS32		
 /* Integer multiply-accumulate instructions should be generated.  */
 #define GENERATE_MADD_MSUB      (ISA_HAS_MADD_MSUB && !TUNE_74K)
 
-/* ISA has floating-point nmadd and nmsub instructions for mode MODE.  */
-#define ISA_HAS_NMADD_NMSUB(MODE) \
+/* ISA has floating-point madd and msub instructions 'd = a * b [+-] c'.  */
+#define ISA_HAS_FP_MADD4_MSUB4  ISA_HAS_FP4
+
+/* ISA has floating-point madd and msub instructions 'c [+-]= a * b'.  */
+#define ISA_HAS_FP_MADD3_MSUB3  TARGET_LOONGSON_2EF
+
+/* ISA has floating-point nmadd and nmsub instructions
+   'd = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD4_NMSUB4(MODE)					\
 				((ISA_MIPS4				\
 				  || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
 				  || ISA_MIPS64)			\
 				 && (!TARGET_MIPS5400 || TARGET_MAD)	\
 				 && !TARGET_MIPS16)
 
+/* ISA has floating-point nmadd and nmsub instructions
+   'c = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD3_NMSUB3(MODE)					\
+                                TARGET_LOONGSON_2EF
+
 /* ISA has count leading zeroes/ones instruction (not implemented).  */
 #define ISA_HAS_CLZ_CLO		((ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
@@ -921,6 +940,26 @@ #define GENERATE_LL_SC			\
   (target_flags_explicit & MASK_LLSC	\
    ? TARGET_LLSC && !TARGET_MIPS16	\
    : ISA_HAS_LL_SC)
+
+/* Predicates for paired-single float instructions.
+   ST Loongson 2E/2F CPUs support only a subset of all
+   paired-single float instructions, so we use below predicates
+   to restrict unsupported instructions.  */
+
+#define ISA_HAS_ABS_PS     TARGET_PAIRED_SINGLE_FLOAT
+#define ISA_HAS_C_COND_PS  TARGET_PAIRED_SINGLE_FLOAT
+#define ISA_HAS_SCC_PS     TARGET_PAIRED_SINGLE_FLOAT
+
+#define ISA_HAS_MOVCC_PS   (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_PXX_PS     (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_CVT_PS     (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_ALNV_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_ADDR_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_MULR_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_CABS_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_C_COND_4S  (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_BC1ANY_PS  (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_RSQRT_PS   (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
 \f
 /* Add -G xx support.  */
 
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	2008-06-09 18:20:40.000000000 +0100
+++ gcc/config/mips/mips.md	2008-06-09 18:25:04.000000000 +0100
@@ -496,7 +496,8 @@ (define_mode_iterator P [(SI "Pmode == S
 
 ;; This mode iterator allows :MOVECC to be used anywhere that a
 ;; conditional-move-type condition is needed.
-(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT") (CC "TARGET_HARD_FLOAT")])
+(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT")
+                              (CC "TARGET_HARD_FLOAT && !TARGET_LOONGSON_2EF")])
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
@@ -1864,33 +1865,53 @@ (define_insn "<u>maddsidi4"
 
 ;; Floating point multiply accumulate instructions.
 
-(define_insn "*madd<mode>"
+(define_insn "*madd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "madd.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*msub<mode>"
+(define_insn "*madd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD && !HONOR_NANS (<MODE>mode)"
+  "madd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*msub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			       (match_operand:ANYF 2 "register_operand" "f"))
 		    (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "msub.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>"
+(define_insn "*msub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			       (match_operand:ANYF 2 "register_operand" "f"))
+		    (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD && !HONOR_NANS (<MODE>mode)"
+  "msub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (plus:ANYF
 		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1898,13 +1919,27 @@ (define_insn "*nmadd<mode>"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>_fastmath"
+(define_insn "*nmadd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (plus:ANYF
+		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
 		    (match_operand:ANYF 2 "register_operand" "f"))
 	 (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1912,13 +1947,27 @@ (define_insn "*nmadd<mode>_fastmath"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>"
+(define_insn "*nmadd3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
+		    (match_operand:ANYF 2 "register_operand" "f"))
+	 (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (minus:ANYF
 		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 			      (match_operand:ANYF 3 "register_operand" "f"))
 		   (match_operand:ANYF 1 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1926,19 +1975,47 @@ (define_insn "*nmsub<mode>"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>_fastmath"
+(define_insn "*nmsub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (minus:ANYF
+		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+			      (match_operand:ANYF 3 "register_operand" "f"))
+		   (match_operand:ANYF 1 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (match_operand:ANYF 1 "register_operand" "f")
 	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 		    (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
   "nmsub.<fmt>\t%0,%1,%2,%3"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (match_operand:ANYF 1 "register_operand" "f")
+	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+		    (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
 \f
 ;;
 ;;  ....................
@@ -6299,7 +6376,7 @@ (define_insn "*mov<SCALARF:mode>_on_<MOV
 		 (const_int 0)])
 	 (match_operand:SCALARF 2 "register_operand" "f,0")
 	 (match_operand:SCALARF 3 "register_operand" "0,f")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
   "@
     mov%T4.<fmt>\t%0,%2,%1
     mov%t4.<fmt>\t%0,%3,%1"
@@ -6326,7 +6403,7 @@ (define_expand "mov<mode>cc"
 	(if_then_else:SCALARF (match_dup 5)
 			      (match_operand:SCALARF 2 "register_operand")
 			      (match_operand:SCALARF 3 "register_operand")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
 {
   mips_expand_conditional_move (operands);
   DONE;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-06-09 13:16       ` Maxim Kuvyrkov
@ 2008-06-09 17:45         ` Richard Sandiford
  2008-06-13  6:59           ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-09 17:45 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> During testing I discovered that the last version of the patch doesn't 
> properly handles paired-single float builtins.
>
> The last field in mips_builtin_description is target_flags.  For all 
> builtins for paired-single float instructions target_flags is 
> MASK_PAIRED_SINGLE_FLOAT.  As this mask specifies backend to support all 
> paired-single float instruction Loongson gets too much intrinsics it 
> can't back up.
>
> I think, I misunderstood your original suggestion to make 
> MASK_PAIRED_SINGLE_FLOAT to specify instructions that both Loongson and 
> generic MIPS5 have.  If so, then we need a new mask to use with builtins 
> that MIPS5 supports and Loongson doesn't.

Nope, you understood it fine.  This is just something that needs to
be patched too.

As luck would have it, I'd already written a patch to replace the
current target_flags/mips_bdesc_arrays stuff with predicate functions.
I'd written it as part of a patch to add R10000 cache barriers that
I sent out a while ago.  The interest in that patch seems to have
waned, but it looks like the builtins patch is useful on its own.

With this, you should be able to attach accurate ISA_HAS_* predicates
to each paired-single function.  Let me know if it works.

Thanks,
Richard


gcc/
	* config/mips/mips.c (BUILTIN_AVAIL_NON_MIPS16): New macro.
	(AVAIL_NON_MIPS16): Likewise.
	(mips_builtin_description): Replace target_flags with a predicate.
	(paired_single, sb1_paired_single, mips3d, dsp, dspr2, dsp_32)
	(dspr2_32): New availability predicates.
	(MIPS_BUILTIN): New macro.
	(DIRECT_BUILTIN, CMP_SCALAR_BUILTINS, CMP_PS_BUILTINS)
	(CMP_4S_BUILTINS, MOVTF_BUILTINS, CMP_BUILTINS)
	(DIRECT_NO_TARGET_BUILTIN, BPOSGE_BUILTIN): Use it.
	Replace the TARGET_FLAGS parameters with AVAIL parameters.
	(mips_ps_bdesc, mips_sb1_bdesc, mips_dsp_bdesc)
	(mips_dsp_32only_bdesc): Merge into...
	(mips_builtins): ...this new array.
	(mips_bdesc_map, mips_bdesc_arrays): Delete.
	(mips_init_builtins): Update after above changes.
	(mips_expand_builtin_1): Merge into...
	(mips_expand_builtin): ...here and update after above changes.

Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2008-06-09 18:31:32.000000000 +0100
+++ gcc/config/mips/mips.c	2008-06-09 18:38:48.000000000 +0100
@@ -10146,6 +10146,23 @@ mips_prefetch_cookie (rtx write, rtx loc
   return GEN_INT (INTVAL (write) + 6);
 }
 \f
+/* Flags that indicate when a built-in function is available.
+
+   BUILTIN_AVAIL_NON_MIPS16
+	The function is available on the current target, but only
+	in non-MIPS16 mode.  */
+#define BUILTIN_AVAIL_NON_MIPS16 1
+
+/* Declare an availability predicate for built-in functions that
+   require non-MIPS16 mode and also require COND to be true.
+   NAME is the main part of the predicate's name.  */
+#define AVAIL_NON_MIPS16(NAME, COND)					\
+ static unsigned int							\
+ mips_builtin_avail_##NAME (void)					\
+ {									\
+   return (COND) ? BUILTIN_AVAIL_NON_MIPS16 : 0;			\
+ }
+
 /* This structure describes a single built-in function.  */
 struct mips_builtin_description {
   /* The code of the main .md file instruction.  See mips_builtin_type
@@ -10164,309 +10181,297 @@ struct mips_builtin_description {
   /* The function's prototype.  */
   enum mips_function_type function_type;
 
-  /* The target flags required for this function.  */
-  int target_flags;
+  /* Whether the function is available.  */
+  unsigned int (*avail) (void);
 };
 
-/* Define a MIPS_BUILTIN_DIRECT function for instruction CODE_FOR_mips_<INSN>.
-   FUNCTION_TYPE and TARGET_FLAGS are mips_builtin_description fields.  */
-#define DIRECT_BUILTIN(INSN, FUNCTION_TYPE, TARGET_FLAGS)		\
-  { CODE_FOR_mips_ ## INSN, 0, "__builtin_mips_" #INSN,			\
-    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, TARGET_FLAGS }
+AVAIL_NON_MIPS16 (paired_single, TARGET_PAIRED_SINGLE_FLOAT)
+AVAIL_NON_MIPS16 (sb1_paired_single, TARGET_SB1 && TARGET_PAIRED_SINGLE_FLOAT)
+AVAIL_NON_MIPS16 (mips3d, TARGET_MIPS3D)
+AVAIL_NON_MIPS16 (dsp, TARGET_DSP)
+AVAIL_NON_MIPS16 (dspr2, TARGET_DSPR2)
+AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
+AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
+
+/* Construct a mips_builtin_description from the given arguments.
+
+   INSN is the name of the associated instruction pattern, without the
+   leading CODE_FOR_mips_.
+
+   CODE is the floating-point condition code associated with the
+   function.  It can be 'f' if the field is not applicable.
+
+   NAME is the name of the function itself, without the leading
+   "__builtin_mips_".
+
+   BUILTIN_TYPE and FUNCTION_TYPE are mips_builtin_description fields.
+
+   AVAIL is the name of the availability predicate, without the leading
+   mips_builtin_avail_.  */
+#define MIPS_BUILTIN(INSN, COND, NAME, BUILTIN_TYPE,			\
+		     FUNCTION_TYPE, AVAIL)				\
+  { CODE_FOR_mips_ ## INSN, MIPS_FP_COND_ ## COND,			\
+    "__builtin_mips_" NAME, BUILTIN_TYPE, FUNCTION_TYPE,		\
+    mips_builtin_avail_ ## AVAIL }
+
+/* Define __builtin_mips_<INSN>, which is a MIPS_BUILTIN_DIRECT function
+   mapped to instruction CODE_FOR_mips_<INSN>,  FUNCTION_TYPE and AVAIL
+   are as for MIPS_BUILTIN.  */
+#define DIRECT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL)			\
+  MIPS_BUILTIN (INSN, f, #INSN, MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, AVAIL)
 
 /* Define __builtin_mips_<INSN>_<COND>_{s,d} functions, both of which
-   require TARGET_FLAGS.  */
-#define CMP_SCALAR_BUILTINS(INSN, COND, TARGET_FLAGS)			\
-  { CODE_FOR_mips_ ## INSN ## _cond_s, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_" #INSN "_" #COND "_s",				\
-    MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_SF_SF, TARGET_FLAGS },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_d, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_" #INSN "_" #COND "_d",				\
-    MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_DF_DF, TARGET_FLAGS }
+   are subject to mips_builtin_avail_<AVAIL>.  */
+#define CMP_SCALAR_BUILTINS(INSN, COND, AVAIL)				\
+  MIPS_BUILTIN (INSN ## _cond_s, COND, #INSN "_" #COND "_s",		\
+		MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_SF_SF, AVAIL),	\
+  MIPS_BUILTIN (INSN ## _cond_d, COND, #INSN "_" #COND "_d",		\
+		MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_DF_DF, AVAIL)
 
 /* Define __builtin_mips_{any,all,upper,lower}_<INSN>_<COND>_ps.
-   The lower and upper forms require TARGET_FLAGS while the any and all
-   forms require MASK_MIPS3D.  */
-#define CMP_PS_BUILTINS(INSN, COND, TARGET_FLAGS)			\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_any_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_ANY, MIPS_INT_FTYPE_V2SF_V2SF, MASK_MIPS3D },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_all_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_ALL, MIPS_INT_FTYPE_V2SF_V2SF, MASK_MIPS3D },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_lower_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_LOWER, MIPS_INT_FTYPE_V2SF_V2SF, TARGET_FLAGS },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_upper_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_UPPER, MIPS_INT_FTYPE_V2SF_V2SF, TARGET_FLAGS }
+   The lower and upper forms are subject to mips_builtin_avail_<AVAIL>
+   while the any and all forms are subject to mips_builtin_avail_mips3d.  */
+#define CMP_PS_BUILTINS(INSN, COND, AVAIL)				\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "any_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_ANY, MIPS_INT_FTYPE_V2SF_V2SF,		\
+		mips3d),						\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "all_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_ALL, MIPS_INT_FTYPE_V2SF_V2SF,		\
+		mips3d),						\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "lower_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_LOWER, MIPS_INT_FTYPE_V2SF_V2SF,	\
+		AVAIL),							\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "upper_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_UPPER, MIPS_INT_FTYPE_V2SF_V2SF,	\
+		AVAIL)
 
 /* Define __builtin_mips_{any,all}_<INSN>_<COND>_4s.  The functions
-   require MASK_MIPS3D.  */
+   are subject to mips_builtin_avail_mips3d.  */
 #define CMP_4S_BUILTINS(INSN, COND)					\
-  { CODE_FOR_mips_ ## INSN ## _cond_4s, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_any_" #INSN "_" #COND "_4s",			\
-    MIPS_BUILTIN_CMP_ANY, MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    MASK_MIPS3D },							\
-  { CODE_FOR_mips_ ## INSN ## _cond_4s, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_all_" #INSN "_" #COND "_4s",			\
-    MIPS_BUILTIN_CMP_ALL, MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    MASK_MIPS3D }
+  MIPS_BUILTIN (INSN ## _cond_4s, COND, "any_" #INSN "_" #COND "_4s",	\
+		MIPS_BUILTIN_CMP_ANY,					\
+		MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF, mips3d),		\
+  MIPS_BUILTIN (INSN ## _cond_4s, COND, "all_" #INSN "_" #COND "_4s",	\
+		MIPS_BUILTIN_CMP_ALL,					\
+		MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF, mips3d)
 
 /* Define __builtin_mips_mov{t,f}_<INSN>_<COND>_ps.  The comparison
-   instruction requires TARGET_FLAGS.  */
-#define MOVTF_BUILTINS(INSN, COND, TARGET_FLAGS)			\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_movt_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_MOVT, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    TARGET_FLAGS },							\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_movf_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_MOVF, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    TARGET_FLAGS }
+   instruction requires mips_builtin_avail_<AVAIL>.  */
+#define MOVTF_BUILTINS(INSN, COND, AVAIL)				\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "movt_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_MOVT, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,	\
+		AVAIL),							\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "movf_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_MOVF, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,	\
+		AVAIL)
 
 /* Define all the built-in functions related to C.cond.fmt condition COND.  */
 #define CMP_BUILTINS(COND)						\
-  MOVTF_BUILTINS (c, COND, MASK_PAIRED_SINGLE_FLOAT),			\
-  MOVTF_BUILTINS (cabs, COND, MASK_MIPS3D),				\
-  CMP_SCALAR_BUILTINS (cabs, COND, MASK_MIPS3D),			\
-  CMP_PS_BUILTINS (c, COND, MASK_PAIRED_SINGLE_FLOAT),			\
-  CMP_PS_BUILTINS (cabs, COND, MASK_MIPS3D),				\
+  MOVTF_BUILTINS (c, COND, paired_single),				\
+  MOVTF_BUILTINS (cabs, COND, mips3d),					\
+  CMP_SCALAR_BUILTINS (cabs, COND, mips3d),				\
+  CMP_PS_BUILTINS (c, COND, paired_single),				\
+  CMP_PS_BUILTINS (cabs, COND, mips3d),					\
   CMP_4S_BUILTINS (c, COND),						\
   CMP_4S_BUILTINS (cabs, COND)
 
-static const struct mips_builtin_description mips_ps_bdesc[] = {
-  DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (plu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (puu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (cvt_ps_s, MIPS_V2SF_FTYPE_SF_SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (cvt_s_pl, MIPS_SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (cvt_s_pu, MIPS_SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (abs_ps, MIPS_V2SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-
-  DIRECT_BUILTIN (alnv_ps, MIPS_V2SF_FTYPE_V2SF_V2SF_INT,
-		  MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (addr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (mulr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (cvt_pw_ps, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (cvt_ps_pw, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-
-  DIRECT_BUILTIN (recip1_s, MIPS_SF_FTYPE_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip1_d, MIPS_DF_FTYPE_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip1_ps, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip2_s, MIPS_SF_FTYPE_SF_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip2_d, MIPS_DF_FTYPE_DF_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-
-  DIRECT_BUILTIN (rsqrt1_s, MIPS_SF_FTYPE_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt1_d, MIPS_DF_FTYPE_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt1_ps, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt2_s, MIPS_SF_FTYPE_SF_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt2_d, MIPS_DF_FTYPE_DF_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-
-  MIPS_FP_CONDITIONS (CMP_BUILTINS)
-};
+/* Define __builtin_mips_<INSN>, which is a MIPS_BUILTIN_DIRECT_NO_TARGET
+   function mapped to instruction CODE_FOR_mips_<INSN>,  FUNCTION_TYPE
+   and AVAIL are as for MIPS_BUILTIN.  */
+#define DIRECT_NO_TARGET_BUILTIN(INSN, FUNCTION_TYPE, AVAIL)		\
+  MIPS_BUILTIN (INSN, f, #INSN,	MIPS_BUILTIN_DIRECT_NO_TARGET,		\
+		FUNCTION_TYPE, AVAIL)
 
-/* Built-in functions for the SB-1 processor.  */
+/* Define __builtin_mips_bposge<VALUE>.  <VALUE> is 32 for the MIPS32 DSP
+   branch instruction.  AVAIL is as for MIPS_BUILTIN.  */
+#define BPOSGE_BUILTIN(VALUE, AVAIL)					\
+  MIPS_BUILTIN (bposge, f, "bposge" #VALUE,				\
+		MIPS_BUILTIN_BPOSGE ## VALUE, MIPS_SI_FTYPE_VOID, AVAIL)
 
 #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2
-
-static const struct mips_builtin_description mips_sb1_bdesc[] = {
-  DIRECT_BUILTIN (sqrt_ps, MIPS_V2SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT)
-};
-
-/* Built-in functions for the DSP ASE.  */
-
 #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3
 #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3
 #define CODE_FOR_mips_subq_ph CODE_FOR_subv2hi3
 #define CODE_FOR_mips_subu_qb CODE_FOR_subv4qi3
 #define CODE_FOR_mips_mul_ph CODE_FOR_mulv2hi3
 
-/* Define a MIPS_BUILTIN_DIRECT_NO_TARGET function for instruction
-   CODE_FOR_mips_<INSN>.  FUNCTION_TYPE and TARGET_FLAGS are
-   mips_builtin_description fields.  */
-#define DIRECT_NO_TARGET_BUILTIN(INSN, FUNCTION_TYPE, TARGET_FLAGS)	\
-  { CODE_FOR_mips_ ## INSN, 0, "__builtin_mips_" #INSN,			\
-    MIPS_BUILTIN_DIRECT_NO_TARGET, FUNCTION_TYPE, TARGET_FLAGS }
-
-/* Define __builtin_mips_bposge<VALUE>.  <VALUE> is 32 for the MIPS32 DSP
-   branch instruction.  TARGET_FLAGS is a mips_builtin_description field.  */
-#define BPOSGE_BUILTIN(VALUE, TARGET_FLAGS)				\
-  { CODE_FOR_mips_bposge, 0, "__builtin_mips_bposge" #VALUE,		\
-    MIPS_BUILTIN_BPOSGE ## VALUE, MIPS_SI_FTYPE_VOID, TARGET_FLAGS }
-
-static const struct mips_builtin_description mips_dsp_bdesc[] = {
-  DIRECT_BUILTIN (addq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (addq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (addq_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (addu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (addu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (subq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (subq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (subq_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (subu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (subu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (addsc, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (addwc, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (modsub, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (raddu_w_qb, MIPS_SI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (absq_s_ph, MIPS_V2HI_FTYPE_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (absq_s_w, MIPS_SI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (precrq_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (precrq_ph_w, MIPS_V2HI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (precrq_rs_ph_w, MIPS_V2HI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (precrqu_s_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (preceq_w_phl, MIPS_SI_FTYPE_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (preceq_w_phr, MIPS_SI_FTYPE_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbl, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbr, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbla, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbra, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbl, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbr, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbla, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbra, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (shll_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shll_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shll_s_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shll_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shrl_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shra_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shra_r_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shra_r_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (muleu_s_ph_qbl, MIPS_V2HI_FTYPE_V4QI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (muleu_s_ph_qbr, MIPS_V2HI_FTYPE_V4QI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (mulq_rs_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (muleq_s_w_phl, MIPS_SI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (muleq_s_w_phr, MIPS_SI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (bitrev, MIPS_SI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (insv, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (repl_qb, MIPS_V4QI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (repl_ph, MIPS_V2HI_FTYPE_SI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmpu_eq_qb, MIPS_VOID_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmpu_lt_qb, MIPS_VOID_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmpu_le_qb, MIPS_VOID_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (cmpgu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (cmpgu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (cmpgu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmp_eq_ph, MIPS_VOID_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmp_lt_ph, MIPS_VOID_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmp_le_ph, MIPS_VOID_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (pick_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (pick_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (packrl_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (wrdsp, MIPS_VOID_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (rddsp, MIPS_SI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (lbux, MIPS_SI_FTYPE_POINTER_SI, MASK_DSP),
-  DIRECT_BUILTIN (lhx, MIPS_SI_FTYPE_POINTER_SI, MASK_DSP),
-  DIRECT_BUILTIN (lwx, MIPS_SI_FTYPE_POINTER_SI, MASK_DSP),
-  BPOSGE_BUILTIN (32, MASK_DSP),
-
-  /* The following are for the MIPS DSP ASE REV 2.  */
-  DIRECT_BUILTIN (absq_s_qb, MIPS_V4QI_FTYPE_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (addu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (addu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (adduh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (adduh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (append, MIPS_SI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (balign, MIPS_SI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (cmpgdu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (cmpgdu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (cmpgdu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (mul_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mul_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulq_rs_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulq_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (precr_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (precr_sra_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (precr_sra_r_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (prepend, MIPS_SI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (shra_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (shra_r_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (shrl_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (subu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subuh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (subuh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_r_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_r_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2)
-};
-
-static const struct mips_builtin_description mips_dsp_32only_bdesc[] = {
-  DIRECT_BUILTIN (dpau_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpau_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpsu_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpsu_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (dpsq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (mulsaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (dpaq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (dpsq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (maq_s_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (maq_s_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (maq_sa_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (maq_sa_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (extr_w, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extr_r_w, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extr_rs_w, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extr_s_h, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extp, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extpdp, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shilo, MIPS_DI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (mthlip, MIPS_DI_FTYPE_DI_SI, MASK_DSP),
-
-  /* The following are for the MIPS DSP ASE REV 2.  */
-  DIRECT_BUILTIN (dpa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dps_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (madd, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (maddu, MIPS_DI_FTYPE_DI_USI_USI, MASK_DSPR2),
-  DIRECT_BUILTIN (msub, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (msubu, MIPS_DI_FTYPE_DI_USI_USI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulsa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mult, MIPS_DI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (multu, MIPS_DI_FTYPE_USI_USI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpax_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpsx_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpaqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpaqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpsqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2)
-};
-
-/* This structure describes an array of mips_builtin_description entries.  */
-struct mips_bdesc_map {
-  /* The array that this entry describes.  */
-  const struct mips_builtin_description *bdesc;
-
-  /* The number of entries in BDESC.  */
-  unsigned int size;
-
-  /* The target processor that supports the functions in BDESC.
-     PROCESSOR_MAX means we enable them for all processors.  */
-  enum processor_type proc;
-
-  /* The functions in BDESC are not supported if any of these
-     target flags are set.  */
-  int unsupported_target_flags;
-};
-
-/* All MIPS-specific built-in functions.  */
-static const struct mips_bdesc_map mips_bdesc_arrays[] = {
-  { mips_ps_bdesc, ARRAY_SIZE (mips_ps_bdesc), PROCESSOR_MAX, 0 },
-  { mips_sb1_bdesc, ARRAY_SIZE (mips_sb1_bdesc), PROCESSOR_SB1, 0 },
-  { mips_dsp_bdesc, ARRAY_SIZE (mips_dsp_bdesc), PROCESSOR_MAX, 0 },
-  { mips_dsp_32only_bdesc, ARRAY_SIZE (mips_dsp_32only_bdesc),
-    PROCESSOR_MAX, MASK_64BIT }
+static const struct mips_builtin_description mips_builtins[] = {
+  DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
+  DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
+  DIRECT_BUILTIN (plu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
+  DIRECT_BUILTIN (puu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
+  DIRECT_BUILTIN (cvt_ps_s, MIPS_V2SF_FTYPE_SF_SF, paired_single),
+  DIRECT_BUILTIN (cvt_s_pl, MIPS_SF_FTYPE_V2SF, paired_single),
+  DIRECT_BUILTIN (cvt_s_pu, MIPS_SF_FTYPE_V2SF, paired_single),
+  DIRECT_BUILTIN (abs_ps, MIPS_V2SF_FTYPE_V2SF, paired_single),
+
+  DIRECT_BUILTIN (alnv_ps, MIPS_V2SF_FTYPE_V2SF_V2SF_INT, paired_single),
+  DIRECT_BUILTIN (addr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+  DIRECT_BUILTIN (mulr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+  DIRECT_BUILTIN (cvt_pw_ps, MIPS_V2SF_FTYPE_V2SF, mips3d),
+  DIRECT_BUILTIN (cvt_ps_pw, MIPS_V2SF_FTYPE_V2SF, mips3d),
+
+  DIRECT_BUILTIN (recip1_s, MIPS_SF_FTYPE_SF, mips3d),
+  DIRECT_BUILTIN (recip1_d, MIPS_DF_FTYPE_DF, mips3d),
+  DIRECT_BUILTIN (recip1_ps, MIPS_V2SF_FTYPE_V2SF, mips3d),
+  DIRECT_BUILTIN (recip2_s, MIPS_SF_FTYPE_SF_SF, mips3d),
+  DIRECT_BUILTIN (recip2_d, MIPS_DF_FTYPE_DF_DF, mips3d),
+  DIRECT_BUILTIN (recip2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+
+  DIRECT_BUILTIN (rsqrt1_s, MIPS_SF_FTYPE_SF, mips3d),
+  DIRECT_BUILTIN (rsqrt1_d, MIPS_DF_FTYPE_DF, mips3d),
+  DIRECT_BUILTIN (rsqrt1_ps, MIPS_V2SF_FTYPE_V2SF, mips3d),
+  DIRECT_BUILTIN (rsqrt2_s, MIPS_SF_FTYPE_SF_SF, mips3d),
+  DIRECT_BUILTIN (rsqrt2_d, MIPS_DF_FTYPE_DF_DF, mips3d),
+  DIRECT_BUILTIN (rsqrt2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+
+  MIPS_FP_CONDITIONS (CMP_BUILTINS),
+
+  /* Built-in functions for the SB-1 processor.  */
+  DIRECT_BUILTIN (sqrt_ps, MIPS_V2SF_FTYPE_V2SF, sb1_paired_single),
+
+  /* Built-in functions for the DSP ASE (32-bit and 64-bit).  */
+  DIRECT_BUILTIN (addq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (addq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (addq_s_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (addu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (addu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (subq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (subq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (subq_s_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (subu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (subu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (addsc, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (addwc, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (modsub, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (raddu_w_qb, MIPS_SI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (absq_s_ph, MIPS_V2HI_FTYPE_V2HI, dsp),
+  DIRECT_BUILTIN (absq_s_w, MIPS_SI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (precrq_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (precrq_ph_w, MIPS_V2HI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (precrq_rs_ph_w, MIPS_V2HI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (precrqu_s_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (preceq_w_phl, MIPS_SI_FTYPE_V2HI, dsp),
+  DIRECT_BUILTIN (preceq_w_phr, MIPS_SI_FTYPE_V2HI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbl, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbr, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbla, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbra, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbl, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbr, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbla, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbra, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (shll_qb, MIPS_V4QI_FTYPE_V4QI_SI, dsp),
+  DIRECT_BUILTIN (shll_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shll_s_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shll_s_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (shrl_qb, MIPS_V4QI_FTYPE_V4QI_SI, dsp),
+  DIRECT_BUILTIN (shra_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shra_r_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shra_r_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (muleu_s_ph_qbl, MIPS_V2HI_FTYPE_V4QI_V2HI, dsp),
+  DIRECT_BUILTIN (muleu_s_ph_qbr, MIPS_V2HI_FTYPE_V4QI_V2HI, dsp),
+  DIRECT_BUILTIN (mulq_rs_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (muleq_s_w_phl, MIPS_SI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (muleq_s_w_phr, MIPS_SI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (bitrev, MIPS_SI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (insv, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (repl_qb, MIPS_V4QI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (repl_ph, MIPS_V2HI_FTYPE_SI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmpu_eq_qb, MIPS_VOID_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmpu_lt_qb, MIPS_VOID_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmpu_le_qb, MIPS_VOID_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (cmpgu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (cmpgu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (cmpgu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmp_eq_ph, MIPS_VOID_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmp_lt_ph, MIPS_VOID_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmp_le_ph, MIPS_VOID_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (pick_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (pick_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (packrl_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (wrdsp, MIPS_VOID_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (rddsp, MIPS_SI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (lbux, MIPS_SI_FTYPE_POINTER_SI, dsp),
+  DIRECT_BUILTIN (lhx, MIPS_SI_FTYPE_POINTER_SI, dsp),
+  DIRECT_BUILTIN (lwx, MIPS_SI_FTYPE_POINTER_SI, dsp),
+  BPOSGE_BUILTIN (32, dsp),
+
+  /* The following are for the MIPS DSP ASE REV 2 (32-bit and 64-bit).  */
+  DIRECT_BUILTIN (absq_s_qb, MIPS_V4QI_FTYPE_V4QI, dspr2),
+  DIRECT_BUILTIN (addu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (addu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (adduh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (adduh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (append, MIPS_SI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (balign, MIPS_SI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (cmpgdu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (cmpgdu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (cmpgdu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (mul_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (mul_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (mulq_rs_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (mulq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (mulq_s_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (precr_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (precr_sra_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (precr_sra_r_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (prepend, MIPS_SI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (shra_qb, MIPS_V4QI_FTYPE_V4QI_SI, dspr2),
+  DIRECT_BUILTIN (shra_r_qb, MIPS_V4QI_FTYPE_V4QI_SI, dspr2),
+  DIRECT_BUILTIN (shrl_ph, MIPS_V2HI_FTYPE_V2HI_SI, dspr2),
+  DIRECT_BUILTIN (subu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subuh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (subuh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (addqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (addqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (addqh_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (addqh_r_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (subqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subqh_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (subqh_r_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+
+  /* Built-in functions for the DSP ASE (32-bit only).  */
+  DIRECT_BUILTIN (dpau_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpau_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpsu_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpsu_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (dpsq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (mulsaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (dpaq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, dsp_32),
+  DIRECT_BUILTIN (dpsq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, dsp_32),
+  DIRECT_BUILTIN (maq_s_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (maq_s_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (maq_sa_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (maq_sa_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (extr_w, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extr_r_w, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extr_rs_w, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extr_s_h, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extp, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extpdp, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (shilo, MIPS_DI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (mthlip, MIPS_DI_FTYPE_DI_SI, dsp_32),
+
+  /* The following are for the MIPS DSP ASE REV 2 (32-bit only).  */
+  DIRECT_BUILTIN (dpa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dps_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (madd, MIPS_DI_FTYPE_DI_SI_SI, dspr2_32),
+  DIRECT_BUILTIN (maddu, MIPS_DI_FTYPE_DI_USI_USI, dspr2_32),
+  DIRECT_BUILTIN (msub, MIPS_DI_FTYPE_DI_SI_SI, dspr2_32),
+  DIRECT_BUILTIN (msubu, MIPS_DI_FTYPE_DI_USI_USI, dspr2_32),
+  DIRECT_BUILTIN (mulsa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (mult, MIPS_DI_FTYPE_SI_SI, dspr2_32),
+  DIRECT_BUILTIN (multu, MIPS_DI_FTYPE_USI_USI, dspr2_32),
+  DIRECT_BUILTIN (dpax_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpsx_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpaqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpaqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpsqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32)
 };
 
 /* MODE is a vector mode whose elements have type TYPE.  Return the type
@@ -10545,25 +10550,17 @@ #define DEF_MIPS_FTYPE(NUM, ARGS)					\
 mips_init_builtins (void)
 {
   const struct mips_builtin_description *d;
-  const struct mips_bdesc_map *m;
-  unsigned int offset;
+  unsigned int i;
 
   /* Iterate through all of the bdesc arrays, initializing all of the
      builtin functions.  */
-  offset = 0;
-  for (m = mips_bdesc_arrays;
-       m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
-       m++)
-    {
-      if ((m->proc == PROCESSOR_MAX || m->proc == mips_arch)
-	  && (m->unsupported_target_flags & target_flags) == 0)
-	for (d = m->bdesc; d < &m->bdesc[m->size]; d++)
-	  if ((d->target_flags & target_flags) == d->target_flags)
-	    add_builtin_function (d->name,
-				  mips_build_function_type (d->function_type),
-				  d - m->bdesc + offset,
-				  BUILT_IN_MD, NULL, NULL);
-      offset += m->size;
+  for (i = 0; i < ARRAY_SIZE (mips_builtins); i++)
+    {
+      d = &mips_builtins[i];
+      if (d->avail ())
+	add_builtin_function (d->name,
+			      mips_build_function_type (d->function_type),
+			      i, BUILT_IN_MD, NULL, NULL);
     }
 }
 
@@ -10808,41 +10805,6 @@ mips_expand_builtin_bposge (enum mips_bu
 				       const1_rtx, const0_rtx);
 }
 
-/* EXP is a CALL_EXPR that calls the function described by BDESC.
-   Expand the call and return an rtx for its return value.
-   TARGET, if nonnull, suggests a good place to put this value.  */
-
-static rtx
-mips_expand_builtin_1 (const struct mips_builtin_description *bdesc,
-		       tree exp, rtx target)
-{
-  switch (bdesc->builtin_type)
-    {
-    case MIPS_BUILTIN_DIRECT:
-      return mips_expand_builtin_direct (bdesc->icode, target, exp, true);
-
-    case MIPS_BUILTIN_DIRECT_NO_TARGET:
-      return mips_expand_builtin_direct (bdesc->icode, target, exp, false);
-
-    case MIPS_BUILTIN_MOVT:
-    case MIPS_BUILTIN_MOVF:
-      return mips_expand_builtin_movtf (bdesc->builtin_type, bdesc->icode,
-					bdesc->cond, target, exp);
-
-    case MIPS_BUILTIN_CMP_ANY:
-    case MIPS_BUILTIN_CMP_ALL:
-    case MIPS_BUILTIN_CMP_UPPER:
-    case MIPS_BUILTIN_CMP_LOWER:
-    case MIPS_BUILTIN_CMP_SINGLE:
-      return mips_expand_builtin_compare (bdesc->builtin_type, bdesc->icode,
-					  bdesc->cond, target, exp);
-
-    case MIPS_BUILTIN_BPOSGE32:
-      return mips_expand_builtin_bposge (bdesc->builtin_type, target);
-    }
-  gcc_unreachable ();
-}
-
 /* Implement TARGET_EXPAND_BUILTIN.  */
 
 static rtx
@@ -10851,25 +10813,44 @@ mips_expand_builtin (tree exp, rtx targe
 		     int ignore ATTRIBUTE_UNUSED)
 {
   tree fndecl;
-  unsigned int fcode;
-  const struct mips_bdesc_map *m;
+  unsigned int fcode, avail;
+  const struct mips_builtin_description *d;
 
   fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   fcode = DECL_FUNCTION_CODE (fndecl);
+  gcc_assert (fcode < ARRAY_SIZE (mips_builtins));
+  d = &mips_builtins[fcode];
+  avail = d->avail ();
+  gcc_assert (avail != 0);
   if (TARGET_MIPS16)
     {
       error ("built-in function %qs not supported for MIPS16",
 	     IDENTIFIER_POINTER (DECL_NAME (fndecl)));
       return const0_rtx;
     }
+  switch (d->builtin_type)
+    {
+    case MIPS_BUILTIN_DIRECT:
+      return mips_expand_builtin_direct (d->icode, target, exp, true);
+
+    case MIPS_BUILTIN_DIRECT_NO_TARGET:
+      return mips_expand_builtin_direct (d->icode, target, exp, false);
+
+    case MIPS_BUILTIN_MOVT:
+    case MIPS_BUILTIN_MOVF:
+      return mips_expand_builtin_movtf (d->builtin_type, d->icode,
+					d->cond, target, exp);
 
-  for (m = mips_bdesc_arrays;
-       m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
-       m++)
-    {
-      if (fcode < m->size)
-	return mips_expand_builtin_1 (m->bdesc + fcode, exp, target);
-      fcode -= m->size;
+    case MIPS_BUILTIN_CMP_ANY:
+    case MIPS_BUILTIN_CMP_ALL:
+    case MIPS_BUILTIN_CMP_UPPER:
+    case MIPS_BUILTIN_CMP_LOWER:
+    case MIPS_BUILTIN_CMP_SINGLE:
+      return mips_expand_builtin_compare (d->builtin_type, d->icode,
+					  d->cond, target, exp);
+
+    case MIPS_BUILTIN_BPOSGE32:
+      return mips_expand_builtin_bposge (d->builtin_type, target);
     }
   gcc_unreachable ();
 }

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-06  8:08         ` Ruan Beihong
@ 2008-06-09 18:24           ` Maxim Kuvyrkov
  2008-06-10  7:32             ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-09 18:24 UTC (permalink / raw)
  To: Ruan Beihong; +Cc: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Ruan Beihong wrote:
> Hi every one,
> There are something special with Loongson.
> See below. (Extracted from binutils-2.18.50.0.5/opcodes/mips-opc.c) I
> wonder if these instruction would be in gcc.

...

> Those instruction is designed for using FPU to to do some easy task of
> ALU thus reducing usage of m[ft]c1.
> There are both mov.d and mov.ps in Loongson and one more: "or" (1051) on FPU.

These instructions operate on 32-bit or 64-bit integer values placed 
into FP registers.  They do indeed offload integer ALU, but I don't see 
how they can reduce usage of m[ft]c1.  On the contrary, additional 
m[ft]c1 instructions will be needed to transfer data between integer and 
fp ALUs.

Anyway, these instructions are not yet supported because there is no 
effective optimization to transfer integer ALU load to fp ALU. 
Scheduler may be a good place for this, but it is not implemented.

--
Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-06 14:06           ` Richard Sandiford
@ 2008-06-09 18:27             ` Maxim Kuvyrkov
  2008-06-10 10:29               ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-09 18:27 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

Richard Sandiford wrote:

...

> Hmm.  If this is a newly-defined interface, I really have to question
> the wisdom of these functions.  The wording above suggests that there's
> something "unstable" about normal C pointer and array accesses.
> There shouldn't be ;)  They ought to work as expected.
> 
> The patch rightly uses well-known insn names for well-known operations
> like vector addition, vector maximum, and so on.  As well as allowing
> autovectorisation, I believe this means you could write:
> 
>     uint8x8_t *a;
> 
>     a[0] = a[1] + a[2];
> 
> (It might be nice to have tests to make sure that this does indeed
> work when using the new header file.  It could just be cut-&-paste
> from the version that uses intrinsic functions.)
> 
> I just think that, given GCC's vector extensions, having these
> functions as well is confusing.  I take what you say about it
> being consistent with arm_neon.h, but AltiVec doesn't have these
> sorts of function, and GCC's generic vector support was heavily
> influenced by AltiVec.

OK, I removed vec_load_* and vec_store_* helpers along with the 
paragraph in extend.texi.

Also I fixed existing tests, but didn't add any new tests, like testing 
vector '+'.  If you think these new tests are really worthy, I'll add 
them in separate patch.

Any further comments?


Thanks,

Maxim

[-- Attachment #2: fsf-ls2ef-2-vector.ChangeLog --]
[-- Type: text/plain, Size: 1738 bytes --]

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>
	    Nathan Sidwell  <nathan@codesourcery.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	
	* config/mips/mips-modes.def: Add V8QI, V4HI and V2SI modes.
	* config/mips/mips-protos.h (mips_expand_vector_init): New.
	* config/mips/mips-ftypes.def: Add function types for Loongson-2E/2F
	builtins.
	* config/mips/mips.c (mips_split_doubleword_move): Handle new modes.
	(mips_hard_regno_mode_ok_p): Allow 64-bit vector modes for Loongson.
	(mips_vector_mode_supported_p): Add V2SImode, V4HImode and
	V8QImode cases.
	(LOONGSON_BUILTIN): New.
	(mips_loongson_2ef_bdesc): New.
	(mips_bdesc_arrays): Add mips_loongson_2ef_bdesc.
	(mips_builtin_vector_type): Handle unsigned versions of vector modes.
	Add new parameter for that.
	(MIPS_ATYPE_UQI, MIPS_ATYPE_UDI, MIPS_ATYPE_V2SI, MIPS_ATYPE_UV2SI)
	(MIPS_ATYPE_V4HI, MIPS_ATYPE_UV4HI, MIPS_ATYPE_V8QI, MIPS_ATYPE_UV8QI):
	New.
	(mips_init_builtins): Initialize Loongson builtins if
	appropriate.
	(mips_expand_vector_init): New.
	* config/mips/mips.h (HAVE_LOONGSON_VECTOR_MODES): New.
	(TARGET_CPU_CPP_BUILTINS): Define __mips_loongson_vector_rev
	if appropriate.
	* config/mips/mips.md: Add unspec numbers for Loongson
	builtins.  Include loongson.md.
	(MOVE64): Include Loongson vector modes.
	(SPLITF): Include Loongson vector modes.
	(HALFMODE): Handle Loongson vector modes.
	* config/mips/loongson.md: New.
	* config/mips/loongson.h: New.
	* config.gcc: Add loongson.h header for mips*-*-* targets.
	* doc/extend.texi (MIPS Loongson Built-in Functions): New.

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>

	* lib/target-supports.exp (check_effective_target_mips_loongson): New.
	* gcc.target/mips/loongson-simd.c: New.

[-- Attachment #3: fsf-ls2ef-2-vector.patch --]
[-- Type: text/plain, Size: 107385 bytes --]

--- gcc/doc/extend.texi	(/local/gcc-trunk)	(revision 382)
+++ gcc/doc/extend.texi	(/local/gcc-2)	(revision 382)
@@ -6788,6 +6788,7 @@ instructions, but allow the compiler to 
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
 * PowerPC AltiVec Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
@@ -8667,6 +8668,132 @@ value is the upper one.  The opposite or
 For example, the code above will set the lower half of @code{a} to
 @code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
+
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
+
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
+
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
+
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+@end smallexample
+
 @menu
 * Paired-Single Arithmetic::
 * Paired-Single Built-in Functions::
--- gcc/testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-trunk)	(revision 382)
+++ gcc/testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-2)	(revision 382)
@@ -0,0 +1,1963 @@
+/* Test cases for ST Microelectronics Loongson-2E/2F SIMD intrinsics.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target mips_loongson } */
+
+#include "loongson.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+
+typedef union { int32x2_t v; int32_t a[2]; } int32x2_encap_t;
+typedef union { int16x4_t v; int16_t a[4]; } int16x4_encap_t;
+typedef union { int8x8_t v; int8_t a[8]; } int8x8_encap_t;
+typedef union { uint32x2_t v; uint32_t a[2]; } uint32x2_encap_t;
+typedef union { uint16x4_t v; uint16_t a[4]; } uint16x4_encap_t;
+typedef union { uint8x8_t v; uint8_t a[8]; } uint8x8_encap_t;
+
+#define UINT16x4_MAX USHRT_MAX
+#define UINT8x8_MAX UCHAR_MAX
+#define INT8x8_MAX SCHAR_MAX
+#define INT16x4_MAX SHRT_MAX
+#define INT32x2_MAX INT_MAX
+
+static void test_packsswh (void)
+{
+  int32x2_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = INT16x4_MAX - 2;
+  s.a[1] = INT16x4_MAX - 1;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX + 1;
+  r.v = packsswh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 2);
+  assert (r.a[1] == INT16x4_MAX - 1);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_packsshb (void)
+{
+  int16x4_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = INT8x8_MAX - 6;
+  s.a[1] = INT8x8_MAX - 5;
+  s.a[2] = INT8x8_MAX - 4;
+  s.a[3] = INT8x8_MAX - 3;
+  t.a[0] = INT8x8_MAX - 2;
+  t.a[1] = INT8x8_MAX - 1;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX + 1;
+  r.v = packsshb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_packushb (void)
+{
+  uint16x4_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = UINT8x8_MAX - 6;
+  s.a[1] = UINT8x8_MAX - 5;
+  s.a[2] = UINT8x8_MAX - 4;
+  s.a[3] = UINT8x8_MAX - 3;
+  t.a[0] = UINT8x8_MAX - 2;
+  t.a[1] = UINT8x8_MAX - 1;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX + 1;
+  r.v = packushb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX - 6);
+  assert (r.a[1] == UINT8x8_MAX - 5);
+  assert (r.a[2] == UINT8x8_MAX - 4);
+  assert (r.a[3] == UINT8x8_MAX - 3);
+  assert (r.a[4] == UINT8x8_MAX - 2);
+  assert (r.a[5] == UINT8x8_MAX - 1);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_paddw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 6);
+}
+
+static void test_paddw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_paddh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = paddh_u (s.v, t.v);
+  assert (r.a[0] == 6);
+  assert (r.a[1] == 8);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 12);
+}
+
+static void test_paddh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = paddh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+}
+
+static void test_paddb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 5;
+  s.a[5] = 6;
+  s.a[6] = 7;
+  s.a[7] = 8;
+  t.a[0] = 9;
+  t.a[1] = 10;
+  t.a[2] = 11;
+  t.a[3] = 12;
+  t.a[4] = 13;
+  t.a[5] = 14;
+  t.a[6] = 15;
+  t.a[7] = 16;
+  r.v = paddb_u (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 12);
+  assert (r.a[2] == 14);
+  assert (r.a[3] == 16);
+  assert (r.a[4] == 18);
+  assert (r.a[5] == 20);
+  assert (r.a[6] == 22);
+  assert (r.a[7] == 24);
+}
+
+static void test_paddb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = paddb_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+  assert (r.a[4] == -45);
+  assert (r.a[5] == -54);
+  assert (r.a[6] == -63);
+  assert (r.a[7] == -72);
+}
+
+static void test_paddd_u (void)
+{
+  uint64_t d = 123456;
+  uint64_t e = 789012;
+  uint64_t r;
+  r = paddd_u (d, e);
+  assert (r == 912468);
+}
+
+static void test_paddd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = paddd_s (d, e);
+  assert (r == -665556);
+}
+
+static void test_paddsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX;
+  t.a[2] = INT16x4_MAX;
+  t.a[3] = INT16x4_MAX;
+  r.v = paddsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_paddsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = INT8x8_MAX;
+  t.a[1] = INT8x8_MAX;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX;
+  t.a[4] = INT8x8_MAX;
+  t.a[5] = INT8x8_MAX;
+  t.a[6] = INT8x8_MAX;
+  t.a[7] = INT8x8_MAX;
+  r.v = paddsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_paddush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  t.a[0] = UINT16x4_MAX;
+  t.a[1] = UINT16x4_MAX;
+  t.a[2] = UINT16x4_MAX;
+  t.a[3] = UINT16x4_MAX;
+  r.v = paddush (s.v, t.v);
+  assert (r.a[0] == UINT16x4_MAX);
+  assert (r.a[1] == UINT16x4_MAX);
+  assert (r.a[2] == UINT16x4_MAX);
+  assert (r.a[3] == UINT16x4_MAX);
+}
+
+static void test_paddusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  s.a[4] = 0;
+  s.a[5] = 1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = UINT8x8_MAX;
+  t.a[1] = UINT8x8_MAX;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX;
+  t.a[4] = UINT8x8_MAX;
+  t.a[5] = UINT8x8_MAX;
+  t.a[6] = UINT8x8_MAX;
+  t.a[7] = UINT8x8_MAX;
+  r.v = paddusb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX);
+  assert (r.a[1] == UINT8x8_MAX);
+  assert (r.a[2] == UINT8x8_MAX);
+  assert (r.a[3] == UINT8x8_MAX);
+  assert (r.a[4] == UINT8x8_MAX);
+  assert (r.a[5] == UINT8x8_MAX);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_pandn_ud (void)
+{
+  uint64_t d1 = 0x0000ffff0000ffffull;
+  uint64_t d2 = 0x0000ffff0000ffffull;
+  uint64_t r;
+  r = pandn_ud (d1, d2);
+  assert (r == 0);
+}
+
+static void test_pandn_sd (void)
+{
+  int64_t d1 = (int64_t) 0x0000000000000000ull;
+  int64_t d2 = (int64_t) 0xfffffffffffffffeull;
+  int64_t r;
+  r = pandn_sd (d1, d2);
+  assert (r == -2);
+}
+
+static void test_pandn_uw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0x00000000;
+  t.a[1] = 0xffffffff;
+  r.v = pandn_uw (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pandn_sw (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0xffffffff;
+  t.a[1] = 0xfffffffe;
+  r.v = pandn_sw (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+}
+
+static void test_pandn_uh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0x0000;
+  t.a[1] = 0xffff;
+  t.a[2] = 0x0000;
+  t.a[3] = 0xffff;
+  r.v = pandn_uh (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pandn_sh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0xffff;
+  t.a[1] = 0xfffe;
+  t.a[2] = 0xffff;
+  t.a[3] = 0xfffe;
+  r.v = pandn_sh (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+}
+
+static void test_pandn_ub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0x00;
+  t.a[1] = 0xff;
+  t.a[2] = 0x00;
+  t.a[3] = 0xff;
+  t.a[4] = 0x00;
+  t.a[5] = 0xff;
+  t.a[6] = 0x00;
+  t.a[7] = 0xff;
+  r.v = pandn_ub (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pandn_sb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0xff;
+  t.a[1] = 0xfe;
+  t.a[2] = 0xff;
+  t.a[3] = 0xfe;
+  t.a[4] = 0xff;
+  t.a[5] = 0xfe;
+  t.a[6] = 0xff;
+  t.a[7] = 0xfe;
+  r.v = pandn_sb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -2);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -2);
+}
+
+static void test_pavgh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = pavgh (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+}
+
+static void test_pavgb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 1;
+  s.a[5] = 2;
+  s.a[6] = 3;
+  s.a[7] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = pavgb (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+  assert (r.a[4] == 3);
+  assert (r.a[5] == 4);
+  assert (r.a[6] == 5);
+  assert (r.a[7] == 6);
+}
+
+static void test_pcmpeqw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  r.v = pcmpeqw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpeqh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  r.v = pcmpeqh_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpeqb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 42;
+  s.a[5] = 43;
+  s.a[6] = 42;
+  s.a[7] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  t.a[4] = 43;
+  t.a[5] = 43;
+  t.a[6] = 43;
+  t.a[7] = 43;
+  r.v = pcmpeqb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpeqw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  r.v = pcmpeqw_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+}
+
+static void test_pcmpeqh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  r.v = pcmpeqh_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpeqb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = -42;
+  s.a[5] = -42;
+  s.a[6] = -42;
+  s.a[7] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = -42;
+  t.a[6] = 42;
+  t.a[7] = -42;
+  r.v = pcmpeqb_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -1);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -1);
+}
+
+static void test_pcmpgtw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 42;
+  r.v = pcmpgtw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpgth_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 40;
+  t.a[1] = 41;
+  t.a[2] = 43;
+  t.a[3] = 42;
+  r.v = pcmpgth_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0x0000);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpgtb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 44;
+  s.a[5] = 45;
+  s.a[6] = 46;
+  s.a[7] = 47;
+  t.a[0] = 48;
+  t.a[1] = 47;
+  t.a[2] = 46;
+  t.a[3] = 45;
+  t.a[4] = 44;
+  t.a[5] = 43;
+  t.a[6] = 42;
+  t.a[7] = 41;
+  r.v = pcmpgtb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0x00);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0x00);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0xff);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpgtw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = -42;
+  t.a[0] = -42;
+  t.a[1] = -42;
+  r.v = pcmpgtw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 0);
+}
+
+static void test_pcmpgth_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = 43;
+  t.a[2] = 44;
+  t.a[3] = -43;
+  r.v = pcmpgth_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpgtb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = 42;
+  s.a[5] = 42;
+  s.a[6] = 42;
+  s.a[7] = 42;
+  t.a[0] = -45;
+  t.a[1] = -44;
+  t.a[2] = -43;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = 43;
+  t.a[6] = 41;
+  t.a[7] = 40;
+  r.v = pcmpgtb_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == -1);
+  assert (r.a[7] == -1);
+}
+
+static void test_pextrh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  r.v = pextrh_u (s.v, 1);
+  assert (r.a[0] == 41);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pextrh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -40;
+  s.a[1] = -41;
+  s.a[2] = -42;
+  s.a[3] = -43;
+  r.v = pextrh_s (s.v, 2);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pinsrh_0123_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_u (t.v, s.v);
+  r.v = pinsrh_1_u (r.v, s.v);
+  r.v = pinsrh_2_u (r.v, s.v);
+  r.v = pinsrh_3_u (r.v, s.v);
+  assert (r.a[0] == 42);
+  assert (r.a[1] == 42);
+  assert (r.a[2] == 42);
+  assert (r.a[3] == 42);
+}
+
+static void test_pinsrh_0123_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_s (t.v, s.v);
+  r.v = pinsrh_1_s (r.v, s.v);
+  r.v = pinsrh_2_s (r.v, s.v);
+  r.v = pinsrh_3_s (r.v, s.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == -42);
+  assert (r.a[2] == -42);
+  assert (r.a[3] == -42);
+}
+
+static void test_pmaddhw (void)
+{
+  int16x4_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -5;
+  s.a[1] = -4;
+  s.a[2] = -3;
+  s.a[3] = -2;
+  t.a[0] = 10;
+  t.a[1] = 11;
+  t.a[2] = 12;
+  t.a[3] = 13;
+  r.v = pmaddhw (s.v, t.v);
+  assert (r.a[0] == (-5*10 + -4*11));
+  assert (r.a[1] == (-3*12 + -2*13));
+}
+
+static void test_pmaxsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pmaxsh (s.v, t.v);
+  assert (r.a[0] == 20);
+  assert (r.a[1] == 40);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 50);
+}
+
+static void test_pmaxub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pmaxub (s.v, t.v);
+  assert (r.a[0] == 80);
+  assert (r.a[1] == 70);
+  assert (r.a[2] == 60);
+  assert (r.a[3] == 50);
+  assert (r.a[4] == 50);
+  assert (r.a[5] == 60);
+  assert (r.a[6] == 70);
+  assert (r.a[7] == 80);
+}
+
+static void test_pminsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pminsh (s.v, t.v);
+  assert (r.a[0] == -20);
+  assert (r.a[1] == -40);
+  assert (r.a[2] == -10);
+  assert (r.a[3] == -50);
+}
+
+static void test_pminub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pminub (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 20);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 40);
+  assert (r.a[4] == 40);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 20);
+  assert (r.a[7] == 10);
+}
+
+static void test_pmovmskb_u (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_encap_t r;
+  s.a[0] = 0xf0;
+  s.a[1] = 0x40;
+  s.a[2] = 0xf0;
+  s.a[3] = 0x40;
+  s.a[4] = 0xf0;
+  s.a[5] = 0x40;
+  s.a[6] = 0xf0;
+  s.a[7] = 0x40;
+  r.v = pmovmskb_u (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmovmskb_s (void)
+{
+  int8x8_encap_t s;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 1;
+  s.a[2] = -1;
+  s.a[3] = 1;
+  s.a[4] = -1;
+  s.a[5] = 1;
+  s.a[6] = -1;
+  s.a[7] = 1;
+  r.v = pmovmskb_s (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmulhuh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xff00;
+  s.a[1] = 0xff00;
+  s.a[2] = 0xff00;
+  s.a[3] = 0xff00;
+  t.a[0] = 16;
+  t.a[1] = 16;
+  t.a[2] = 16;
+  t.a[3] = 16;
+  r.v = pmulhuh (s.v, t.v);
+  assert (r.a[0] == 0x000f);
+  assert (r.a[1] == 0x000f);
+  assert (r.a[2] == 0x000f);
+  assert (r.a[3] == 0x000f);
+}
+
+static void test_pmulhh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmulhh (s.v, t.v);
+  assert (r.a[0] == -16);
+  assert (r.a[1] == -16);
+  assert (r.a[2] == -16);
+  assert (r.a[3] == -16);
+}
+
+static void test_pmullh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmullh (s.v, t.v);
+  assert (r.a[0] == 4096);
+  assert (r.a[1] == 4096);
+  assert (r.a[2] == 4096);
+  assert (r.a[3] == 4096);
+}
+
+static void test_pmuluw (void)
+{
+  uint32x2_encap_t s, t;
+  uint64_t r;
+  s.a[0] = 0xdeadbeef;
+  s.a[1] = 0;
+  t.a[0] = 0x0f00baaa;
+  t.a[1] = 0;
+  r = pmuluw (s.v, t.v);
+  assert (r == 0xd0cd08e1d1a70b6ull);
+}
+
+static void test_pasubub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pasubub (s.v, t.v);
+  assert (r.a[0] == 70);
+  assert (r.a[1] == 50);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 10);
+  assert (r.a[4] == 10);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 50);
+  assert (r.a[7] == 70);
+}
+
+static void test_biadd (void)
+{
+  uint8x8_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  r.v = biadd (s.v);
+  assert (r.a[0] == 360);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psadbh (void)
+{
+  uint8x8_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = psadbh (s.v, t.v);
+  assert (r.a[0] == 0x0140);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pshufh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_u (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_pshufh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 2;
+  s.a[2] = -3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_s (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+}
+
+static void test_psllh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0xffff;
+  s.a[2] = 0xffff;
+  s.a[3] = 0xffff;
+  r.v = psllh_u (s.v, 1);
+  assert (r.a[0] == 0xfffe);
+  assert (r.a[1] == 0xfffe);
+  assert (r.a[2] == 0xfffe);
+  assert (r.a[3] == 0xfffe);
+}
+
+static void test_psllw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0xffffffff;
+  r.v = psllw_u (s.v, 2);
+  assert (r.a[0] == 0xfffffffc);
+  assert (r.a[1] == 0xfffffffc);
+}
+
+static void test_psllh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psllh_s (s.v, 1);
+  assert (r.a[0] == -2);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == -2);
+  assert (r.a[3] == -2);
+}
+
+static void test_psllw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psllw_s (s.v, 2);
+  assert (r.a[0] == -4);
+  assert (r.a[1] == -4);
+}
+
+static void test_psrah_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrah_u (s.v, 1);
+  assert (r.a[0] == 0xfff7);
+  assert (r.a[1] == 0xfff7);
+  assert (r.a[2] == 0xfff7);
+  assert (r.a[3] == 0xfff7);
+}
+
+static void test_psraw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psraw_u (s.v, 1);
+  assert (r.a[0] == 0xfffffff7);
+  assert (r.a[1] == 0xfffffff7);
+}
+
+static void test_psrah_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s.a[2] = -2;
+  s.a[3] = -2;
+  r.v = psrah_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == -1);
+}
+
+static void test_psraw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  r.v = psraw_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+}
+
+static void test_psrlh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrlh_u (s.v, 1);
+  assert (r.a[0] == 0x7ff7);
+  assert (r.a[1] == 0x7ff7);
+  assert (r.a[2] == 0x7ff7);
+  assert (r.a[3] == 0x7ff7);
+}
+
+static void test_psrlw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psrlw_u (s.v, 1);
+  assert (r.a[0] == 0x7ffffff7);
+  assert (r.a[1] == 0x7ffffff7);
+}
+
+static void test_psrlh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psrlh_s (s.v, 1);
+  assert (r.a[0] == INT16x4_MAX);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psrlw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psrlw_s (s.v, 1);
+  assert (r.a[0] == INT32x2_MAX);
+  assert (r.a[1] == INT32x2_MAX);
+}
+
+static void test_psubw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 3;
+  s.a[1] = 4;
+  t.a[0] = 2;
+  t.a[1] = 1;
+  r.v = psubw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = -4;
+  r.v = psubw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 5;
+  s.a[1] = 6;
+  s.a[2] = 7;
+  s.a[3] = 8;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 4);
+  assert (r.a[3] == 4);
+}
+
+static void test_psubh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+}
+
+static void test_psubb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 11;
+  s.a[2] = 12;
+  s.a[3] = 13;
+  s.a[4] = 14;
+  s.a[5] = 15;
+  s.a[6] = 16;
+  s.a[7] = 17;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 9);
+  assert (r.a[2] == 9);
+  assert (r.a[3] == 9);
+  assert (r.a[4] == 9);
+  assert (r.a[5] == 9);
+  assert (r.a[6] == 9);
+  assert (r.a[7] == 9);
+}
+
+static void test_psubb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+  assert (r.a[4] == -55);
+  assert (r.a[5] == -66);
+  assert (r.a[6] == -77);
+  assert (r.a[7] == -88);
+}
+
+static void test_psubd_u (void)
+{
+  uint64_t d = 789012;
+  uint64_t e = 123456;
+  uint64_t r;
+  r = psubd_u (d, e);
+  assert (r == 665556);
+}
+
+static void test_psubd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = psubd_s (d, e);
+  assert (r == 912468);
+}
+
+static void test_psubsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = -INT16x4_MAX;
+  t.a[1] = -INT16x4_MAX;
+  t.a[2] = -INT16x4_MAX;
+  t.a[3] = -INT16x4_MAX;
+  r.v = psubsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psubsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = -INT8x8_MAX;
+  t.a[1] = -INT8x8_MAX;
+  t.a[2] = -INT8x8_MAX;
+  t.a[3] = -INT8x8_MAX;
+  t.a[4] = -INT8x8_MAX;
+  t.a[5] = -INT8x8_MAX;
+  t.a[6] = -INT8x8_MAX;
+  t.a[7] = -INT8x8_MAX;
+  r.v = psubsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_psubush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  r.v = psubush (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psubusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  s.a[4] = 4;
+  s.a[5] = 5;
+  s.a[6] = 6;
+  s.a[7] = 7;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  t.a[4] = 5;
+  t.a[5] = 5;
+  t.a[6] = 7;
+  t.a[7] = 7;
+  r.v = psubusb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_punpckhbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == -11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == -13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == -15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == 11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == 13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == 15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpckhhw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == -6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpckhhw_u (s.v, t.v);
+  assert (r.a[0] == 5);
+  assert (r.a[1] == 6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = -4;
+  r.v = punpckhwd_s (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == -4);
+}
+
+static void test_punpckhwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpckhwd_u (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+}
+
+static void test_punpcklbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == -5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == -7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == 5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == 7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpcklhw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpcklhw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  r.v = punpcklwd_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == -2);
+}
+
+static void test_punpcklwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpcklwd_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+}
+
+int main (void)
+{
+  test_packsswh ();
+  test_packsshb ();
+  test_packushb ();
+  test_paddw_u ();
+  test_paddw_s ();
+  test_paddh_u ();
+  test_paddh_s ();
+  test_paddb_u ();
+  test_paddb_s ();
+  test_paddd_u ();
+  test_paddd_s ();
+  test_paddsh ();
+  test_paddsb ();
+  test_paddush ();
+  test_paddusb ();
+  test_pandn_ud ();
+  test_pandn_sd ();
+  test_pandn_uw ();
+  test_pandn_sw ();
+  test_pandn_uh ();
+  test_pandn_sh ();
+  test_pandn_ub ();
+  test_pandn_sb ();
+  test_pavgh ();
+  test_pavgb ();
+  test_pcmpeqw_u ();
+  test_pcmpeqh_u ();
+  test_pcmpeqb_u ();
+  test_pcmpeqw_s ();
+  test_pcmpeqh_s ();
+  test_pcmpeqb_s ();
+  test_pcmpgtw_u ();
+  test_pcmpgth_u ();
+  test_pcmpgtb_u ();
+  test_pcmpgtw_s ();
+  test_pcmpgth_s ();
+  test_pcmpgtb_s ();
+  test_pextrh_u ();
+  test_pextrh_s ();
+  test_pinsrh_0123_u ();
+  test_pinsrh_0123_s ();
+  test_pmaddhw ();
+  test_pmaxsh ();
+  test_pmaxub ();
+  test_pminsh ();
+  test_pminub ();
+  test_pmovmskb_u ();
+  test_pmovmskb_s ();
+  test_pmulhuh ();
+  test_pmulhh ();
+  test_pmullh ();
+  test_pmuluw ();
+  test_pasubub ();
+  test_biadd ();
+  test_psadbh ();
+  test_pshufh_u ();
+  test_pshufh_s ();
+  test_psllh_u ();
+  test_psllw_u ();
+  test_psllh_s ();
+  test_psllw_s ();
+  test_psrah_u ();
+  test_psraw_u ();
+  test_psrah_s ();
+  test_psraw_s ();
+  test_psrlh_u ();
+  test_psrlw_u ();
+  test_psrlh_s ();
+  test_psrlw_s ();
+  test_psubw_u ();
+  test_psubw_s ();
+  test_psubh_u ();
+  test_psubh_s ();
+  test_psubb_u ();
+  test_psubb_s ();
+  test_psubd_u ();
+  test_psubd_s ();
+  test_psubsh ();
+  test_psubsb ();
+  test_psubush ();
+  test_psubusb ();
+  test_punpckhbh_s ();
+  test_punpckhbh_u ();
+  test_punpckhhw_s ();
+  test_punpckhhw_u ();
+  test_punpckhwd_s ();
+  test_punpckhwd_u ();
+  test_punpcklbh_s ();
+  test_punpcklbh_u ();
+  test_punpcklhw_s ();
+  test_punpcklhw_u ();
+  test_punpcklwd_s ();
+  test_punpcklwd_u ();
+  return 0;
+}
--- gcc/testsuite/lib/target-supports.exp	(/local/gcc-trunk)	(revision 382)
+++ gcc/testsuite/lib/target-supports.exp	(/local/gcc-2)	(revision 382)
@@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
     } "-mfpu=neon -mfloat-abi=softfp"]
 }
 
+# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+# the Loongson vector modes.
+
+proc check_effective_target_mips_loongson { } {
+    return [check_no_compiler_messages loongson assembly {
+	#if !defined(__mips_loongson_vector_rev)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {
--- gcc/config.gcc	(/local/gcc-trunk)	(revision 382)
+++ gcc/config.gcc	(/local/gcc-2)	(revision 382)
@@ -349,6 +349,7 @@ m68k-*-*)
 mips*-*-*)
 	cpu_type=mips
 	need_64bit_hwint=yes
+	extra_headers="loongson.h"
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
--- gcc/config/mips/loongson.md	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/loongson.md	(/local/gcc-2)	(revision 382)
@@ -0,0 +1,429 @@
+;; Machine description for ST Microelectronics Loongson-2E/2F.
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Mode iterators and attributes.
+
+;; 64-bit vectors of bytes.
+(define_mode_iterator VB [V8QI])
+
+;; 64-bit vectors of halfwords.
+(define_mode_iterator VH [V4HI])
+
+;; 64-bit vectors of words.
+(define_mode_iterator VW [V2SI])
+
+;; 64-bit vectors of halfwords and bytes.
+(define_mode_iterator VHB [V4HI V8QI])
+
+;; 64-bit vectors of words and halfwords.
+(define_mode_iterator VWH [V2SI V4HI])
+
+;; 64-bit vectors of words, halfwords and bytes.
+(define_mode_iterator VWHB [V2SI V4HI V8QI])
+
+;; 64-bit vectors of words, halfwords and bytes; and DImode.
+(define_mode_iterator VWHBDI [V2SI V4HI V8QI DI])
+
+;; The Loongson instruction suffixes corresponding to the modes in the
+;; VWHB iterator.
+(define_mode_attr V_suffix [(V2SI "w") (V4HI "h") (V8QI "b")])
+
+;; Given a vector type T, the mode of a vector half the size of T
+;; and with the same number of elements.
+(define_mode_attr V_squash [(V2SI "V2HI") (V4HI "V4QI")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with half as many elements.
+(define_mode_attr V_stretch_half [(V2SI "DI") (V4HI "V2SI") (V8QI "V4HI")])
+
+;; The Loongson instruction suffixes corresponding to the transformation
+;; expressed by V_stretch_half.
+(define_mode_attr V_stretch_half_suffix [(V2SI "wd") (V4HI "hw") (V8QI "bh")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with twice as many elements.
+(define_mode_attr V_squash_double [(V2SI "V4HI") (V4HI "V8QI")])
+
+;; The Loongson instruction suffixes corresponding to the conversions
+;; specified by V_half_width.
+(define_mode_attr V_squash_double_suffix [(V2SI "wh") (V4HI "hb")])
+
+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0)
+	(match_operand:VWHB 1))]
+  ""
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})
+
+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,  d,  m,  d")
+	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  { return mips_output_move (operands[0], operands[1]); }
+  [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
+   (set_attr "mode" "DI")])
+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand")
+	(match_operand 1 ""))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+{
+  mips_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
+
+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 1 "register_operand" "f"))
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 2 "register_operand" "f"))))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packss<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Pack with unsigned saturation.
+(define_insn "vec_pack_usat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 1 "register_operand" "f"))
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 2 "register_operand" "f"))))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packus<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by wraparound.
+(define_insn "add<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (plus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		   (match_operand:VWHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padd<V_suffix>\t%0,%1,%2")
+
+;; Addition of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "paddd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (plus:DI (match_operand:DI 1 "register_operand" "f")
+		 (match_operand:DI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddd\t%0,%1,%2")
+
+;; Addition, treating overflow by signed saturation.
+(define_insn "ssadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padds<V_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by unsigned saturation.
+(define_insn "usadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddus<V_suffix>\t%0,%1,%2")
+
+;; Logical AND NOT.
+(define_insn "loongson_and_not_<mode>"
+  [(set (match_operand:VWHBDI 0 "register_operand" "=f")
+        (and:VWHBDI
+	 (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
+	 (match_operand:VWHBDI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pandn\t%0,%1,%2")
+
+;; Average.
+(define_insn "loongson_average_<mode>"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (unspec:VHB [(match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")]
+		    UNSPEC_LOONGSON_AVERAGE))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pavg<V_suffix>\t%0,%1,%2")
+
+;; Equality test.
+(define_insn "loongson_eq_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_EQ))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpeq<V_suffix>\t%0,%1,%2")
+
+;; Greater-than test.
+(define_insn "loongson_gt_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_GT))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpgt<V_suffix>\t%0,%1,%2")
+
+;; Extract halfword.
+(define_insn "loongson_extract_halfword"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+ 		    (match_operand:SI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_EXTRACT_HALFWORD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pextr<V_suffix>\t%0,%1,%2")
+
+;; Insert halfword.
+(define_insn "loongson_insert_halfword_0"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_0))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_0\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_1"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_1))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_1\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_2"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_2))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_2\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_3))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_3\t%0,%1,%2")
+
+;; Multiply and add packed integers.
+(define_insn "loongson_mult_add"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VH 1 "register_operand" "f")
+				  (match_operand:VH 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_MULT_ADD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Maximum of signed halfwords.
+(define_insn "smax<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smax:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxs<V_suffix>\t%0,%1,%2")
+
+;; Maximum of unsigned bytes.
+(define_insn "umax<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umax:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxu<V_suffix>\t%0,%1,%2")
+
+;; Minimum of signed halfwords.
+(define_insn "smin<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smin:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmins<V_suffix>\t%0,%1,%2")
+
+;; Minimum of unsigned bytes.
+(define_insn "umin<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umin:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pminu<V_suffix>\t%0,%1,%2")
+
+;; Move byte mask.
+(define_insn "loongson_move_byte_mask"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")]
+		   UNSPEC_LOONGSON_MOVE_BYTE_MASK))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmovmsk<V_suffix>\t%0,%1")
+
+;; Multiply unsigned integers and store high result.
+(define_insn "umul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_UMUL_HIGHPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulhu<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store high result.
+(define_insn "smul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_SMUL_HIGHPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulh<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store low result.
+(define_insn "loongson_smul_lowpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_SMUL_LOWPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmull<V_suffix>\t%0,%1,%2")
+
+;; Multiply unsigned word integers.
+(define_insn "loongson_umul_word"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:VW 1 "register_operand" "f")
+		    (match_operand:VW 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_UMUL_WORD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulu<V_suffix>\t%0,%1,%2")
+
+;; Absolute difference.
+(define_insn "loongson_pasubub"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")
+		    (match_operand:VB 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PASUBUB))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2")
+
+;; Sum of unsigned byte integers.
+(define_insn "reduc_uplus_<mode>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")]
+				 UNSPEC_LOONGSON_BIADD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "biadd\t%0,%1")
+
+;; Sum of absolute differences.
+(define_insn "loongson_psadbh"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")
+				  (match_operand:VB 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PSADBH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2;biadd\t%0,%0")
+
+;; Shuffle halfwords.
+(define_insn "loongson_pshufh"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "0")
+		    (match_operand:VH 2 "register_operand" "f")
+		    (match_operand:SI 3 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSHUFH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pshufh\t%0,%2,%3")
+
+;; Shift left logical.
+(define_insn "loongson_psll<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashift:VWH (match_operand:VWH 1 "register_operand" "f")
+		    (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psll<V_suffix>\t%0,%1,%2")
+
+;; Shift right arithmetic.
+(define_insn "loongson_psra<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psra<V_suffix>\t%0,%1,%2")
+
+;; Shift right logical.
+(define_insn "loongson_psrl<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (lshiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psrl<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by wraparound.
+(define_insn "sub<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (minus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		    (match_operand:VWHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psub<V_suffix>\t%0,%1,%2")
+
+;; Subtraction of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "psubd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (minus:DI (match_operand:DI 1 "register_operand" "f")
+		  (match_operand:DI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubd\t%0,%1,%2")
+
+;; Subtraction, treating overflow by signed saturation.
+(define_insn "sssub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubs<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by unsigned saturation.
+(define_insn "ussub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubus<V_suffix>\t%0,%1,%2")
+
+;; Unpack high data.
+(define_insn "vec_interleave_high<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_UNPACK_HIGH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Unpack low data.
+(define_insn "vec_interleave_low<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_UNPACK_LOW))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2")
--- gcc/config/mips/mips-ftypes.def	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/mips-ftypes.def	(/local/gcc-2)	(revision 382)
@@ -66,6 +66,24 @@ DEF_MIPS_FTYPE (1, (SF, SF))
 DEF_MIPS_FTYPE (2, (SF, SF, SF))
 DEF_MIPS_FTYPE (1, (SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (UDI, UDI, UDI))
+DEF_MIPS_FTYPE (2, (UDI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UQI))
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV4HI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV8QI, UV8QI))
+
+DEF_MIPS_FTYPE (2, (UV8QI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV8QI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV8QI, UV8QI, UV8QI))
+
 DEF_MIPS_FTYPE (1, (V2HI, SI))
 DEF_MIPS_FTYPE (2, (V2HI, SI, SI))
 DEF_MIPS_FTYPE (3, (V2HI, SI, SI, SI))
@@ -81,12 +99,27 @@ DEF_MIPS_FTYPE (2, (V2SF, V2SF, V2SF))
 DEF_MIPS_FTYPE (3, (V2SF, V2SF, V2SF, INT))
 DEF_MIPS_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, UQI))
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V2SI, V4HI, V4HI))
+
+DEF_MIPS_FTYPE (2, (V4HI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, USI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, V4HI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, USI))
+
 DEF_MIPS_FTYPE (1, (V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V2HI, V2HI))
 DEF_MIPS_FTYPE (1, (V4QI, V4QI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, V4QI))
 
+DEF_MIPS_FTYPE (2, (V8QI, V4HI, V4HI))
+DEF_MIPS_FTYPE (1, (V8QI, V8QI))
+DEF_MIPS_FTYPE (2, (V8QI, V8QI, V8QI))
+
 DEF_MIPS_FTYPE (2, (VOID, SI, SI))
 DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI))
 DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI))
--- gcc/config/mips/mips.md	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/mips.md	(/local/gcc-2)	(revision 382)
@@ -213,6 +213,28 @@
    (UNSPEC_DPAQX_SA_W_PH	446)
    (UNSPEC_DPSQX_S_W_PH		447)
    (UNSPEC_DPSQX_SA_W_PH	448)
+
+   ;; ST Microelectronics Loongson-2E/2F.
+   (UNSPEC_LOONGSON_AVERAGE		500)
+   (UNSPEC_LOONGSON_EQ			501)
+   (UNSPEC_LOONGSON_GT			502)
+   (UNSPEC_LOONGSON_EXTRACT_HALFWORD	503)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_0	504)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_1	505)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_2	506)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_3	507)
+   (UNSPEC_LOONGSON_MULT_ADD		508)
+   (UNSPEC_LOONGSON_MOVE_BYTE_MASK	509)
+   (UNSPEC_LOONGSON_UMUL_HIGHPART	510)
+   (UNSPEC_LOONGSON_SMUL_HIGHPART	511)
+   (UNSPEC_LOONGSON_SMUL_LOWPART	512)
+   (UNSPEC_LOONGSON_UMUL_WORD		513)
+   (UNSPEC_LOONGSON_PASUBUB             514)
+   (UNSPEC_LOONGSON_BIADD		515)
+   (UNSPEC_LOONGSON_PSADBH		516)
+   (UNSPEC_LOONGSON_PSHUFH		517)
+   (UNSPEC_LOONGSON_UNPACK_HIGH		518)
+   (UNSPEC_LOONGSON_UNPACK_LOW		519)
   ]
 )
 
@@ -494,7 +516,11 @@
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
-  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
+  [DI DF
+   (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "HAVE_LOONGSON_VECTOR_MODES")
+   (V4HI "HAVE_LOONGSON_VECTOR_MODES")
+   (V8QI "HAVE_LOONGSON_VECTOR_MODES")])
 
 ;; 128-bit modes for which we provide move patterns on 64-bit targets.
 (define_mode_iterator MOVE128 [TF])
@@ -521,6 +547,9 @@
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (V2SF "!TARGET_64BIT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
+   (V4HI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
+   (V8QI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
    (TF "TARGET_64BIT && TARGET_FLOAT64")])
 
 ;; In GPR templates, a string like "<d>subu" will expand to "subu" in the
@@ -573,7 +602,9 @@
 
 ;; This attribute gives the integer mode that has half the size of
 ;; the controlling mode.
-(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")])
+(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI")
+                            (V2SI "SI") (V4HI "SI") (V8QI "SI")
+			    (TF "DI")])
 
 ;; This attribute works around the early SB-1 rev2 core "F2" erratum:
 ;;
@@ -6406,3 +6437,6 @@
 
 ; MIPS fixed-point instructions.
 (include "mips-fixed.md")
+
+; ST-Microelectronics Loongson-2E/2F-specific patterns.
+(include "loongson.md")
--- gcc/config/mips/mips-protos.h	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/mips-protos.h	(/local/gcc-2)	(revision 382)
@@ -303,4 +303,6 @@ union mips_gen_fn_ptrs
 extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 				     rtx, rtx, rtx, rtx);
 
+extern void mips_expand_vector_init (rtx, rtx);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
--- gcc/config/mips/loongson.h	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/loongson.h	(/local/gcc-2)	(revision 382)
@@ -0,0 +1,693 @@
+/* Intrinsics for ST Microelectronics Loongson-2E/2F SIMD operations.
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_LOONGSON_H
+#define _GCC_LOONGSON_H
+
+#if !defined(__mips_loongson_vector_rev)
+# error "You must select -march=loongson2e or -march=loongson2f to use loongson.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Vectors of unsigned bytes, halfwords and words.  */
+typedef uint8_t uint8x8_t __attribute__((vector_size (8)));
+typedef uint16_t uint16x4_t __attribute__((vector_size (8)));
+typedef uint32_t uint32x2_t __attribute__((vector_size (8)));
+
+/* Vectors of signed bytes, halfwords and words.  */
+typedef int8_t int8x8_t __attribute__((vector_size (8)));
+typedef int16_t int16x4_t __attribute__((vector_size (8)));
+typedef int32_t int32x2_t __attribute__((vector_size (8)));
+
+/* SIMD intrinsics.
+   Unless otherwise noted, calls to the functions below will expand into
+   precisely one machine instruction, modulo any moves required to
+   satisfy register allocation constraints.  */
+
+/* Pack with signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+packsswh (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_packsswh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+packsshb (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_packsshb (s, t);
+}
+
+/* Pack with unsigned saturation.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+packushb (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_packushb (s, t);
+}
+
+/* Vector addition, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+paddw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_paddw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+paddw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_paddw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddb_s (s, t);
+}
+
+/* Addition of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+paddd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_paddd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+paddd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_paddd_s (s, t);
+}
+
+/* Vector addition, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddsb (s, t);
+}
+
+/* Vector addition, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddusb (s, t);
+}
+
+/* Logical AND NOT.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+pandn_ud (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_pandn_ud (s, t);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pandn_uw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pandn_uw (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pandn_uh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pandn_uh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pandn_ub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pandn_ub (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pandn_sd (int64_t s, int64_t t)
+{
+  return __builtin_loongson_pandn_sd (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pandn_sw (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pandn_sw (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pandn_sh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pandn_sh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pandn_sb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pandn_sb (s, t);
+}
+
+/* Average.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pavgh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pavgh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pavgb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pavgb (s, t);
+}
+
+/* Equality test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_s (s, t);
+}
+
+/* Greater-than test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpgth_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpgth_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_s (s, t);
+}
+
+/* Extract halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pextrh_u (uint16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_u (s, field);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pextrh_s (int16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_s (s, field);
+}
+
+/* Insert halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_u (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_s (s, t);
+}
+
+/* Multiply and add.  */
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pmaddhw (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaddhw (s, t);
+}
+
+/* Maximum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmaxsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaxsh (s, t);
+}
+
+/* Maximum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmaxub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pmaxub (s, t);
+}
+
+/* Minimum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pminsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pminsh (s, t);
+}
+
+/* Minimum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pminub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pminub (s, t);
+}
+
+/* Move byte mask.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmovmskb_u (uint8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_u (s);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pmovmskb_s (int8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_s (s);
+}
+
+/* Multiply unsigned integers and store high result.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pmulhuh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pmulhuh (s, t);
+}
+
+/* Multiply signed integers and store high result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmulhh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmulhh (s, t);
+}
+
+/* Multiply signed integers and store low result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmullh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmullh (s, t);
+}
+
+/* Multiply unsigned word integers.  */
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pmuluw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pmuluw (s, t);
+}
+
+/* Absolute difference.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pasubub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pasubub (s, t);
+}
+
+/* Sum of unsigned byte integers.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+biadd (uint8x8_t s)
+{
+  return __builtin_loongson_biadd (s);
+}
+
+/* Sum of absolute differences.
+   Note that this intrinsic expands into two machine instructions:
+   PASUBUB followed by BIADD.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psadbh (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psadbh (s, t);
+}
+
+/* Shuffle halfwords.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_u (dest, s, order);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_s (dest, s, order);
+}
+
+/* Shift left logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psllh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psllh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psllw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psllw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_s (s, amount);
+}
+
+/* Shift right logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrlh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrlh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psrlw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psrlw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_s (s, amount);
+}
+
+/* Shift right arithmetic.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrah_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrah_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psraw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psraw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_s (s, amount);
+}
+
+/* Vector subtraction, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psubw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_psubw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psubw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_psubw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubb_s (s, t);
+}
+
+/* Subtraction of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+psubd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_psubd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+psubd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_psubd_s (s, t);
+}
+
+/* Vector subtraction, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubsb (s, t);
+}
+
+/* Vector subtraction, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubusb (s, t);
+}
+
+/* Unpack high data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpckhwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpckhhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpckhbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpckhwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpckhhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpckhbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_s (s, t);
+}
+
+/* Unpack low data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpcklwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpcklhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpcklbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpcklwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpcklhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpcklbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_s (s, t);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- gcc/config/mips/mips.c	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/mips.c	(/local/gcc-2)	(revision 382)
@@ -3531,6 +3531,12 @@ mips_split_doubleword_move (rtx dest, rt
 	emit_insn (gen_move_doubleword_fprdf (dest, src));
       else if (!TARGET_64BIT && GET_MODE (dest) == V2SFmode)
 	emit_insn (gen_move_doubleword_fprv2sf (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V2SImode)
+	emit_insn (gen_move_doubleword_fprv2si (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V4HImode)
+	emit_insn (gen_move_doubleword_fprv4hi (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V8QImode)
+	emit_insn (gen_move_doubleword_fprv8qi (dest, src));
       else if (TARGET_64BIT && GET_MODE (dest) == TFmode)
 	emit_insn (gen_move_doubleword_fprtf (dest, src));
       else
@@ -8922,6 +8928,14 @@ mips_hard_regno_mode_ok_p (unsigned int 
       if (mode == TFmode && ISA_HAS_8CC)
 	return true;
 
+      /* Allow 64-bit vector modes for Loongson-2E/2F.  */
+      if (HAVE_LOONGSON_VECTOR_MODES
+	  && (mode == V2SImode
+	      || mode == V4HImode
+	      || mode == V8QImode
+	      || mode == DImode))
+	return true;
+
       if (class == MODE_FLOAT
 	  || class == MODE_COMPLEX_FLOAT
 	  || class == MODE_VECTOR_FLOAT)
@@ -9268,6 +9282,11 @@ mips_vector_mode_supported_p (enum machi
     case V4UQQmode:
       return TARGET_DSP;
 
+    case V2SImode:
+    case V4HImode:
+    case V8QImode:
+      return HAVE_LOONGSON_VECTOR_MODES;
+
     default:
       return false;
     }
@@ -10388,6 +10407,213 @@ static const struct mips_builtin_descrip
   DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2)
 };
 
+/* Define a Loongson MIPS_BUILTIN_DIRECT function for instruction
+   CODE_FOR_mips_<INSN>.  FUNCTION_TYPE and TARGET_FLAGS are
+   builtin_description fields.  */
+#define LOONGSON_BUILTIN(FN_NAME, INSN, FUNCTION_TYPE)		\
+  { CODE_FOR_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
+    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, 0 }
+
+/* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
+static const struct mips_builtin_description mips_loongson_2ef_bdesc [] =
+{
+  /* Pack with signed saturation.  */
+  LOONGSON_BUILTIN (packsswh, vec_pack_ssat_v2si,
+                    MIPS_V4HI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (packsshb, vec_pack_ssat_v4hi,
+                    MIPS_V8QI_FTYPE_V4HI_V4HI),
+  /* Pack with unsigned saturation.  */
+  LOONGSON_BUILTIN (packushb, vec_pack_usat_v4hi,
+                    MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
+  /* Vector addition, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddw_u, addv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (paddh_u, addv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddb_u, addv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (paddw_s, addv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (paddh_s, addv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddb_s, addv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Addition of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddd_u, paddd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (paddd_s, paddd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector addition, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (paddsh, ssaddv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddsb, ssaddv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector addition, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (paddush, usaddv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddusb, usaddv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Logical AND NOT.  */
+  LOONGSON_BUILTIN (pandn_ud, loongson_and_not_di, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (pandn_uw, loongson_and_not_v2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pandn_uh, loongson_and_not_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pandn_ub, loongson_and_not_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pandn_sd, loongson_and_not_di, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (pandn_sw, loongson_and_not_v2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pandn_sh, loongson_and_not_v4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pandn_sb, loongson_and_not_v8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Average.  */
+  LOONGSON_BUILTIN (pavgh, loongson_average_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pavgb, loongson_average_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Equality test.  */
+  LOONGSON_BUILTIN (pcmpeqw_u, loongson_eq_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpeqh_u, loongson_eq_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpeqb_u, loongson_eq_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpeqw_s, loongson_eq_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpeqh_s, loongson_eq_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpeqb_s, loongson_eq_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Greater-than test.  */
+  LOONGSON_BUILTIN (pcmpgtw_u, loongson_gt_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpgth_u, loongson_gt_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpgtb_u, loongson_gt_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpgtw_s, loongson_gt_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpgth_s, loongson_gt_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpgtb_s, loongson_gt_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Extract halfword.  */
+  LOONGSON_BUILTIN (pextrh_u, loongson_extract_halfword,
+  		    MIPS_UV4HI_FTYPE_UV4HI_USI),
+  LOONGSON_BUILTIN (pextrh_s, loongson_extract_halfword,
+  		    MIPS_V4HI_FTYPE_V4HI_USI),
+  /* Insert halfword.  */
+  LOONGSON_BUILTIN (pinsrh_0_u, loongson_insert_halfword_0,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_1_u, loongson_insert_halfword_1,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_2_u, loongson_insert_halfword_2,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_3_u, loongson_insert_halfword_3,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_0_s, loongson_insert_halfword_0,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_1_s, loongson_insert_halfword_1,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_2_s, loongson_insert_halfword_2,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_3_s, loongson_insert_halfword_3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply and add.  */
+  LOONGSON_BUILTIN (pmaddhw, loongson_mult_add,
+  		    MIPS_V2SI_FTYPE_V4HI_V4HI),
+  /* Maximum of signed halfwords.  */
+  LOONGSON_BUILTIN (pmaxsh, smaxv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Maximum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pmaxub, umaxv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Minimum of signed halfwords.  */
+  LOONGSON_BUILTIN (pminsh, sminv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Minimum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pminub, uminv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Move byte mask.  */
+  LOONGSON_BUILTIN (pmovmskb_u, loongson_move_byte_mask,
+  		    MIPS_UV8QI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN (pmovmskb_s, loongson_move_byte_mask,
+  		    MIPS_V8QI_FTYPE_V8QI),
+  /* Multiply unsigned integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhuh, umulv4hi3_highpart,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  /* Multiply signed integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhh, smulv4hi3_highpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply signed integers and store low result.  */
+  LOONGSON_BUILTIN (pmullh, loongson_smul_lowpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply unsigned word integers.  */
+  LOONGSON_BUILTIN (pmuluw, loongson_umul_word,
+  		    MIPS_UDI_FTYPE_UV2SI_UV2SI),
+  /* Absolute difference.  */
+  LOONGSON_BUILTIN (pasubub, loongson_pasubub,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Sum of unsigned byte integers.  */
+  LOONGSON_BUILTIN (biadd, reduc_uplus_v8qi,
+		    MIPS_UV4HI_FTYPE_UV8QI),
+  /* Sum of absolute differences.  */
+  LOONGSON_BUILTIN (psadbh, loongson_psadbh,
+  		    MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
+  /* Shuffle halfwords.  */
+  LOONGSON_BUILTIN (pshufh_u, loongson_pshufh,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
+  LOONGSON_BUILTIN (pshufh_s, loongson_pshufh,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
+  /* Shift left logical.  */
+  LOONGSON_BUILTIN (psllh_u, loongson_psllv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psllh_s, loongson_psllv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psllw_u, loongson_psllv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psllw_s, loongson_psllv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right arithmetic.  */
+  LOONGSON_BUILTIN (psrah_u, loongson_psrav4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrah_s, loongson_psrav4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psraw_u, loongson_psrav2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psraw_s, loongson_psrav2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right logical.  */
+  LOONGSON_BUILTIN (psrlh_u, loongson_psrlv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrlh_s, loongson_psrlv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psrlw_u, loongson_psrlv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psrlw_s, loongson_psrlv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Vector subtraction, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubw_u, subv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (psubh_u, subv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubb_u, subv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (psubw_s, subv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (psubh_s, subv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubb_s, subv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Subtraction of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubd_u, psubd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (psubd_s, psubd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector subtraction, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (psubsh, sssubv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubsb, sssubv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector subtraction, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (psubush, ussubv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubusb, ussubv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Unpack high data.  */
+  LOONGSON_BUILTIN (punpckhbh_u, vec_interleave_highv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpckhhw_u, vec_interleave_highv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpckhwd_u, vec_interleave_highv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpckhbh_s, vec_interleave_highv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpckhhw_s, vec_interleave_highv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpckhwd_s, vec_interleave_highv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  /* Unpack low data.  */
+  LOONGSON_BUILTIN (punpcklbh_u, vec_interleave_lowv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpcklhw_u, vec_interleave_lowv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpcklwd_u, vec_interleave_lowv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpcklbh_s, vec_interleave_lowv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpcklhw_s, vec_interleave_lowv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpcklwd_s, vec_interleave_lowv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI)
+};
+
 /* This structure describes an array of mips_builtin_description entries.  */
 struct mips_bdesc_map {
   /* The array that this entry describes.  */
@@ -10411,7 +10637,9 @@ static const struct mips_bdesc_map mips_
   { mips_sb1_bdesc, ARRAY_SIZE (mips_sb1_bdesc), PROCESSOR_SB1, 0 },
   { mips_dsp_bdesc, ARRAY_SIZE (mips_dsp_bdesc), PROCESSOR_MAX, 0 },
   { mips_dsp_32only_bdesc, ARRAY_SIZE (mips_dsp_32only_bdesc),
-    PROCESSOR_MAX, MASK_64BIT }
+    PROCESSOR_MAX, MASK_64BIT },
+  { mips_loongson_2ef_bdesc, ARRAY_SIZE (mips_loongson_2ef_bdesc),
+    PROCESSOR_MAX, 0 }
 };
 
 /* MODE is a vector mode whose elements have type TYPE.  Return the type
@@ -10420,11 +10648,17 @@ static const struct mips_bdesc_map mips_
 static tree
 mips_builtin_vector_type (tree type, enum machine_mode mode)
 {
-  static tree types[(int) MAX_MACHINE_MODE];
+  static tree types[2 * (int) MAX_MACHINE_MODE];
+  int mode_index;
+
+  mode_index = (int) mode;
 
-  if (types[(int) mode] == NULL_TREE)
-    types[(int) mode] = build_vector_type_for_mode (type, mode);
-  return types[(int) mode];
+  if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type))
+    mode_index += MAX_MACHINE_MODE;
+
+  if (types[mode_index] == NULL_TREE)
+    types[mode_index] = build_vector_type_for_mode (type, mode);
+  return types[mode_index];
 }
 
 /* Source-level argument types.  */
@@ -10433,16 +10667,27 @@ mips_builtin_vector_type (tree type, enu
 #define MIPS_ATYPE_POINTER ptr_type_node
 
 /* Standard mode-based argument types.  */
+#define MIPS_ATYPE_UQI unsigned_intQI_type_node
 #define MIPS_ATYPE_SI intSI_type_node
 #define MIPS_ATYPE_USI unsigned_intSI_type_node
 #define MIPS_ATYPE_DI intDI_type_node
+#define MIPS_ATYPE_UDI unsigned_intDI_type_node
 #define MIPS_ATYPE_SF float_type_node
 #define MIPS_ATYPE_DF double_type_node
 
 /* Vector argument types.  */
 #define MIPS_ATYPE_V2SF mips_builtin_vector_type (float_type_node, V2SFmode)
 #define MIPS_ATYPE_V2HI mips_builtin_vector_type (intHI_type_node, V2HImode)
+#define MIPS_ATYPE_V2SI mips_builtin_vector_type (intSI_type_node, V2SImode)
 #define MIPS_ATYPE_V4QI mips_builtin_vector_type (intQI_type_node, V4QImode)
+#define MIPS_ATYPE_V4HI mips_builtin_vector_type (intHI_type_node, V4HImode)
+#define MIPS_ATYPE_V8QI mips_builtin_vector_type (intQI_type_node, V8QImode)
+#define MIPS_ATYPE_UV2SI					\
+  mips_builtin_vector_type (unsigned_intSI_type_node, V2SImode)
+#define MIPS_ATYPE_UV4HI					\
+  mips_builtin_vector_type (unsigned_intHI_type_node, V4HImode)
+#define MIPS_ATYPE_UV8QI					\
+  mips_builtin_vector_type (unsigned_intQI_type_node, V8QImode)
 
 /* MIPS_FTYPE_ATYPESN takes N MIPS_FTYPES-like type codes and lists
    their associated MIPS_ATYPEs.  */
@@ -10500,10 +10745,14 @@ mips_init_builtins (void)
        m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
        m++)
     {
+      bool loongson_p = (m->bdesc == mips_loongson_2ef_bdesc);
+
       if ((m->proc == PROCESSOR_MAX || m->proc == mips_arch)
-	  && (m->unsupported_target_flags & target_flags) == 0)
+ 	  && (m->unsupported_target_flags & target_flags) == 0
+ 	  && (!loongson_p || HAVE_LOONGSON_VECTOR_MODES))
 	for (d = m->bdesc; d < &m->bdesc[m->size]; d++)
-	  if ((d->target_flags & target_flags) == d->target_flags)
+ 	  if (((d->target_flags & target_flags) == d->target_flags)
+ 	      || loongson_p)
 	    add_builtin_function (d->name,
 				  mips_build_function_type (d->function_type),
 				  d - m->bdesc + offset,
@@ -12603,6 +12852,30 @@ mips_order_regs_for_local_alloc (void)
       reg_alloc_order[24] = 0;
     }
 }
+
+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode;
+  enum machine_mode inner;
+  unsigned int i, n_elts;
+  rtx mem;
+
+  mode = GET_MODE (target);
+  inner = GET_MODE_INNER (mode);
+  n_elts = GET_MODE_NUNITS (mode);
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
 \f
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
--- gcc/config/mips/mips.h	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/mips.h	(/local/gcc-2)	(revision 382)
@@ -266,6 +266,12 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
 
+/* Whether vector modes and intrinsics for ST Microelectronics
+   Loongson-2E/2F processors should be enabled.  In o32 pairs of
+   floating-point registers provide 64-bit values.  */
+#define HAVE_LOONGSON_VECTOR_MODES  (TARGET_LOONGSON_2EF	\
+				     && TARGET_HARD_FLOAT)
+
 /* True if the pre-reload scheduler should try to create chains of
    multiply-add or multiply-subtract instructions.  For example,
    suppose we have:
@@ -496,6 +502,10 @@ enum mips_code_readable_setting {
 	  builtin_define_std ("MIPSEL");				\
 	  builtin_define ("_MIPSEL");					\
 	}								\
+                                                                        \
+      /* Whether Loongson vector modes are enabled.  */                 \
+      if (HAVE_LOONGSON_VECTOR_MODES)                                   \
+        builtin_define ("__mips_loongson_vector_rev");                  \
 									\
       /* Macros dependent on the C dialect.  */				\
       if (preprocessing_asm_p ())					\
--- gcc/config/mips/mips-modes.def	(/local/gcc-trunk)	(revision 382)
+++ gcc/config/mips/mips-modes.def	(/local/gcc-2)	(revision 382)
@@ -26,6 +26,7 @@ RESET_FLOAT_FORMAT (DF, mips_double_form
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
+VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
 

Property changes on: 
___________________________________________________________________
Name: svk:merge
  23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-pta-dev:259
  23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-trunk:531
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:371
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:370
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-09 18:24           ` Maxim Kuvyrkov
@ 2008-06-10  7:32             ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-10  7:32 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Ruan Beihong, gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Ruan Beihong wrote:
>> Hi every one,
>> There are something special with Loongson.
>> See below. (Extracted from binutils-2.18.50.0.5/opcodes/mips-opc.c) I
>> wonder if these instruction would be in gcc.
>
> ...
>
>> Those instruction is designed for using FPU to to do some easy task of
>> ALU thus reducing usage of m[ft]c1.
>> There are both mov.d and mov.ps in Loongson and one more: "or" (1051) on FPU.
>
> These instructions operate on 32-bit or 64-bit integer values placed 
> into FP registers.  They do indeed offload integer ALU, but I don't see 
> how they can reduce usage of m[ft]c1.  On the contrary, additional 
> m[ft]c1 instructions will be needed to transfer data between integer and 
> fp ALUs.

FWIW, if these insns can be safely applied to vector data[*],
you could use AND, OR, NOR and XOR as vector operations.
And as discussed, you could probably use them for FPR<->FPR
vector moves.

  [*] Such as if the Loongson FPU treats vector data as a
      variant of the FPU L format, so that 64-bit integer
      operations can be freely mixed with vector operations.

No requirement to do that, of course.  Just saying.

> Anyway, these instructions are not yet supported because there is no 
> effective optimization to transfer integer ALU load to fp ALU. 
> Scheduler may be a good place for this, but it is not implemented.

Agreed that we don't have the appropriate load-balancing infrastructure.
Even so, perhaps exposing these operations to the register allocator
might avoid spills in some cases?

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-09 18:27             ` Maxim Kuvyrkov
@ 2008-06-10 10:29               ` Richard Sandiford
  2008-06-11 10:09                 ` Maxim Kuvyrkov
  2008-06-11 20:34                 ` Maxim Kuvyrkov
  0 siblings, 2 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-10 10:29 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Hmm.  If this is a newly-defined interface, I really have to question
>> the wisdom of these functions.  The wording above suggests that there's
>> something "unstable" about normal C pointer and array accesses.
>> There shouldn't be ;)  They ought to work as expected.
>> 
>> The patch rightly uses well-known insn names for well-known operations
>> like vector addition, vector maximum, and so on.  As well as allowing
>> autovectorisation, I believe this means you could write:
>> 
>>     uint8x8_t *a;
>> 
>>     a[0] = a[1] + a[2];
>> 
>> (It might be nice to have tests to make sure that this does indeed
>> work when using the new header file.  It could just be cut-&-paste
>> from the version that uses intrinsic functions.)
>> 
>> I just think that, given GCC's vector extensions, having these
>> functions as well is confusing.  I take what you say about it
>> being consistent with arm_neon.h, but AltiVec doesn't have these
>> sorts of function, and GCC's generic vector support was heavily
>> influenced by AltiVec.
>
> OK, I removed vec_load_* and vec_store_* helpers along with the 
> paragraph in extend.texi.

Thanks.

> Also I fixed existing tests, but didn't add any new tests, like testing 
> vector '+'.  If you think these new tests are really worthy, I'll add 
> them in separate patch.

I think it's worth it, but feel free to do it separately.

It might also be worth adding scan-assembler tests for testers without
Loongson support.  See below for one problem that scan-assembler tests
would have caught.

> Any further comments?

A couple, I'm afraid ;)

> 	(mips_builtin_vector_type): Handle unsigned versions of vector modes.
> 	Add new parameter for that.
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
No longer true.

> +;; Expander to legitimize moves involving values of vector modes.
> +(define_expand "mov<mode>"
> +  [(set (match_operand:VWHB 0)
> +	(match_operand:VWHB 1))]
> +  ""
> +{
> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
> +    DONE;
> +})

We probably ought to use an insn condition here to restrict the moves
to targets that support the modes.  (Insn conditions are checked,
unlike predicates.)

> +;; Addition of doubleword integers stored in FP registers.
> +;; Overflow is treated by wraparound.
> +(define_insn "paddd"
> +  [(set (match_operand:DI 0 "register_operand" "=f")
> +        (plus:DI (match_operand:DI 1 "register_operand" "f")
> +		 (match_operand:DI 2 "register_operand" "f")))]
> +  "HAVE_LOONGSON_VECTOR_MODES"
> +  "paddd\t%0,%1,%2")

I don't think this pattern or psubd will ever be used for 64-bit ABIs;
they'll be trumped by the normal addition and subtraction patterns.
Thus paddd (...) and psub (...) will actually expand to "daddu" and
"dsubu", moving to and from FPRs if necessary.  Also, you _might_ end up
using these patterns for 64-bit addition on 32-bit ABIs, even though the
cost of moving to and from FPRs is higher than the usual add/shift
sequence.

> +/* Define a Loongson MIPS_BUILTIN_DIRECT function for instruction
> +   CODE_FOR_mips_<INSN>.  FUNCTION_TYPE and TARGET_FLAGS are
> +   builtin_description fields.  */
> +#define LOONGSON_BUILTIN(FN_NAME, INSN, FUNCTION_TYPE)		\
> +  { CODE_FOR_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
> +    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, 0 }

Comment doesn't match code: not CODE_FOR_mips, and no TARGET_FLAGS.

> +/* Vectors of unsigned bytes, halfwords and words.  */
> +typedef uint8_t uint8x8_t __attribute__((vector_size (8)));
> +typedef uint16_t uint16x4_t __attribute__((vector_size (8)));
> +typedef uint32_t uint32x2_t __attribute__((vector_size (8)));
> +
> +/* Vectors of signed bytes, halfwords and words.  */
> +typedef int8_t int8x8_t __attribute__((vector_size (8)));
> +typedef int16_t int16x4_t __attribute__((vector_size (8)));
> +typedef int32_t int32x2_t __attribute__((vector_size (8)));

More of a heads-up than anything, but the foo32x2_t definitions don't
seem to work with newlib.  Things work with "uint32_t" replaced by
"unsigned int" and "int32_t" replaced by "int".  (Maybe newlib uses
"long" instead?)

Also, I fluffed the last review.  I said that HAVE_LOONGSON_MODES
ought to check TARGET_HARD_FLOAT, but as discussed in the thread
for the other patch, it should be TARGET_HARD_FLOAT_ABI.  We should
then check TARGET_HARD_FLOAT at insn-generation time.  Thus we
register the functions if TARGET_HARD_FLOAT_ABI, and allow the
modes if TARGET_HARD_FLOAT_ABI.

(Yes, this whole ISA_HAS_*/GENERATE_*/TARGET_* area needs a revamp.
It's on my list...)

Since it was my mistake, I've updated the patch with that change.

I also picked another nit.  Sometimes the patterns were named
after the insn:

    (define_insn "loongson_pasubub"

Sometimes they were named after the insn, but with modes instead of
the Loongson vector suffixes:

    (define_insn "loongson_psra<mode>"

And sometimes they used English descriptions:

    (define_insn "loongson_and_not_<mode>"

I'd rather stick to the first, like we do for other built-in functions.
I also changed the UNSPEC_LOONGSON_* numbers accordingly.

I also changed the built-in function descriptions to match the style
used elsewhere.

Finally, I adjusted the patch so that it applies on top of the
built-in-table patch I sent yesterday in the [3/5] thread.

I've not done anything about the paddd/psubd thing; I'll leave
that to you ;)  Otherwise, does this look OK to you?

Richard


Adjusted gcc/ changelog:

	* config/mips/mips-modes.def: Add V8QI, V4HI and V2SI modes.
	* config/mips/mips-protos.h (mips_expand_vector_init): New.
	* config/mips/mips-ftypes.def: Add function types for Loongson-2E/2F
	builtins.
	* config/mips/mips.c (mips_split_doubleword_move): Handle new modes.
	(mips_hard_regno_mode_ok_p): Allow 64-bit vector modes for Loongson.
	(mips_vector_mode_supported_p): Add V2SImode, V4HImode and
	V8QImode cases.
	(LOONGSON_BUILTIN, LOONGSON_BUILTIN_ALIAS): New.
	(CODE_FOR_loongson_packsswh, CODE_FOR_loongson_packsshb,
	(CODE_FOR_loongson_packushb, CODE_FOR_loongson_paddw,
	(CODE_FOR_loongson_paddh, CODE_FOR_loongson_paddb,
	(CODE_FOR_loongson_paddsh, CODE_FOR_loongson_paddsb)
	(CODE_FOR_loongson_paddush, CODE_FOR_loongson_paddusb)
	(CODE_FOR_loongson_pmaxsh, CODE_FOR_loongson_pmaxub)
	(CODE_FOR_loongson_pminsh, CODE_FOR_loongson_pminub)
	(CODE_FOR_loongson_pmulhuh, CODE_FOR_loongson_pmulhh)
	(CODE_FOR_loongson_biadd, CODE_FOR_loongson_psubw)
	(CODE_FOR_loongson_psubh, CODE_FOR_loongson_psubb)
	(CODE_FOR_loongson_psubsh, CODE_FOR_loongson_psubsb)
	(CODE_FOR_loongson_psubush, CODE_FOR_loongson_psubusb)
	(CODE_FOR_loongson_punpckhbh, CODE_FOR_loongson_punpckhhw)
	(CODE_FOR_loongson_punpckhwd, CODE_FOR_loongson_punpcklbh)
	(CODE_FOR_loongson_punpcklhw, CODE_FOR_loongson_punpcklwd): New.
	(mips_builtins): Add Loongson builtins.
	(mips_loongson_2ef_bdesc): New.
	(mips_bdesc_arrays): Add mips_loongson_2ef_bdesc.
	(mips_builtin_vector_type): Handle unsigned versions of vector modes.
	(MIPS_ATYPE_UQI, MIPS_ATYPE_UDI, MIPS_ATYPE_V2SI, MIPS_ATYPE_UV2SI)
	(MIPS_ATYPE_V4HI, MIPS_ATYPE_UV4HI, MIPS_ATYPE_V8QI, MIPS_ATYPE_UV8QI):
	New.
	(mips_expand_vector_init): New.
	* config/mips/mips.h (HAVE_LOONGSON_VECTOR_MODES): New.
	(TARGET_CPU_CPP_BUILTINS): Define __mips_loongson_vector_rev
	if appropriate.
	* config/mips/mips.md: Add unspec numbers for Loongson
	builtins.  Include loongson.md.
	(MOVE64): Include Loongson vector modes.
	(SPLITF): Include Loongson vector modes.
	(HALFMODE): Handle Loongson vector modes.
	* config/mips/loongson.md: New.
	* config/mips/loongson.h: New.
	* config.gcc: Add loongson.h header for mips*-*-* targets.
	* doc/extend.texi (MIPS Loongson Built-in Functions): New.

Index: gcc/config/mips/mips-modes.def
===================================================================
--- gcc/config/mips/mips-modes.def	2008-06-10 08:47:42.000000000 +0100
+++ gcc/config/mips/mips-modes.def	2008-06-10 08:47:43.000000000 +0100
@@ -26,6 +26,7 @@ RESET_FLOAT_FORMAT (DF, mips_double_form
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
+VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
 
Index: gcc/config/mips/mips-protos.h
===================================================================
--- gcc/config/mips/mips-protos.h	2008-06-10 08:47:41.000000000 +0100
+++ gcc/config/mips/mips-protos.h	2008-06-10 08:47:43.000000000 +0100
@@ -303,4 +303,6 @@ extern bool mips16e_save_restore_pattern
 extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 				     rtx, rtx, rtx, rtx);
 
+extern void mips_expand_vector_init (rtx, rtx);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
Index: gcc/config/mips/mips-ftypes.def
===================================================================
--- gcc/config/mips/mips-ftypes.def	2008-06-10 08:47:41.000000000 +0100
+++ gcc/config/mips/mips-ftypes.def	2008-06-10 08:47:43.000000000 +0100
@@ -66,6 +66,24 @@ DEF_MIPS_FTYPE (1, (SF, SF))
 DEF_MIPS_FTYPE (2, (SF, SF, SF))
 DEF_MIPS_FTYPE (1, (SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (UDI, UDI, UDI))
+DEF_MIPS_FTYPE (2, (UDI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UQI))
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV4HI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV8QI, UV8QI))
+
+DEF_MIPS_FTYPE (2, (UV8QI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV8QI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV8QI, UV8QI, UV8QI))
+
 DEF_MIPS_FTYPE (1, (V2HI, SI))
 DEF_MIPS_FTYPE (2, (V2HI, SI, SI))
 DEF_MIPS_FTYPE (3, (V2HI, SI, SI, SI))
@@ -81,12 +99,27 @@ DEF_MIPS_FTYPE (2, (V2SF, V2SF, V2SF))
 DEF_MIPS_FTYPE (3, (V2SF, V2SF, V2SF, INT))
 DEF_MIPS_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, UQI))
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V2SI, V4HI, V4HI))
+
+DEF_MIPS_FTYPE (2, (V4HI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, USI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, V4HI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, USI))
+
 DEF_MIPS_FTYPE (1, (V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V2HI, V2HI))
 DEF_MIPS_FTYPE (1, (V4QI, V4QI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, V4QI))
 
+DEF_MIPS_FTYPE (2, (V8QI, V4HI, V4HI))
+DEF_MIPS_FTYPE (1, (V8QI, V8QI))
+DEF_MIPS_FTYPE (2, (V8QI, V8QI, V8QI))
+
 DEF_MIPS_FTYPE (2, (VOID, SI, SI))
 DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI))
 DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI))
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2008-06-10 08:47:41.000000000 +0100
+++ gcc/config/mips/mips.c	2008-06-10 10:25:37.000000000 +0100
@@ -3532,6 +3532,12 @@ mips_split_doubleword_move (rtx dest, rt
 	emit_insn (gen_move_doubleword_fprdf (dest, src));
       else if (!TARGET_64BIT && GET_MODE (dest) == V2SFmode)
 	emit_insn (gen_move_doubleword_fprv2sf (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V2SImode)
+	emit_insn (gen_move_doubleword_fprv2si (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V4HImode)
+	emit_insn (gen_move_doubleword_fprv4hi (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V8QImode)
+	emit_insn (gen_move_doubleword_fprv8qi (dest, src));
       else if (TARGET_64BIT && GET_MODE (dest) == TFmode)
 	emit_insn (gen_move_doubleword_fprtf (dest, src));
       else
@@ -8960,6 +8966,14 @@ mips_hard_regno_mode_ok_p (unsigned int 
       if (mode == TFmode && ISA_HAS_8CC)
 	return true;
 
+      /* Allow 64-bit vector modes for Loongson-2E/2F.  */
+      if (TARGET_LOONGSON_VECTORS
+	  && (mode == V2SImode
+	      || mode == V4HImode
+	      || mode == V8QImode
+	      || mode == DImode))
+	return true;
+
       if (class == MODE_FLOAT
 	  || class == MODE_COMPLEX_FLOAT
 	  || class == MODE_VECTOR_FLOAT)
@@ -9323,6 +9337,11 @@ mips_vector_mode_supported_p (enum machi
     case V4UQQmode:
       return TARGET_DSP;
 
+    case V2SImode:
+    case V4HImode:
+    case V8QImode:
+      return TARGET_LOONGSON_VECTORS;
+
     default:
       return false;
     }
@@ -10192,6 +10211,7 @@ AVAIL_NON_MIPS16 (dsp, TARGET_DSP)
 AVAIL_NON_MIPS16 (dspr2, TARGET_DSPR2)
 AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
+AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_VECTORS)
 
 /* Construct a mips_builtin_description from the given arguments.
 
@@ -10288,6 +10308,25 @@ #define BPOSGE_BUILTIN(VALUE, AVAIL)				
   MIPS_BUILTIN (bposge, f, "bposge" #VALUE,				\
 		MIPS_BUILTIN_BPOSGE ## VALUE, MIPS_SI_FTYPE_VOID, AVAIL)
 
+/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<FN_NAME>
+   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
+   builtin_description field.  */
+#define LOONGSON_BUILTIN_ALIAS(INSN, FN_NAME, FUNCTION_TYPE)		\
+  { CODE_FOR_loongson_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
+    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, mips_builtin_avail_loongson }
+
+/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<INSN>
+   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
+   builtin_description field.  */
+#define LOONGSON_BUILTIN(INSN, FUNCTION_TYPE)				\
+  LOONGSON_BUILTIN_ALIAS (INSN, INSN, FUNCTION_TYPE)
+
+/* Like LOONGSON_BUILTIN, but add _<SUFFIX> to the end of the function name.
+   We use functions of this form when the same insn can be usefully applied
+   to more than one datatype.  */
+#define LOONGSON_BUILTIN_SUFFIX(INSN, SUFFIX, FUNCTION_TYPE)		\
+  LOONGSON_BUILTIN_ALIAS (INSN, INSN ## _ ## SUFFIX, FUNCTION_TYPE)
+
 #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2
 #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3
 #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3
@@ -10295,6 +10334,37 @@ #define CODE_FOR_mips_subq_ph CODE_FOR_s
 #define CODE_FOR_mips_subu_qb CODE_FOR_subv4qi3
 #define CODE_FOR_mips_mul_ph CODE_FOR_mulv2hi3
 
+#define CODE_FOR_loongson_packsswh CODE_FOR_vec_pack_ssat_v2si
+#define CODE_FOR_loongson_packsshb CODE_FOR_vec_pack_ssat_v4hi
+#define CODE_FOR_loongson_packushb CODE_FOR_vec_pack_usat_v4hi
+#define CODE_FOR_loongson_paddw CODE_FOR_addv2si3
+#define CODE_FOR_loongson_paddh CODE_FOR_addv4hi3
+#define CODE_FOR_loongson_paddb CODE_FOR_addv8qi3
+#define CODE_FOR_loongson_paddsh CODE_FOR_ssaddv4hi3
+#define CODE_FOR_loongson_paddsb CODE_FOR_ssaddv8qi3
+#define CODE_FOR_loongson_paddush CODE_FOR_usaddv4hi3
+#define CODE_FOR_loongson_paddusb CODE_FOR_usaddv8qi3
+#define CODE_FOR_loongson_pmaxsh CODE_FOR_smaxv4hi3
+#define CODE_FOR_loongson_pmaxub CODE_FOR_umaxv8qi3
+#define CODE_FOR_loongson_pminsh CODE_FOR_sminv4hi3
+#define CODE_FOR_loongson_pminub CODE_FOR_uminv8qi3
+#define CODE_FOR_loongson_pmulhuh CODE_FOR_umulv4hi3_highpart
+#define CODE_FOR_loongson_pmulhh CODE_FOR_smulv4hi3_highpart
+#define CODE_FOR_loongson_biadd CODE_FOR_reduc_uplus_v8qi
+#define CODE_FOR_loongson_psubw CODE_FOR_subv2si3
+#define CODE_FOR_loongson_psubh CODE_FOR_subv4hi3
+#define CODE_FOR_loongson_psubb CODE_FOR_subv8qi3
+#define CODE_FOR_loongson_psubsh CODE_FOR_sssubv4hi3
+#define CODE_FOR_loongson_psubsb CODE_FOR_sssubv8qi3
+#define CODE_FOR_loongson_psubush CODE_FOR_ussubv4hi3
+#define CODE_FOR_loongson_psubusb CODE_FOR_ussubv8qi3
+#define CODE_FOR_loongson_punpckhbh CODE_FOR_vec_interleave_highv8qi
+#define CODE_FOR_loongson_punpckhhw CODE_FOR_vec_interleave_highv4hi
+#define CODE_FOR_loongson_punpckhwd CODE_FOR_vec_interleave_highv2si
+#define CODE_FOR_loongson_punpcklbh CODE_FOR_vec_interleave_lowv8qi
+#define CODE_FOR_loongson_punpcklhw CODE_FOR_vec_interleave_lowv4hi
+#define CODE_FOR_loongson_punpcklwd CODE_FOR_vec_interleave_lowv2si
+
 static const struct mips_builtin_description mips_builtins[] = {
   DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
   DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
@@ -10471,7 +10541,108 @@ static const struct mips_builtin_descrip
   DIRECT_BUILTIN (dpaqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
   DIRECT_BUILTIN (dpaqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
   DIRECT_BUILTIN (dpsqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
-  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32)
+  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+
+  /* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
+  LOONGSON_BUILTIN (packsswh, MIPS_V4HI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (packsshb, MIPS_V8QI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (packushb, MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (paddh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (paddw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (paddh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (paddd, u, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_SUFFIX (paddd, s, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (paddsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (paddush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_ud, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_uw, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_uh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_ub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_sd, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_sw, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_sh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_sb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (pavgh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pavgb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgth, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgth, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (pextrh, u, MIPS_UV4HI_FTYPE_UV4HI_USI),
+  LOONGSON_BUILTIN_SUFFIX (pextrh, s, MIPS_V4HI_FTYPE_V4HI_USI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaddhw, MIPS_V2SI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaxsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaxub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pminsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pminub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pmovmskb, u, MIPS_UV8QI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pmovmskb, s, MIPS_V8QI_FTYPE_V8QI),
+  LOONGSON_BUILTIN (pmulhuh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pmulhh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmullh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmuluw, MIPS_UDI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pasubub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (biadd, MIPS_UV4HI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN (psadbh, MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrah, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrah, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psraw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psraw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psubw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (psubh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (psubb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (psubw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (psubh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (psubb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (psubd, u, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_SUFFIX (psubd, s, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (psubsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (psubush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI)
 };
 
 /* MODE is a vector mode whose elements have type TYPE.  Return the type
@@ -10480,11 +10651,17 @@ static const struct mips_builtin_descrip
 static tree
 mips_builtin_vector_type (tree type, enum machine_mode mode)
 {
-  static tree types[(int) MAX_MACHINE_MODE];
+  static tree types[2 * (int) MAX_MACHINE_MODE];
+  int mode_index;
+
+  mode_index = (int) mode;
 
-  if (types[(int) mode] == NULL_TREE)
-    types[(int) mode] = build_vector_type_for_mode (type, mode);
-  return types[(int) mode];
+  if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type))
+    mode_index += MAX_MACHINE_MODE;
+
+  if (types[mode_index] == NULL_TREE)
+    types[mode_index] = build_vector_type_for_mode (type, mode);
+  return types[mode_index];
 }
 
 /* Source-level argument types.  */
@@ -10493,16 +10670,27 @@ #define MIPS_ATYPE_INT integer_type_node
 #define MIPS_ATYPE_POINTER ptr_type_node
 
 /* Standard mode-based argument types.  */
+#define MIPS_ATYPE_UQI unsigned_intQI_type_node
 #define MIPS_ATYPE_SI intSI_type_node
 #define MIPS_ATYPE_USI unsigned_intSI_type_node
 #define MIPS_ATYPE_DI intDI_type_node
+#define MIPS_ATYPE_UDI unsigned_intDI_type_node
 #define MIPS_ATYPE_SF float_type_node
 #define MIPS_ATYPE_DF double_type_node
 
 /* Vector argument types.  */
 #define MIPS_ATYPE_V2SF mips_builtin_vector_type (float_type_node, V2SFmode)
 #define MIPS_ATYPE_V2HI mips_builtin_vector_type (intHI_type_node, V2HImode)
+#define MIPS_ATYPE_V2SI mips_builtin_vector_type (intSI_type_node, V2SImode)
 #define MIPS_ATYPE_V4QI mips_builtin_vector_type (intQI_type_node, V4QImode)
+#define MIPS_ATYPE_V4HI mips_builtin_vector_type (intHI_type_node, V4HImode)
+#define MIPS_ATYPE_V8QI mips_builtin_vector_type (intQI_type_node, V8QImode)
+#define MIPS_ATYPE_UV2SI					\
+  mips_builtin_vector_type (unsigned_intSI_type_node, V2SImode)
+#define MIPS_ATYPE_UV4HI					\
+  mips_builtin_vector_type (unsigned_intHI_type_node, V4HImode)
+#define MIPS_ATYPE_UV8QI					\
+  mips_builtin_vector_type (unsigned_intQI_type_node, V8QImode)
 
 /* MIPS_FTYPE_ATYPESN takes N MIPS_FTYPES-like type codes and lists
    their associated MIPS_ATYPEs.  */
@@ -12650,6 +12838,30 @@ mips_order_regs_for_local_alloc (void)
       reg_alloc_order[24] = 0;
     }
 }
+
+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode;
+  enum machine_mode inner;
+  unsigned int i, n_elts;
+  rtx mem;
+
+  mode = GET_MODE (target);
+  inner = GET_MODE_INNER (mode);
+  n_elts = GET_MODE_NUNITS (mode);
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
 \f
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	2008-06-10 08:47:41.000000000 +0100
+++ gcc/config/mips/mips.h	2008-06-10 10:47:11.000000000 +0100
@@ -267,6 +267,12 @@ #define TUNE_74K                    (mip
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
 
+/* Whether vector modes and intrinsics for ST Microelectronics
+   Loongson-2E/2F processors should be enabled.  In o32 pairs of
+   floating-point registers provide 64-bit values.  */
+#define TARGET_LOONGSON_VECTORS	    (TARGET_HARD_FLOAT_ABI		\
+				     && TARGET_LOONGSON_2EF)
+
 /* True if the pre-reload scheduler should try to create chains of
    multiply-add or multiply-subtract instructions.  For example,
    suppose we have:
@@ -497,6 +503,10 @@ #define TARGET_CPU_CPP_BUILTINS()					\
 	  builtin_define_std ("MIPSEL");				\
 	  builtin_define ("_MIPSEL");					\
 	}								\
+                                                                        \
+      /* Whether Loongson vector modes are enabled.  */                 \
+      if (TARGET_LOONGSON_VECTORS)					\
+        builtin_define ("__mips_loongson_vector_rev");                  \
 									\
       /* Macros dependent on the C dialect.  */				\
       if (preprocessing_asm_p ())					\
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	2008-06-10 08:47:41.000000000 +0100
+++ gcc/config/mips/mips.md	2008-06-10 10:34:24.000000000 +0100
@@ -215,6 +215,28 @@ (define_constants
    (UNSPEC_DPAQX_SA_W_PH	446)
    (UNSPEC_DPSQX_S_W_PH		447)
    (UNSPEC_DPSQX_SA_W_PH	448)
+
+   ;; ST Microelectronics Loongson-2E/2F.
+   (UNSPEC_LOONGSON_PAVG	500)
+   (UNSPEC_LOONGSON_PCMPEQ	501)
+   (UNSPEC_LOONGSON_PCMPGT	502)
+   (UNSPEC_LOONGSON_PEXTR	503)
+   (UNSPEC_LOONGSON_PINSR_0	504)
+   (UNSPEC_LOONGSON_PINSR_1	505)
+   (UNSPEC_LOONGSON_PINSR_2	506)
+   (UNSPEC_LOONGSON_PINSR_3	507)
+   (UNSPEC_LOONGSON_PMADD	508)
+   (UNSPEC_LOONGSON_PMOVMSK	509)
+   (UNSPEC_LOONGSON_PMULHU	510)
+   (UNSPEC_LOONGSON_PMULH	511)
+   (UNSPEC_LOONGSON_PMULL	512)
+   (UNSPEC_LOONGSON_PMULU	513)
+   (UNSPEC_LOONGSON_PASUBUB	514)
+   (UNSPEC_LOONGSON_BIADD	515)
+   (UNSPEC_LOONGSON_PSADBH	516)
+   (UNSPEC_LOONGSON_PSHUFH	517)
+   (UNSPEC_LOONGSON_PUNPCKH	518)
+   (UNSPEC_LOONGSON_PUNPCKL	519)
   ]
 )
 
@@ -501,7 +523,11 @@ (define_mode_iterator MOVECC [SI (DI "TA
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
-  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
+  [DI DF
+   (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
+   (V4HI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
+   (V8QI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")])
 
 ;; 128-bit modes for which we provide move patterns on 64-bit targets.
 (define_mode_iterator MOVE128 [TI TF])
@@ -528,6 +554,9 @@ (define_mode_iterator SPLITF
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (V2SF "!TARGET_64BIT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
+   (V4HI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
+   (V8QI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
    (TF "TARGET_64BIT && TARGET_FLOAT64")])
 
 ;; In GPR templates, a string like "<d>subu" will expand to "subu" in the
@@ -580,7 +609,9 @@ (define_mode_attr IMODE [(QQ "QI") (HQ "
 
 ;; This attribute gives the integer mode that has half the size of
 ;; the controlling mode.
-(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")])
+(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI")
+			    (V2SI "SI") (V4HI "SI") (V8QI "SI")
+			    (TF "DI")])
 
 ;; This attribute works around the early SB-1 rev2 core "F2" erratum:
 ;;
@@ -6512,3 +6543,6 @@ (include "mips-dspr2.md")
 
 ; MIPS fixed-point instructions.
 (include "mips-fixed.md")
+
+; ST-Microelectronics Loongson-2E/2F-specific patterns.
+(include "loongson.md")
Index: gcc/config/mips/loongson.md
===================================================================
--- /dev/null	2008-06-08 10:32:14.544096500 +0100
+++ gcc/config/mips/loongson.md	2008-06-10 10:50:17.000000000 +0100
@@ -0,0 +1,429 @@
+;; Machine description for ST Microelectronics Loongson-2E/2F.
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Mode iterators and attributes.
+
+;; 64-bit vectors of bytes.
+(define_mode_iterator VB [V8QI])
+
+;; 64-bit vectors of halfwords.
+(define_mode_iterator VH [V4HI])
+
+;; 64-bit vectors of words.
+(define_mode_iterator VW [V2SI])
+
+;; 64-bit vectors of halfwords and bytes.
+(define_mode_iterator VHB [V4HI V8QI])
+
+;; 64-bit vectors of words and halfwords.
+(define_mode_iterator VWH [V2SI V4HI])
+
+;; 64-bit vectors of words, halfwords and bytes.
+(define_mode_iterator VWHB [V2SI V4HI V8QI])
+
+;; 64-bit vectors of words, halfwords and bytes; and DImode.
+(define_mode_iterator VWHBDI [V2SI V4HI V8QI DI])
+
+;; The Loongson instruction suffixes corresponding to the modes in the
+;; VWHBDI iterator.
+(define_mode_attr V_suffix [(V2SI "w") (V4HI "h") (V8QI "b") (DI "d")])
+
+;; Given a vector type T, the mode of a vector half the size of T
+;; and with the same number of elements.
+(define_mode_attr V_squash [(V2SI "V2HI") (V4HI "V4QI")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with half as many elements.
+(define_mode_attr V_stretch_half [(V2SI "DI") (V4HI "V2SI") (V8QI "V4HI")])
+
+;; The Loongson instruction suffixes corresponding to the transformation
+;; expressed by V_stretch_half.
+(define_mode_attr V_stretch_half_suffix [(V2SI "wd") (V4HI "hw") (V8QI "bh")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with twice as many elements.
+(define_mode_attr V_squash_double [(V2SI "V4HI") (V4HI "V8QI")])
+
+;; The Loongson instruction suffixes corresponding to the conversions
+;; specified by V_half_width.
+(define_mode_attr V_squash_double_suffix [(V2SI "wh") (V4HI "hb")])
+
+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0)
+	(match_operand:VWHB 1))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})
+
+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,  d,  m,  d")
+	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  { return mips_output_move (operands[0], operands[1]); }
+  [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
+   (set_attr "mode" "DI")])
+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand")
+	(match_operand 1 ""))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+{
+  mips_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
+
+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 1 "register_operand" "f"))
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 2 "register_operand" "f"))))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "packss<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Pack with unsigned saturation.
+(define_insn "vec_pack_usat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 1 "register_operand" "f"))
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 2 "register_operand" "f"))))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "packus<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by wraparound.
+(define_insn "add<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (plus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		   (match_operand:VWHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "padd<V_suffix>\t%0,%1,%2")
+
+;; Addition of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "loongson_paddd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (plus:DI (match_operand:DI 1 "register_operand" "f")
+		 (match_operand:DI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "paddd\t%0,%1,%2")
+
+;; Addition, treating overflow by signed saturation.
+(define_insn "ssadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "padds<V_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by unsigned saturation.
+(define_insn "usadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "paddus<V_suffix>\t%0,%1,%2")
+
+;; Logical AND NOT.
+(define_insn "loongson_pandn_<V_suffix>"
+  [(set (match_operand:VWHBDI 0 "register_operand" "=f")
+        (and:VWHBDI
+	 (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
+	 (match_operand:VWHBDI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pandn\t%0,%1,%2")
+
+;; Average.
+(define_insn "loongson_pavg<V_suffix>"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (unspec:VHB [(match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")]
+		    UNSPEC_LOONGSON_PAVG))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pavg<V_suffix>\t%0,%1,%2")
+
+;; Equality test.
+(define_insn "loongson_pcmpeq<V_suffix>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PCMPEQ))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pcmpeq<V_suffix>\t%0,%1,%2")
+
+;; Greater-than test.
+(define_insn "loongson_pcmpgt<V_suffix>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PCMPGT))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pcmpgt<V_suffix>\t%0,%1,%2")
+
+;; Extract halfword.
+(define_insn "loongson_pextr<V_suffix>"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+ 		    (match_operand:SI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PEXTR))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pextr<V_suffix>\t%0,%1,%2")
+
+;; Insert halfword.
+(define_insn "loongson_pinsr<V_suffix>_0"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_0))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_0\t%0,%1,%2")
+
+(define_insn "loongson_pinsr<V_suffix>_1"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_1))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_1\t%0,%1,%2")
+
+(define_insn "loongson_pinsr<V_suffix>_2"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_2))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_2\t%0,%1,%2")
+
+(define_insn "loongson_pinsr<V_suffix>_3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_3))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_3\t%0,%1,%2")
+
+;; Multiply and add packed integers.
+(define_insn "loongson_pmadd<V_stretch_half_suffix>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VH 1 "register_operand" "f")
+				  (match_operand:VH 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PMADD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Maximum of signed halfwords.
+(define_insn "smax<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smax:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmaxs<V_suffix>\t%0,%1,%2")
+
+;; Maximum of unsigned bytes.
+(define_insn "umax<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umax:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmaxu<V_suffix>\t%0,%1,%2")
+
+;; Minimum of signed halfwords.
+(define_insn "smin<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smin:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmins<V_suffix>\t%0,%1,%2")
+
+;; Minimum of unsigned bytes.
+(define_insn "umin<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umin:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pminu<V_suffix>\t%0,%1,%2")
+
+;; Move byte mask.
+(define_insn "loongson_pmovmsk<V_suffix>"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMOVMSK))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmovmsk<V_suffix>\t%0,%1")
+
+;; Multiply unsigned integers and store high result.
+(define_insn "umul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULHU))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulhu<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store high result.
+(define_insn "smul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulh<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store low result.
+(define_insn "loongson_pmull<V_suffix>"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULL))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmull<V_suffix>\t%0,%1,%2")
+
+;; Multiply unsigned word integers.
+(define_insn "loongson_pmulu<V_suffix>"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:VW 1 "register_operand" "f")
+		    (match_operand:VW 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULU))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulu<V_suffix>\t%0,%1,%2")
+
+;; Absolute difference.
+(define_insn "loongson_pasubub"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")
+		    (match_operand:VB 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PASUBUB))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pasubub\t%0,%1,%2")
+
+;; Sum of unsigned byte integers.
+(define_insn "reduc_uplus_<mode>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")]
+				 UNSPEC_LOONGSON_BIADD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "biadd\t%0,%1")
+
+;; Sum of absolute differences.
+(define_insn "loongson_psadbh"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")
+				  (match_operand:VB 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PSADBH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pasubub\t%0,%1,%2;biadd\t%0,%0")
+
+;; Shuffle halfwords.
+(define_insn "loongson_pshufh"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "0")
+		    (match_operand:VH 2 "register_operand" "f")
+		    (match_operand:SI 3 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSHUFH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pshufh\t%0,%2,%3")
+
+;; Shift left logical.
+(define_insn "loongson_psll<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashift:VWH (match_operand:VWH 1 "register_operand" "f")
+		    (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psll<V_suffix>\t%0,%1,%2")
+
+;; Shift right arithmetic.
+(define_insn "loongson_psra<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psra<V_suffix>\t%0,%1,%2")
+
+;; Shift right logical.
+(define_insn "loongson_psrl<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (lshiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psrl<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by wraparound.
+(define_insn "sub<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (minus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		    (match_operand:VWHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psub<V_suffix>\t%0,%1,%2")
+
+;; Subtraction of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "loongson_psubd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (minus:DI (match_operand:DI 1 "register_operand" "f")
+		  (match_operand:DI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubd\t%0,%1,%2")
+
+;; Subtraction, treating overflow by signed saturation.
+(define_insn "sssub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubs<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by unsigned saturation.
+(define_insn "ussub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubus<V_suffix>\t%0,%1,%2")
+
+;; Unpack high data.
+(define_insn "vec_interleave_high<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PUNPCKH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Unpack low data.
+(define_insn "vec_interleave_low<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PUNPCKL))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2")
Index: gcc/config/mips/loongson.h
===================================================================
--- /dev/null	2008-06-08 10:32:14.544096500 +0100
+++ gcc/config/mips/loongson.h	2008-06-10 11:23:01.000000000 +0100
@@ -0,0 +1,693 @@
+/* Intrinsics for ST Microelectronics Loongson-2E/2F SIMD operations.
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_LOONGSON_H
+#define _GCC_LOONGSON_H
+
+#if !defined(__mips_loongson_vector_rev)
+# error "You must select -march=loongson2e or -march=loongson2f to use loongson.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Vectors of unsigned bytes, halfwords and words.  */
+typedef uint8_t uint8x8_t __attribute__((vector_size (8)));
+typedef uint16_t uint16x4_t __attribute__((vector_size (8)));
+typedef uint32_t uint32x2_t __attribute__((vector_size (8)));
+
+/* Vectors of signed bytes, halfwords and words.  */
+typedef int8_t int8x8_t __attribute__((vector_size (8)));
+typedef int16_t int16x4_t __attribute__((vector_size (8)));
+typedef int32_t int32x2_t __attribute__((vector_size (8)));
+
+/* SIMD intrinsics.
+   Unless otherwise noted, calls to the functions below will expand into
+   precisely one machine instruction, modulo any moves required to
+   satisfy register allocation constraints.  */
+
+/* Pack with signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+packsswh (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_packsswh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+packsshb (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_packsshb (s, t);
+}
+
+/* Pack with unsigned saturation.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+packushb (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_packushb (s, t);
+}
+
+/* Vector addition, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+paddw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_paddw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+paddw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_paddw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddb_s (s, t);
+}
+
+/* Addition of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+paddd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_paddd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+paddd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_paddd_s (s, t);
+}
+
+/* Vector addition, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddsb (s, t);
+}
+
+/* Vector addition, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddusb (s, t);
+}
+
+/* Logical AND NOT.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+pandn_ud (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_pandn_ud (s, t);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pandn_uw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pandn_uw (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pandn_uh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pandn_uh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pandn_ub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pandn_ub (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pandn_sd (int64_t s, int64_t t)
+{
+  return __builtin_loongson_pandn_sd (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pandn_sw (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pandn_sw (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pandn_sh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pandn_sh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pandn_sb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pandn_sb (s, t);
+}
+
+/* Average.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pavgh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pavgh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pavgb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pavgb (s, t);
+}
+
+/* Equality test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_s (s, t);
+}
+
+/* Greater-than test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpgth_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpgth_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_s (s, t);
+}
+
+/* Extract halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pextrh_u (uint16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_u (s, field);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pextrh_s (int16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_s (s, field);
+}
+
+/* Insert halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_u (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_s (s, t);
+}
+
+/* Multiply and add.  */
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pmaddhw (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaddhw (s, t);
+}
+
+/* Maximum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmaxsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaxsh (s, t);
+}
+
+/* Maximum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmaxub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pmaxub (s, t);
+}
+
+/* Minimum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pminsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pminsh (s, t);
+}
+
+/* Minimum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pminub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pminub (s, t);
+}
+
+/* Move byte mask.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmovmskb_u (uint8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_u (s);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pmovmskb_s (int8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_s (s);
+}
+
+/* Multiply unsigned integers and store high result.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pmulhuh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pmulhuh (s, t);
+}
+
+/* Multiply signed integers and store high result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmulhh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmulhh (s, t);
+}
+
+/* Multiply signed integers and store low result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmullh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmullh (s, t);
+}
+
+/* Multiply unsigned word integers.  */
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pmuluw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pmuluw (s, t);
+}
+
+/* Absolute difference.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pasubub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pasubub (s, t);
+}
+
+/* Sum of unsigned byte integers.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+biadd (uint8x8_t s)
+{
+  return __builtin_loongson_biadd (s);
+}
+
+/* Sum of absolute differences.
+   Note that this intrinsic expands into two machine instructions:
+   PASUBUB followed by BIADD.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psadbh (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psadbh (s, t);
+}
+
+/* Shuffle halfwords.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_u (dest, s, order);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_s (dest, s, order);
+}
+
+/* Shift left logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psllh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psllh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psllw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psllw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_s (s, amount);
+}
+
+/* Shift right logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrlh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrlh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psrlw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psrlw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_s (s, amount);
+}
+
+/* Shift right arithmetic.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrah_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrah_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psraw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psraw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_s (s, amount);
+}
+
+/* Vector subtraction, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psubw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_psubw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psubw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_psubw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubb_s (s, t);
+}
+
+/* Subtraction of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+psubd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_psubd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+psubd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_psubd_s (s, t);
+}
+
+/* Vector subtraction, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubsb (s, t);
+}
+
+/* Vector subtraction, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubusb (s, t);
+}
+
+/* Unpack high data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpckhwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpckhhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpckhbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpckhwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpckhhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpckhbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_s (s, t);
+}
+
+/* Unpack low data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpcklwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpcklhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpcklbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpcklwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpcklhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpcklbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_s (s, t);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	2008-06-10 08:47:40.000000000 +0100
+++ gcc/config.gcc	2008-06-10 08:47:43.000000000 +0100
@@ -307,6 +307,7 @@ m68k-*-*)
 mips*-*-*)
 	cpu_type=mips
 	need_64bit_hwint=yes
+	extra_headers="loongson.h"
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	2008-06-10 08:47:40.000000000 +0100
+++ gcc/doc/extend.texi	2008-06-10 08:47:43.000000000 +0100
@@ -6788,6 +6788,7 @@ instructions, but allow the compiler to 
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
 * PowerPC AltiVec Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
@@ -8667,6 +8668,132 @@ value is the upper one.  The opposite or
 For example, the code above will set the lower half of @code{a} to
 @code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
+
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
+
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
+
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
+
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+@end smallexample
+
 @menu
 * Paired-Single Arithmetic::
 * Paired-Single Built-in Functions::
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2008-06-10 08:47:40.000000000 +0100
+++ gcc/testsuite/lib/target-supports.exp	2008-06-10 08:47:43.000000000 +0100
@@ -1249,6 +1249,17 @@ proc check_effective_target_arm_neon_hw 
     } "-mfpu=neon -mfloat-abi=softfp"]
 }
 
+# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+# the Loongson vector modes.
+
+proc check_effective_target_mips_loongson { } {
+    return [check_no_compiler_messages loongson assembly {
+	#if !defined(__mips_loongson_vector_rev)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {
Index: gcc/testsuite/gcc.target/mips/loongson-simd.c
===================================================================
--- /dev/null	2008-06-08 10:32:14.544096500 +0100
+++ gcc/testsuite/gcc.target/mips/loongson-simd.c	2008-06-10 10:28:33.000000000 +0100
@@ -0,0 +1,1963 @@
+/* Test cases for ST Microelectronics Loongson-2E/2F SIMD intrinsics.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target mips_loongson } */
+
+#include "loongson.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+
+typedef union { int32x2_t v; int32_t a[2]; } int32x2_encap_t;
+typedef union { int16x4_t v; int16_t a[4]; } int16x4_encap_t;
+typedef union { int8x8_t v; int8_t a[8]; } int8x8_encap_t;
+typedef union { uint32x2_t v; uint32_t a[2]; } uint32x2_encap_t;
+typedef union { uint16x4_t v; uint16_t a[4]; } uint16x4_encap_t;
+typedef union { uint8x8_t v; uint8_t a[8]; } uint8x8_encap_t;
+
+#define UINT16x4_MAX USHRT_MAX
+#define UINT8x8_MAX UCHAR_MAX
+#define INT8x8_MAX SCHAR_MAX
+#define INT16x4_MAX SHRT_MAX
+#define INT32x2_MAX INT_MAX
+
+static void test_packsswh (void)
+{
+  int32x2_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = INT16x4_MAX - 2;
+  s.a[1] = INT16x4_MAX - 1;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX + 1;
+  r.v = packsswh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 2);
+  assert (r.a[1] == INT16x4_MAX - 1);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_packsshb (void)
+{
+  int16x4_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = INT8x8_MAX - 6;
+  s.a[1] = INT8x8_MAX - 5;
+  s.a[2] = INT8x8_MAX - 4;
+  s.a[3] = INT8x8_MAX - 3;
+  t.a[0] = INT8x8_MAX - 2;
+  t.a[1] = INT8x8_MAX - 1;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX + 1;
+  r.v = packsshb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_packushb (void)
+{
+  uint16x4_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = UINT8x8_MAX - 6;
+  s.a[1] = UINT8x8_MAX - 5;
+  s.a[2] = UINT8x8_MAX - 4;
+  s.a[3] = UINT8x8_MAX - 3;
+  t.a[0] = UINT8x8_MAX - 2;
+  t.a[1] = UINT8x8_MAX - 1;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX + 1;
+  r.v = packushb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX - 6);
+  assert (r.a[1] == UINT8x8_MAX - 5);
+  assert (r.a[2] == UINT8x8_MAX - 4);
+  assert (r.a[3] == UINT8x8_MAX - 3);
+  assert (r.a[4] == UINT8x8_MAX - 2);
+  assert (r.a[5] == UINT8x8_MAX - 1);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_paddw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 6);
+}
+
+static void test_paddw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_paddh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = paddh_u (s.v, t.v);
+  assert (r.a[0] == 6);
+  assert (r.a[1] == 8);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 12);
+}
+
+static void test_paddh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = paddh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+}
+
+static void test_paddb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 5;
+  s.a[5] = 6;
+  s.a[6] = 7;
+  s.a[7] = 8;
+  t.a[0] = 9;
+  t.a[1] = 10;
+  t.a[2] = 11;
+  t.a[3] = 12;
+  t.a[4] = 13;
+  t.a[5] = 14;
+  t.a[6] = 15;
+  t.a[7] = 16;
+  r.v = paddb_u (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 12);
+  assert (r.a[2] == 14);
+  assert (r.a[3] == 16);
+  assert (r.a[4] == 18);
+  assert (r.a[5] == 20);
+  assert (r.a[6] == 22);
+  assert (r.a[7] == 24);
+}
+
+static void test_paddb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = paddb_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+  assert (r.a[4] == -45);
+  assert (r.a[5] == -54);
+  assert (r.a[6] == -63);
+  assert (r.a[7] == -72);
+}
+
+static void test_paddd_u (void)
+{
+  uint64_t d = 123456;
+  uint64_t e = 789012;
+  uint64_t r;
+  r = paddd_u (d, e);
+  assert (r == 912468);
+}
+
+static void test_paddd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = paddd_s (d, e);
+  assert (r == -665556);
+}
+
+static void test_paddsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX;
+  t.a[2] = INT16x4_MAX;
+  t.a[3] = INT16x4_MAX;
+  r.v = paddsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_paddsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = INT8x8_MAX;
+  t.a[1] = INT8x8_MAX;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX;
+  t.a[4] = INT8x8_MAX;
+  t.a[5] = INT8x8_MAX;
+  t.a[6] = INT8x8_MAX;
+  t.a[7] = INT8x8_MAX;
+  r.v = paddsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_paddush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  t.a[0] = UINT16x4_MAX;
+  t.a[1] = UINT16x4_MAX;
+  t.a[2] = UINT16x4_MAX;
+  t.a[3] = UINT16x4_MAX;
+  r.v = paddush (s.v, t.v);
+  assert (r.a[0] == UINT16x4_MAX);
+  assert (r.a[1] == UINT16x4_MAX);
+  assert (r.a[2] == UINT16x4_MAX);
+  assert (r.a[3] == UINT16x4_MAX);
+}
+
+static void test_paddusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  s.a[4] = 0;
+  s.a[5] = 1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = UINT8x8_MAX;
+  t.a[1] = UINT8x8_MAX;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX;
+  t.a[4] = UINT8x8_MAX;
+  t.a[5] = UINT8x8_MAX;
+  t.a[6] = UINT8x8_MAX;
+  t.a[7] = UINT8x8_MAX;
+  r.v = paddusb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX);
+  assert (r.a[1] == UINT8x8_MAX);
+  assert (r.a[2] == UINT8x8_MAX);
+  assert (r.a[3] == UINT8x8_MAX);
+  assert (r.a[4] == UINT8x8_MAX);
+  assert (r.a[5] == UINT8x8_MAX);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_pandn_ud (void)
+{
+  uint64_t d1 = 0x0000ffff0000ffffull;
+  uint64_t d2 = 0x0000ffff0000ffffull;
+  uint64_t r;
+  r = pandn_ud (d1, d2);
+  assert (r == 0);
+}
+
+static void test_pandn_sd (void)
+{
+  int64_t d1 = (int64_t) 0x0000000000000000ull;
+  int64_t d2 = (int64_t) 0xfffffffffffffffeull;
+  int64_t r;
+  r = pandn_sd (d1, d2);
+  assert (r == -2);
+}
+
+static void test_pandn_uw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0x00000000;
+  t.a[1] = 0xffffffff;
+  r.v = pandn_uw (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pandn_sw (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0xffffffff;
+  t.a[1] = 0xfffffffe;
+  r.v = pandn_sw (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+}
+
+static void test_pandn_uh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0x0000;
+  t.a[1] = 0xffff;
+  t.a[2] = 0x0000;
+  t.a[3] = 0xffff;
+  r.v = pandn_uh (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pandn_sh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0xffff;
+  t.a[1] = 0xfffe;
+  t.a[2] = 0xffff;
+  t.a[3] = 0xfffe;
+  r.v = pandn_sh (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+}
+
+static void test_pandn_ub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0x00;
+  t.a[1] = 0xff;
+  t.a[2] = 0x00;
+  t.a[3] = 0xff;
+  t.a[4] = 0x00;
+  t.a[5] = 0xff;
+  t.a[6] = 0x00;
+  t.a[7] = 0xff;
+  r.v = pandn_ub (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pandn_sb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0xff;
+  t.a[1] = 0xfe;
+  t.a[2] = 0xff;
+  t.a[3] = 0xfe;
+  t.a[4] = 0xff;
+  t.a[5] = 0xfe;
+  t.a[6] = 0xff;
+  t.a[7] = 0xfe;
+  r.v = pandn_sb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -2);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -2);
+}
+
+static void test_pavgh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = pavgh (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+}
+
+static void test_pavgb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 1;
+  s.a[5] = 2;
+  s.a[6] = 3;
+  s.a[7] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = pavgb (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+  assert (r.a[4] == 3);
+  assert (r.a[5] == 4);
+  assert (r.a[6] == 5);
+  assert (r.a[7] == 6);
+}
+
+static void test_pcmpeqw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  r.v = pcmpeqw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpeqh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  r.v = pcmpeqh_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpeqb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 42;
+  s.a[5] = 43;
+  s.a[6] = 42;
+  s.a[7] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  t.a[4] = 43;
+  t.a[5] = 43;
+  t.a[6] = 43;
+  t.a[7] = 43;
+  r.v = pcmpeqb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpeqw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  r.v = pcmpeqw_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+}
+
+static void test_pcmpeqh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  r.v = pcmpeqh_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpeqb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = -42;
+  s.a[5] = -42;
+  s.a[6] = -42;
+  s.a[7] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = -42;
+  t.a[6] = 42;
+  t.a[7] = -42;
+  r.v = pcmpeqb_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -1);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -1);
+}
+
+static void test_pcmpgtw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 42;
+  r.v = pcmpgtw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpgth_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 40;
+  t.a[1] = 41;
+  t.a[2] = 43;
+  t.a[3] = 42;
+  r.v = pcmpgth_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0x0000);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpgtb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 44;
+  s.a[5] = 45;
+  s.a[6] = 46;
+  s.a[7] = 47;
+  t.a[0] = 48;
+  t.a[1] = 47;
+  t.a[2] = 46;
+  t.a[3] = 45;
+  t.a[4] = 44;
+  t.a[5] = 43;
+  t.a[6] = 42;
+  t.a[7] = 41;
+  r.v = pcmpgtb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0x00);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0x00);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0xff);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpgtw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = -42;
+  t.a[0] = -42;
+  t.a[1] = -42;
+  r.v = pcmpgtw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 0);
+}
+
+static void test_pcmpgth_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = 43;
+  t.a[2] = 44;
+  t.a[3] = -43;
+  r.v = pcmpgth_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpgtb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = 42;
+  s.a[5] = 42;
+  s.a[6] = 42;
+  s.a[7] = 42;
+  t.a[0] = -45;
+  t.a[1] = -44;
+  t.a[2] = -43;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = 43;
+  t.a[6] = 41;
+  t.a[7] = 40;
+  r.v = pcmpgtb_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == -1);
+  assert (r.a[7] == -1);
+}
+
+static void test_pextrh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  r.v = pextrh_u (s.v, 1);
+  assert (r.a[0] == 41);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pextrh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -40;
+  s.a[1] = -41;
+  s.a[2] = -42;
+  s.a[3] = -43;
+  r.v = pextrh_s (s.v, 2);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pinsrh_0123_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_u (t.v, s.v);
+  r.v = pinsrh_1_u (r.v, s.v);
+  r.v = pinsrh_2_u (r.v, s.v);
+  r.v = pinsrh_3_u (r.v, s.v);
+  assert (r.a[0] == 42);
+  assert (r.a[1] == 42);
+  assert (r.a[2] == 42);
+  assert (r.a[3] == 42);
+}
+
+static void test_pinsrh_0123_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_s (t.v, s.v);
+  r.v = pinsrh_1_s (r.v, s.v);
+  r.v = pinsrh_2_s (r.v, s.v);
+  r.v = pinsrh_3_s (r.v, s.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == -42);
+  assert (r.a[2] == -42);
+  assert (r.a[3] == -42);
+}
+
+static void test_pmaddhw (void)
+{
+  int16x4_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -5;
+  s.a[1] = -4;
+  s.a[2] = -3;
+  s.a[3] = -2;
+  t.a[0] = 10;
+  t.a[1] = 11;
+  t.a[2] = 12;
+  t.a[3] = 13;
+  r.v = pmaddhw (s.v, t.v);
+  assert (r.a[0] == (-5*10 + -4*11));
+  assert (r.a[1] == (-3*12 + -2*13));
+}
+
+static void test_pmaxsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pmaxsh (s.v, t.v);
+  assert (r.a[0] == 20);
+  assert (r.a[1] == 40);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 50);
+}
+
+static void test_pmaxub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pmaxub (s.v, t.v);
+  assert (r.a[0] == 80);
+  assert (r.a[1] == 70);
+  assert (r.a[2] == 60);
+  assert (r.a[3] == 50);
+  assert (r.a[4] == 50);
+  assert (r.a[5] == 60);
+  assert (r.a[6] == 70);
+  assert (r.a[7] == 80);
+}
+
+static void test_pminsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pminsh (s.v, t.v);
+  assert (r.a[0] == -20);
+  assert (r.a[1] == -40);
+  assert (r.a[2] == -10);
+  assert (r.a[3] == -50);
+}
+
+static void test_pminub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pminub (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 20);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 40);
+  assert (r.a[4] == 40);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 20);
+  assert (r.a[7] == 10);
+}
+
+static void test_pmovmskb_u (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_encap_t r;
+  s.a[0] = 0xf0;
+  s.a[1] = 0x40;
+  s.a[2] = 0xf0;
+  s.a[3] = 0x40;
+  s.a[4] = 0xf0;
+  s.a[5] = 0x40;
+  s.a[6] = 0xf0;
+  s.a[7] = 0x40;
+  r.v = pmovmskb_u (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmovmskb_s (void)
+{
+  int8x8_encap_t s;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 1;
+  s.a[2] = -1;
+  s.a[3] = 1;
+  s.a[4] = -1;
+  s.a[5] = 1;
+  s.a[6] = -1;
+  s.a[7] = 1;
+  r.v = pmovmskb_s (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmulhuh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xff00;
+  s.a[1] = 0xff00;
+  s.a[2] = 0xff00;
+  s.a[3] = 0xff00;
+  t.a[0] = 16;
+  t.a[1] = 16;
+  t.a[2] = 16;
+  t.a[3] = 16;
+  r.v = pmulhuh (s.v, t.v);
+  assert (r.a[0] == 0x000f);
+  assert (r.a[1] == 0x000f);
+  assert (r.a[2] == 0x000f);
+  assert (r.a[3] == 0x000f);
+}
+
+static void test_pmulhh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmulhh (s.v, t.v);
+  assert (r.a[0] == -16);
+  assert (r.a[1] == -16);
+  assert (r.a[2] == -16);
+  assert (r.a[3] == -16);
+}
+
+static void test_pmullh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmullh (s.v, t.v);
+  assert (r.a[0] == 4096);
+  assert (r.a[1] == 4096);
+  assert (r.a[2] == 4096);
+  assert (r.a[3] == 4096);
+}
+
+static void test_pmuluw (void)
+{
+  uint32x2_encap_t s, t;
+  uint64_t r;
+  s.a[0] = 0xdeadbeef;
+  s.a[1] = 0;
+  t.a[0] = 0x0f00baaa;
+  t.a[1] = 0;
+  r = pmuluw (s.v, t.v);
+  assert (r == 0xd0cd08e1d1a70b6ull);
+}
+
+static void test_pasubub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pasubub (s.v, t.v);
+  assert (r.a[0] == 70);
+  assert (r.a[1] == 50);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 10);
+  assert (r.a[4] == 10);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 50);
+  assert (r.a[7] == 70);
+}
+
+static void test_biadd (void)
+{
+  uint8x8_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  r.v = biadd (s.v);
+  assert (r.a[0] == 360);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psadbh (void)
+{
+  uint8x8_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = psadbh (s.v, t.v);
+  assert (r.a[0] == 0x0140);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pshufh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_u (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_pshufh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 2;
+  s.a[2] = -3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_s (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+}
+
+static void test_psllh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0xffff;
+  s.a[2] = 0xffff;
+  s.a[3] = 0xffff;
+  r.v = psllh_u (s.v, 1);
+  assert (r.a[0] == 0xfffe);
+  assert (r.a[1] == 0xfffe);
+  assert (r.a[2] == 0xfffe);
+  assert (r.a[3] == 0xfffe);
+}
+
+static void test_psllw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0xffffffff;
+  r.v = psllw_u (s.v, 2);
+  assert (r.a[0] == 0xfffffffc);
+  assert (r.a[1] == 0xfffffffc);
+}
+
+static void test_psllh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psllh_s (s.v, 1);
+  assert (r.a[0] == -2);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == -2);
+  assert (r.a[3] == -2);
+}
+
+static void test_psllw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psllw_s (s.v, 2);
+  assert (r.a[0] == -4);
+  assert (r.a[1] == -4);
+}
+
+static void test_psrah_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrah_u (s.v, 1);
+  assert (r.a[0] == 0xfff7);
+  assert (r.a[1] == 0xfff7);
+  assert (r.a[2] == 0xfff7);
+  assert (r.a[3] == 0xfff7);
+}
+
+static void test_psraw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psraw_u (s.v, 1);
+  assert (r.a[0] == 0xfffffff7);
+  assert (r.a[1] == 0xfffffff7);
+}
+
+static void test_psrah_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s.a[2] = -2;
+  s.a[3] = -2;
+  r.v = psrah_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == -1);
+}
+
+static void test_psraw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  r.v = psraw_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+}
+
+static void test_psrlh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrlh_u (s.v, 1);
+  assert (r.a[0] == 0x7ff7);
+  assert (r.a[1] == 0x7ff7);
+  assert (r.a[2] == 0x7ff7);
+  assert (r.a[3] == 0x7ff7);
+}
+
+static void test_psrlw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psrlw_u (s.v, 1);
+  assert (r.a[0] == 0x7ffffff7);
+  assert (r.a[1] == 0x7ffffff7);
+}
+
+static void test_psrlh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psrlh_s (s.v, 1);
+  assert (r.a[0] == INT16x4_MAX);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psrlw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psrlw_s (s.v, 1);
+  assert (r.a[0] == INT32x2_MAX);
+  assert (r.a[1] == INT32x2_MAX);
+}
+
+static void test_psubw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 3;
+  s.a[1] = 4;
+  t.a[0] = 2;
+  t.a[1] = 1;
+  r.v = psubw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = -4;
+  r.v = psubw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 5;
+  s.a[1] = 6;
+  s.a[2] = 7;
+  s.a[3] = 8;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 4);
+  assert (r.a[3] == 4);
+}
+
+static void test_psubh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+}
+
+static void test_psubb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 11;
+  s.a[2] = 12;
+  s.a[3] = 13;
+  s.a[4] = 14;
+  s.a[5] = 15;
+  s.a[6] = 16;
+  s.a[7] = 17;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 9);
+  assert (r.a[2] == 9);
+  assert (r.a[3] == 9);
+  assert (r.a[4] == 9);
+  assert (r.a[5] == 9);
+  assert (r.a[6] == 9);
+  assert (r.a[7] == 9);
+}
+
+static void test_psubb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+  assert (r.a[4] == -55);
+  assert (r.a[5] == -66);
+  assert (r.a[6] == -77);
+  assert (r.a[7] == -88);
+}
+
+static void test_psubd_u (void)
+{
+  uint64_t d = 789012;
+  uint64_t e = 123456;
+  uint64_t r;
+  r = psubd_u (d, e);
+  assert (r == 665556);
+}
+
+static void test_psubd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = psubd_s (d, e);
+  assert (r == 912468);
+}
+
+static void test_psubsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = -INT16x4_MAX;
+  t.a[1] = -INT16x4_MAX;
+  t.a[2] = -INT16x4_MAX;
+  t.a[3] = -INT16x4_MAX;
+  r.v = psubsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psubsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = -INT8x8_MAX;
+  t.a[1] = -INT8x8_MAX;
+  t.a[2] = -INT8x8_MAX;
+  t.a[3] = -INT8x8_MAX;
+  t.a[4] = -INT8x8_MAX;
+  t.a[5] = -INT8x8_MAX;
+  t.a[6] = -INT8x8_MAX;
+  t.a[7] = -INT8x8_MAX;
+  r.v = psubsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_psubush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  r.v = psubush (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psubusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  s.a[4] = 4;
+  s.a[5] = 5;
+  s.a[6] = 6;
+  s.a[7] = 7;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  t.a[4] = 5;
+  t.a[5] = 5;
+  t.a[6] = 7;
+  t.a[7] = 7;
+  r.v = psubusb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_punpckhbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == -11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == -13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == -15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == 11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == 13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == 15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhhw_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpckhhw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == -6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhhw_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpckhhw_u (s.v, t.v);
+  assert (r.a[0] == 5);
+  assert (r.a[1] == 6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhwd_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = -4;
+  r.v = punpckhwd_s (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == -4);
+}
+
+static void test_punpckhwd_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpckhwd_u (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+}
+
+static void test_punpcklbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == -5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == -7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == 5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == 7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklhw_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpcklhw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklhw_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpcklhw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklwd_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  r.v = punpcklwd_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == -2);
+}
+
+static void test_punpcklwd_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpcklwd_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+}
+
+int main (void)
+{
+  test_packsswh ();
+  test_packsshb ();
+  test_packushb ();
+  test_paddw_u ();
+  test_paddw_s ();
+  test_paddh_u ();
+  test_paddh_s ();
+  test_paddb_u ();
+  test_paddb_s ();
+  test_paddd_u ();
+  test_paddd_s ();
+  test_paddsh ();
+  test_paddsb ();
+  test_paddush ();
+  test_paddusb ();
+  test_pandn_ud ();
+  test_pandn_sd ();
+  test_pandn_uw ();
+  test_pandn_sw ();
+  test_pandn_uh ();
+  test_pandn_sh ();
+  test_pandn_ub ();
+  test_pandn_sb ();
+  test_pavgh ();
+  test_pavgb ();
+  test_pcmpeqw_u ();
+  test_pcmpeqh_u ();
+  test_pcmpeqb_u ();
+  test_pcmpeqw_s ();
+  test_pcmpeqh_s ();
+  test_pcmpeqb_s ();
+  test_pcmpgtw_u ();
+  test_pcmpgth_u ();
+  test_pcmpgtb_u ();
+  test_pcmpgtw_s ();
+  test_pcmpgth_s ();
+  test_pcmpgtb_s ();
+  test_pextrh_u ();
+  test_pextrh_s ();
+  test_pinsrh_0123_u ();
+  test_pinsrh_0123_s ();
+  test_pmaddhw ();
+  test_pmaxsh ();
+  test_pmaxub ();
+  test_pminsh ();
+  test_pminub ();
+  test_pmovmskb_u ();
+  test_pmovmskb_s ();
+  test_pmulhuh ();
+  test_pmulhh ();
+  test_pmullh ();
+  test_pmuluw ();
+  test_pasubub ();
+  test_biadd ();
+  test_psadbh ();
+  test_pshufh_u ();
+  test_pshufh_s ();
+  test_psllh_u ();
+  test_psllw_u ();
+  test_psllh_s ();
+  test_psllw_s ();
+  test_psrah_u ();
+  test_psraw_u ();
+  test_psrah_s ();
+  test_psraw_s ();
+  test_psrlh_u ();
+  test_psrlw_u ();
+  test_psrlh_s ();
+  test_psrlw_s ();
+  test_psubw_u ();
+  test_psubw_s ();
+  test_psubh_u ();
+  test_psubh_s ();
+  test_psubb_u ();
+  test_psubb_s ();
+  test_psubd_u ();
+  test_psubd_s ();
+  test_psubsh ();
+  test_psubsb ();
+  test_psubush ();
+  test_psubusb ();
+  test_punpckhbh_s ();
+  test_punpckhbh_u ();
+  test_punpckhhw_s ();
+  test_punpckhhw_u ();
+  test_punpckhwd_s ();
+  test_punpckhwd_u ();
+  test_punpcklbh_s ();
+  test_punpcklbh_u ();
+  test_punpcklhw_s ();
+  test_punpcklhw_u ();
+  test_punpcklwd_s ();
+  test_punpcklwd_u ();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-10 10:29               ` Richard Sandiford
@ 2008-06-11 10:09                 ` Maxim Kuvyrkov
  2008-06-11 10:23                   ` Richard Sandiford
  2008-06-11 20:34                 ` Maxim Kuvyrkov
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-11 10:09 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:

...

>> Any further comments?
> 
> A couple, I'm afraid ;)

Richard, you are very helpful with this work.  Thank you.

...

> I also changed the built-in function descriptions to match the style
> used elsewhere.
> 
> Finally, I adjusted the patch so that it applies on top of the
> built-in-table patch I sent yesterday in the [3/5] thread.

Are you sure you posted the built-in-table patch?  I couldn't find it 
neither in gcc-patches@ archive nor in mailbox.  The hunks below do not 
apply to current trunk, supposedly because of this.


> @@ -10192,6 +10211,7 @@ AVAIL_NON_MIPS16 (dsp, TARGET_DSP)
>  AVAIL_NON_MIPS16 (dspr2, TARGET_DSPR2)
>  AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
>  AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
> +AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_VECTORS)
>  
>  /* Construct a mips_builtin_description from the given arguments.
>  
> @@ -10288,6 +10308,25 @@ #define BPOSGE_BUILTIN(VALUE, AVAIL)				
>    MIPS_BUILTIN (bposge, f, "bposge" #VALUE,				\
>  		MIPS_BUILTIN_BPOSGE ## VALUE, MIPS_SI_FTYPE_VOID, AVAIL)
>  
> +/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<FN_NAME>
> +   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
> +   builtin_description field.  */
> +#define LOONGSON_BUILTIN_ALIAS(INSN, FN_NAME, FUNCTION_TYPE)		\
> +  { CODE_FOR_loongson_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
> +    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, mips_builtin_avail_loongson }
> +
> +/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<INSN>
> +   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
> +   builtin_description field.  */
> +#define LOONGSON_BUILTIN(INSN, FUNCTION_TYPE)				\
> +  LOONGSON_BUILTIN_ALIAS (INSN, INSN, FUNCTION_TYPE)
> +
> +/* Like LOONGSON_BUILTIN, but add _<SUFFIX> to the end of the function name.
> +   We use functions of this form when the same insn can be usefully applied
> +   to more than one datatype.  */
> +#define LOONGSON_BUILTIN_SUFFIX(INSN, SUFFIX, FUNCTION_TYPE)		\
> +  LOONGSON_BUILTIN_ALIAS (INSN, INSN ## _ ## SUFFIX, FUNCTION_TYPE)
> +
>  #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2
>  #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3
>  #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3
> @@ -10295,6 +10334,37 @@ #define CODE_FOR_mips_subq_ph CODE_FOR_s
>  #define CODE_FOR_mips_subu_qb CODE_FOR_subv4qi3
>  #define CODE_FOR_mips_mul_ph CODE_FOR_mulv2hi3
>  
> +#define CODE_FOR_loongson_packsswh CODE_FOR_vec_pack_ssat_v2si
> +#define CODE_FOR_loongson_packsshb CODE_FOR_vec_pack_ssat_v4hi
> +#define CODE_FOR_loongson_packushb CODE_FOR_vec_pack_usat_v4hi
> +#define CODE_FOR_loongson_paddw CODE_FOR_addv2si3
> +#define CODE_FOR_loongson_paddh CODE_FOR_addv4hi3
> +#define CODE_FOR_loongson_paddb CODE_FOR_addv8qi3
> +#define CODE_FOR_loongson_paddsh CODE_FOR_ssaddv4hi3
> +#define CODE_FOR_loongson_paddsb CODE_FOR_ssaddv8qi3
> +#define CODE_FOR_loongson_paddush CODE_FOR_usaddv4hi3
> +#define CODE_FOR_loongson_paddusb CODE_FOR_usaddv8qi3
> +#define CODE_FOR_loongson_pmaxsh CODE_FOR_smaxv4hi3
> +#define CODE_FOR_loongson_pmaxub CODE_FOR_umaxv8qi3
> +#define CODE_FOR_loongson_pminsh CODE_FOR_sminv4hi3
> +#define CODE_FOR_loongson_pminub CODE_FOR_uminv8qi3
> +#define CODE_FOR_loongson_pmulhuh CODE_FOR_umulv4hi3_highpart
> +#define CODE_FOR_loongson_pmulhh CODE_FOR_smulv4hi3_highpart
> +#define CODE_FOR_loongson_biadd CODE_FOR_reduc_uplus_v8qi
> +#define CODE_FOR_loongson_psubw CODE_FOR_subv2si3
> +#define CODE_FOR_loongson_psubh CODE_FOR_subv4hi3
> +#define CODE_FOR_loongson_psubb CODE_FOR_subv8qi3
> +#define CODE_FOR_loongson_psubsh CODE_FOR_sssubv4hi3
> +#define CODE_FOR_loongson_psubsb CODE_FOR_sssubv8qi3
> +#define CODE_FOR_loongson_psubush CODE_FOR_ussubv4hi3
> +#define CODE_FOR_loongson_psubusb CODE_FOR_ussubv8qi3
> +#define CODE_FOR_loongson_punpckhbh CODE_FOR_vec_interleave_highv8qi
> +#define CODE_FOR_loongson_punpckhhw CODE_FOR_vec_interleave_highv4hi
> +#define CODE_FOR_loongson_punpckhwd CODE_FOR_vec_interleave_highv2si
> +#define CODE_FOR_loongson_punpcklbh CODE_FOR_vec_interleave_lowv8qi
> +#define CODE_FOR_loongson_punpcklhw CODE_FOR_vec_interleave_lowv4hi
> +#define CODE_FOR_loongson_punpcklwd CODE_FOR_vec_interleave_lowv2si
> +
>  static const struct mips_builtin_description mips_builtins[] = {
>    DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
>    DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
> @@ -10471,7 +10541,108 @@ static const struct mips_builtin_descrip
>    DIRECT_BUILTIN (dpaqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
>    DIRECT_BUILTIN (dpaqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
>    DIRECT_BUILTIN (dpsqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
> -  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32)
> +  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
> +
> +  /* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
> +  LOONGSON_BUILTIN (packsswh, MIPS_V4HI_FTYPE_V2SI_V2SI),
> +  LOONGSON_BUILTIN (packsshb, MIPS_V8QI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (packushb, MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (paddw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN_SUFFIX (paddh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (paddb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (paddw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
> +  LOONGSON_BUILTIN_SUFFIX (paddh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (paddb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN_SUFFIX (paddd, u, MIPS_UDI_FTYPE_UDI_UDI),
> +  LOONGSON_BUILTIN_SUFFIX (paddd, s, MIPS_DI_FTYPE_DI_DI),
> +  LOONGSON_BUILTIN (paddsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (paddsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN (paddush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN (paddusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_ud, MIPS_UDI_FTYPE_UDI_UDI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_uw, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_uh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_ub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_sd, MIPS_DI_FTYPE_DI_DI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_sw, MIPS_V2SI_FTYPE_V2SI_V2SI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_sh, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_sb, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN (pavgh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN (pavgb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpgth, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpgth, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pextrh, u, MIPS_UV4HI_FTYPE_UV4HI_USI),
> +  LOONGSON_BUILTIN_SUFFIX (pextrh, s, MIPS_V4HI_FTYPE_V4HI_USI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (pmaddhw, MIPS_V2SI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (pmaxsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (pmaxub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN (pminsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (pminub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pmovmskb, u, MIPS_UV8QI_FTYPE_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pmovmskb, s, MIPS_V8QI_FTYPE_V8QI),
> +  LOONGSON_BUILTIN (pmulhuh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN (pmulhh, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (pmullh, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (pmuluw, MIPS_UDI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN (pasubub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN (biadd, MIPS_UV4HI_FTYPE_UV8QI),
> +  LOONGSON_BUILTIN (psadbh, MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psllh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psllh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psllw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psllw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psrah, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psrah, s, MIPS_V4HI_FTYPE_V4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psraw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psraw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psrlh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psrlh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psrlw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psrlw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (psubw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN_SUFFIX (psubh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (psubb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (psubw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
> +  LOONGSON_BUILTIN_SUFFIX (psubh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (psubb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN_SUFFIX (psubd, u, MIPS_UDI_FTYPE_UDI_UDI),
> +  LOONGSON_BUILTIN_SUFFIX (psubd, s, MIPS_DI_FTYPE_DI_DI),
> +  LOONGSON_BUILTIN (psubsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN (psubsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN (psubush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN (psubusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (punpckhbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (punpckhhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (punpckhwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN_SUFFIX (punpckhbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN_SUFFIX (punpckhhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (punpckhwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
> +  LOONGSON_BUILTIN_SUFFIX (punpcklbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
> +  LOONGSON_BUILTIN_SUFFIX (punpcklhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
> +  LOONGSON_BUILTIN_SUFFIX (punpcklwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
> +  LOONGSON_BUILTIN_SUFFIX (punpcklbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
> +  LOONGSON_BUILTIN_SUFFIX (punpcklhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
> +  LOONGSON_BUILTIN_SUFFIX (punpcklwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI)
>  };
>  
>  /* MODE is a vector mode whose elements have type TYPE.  Return the type

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-11 10:09                 ` Maxim Kuvyrkov
@ 2008-06-11 10:23                   ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-11 10:23 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> I also changed the built-in function descriptions to match the style
>> used elsewhere.
>> 
>> Finally, I adjusted the patch so that it applies on top of the
>> built-in-table patch I sent yesterday in the [3/5] thread.
>
> Are you sure you posted the built-in-table patch?  I couldn't find it 
> neither in gcc-patches@ archive nor in mailbox.  The hunks below do not 
> apply to current trunk, supposedly because of this.

I meant the thing I posted in the [3/5] thread:

    http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00497.html

That patch allows you to use tighter conditions for the paired-single
built-in functions, avoiding the problem you described earlier in
that thread.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-10 10:29               ` Richard Sandiford
  2008-06-11 10:09                 ` Maxim Kuvyrkov
@ 2008-06-11 20:34                 ` Maxim Kuvyrkov
  2008-06-12  8:16                   ` Richard Sandiford
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-11 20:34 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:

...

>> +;; Addition of doubleword integers stored in FP registers.
>> +;; Overflow is treated by wraparound.
>> +(define_insn "paddd"
>> +  [(set (match_operand:DI 0 "register_operand" "=f")
>> +        (plus:DI (match_operand:DI 1 "register_operand" "f")
>> +		 (match_operand:DI 2 "register_operand" "f")))]
>> +  "HAVE_LOONGSON_VECTOR_MODES"
>> +  "paddd\t%0,%1,%2")
> 
> I don't think this pattern or psubd will ever be used for 64-bit ABIs;
> they'll be trumped by the normal addition and subtraction patterns.
> Thus paddd (...) and psub (...) will actually expand to "daddu" and
> "dsubu", moving to and from FPRs if necessary.

I think this is separate problem from what this patch tries to solve. 
The main objective of this patch is to add intrinsics that can be used 
to write hand-optimized code.

I tried to write a proper support for all the different cases of DImode 
add/sub today and the result just doesn't look good enough.

> Also, you _might_ end up
> using these patterns for 64-bit addition on 32-bit ABIs, even though the
> cost of moving to and from FPRs is higher than the usual add/shift
> sequence.

Right.  Although, my recollection from last time I looked at splitter is 
that if splitter sees something it can split, it splits it.

For the time being, I propose to replace 'plus' in paddd and 'minus' in 
psubd with unspecs.  This way the intrinsics will still work, and 
there'll be no clashes with add<mode>3 and sub<mode>3.

I attached two patches that implement different approaches.  My opinion 
is that we shouldn't complicate things and, therefore, should accept the 
simple solution.

> Finally, I adjusted the patch so that it applies on top of the
> built-in-table patch I sent yesterday in the [3/5] thread.
> 
> I've not done anything about the paddd/psubd thing; I'll leave
> that to you ;)  Otherwise, does this look OK to you?

Yes, the patch looks great, thanks.  I'm testing it right now together 
with the simple fix for paddd/psubd issue.

--
Maxim

[-- Attachment #2: fsf-ls2ef-2.1-vector.patch --]
[-- Type: text/plain, Size: 2192 bytes --]

Index: gcc/config/mips/loongson.md
===================================================================
--- gcc/config/mips/loongson.md	(revision 512)
+++ gcc/config/mips/loongson.md	(working copy)
@@ -131,10 +131,15 @@
 
 ;; Addition of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
+;; We use 'unspec' instead of 'plus' here to avoid clash with
+;; mips.md::add<mode>3.  If 'plus' was used, then such instruction
+;; would be recognized as adddi3 and reload would make it use
+;; GPRs instead of FPRs.  Same is valid for loongson_paddd.
 (define_insn "loongson_paddd"
   [(set (match_operand:DI 0 "register_operand" "=f")
-        (plus:DI (match_operand:DI 1 "register_operand" "f")
-		 (match_operand:DI 2 "register_operand" "f")))]
+        (unspec:DI [(match_operand:DI 1 "register_operand" "f")
+		    (match_operand:DI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PADDD))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
   "paddd\t%0,%1,%2")
 
@@ -387,10 +392,15 @@
 
 ;; Subtraction of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
+;; We use 'unspec' instead of 'minus' here to avoid clash with
+;; mips.md::sub<mode>3.  If 'minus' was used, then such instruction
+;; would be recognized as subdi3 and reload would make it use
+;; GPRs instead of FPRs.  Same is valid for loongson_paddd.
 (define_insn "loongson_psubd"
   [(set (match_operand:DI 0 "register_operand" "=f")
-        (minus:DI (match_operand:DI 1 "register_operand" "f")
-		  (match_operand:DI 2 "register_operand" "f")))]
+        (unspec:DI [(match_operand:DI 1 "register_operand" "f")
+		    (match_operand:DI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSUBD))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
   "psubd\t%0,%1,%2")
 
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 512)
+++ gcc/config/mips/mips.md	(working copy)
@@ -237,6 +237,8 @@
    (UNSPEC_LOONGSON_PSHUFH	517)
    (UNSPEC_LOONGSON_PUNPCKH	518)
    (UNSPEC_LOONGSON_PUNPCKL	519)
+   (UNSPEC_LOONGSON_PADDD       520)
+   (UNSPEC_LOONGSON_PSUBD       521)
   ]
 )
 

[-- Attachment #3: fsf-ls2ef-2.2-vector.patch --]
[-- Type: text/plain, Size: 4469 bytes --]

Index: gcc/config/mips/loongson.md
===================================================================
--- gcc/config/mips/loongson.md	(revision 512)
+++ gcc/config/mips/loongson.md	(working copy)
@@ -131,12 +131,11 @@
 
 ;; Addition of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
-(define_insn "loongson_paddd"
-  [(set (match_operand:DI 0 "register_operand" "=f")
-        (plus:DI (match_operand:DI 1 "register_operand" "f")
-		 (match_operand:DI 2 "register_operand" "f")))]
-  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "paddd\t%0,%1,%2")
+(define_expand "loongson_paddd"
+  [(set (match_operand:DI 0)
+        (plus:DI (match_operand:DI 1)
+		 (match_operand:DI 2)))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
 
 ;; Addition, treating overflow by signed saturation.
 (define_insn "ssadd<mode>3"
@@ -387,12 +386,11 @@
 
 ;; Subtraction of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
-(define_insn "loongson_psubd"
-  [(set (match_operand:DI 0 "register_operand" "=f")
-        (minus:DI (match_operand:DI 1 "register_operand" "f")
-		  (match_operand:DI 2 "register_operand" "f")))]
-  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psubd\t%0,%1,%2")
+(define_expand "loongson_psubd"
+  [(set (match_operand:DI 0)
+        (minus:DI (match_operand:DI 1)
+		  (match_operand:DI 2)))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
 
 ;; Subtraction, treating overflow by signed saturation.
 (define_insn "sssub<mode>3"
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 512)
+++ gcc/config/mips/mips.md	(working copy)
@@ -865,13 +865,41 @@
   [(set (match_operand:GPR 0 "register_operand" "=d,d")
 	(plus:GPR (match_operand:GPR 1 "register_operand" "d,d")
 		  (match_operand:GPR 2 "arith_operand" "d,Q")))]
-  "!TARGET_MIPS16"
+  "!TARGET_MIPS16
+   && !(TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS)"
   "@
     <d>addu\t%0,%1,%2
     <d>addiu\t%0,%1,%2"
   [(set_attr "type" "arith")
    (set_attr "mode" "<MODE>")])
 
+(define_mode_attr ls2_add1 [(SI "d,d,d") (DI "d,d,f")])
+(define_mode_attr ls2_add2 [(SI "d,Q,d") (DI "d,Q,f")])
+ 
+(define_insn "*add<mode>3_loongson"
+  [(set (match_operand:GPR 0 "register_operand" "=<ls2_add1>")
+	(plus:GPR (match_operand:GPR 1 "register_operand" "<ls2_add1>")
+		  (match_operand:GPR 2 "arith_operand" "<ls2_add2>")))]
+  "!TARGET_MIPS16
+   && TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "@
+    <d>addu\t%0,%1,%2
+    <d>addiu\t%0,%1,%2
+    paddd\t%0,%1,%2"
+  [(set_attr "type" "arith,arith,fadd")
+   (set_attr "mode" "<MODE>")])
+
+(define_inan "*adddi3_loongson_32bit"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+	(plus:DI (match_operand:DI 1 "register_operand" "f")
+		 (match_operand:DI 2 "register_operand" "f")))]
+  "!TARGET_MIPS16
+   && TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS
+   && !TARGET_64BIT"
+  "paddd\t%0,%1,%2"
+  [(set_attr "type" "fadd")
+   (set_attr "mode" "DI")])
+
 (define_insn "*add<mode>3_mips16"
   [(set (match_operand:GPR 0 "register_operand" "=ks,d,d,d,d")
 	(plus:GPR (match_operand:GPR 1 "register_operand" "ks,ks,0,d,d")
@@ -1061,11 +1089,34 @@
   [(set (match_operand:GPR 0 "register_operand" "=d")
 	(minus:GPR (match_operand:GPR 1 "register_operand" "d")
 		   (match_operand:GPR 2 "register_operand" "d")))]
-  ""
+  "!(TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS)"
   "<d>subu\t%0,%1,%2"
   [(set_attr "type" "arith")
    (set_attr "mode" "<MODE>")])
 
+(define_mode_attr ls2_sub1 [(SI "d,d") (DI "d,f")])
+
+(define_insn "sub<mode>3_loongson"
+  [(set (match_operand:GPR 0 "register_operand" "=<ls2_sub>")
+	(minus:GPR (match_operand:GPR 1 "register_operand" "<ls2_sub>")
+		   (match_operand:GPR 2 "register_operand" "<ls2_sub>")))]
+  "(TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS)"
+  "@
+    <d>subu\t%0,%1,%2
+    psubd\t%0,%1,%2"
+  [(set_attr "type" "arith,fadd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "subdi3_loongson_32bit"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+	(minus:DI (match_operand:DI 1 "register_operand" "f")
+		  (match_operand:DI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS
+   && !TARGET_64BIT"
+  "psubd\t%0,%1,%2"
+  [(set_attr "type" "fadd")
+   (set_attr "mode" "DI")])
+
 (define_insn "*subsi3_extended"
   [(set (match_operand:DI 0 "register_operand" "=d")
 	(sign_extend:DI

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-11 20:34                 ` Maxim Kuvyrkov
@ 2008-06-12  8:16                   ` Richard Sandiford
  2008-06-12  8:45                     ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-12  8:16 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>> +;; Addition of doubleword integers stored in FP registers.
>>> +;; Overflow is treated by wraparound.
>>> +(define_insn "paddd"
>>> +  [(set (match_operand:DI 0 "register_operand" "=f")
>>> +        (plus:DI (match_operand:DI 1 "register_operand" "f")
>>> +		 (match_operand:DI 2 "register_operand" "f")))]
>>> +  "HAVE_LOONGSON_VECTOR_MODES"
>>> +  "paddd\t%0,%1,%2")
>> 
>> I don't think this pattern or psubd will ever be used for 64-bit ABIs;
>> they'll be trumped by the normal addition and subtraction patterns.
>> Thus paddd (...) and psub (...) will actually expand to "daddu" and
>> "dsubu", moving to and from FPRs if necessary.
>
> I think this is separate problem from what this patch tries to solve. 

Sorry, but I disagree.  As you say...

> The main objective of this patch is to add intrinsics that can be used 
> to write hand-optimized code.

...so users of paddd() would reasonably expect it to generate paddd
instead of daddu, because presumably the code has been hand-optimised
that way.  So they'd expect to see:

        paddd   $f0,$f1,$f2

rather than (on 64-bit targets)

        dmfc1   $5,$f1
        dmfc1   $6,$f2
        daddu   $4,$5,$6
        dmtc1   $4,$f0

Although since your patch fixes this, perhaps you thought I was saying
something else.  For avoidance of doubt, there really was no subtext.
I was simply saying that paddd() ought to expand to paddd rather than daddu.

>> Also, you _might_ end up
>> using these patterns for 64-bit addition on 32-bit ABIs, even though the
>> cost of moving to and from FPRs is higher than the usual add/shift
>> sequence.
>
> Right.  Although, my recollection from last time I looked at splitter is 
> that if splitter sees something it can split, it splits it.

There is no splitter for 64-bit addition though.  We decompose it
during expand.  The problem is that we then attach a REG_EQUAL
note that contains the original 64-bit addition.  Passes which
use that note could in principle recognise this instruction.

> For the time being, I propose to replace 'plus' in paddd and 'minus' in 
> psubd with unspecs.  This way the intrinsics will still work, and 
> there'll be no clashes with add<mode>3 and sub<mode>3.

Yeah, FWIW, I suspected this is the way we'd go.  I certainly can't
think of anything better.

We lose the ability to constant-fold, but:

  (a) I imagine that's less likely to happen in hand-optimised code.
  (b) We don't constant-fold most built-in functions.  There's no
      intrinsic reason why this one should be any different.
  (c) It would be simple to add a "simplify_unspec" target hook at some
      point, if anyone was so inclined.

> +;; We use 'unspec' instead of 'plus' here to avoid clash with
> +;; mips.md::add<mode>3.  If 'plus' was used, then such instruction
> +;; would be recognized as adddi3 and reload would make it use
> +;; GPRs instead of FPRs.  Same is valid for loongson_paddd.
                                               ^^^^^^^^^^^^^^
>  (define_insn "loongson_paddd"

Not sure I understand the last sentence.  Did you mean loongson_psubd?
Either way, I think it would be clearer with this sentence removed.
Here...

>  ;; Subtraction of doubleword integers stored in FP registers.
>  ;; Overflow is treated by wraparound.
> +;; We use 'unspec' instead of 'minus' here to avoid clash with
> +;; mips.md::sub<mode>3.  If 'minus' was used, then such instruction
> +;; would be recognized as subdi3 and reload would make it use
> +;; GPRs instead of FPRs.  Same is valid for loongson_paddd.

...you can just say something like:

;; See loongson_paddd for the reason we use 'unspec' rather than 'minus' here.

>> Finally, I adjusted the patch so that it applies on top of the
>> built-in-table patch I sent yesterday in the [3/5] thread.
>> 
>> I've not done anything about the paddd/psubd thing; I'll leave
>> that to you ;)  Otherwise, does this look OK to you?
>
> Yes, the patch looks great, thanks.  I'm testing it right now together 
> with the simple fix for paddd/psubd issue.

Thanks.  I'll test the builtin patch separately on mipsisa64-elfoabi,
just to be sure.  Once that's in, the combination of:

    http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00554.html

and the paddd patch (adjusted as above) is OK to install,
if testing succeeds.

Thanks being so patient about all this.

Richard

PS. Just as a reminder, I think it would be a good idea to have
    scan-assembler tests too, as a separate submission.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-12  8:16                   ` Richard Sandiford
@ 2008-06-12  8:45                     ` Maxim Kuvyrkov
  2008-06-12  9:03                       ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-12  8:45 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Richard Sandiford wrote:
>>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>>> +;; Addition of doubleword integers stored in FP registers.
>>>> +;; Overflow is treated by wraparound.
>>>> +(define_insn "paddd"
>>>> +  [(set (match_operand:DI 0 "register_operand" "=f")
>>>> +        (plus:DI (match_operand:DI 1 "register_operand" "f")
>>>> +		 (match_operand:DI 2 "register_operand" "f")))]
>>>> +  "HAVE_LOONGSON_VECTOR_MODES"
>>>> +  "paddd\t%0,%1,%2")
>>> I don't think this pattern or psubd will ever be used for 64-bit ABIs;
>>> they'll be trumped by the normal addition and subtraction patterns.
>>> Thus paddd (...) and psub (...) will actually expand to "daddu" and
>>> "dsubu", moving to and from FPRs if necessary.
>> I think this is separate problem from what this patch tries to solve. 
> 
> Sorry, but I disagree.  As you say...
> 
>> The main objective of this patch is to add intrinsics that can be used 
>> to write hand-optimized code.
> 
> ...so users of paddd() would reasonably expect it to generate paddd
> instead of daddu, because presumably the code has been hand-optimised
> that way.  So they'd expect to see:
> 
>         paddd   $f0,$f1,$f2
> 
> rather than (on 64-bit targets)
> 
>         dmfc1   $5,$f1
>         dmfc1   $6,$f2
>         daddu   $4,$5,$6
>         dmtc1   $4,$f0

What I meant was the patch should provide proper support for all 
intrinsics it defines, so that, when paddd intrinsic is used, the 
compiler outputs paddd instruction.  This patch should not, however, go 
far beyond its primary objective and try to add optimizations that are 
not trivial to accomplish.  Sorry if I wasn't clear.

...

>>> Also, you _might_ end up
>>> using these patterns for 64-bit addition on 32-bit ABIs, even though the
>>> cost of moving to and from FPRs is higher than the usual add/shift
>>> sequence.
>> Right.  Although, my recollection from last time I looked at splitter is 
>> that if splitter sees something it can split, it splits it.
> 
> There is no splitter for 64-bit addition though.  We decompose it
> during expand.  The problem is that we then attach a REG_EQUAL
> note that contains the original 64-bit addition.  Passes which
> use that note could in principle recognise this instruction.
> 
>> For the time being, I propose to replace 'plus' in paddd and 'minus' in 
>> psubd with unspecs.  This way the intrinsics will still work, and 
>> there'll be no clashes with add<mode>3 and sub<mode>3.
> 
> Yeah, FWIW, I suspected this is the way we'd go.  I certainly can't
> think of anything better.
> 
> We lose the ability to constant-fold, but:
> 
>   (a) I imagine that's less likely to happen in hand-optimised code.
>   (b) We don't constant-fold most built-in functions.  There's no
>       intrinsic reason why this one should be any different.
>   (c) It would be simple to add a "simplify_unspec" target hook at some
>       point, if anyone was so inclined.

Unspecs it is.

> 
>> +;; We use 'unspec' instead of 'plus' here to avoid clash with
>> +;; mips.md::add<mode>3.  If 'plus' was used, then such instruction
>> +;; would be recognized as adddi3 and reload would make it use
>> +;; GPRs instead of FPRs.  Same is valid for loongson_paddd.
>                                                ^^^^^^^^^^^^^^
>>  (define_insn "loongson_paddd"
> 
> Not sure I understand the last sentence.  Did you mean loongson_psubd?
> Either way, I think it would be clearer with this sentence removed.
> Here...

OK.

> 
>>  ;; Subtraction of doubleword integers stored in FP registers.
>>  ;; Overflow is treated by wraparound.
>> +;; We use 'unspec' instead of 'minus' here to avoid clash with
>> +;; mips.md::sub<mode>3.  If 'minus' was used, then such instruction
>> +;; would be recognized as subdi3 and reload would make it use
>> +;; GPRs instead of FPRs.  Same is valid for loongson_paddd.
> 
> ...you can just say something like:
> 
> ;; See loongson_paddd for the reason we use 'unspec' rather than 'minus' here.

OK.

> 
>>> Finally, I adjusted the patch so that it applies on top of the
>>> built-in-table patch I sent yesterday in the [3/5] thread.
>>>
>>> I've not done anything about the paddd/psubd thing; I'll leave
>>> that to you ;)  Otherwise, does this look OK to you?
>> Yes, the patch looks great, thanks.  I'm testing it right now together 
>> with the simple fix for paddd/psubd issue.
> 
> Thanks.  I'll test the builtin patch separately on mipsisa64-elfoabi,
> just to be sure.  Once that's in, the combination of:
> 
>     http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00554.html
> 
> and the paddd patch (adjusted as above) is OK to install,
> if testing succeeds.

I'll post the final patch once the tests are finished.  I'm also testing 
your built-in-table patch and will get back to you with the results.

...

> PS. Just as a reminder, I think it would be a good idea to have
>     scan-assembler tests too, as a separate submission.

This is on my list.


Thanks,

Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-12  8:45                     ` Maxim Kuvyrkov
@ 2008-06-12  9:03                       ` Richard Sandiford
  2008-06-13 18:36                         ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-12  9:03 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>> Richard Sandiford wrote:
>>>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>>>> +;; Addition of doubleword integers stored in FP registers.
>>>>> +;; Overflow is treated by wraparound.
>>>>> +(define_insn "paddd"
>>>>> +  [(set (match_operand:DI 0 "register_operand" "=f")
>>>>> +        (plus:DI (match_operand:DI 1 "register_operand" "f")
>>>>> +		 (match_operand:DI 2 "register_operand" "f")))]
>>>>> +  "HAVE_LOONGSON_VECTOR_MODES"
>>>>> +  "paddd\t%0,%1,%2")
>>>> I don't think this pattern or psubd will ever be used for 64-bit ABIs;
>>>> they'll be trumped by the normal addition and subtraction patterns.
>>>> Thus paddd (...) and psub (...) will actually expand to "daddu" and
>>>> "dsubu", moving to and from FPRs if necessary.
>>> I think this is separate problem from what this patch tries to solve. 
>> 
>> Sorry, but I disagree.  As you say...
>> 
>>> The main objective of this patch is to add intrinsics that can be used 
>>> to write hand-optimized code.
>> 
>> ...so users of paddd() would reasonably expect it to generate paddd
>> instead of daddu, because presumably the code has been hand-optimised
>> that way.  So they'd expect to see:
>> 
>>         paddd   $f0,$f1,$f2
>> 
>> rather than (on 64-bit targets)
>> 
>>         dmfc1   $5,$f1
>>         dmfc1   $6,$f2
>>         daddu   $4,$5,$6
>>         dmtc1   $4,$f0
>
> What I meant was the patch should provide proper support for all 
> intrinsics it defines, so that, when paddd intrinsic is used, the 
> compiler outputs paddd instruction.  This patch should not, however, go 
> far beyond its primary objective and try to add optimizations that are 
> not trivial to accomplish.  Sorry if I wasn't clear.

Well, I wasn't talking about adding optimisations, so the reply was
a bit confusing ;)  Looks like we're really in violent agreement though.

>> PS. Just as a reminder, I think it would be a good idea to have
>>     scan-assembler tests too, as a separate submission.
>
> This is on my list.

Thanks.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-05-25 11:57   ` Richard Sandiford
@ 2008-06-12 13:45     ` Maxim Kuvyrkov
  2008-06-12 17:49       ` Richard Sandiford
       [not found]       ` <48515794.7050007@codesourcery.com>
  0 siblings, 2 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-12 13:45 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 4332 bytes --]

Richard Sandiford wrote:
> [Silly thing, but I tried applying these patches, and was alerted
> to trailing whitespace in:
> 
>     gcc/testsuite/gcc.target/mips/loongson-simd.c
>     gcc/config/mips/loongson.md
>     gcc/config/mips/loongson2ef.md
> 
> Please run them through your favourite trailing-whitespace remover
> before applying.]

Thanks for detailed review.  I've attached update patch with all issues 
fixed.  I've also added a couple more things in it:

* Catch-all reservation in loongson2ef.md.  This is the last reservation 
in the file, please check that you are happy with the comment ;)

* (define_attr "cpu"): I've renamed values loongson2? to loongson_2?. 
This corrects a typo in [1/5] patch.


I've fixed all the formatting issues.  Thanks for pointing them out.

> 
> Anyway, this patch looks good, thanks.  A neat use of CPU querying ;)

Thanks.

...

>> @@ -301,8 +335,14 @@ struct machine_function GTY(()) {
>>    /* True if we have emitted an instruction to initialize
>>       mips16_gp_pseudo_rtx.  */
>>    bool initialized_mips16_gp_pseudo_p;
>> +
>> +  /* Data used when scheduling for Loongson 2E/2F.  */
>> +  struct sched_ls2_def _sched_ls2;
>>  };
>>  
>> +/* A convenient shortcut.  */
>> +#define sched_ls2 (cfun->machine->_sched_ls2)
>> +
> 
> Hmm, we really shouldn't use _foo names.
> 
> I don't see why this has to be in machine_function anyway.  It's
> pass-local, so why not just use a static sched_ls2 variable, like
> we do for other scheduling stuff?
> 
> (I assume putting this in machine_function keeps the fake insns
> around between sched1 and sched2, even though we create new insns
> during the initialisation phase of sched2.)

I've made it struct { ... } mips_ls2;.

> Actually, make that "ls2_sched" or "mips_ls2_sched", for consistency
> with other ISA- or processor-prefixed names.  Same with with the other
> similar names in the patch; sometimes you've used "sched_ls2_foo"
> and sometimes you've used "mips_ls2_foo".  "mips_ls2_foo" is fine.

The rationale here was that exported stuff (like hooks) have 
mips_ls2_sched_* names while static variables and static functions are 
named sched_ls2_*.  I've changed all names to mips_ls2.

> 
>> +/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook.
>> +   Init data used in mips_dfa_post_advance_cycle.  */
>> +static void
>> +mips_init_dfa_post_cycle_insn (void)
> 
> Local style is to have a blank line before the function.
> Other instances.

Fixed.

...

> Please put the "if" body in a separate ls2_init_dfa_post_cycle_insn
> function, for consistency with other hooks.  (You did this for
> sched_ls2_dfa_post_advance_cycle, thanks.)

Fixed

> 
>> @@ -9982,6 +10129,14 @@ mips_sched_init (FILE *file ATTRIBUTE_UN
>>    mips_macc_chains_last_hilo = 0;
>>    vr4130_last_insn = 0;
>>    mips_74k_agen_init (NULL_RTX);
>> +
>> +  if (TUNE_LOONGSON_2EF)
>> +    {
>> +      /* Branch instructions go to ALU1, therefore basic block is most likely
>> + 	 to start with round-robin counter pointed to ALU2.  */
>> +      sched_ls2.alu1_turn_p = false;
>> +      sched_ls2.falu1_turn_p = true;
>> +    }
> 
> As you can see from the context, we initialise other schedulers'
> information unconditionally.  That's amenable to change if you like,
> but whatever we do, let's be consistent.

The initialization is unconditional now.

> 
>> @@ -10022,6 +10204,21 @@ mips_variable_issue (FILE *file ATTRIBUT
>>        vr4130_last_insn = insn;
>>        if (TUNE_74K)
>>  	mips_74k_agen_init (insn);
>> +      else if (TUNE_LOONGSON_2EF)
>> +	{
>> +	  mips_ls2_variable_issue ();
>> +
>> +	  if (recog_memoized (insn) >= 0)
>> +	    {
>> +	      sched_ls2.cycle_has_multi_p |= (get_attr_type (insn)
>> +					      == TYPE_MULTI);
>> +
>> +	      /* Instructions of type 'multi' should all be split before
>> +		 second scheduling pass.  */
>> +	      gcc_assert (!sched_ls2.cycle_has_multi_p
>> +			  || !reload_completed);
>> +	    }
>> +	}
> 
> Please put all the Loongson bits in mips_ls2_variable_issue.
> 
> I agree that the assert is a good thing, but I think we should require
> it for all targets or none.  If we don't, it's too likely that someone
> working on another target will accidentally break Loongson.

I've put loongson bits to mips_ls2_variable_issue and made the assert 
work for all architectures.


--
Maxim

[-- Attachment #2: fsf-ls2ef-4-sched.ChangeLog --]
[-- Type: text/plain, Size: 2099 bytes --]

2008-05-22  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/mips/loongson2ef.md: New file.
	* config/mips/mips.md (UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN)
	(UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN)
	(UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN)
	(UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN): New constants.
	(define_attr "cpu"): Rename loongson2e and loongson2f to loongson_2e
	and loongson_2f.
	(loongson2ef.md): New include.
	* config/mips/loongson.md (vec_pack_ssat_<mode>, vec_pack_usat_<mode>)
	(add<mode>3, paddd, ssadd<mode>3, usadd<mode>3)
	(loongson_and_not_<mode>, loongson_average_<mode>, loongson_eq_<mode>)
	(loongson_gt_<mode>, loongson_extract_halfword)
	(loongson_insert_halfword_0, loongson_insert_halfword_2)
	(loongson_insert_halfword_3, loongson_mult_add, smax<mode>3)
	(umax<mode>3, smin<mode>3, umin<mode>3, loongson_move_byte_mask)
	(umul<mode>3_highpart, smul<mode>3_highpart, loongson_smul_lowpart)
	(loongson_umul_word, loongson_pasubub, reduc_uplus_<mode>)
	(loongson_psadbh, loongson_pshufh, loongson_psll<mode>)
	(loongson_psra<mode>, loongson_psrl<mode>, sub<mode>3, psubd)
	(sssub<mode>3, ussub<mode>3, vec_interleave_high<mode>)
	(vec_interleave_low<mode>): Define type attribute.
	* config/mips/mips.c (mips_ls2): New static variable.
	(mips_issue_rate): Update to handle tuning for Loongson 2E/2F.
	(mips_ls2_init_dfa_post_cycle_insn, mips_init_dfa_post_cycle_insn)
	(sched_ls2_dfa_post_advance_cycle, mips_dfa_post_advance_cycle):
	Implement target scheduling hooks.
	(mips_multipass_dfa_lookahead): Update to handle tuning for
	Loongson 2E/2F.
	(mips_sched_init): Initialize data for Loongson scheduling.
	(mips_ls2_variable_issue): New static function.
	(mips_variable_issue): Update to handle tuning for Loongson 2E/2F.
	Add sanity check.
	(TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN)
	(TARGET_SCHED_DFA_POST_ADVANCE_CYCLE): Override target hooks.
	* config/mips/mips.h (TUNE_LOONGSON_2EF): New macros.
	(ISA_HAS_XFER_DELAY, ISA_HAS_FCMP_DELAY, ISA_HAS_HILO_INTERLOCKS):
	Handle ST Loongson 2E/2F cores.
	(CPU_UNITS_QUERY): Define macro to enable querying of DFA units.

[-- Attachment #3: fsf-ls2ef-4-sched.patch --]
[-- Type: text/plain, Size: 34016 bytes --]

--- gcc/config/mips/loongson.md	(/local/gcc-3)	(revision 523)
+++ gcc/config/mips/loongson.md	(/local/gcc-4)	(revision 523)
@@ -108,7 +108,8 @@
 	 (ss_truncate:<V_squash>
 	  (match_operand:VWH 2 "register_operand" "f"))))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "packss<V_squash_double_suffix>\t%0,%1,%2")
+  "packss<V_squash_double_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Pack with unsigned saturation.
 (define_insn "vec_pack_usat_<mode>"
@@ -119,7 +120,8 @@
 	 (us_truncate:<V_squash>
 	  (match_operand:VH 2 "register_operand" "f"))))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "packus<V_squash_double_suffix>\t%0,%1,%2")
+  "packus<V_squash_double_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Addition, treating overflow by wraparound.
 (define_insn "add<mode>3"
@@ -127,7 +129,8 @@
         (plus:VWHB (match_operand:VWHB 1 "register_operand" "f")
 		   (match_operand:VWHB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "padd<V_suffix>\t%0,%1,%2")
+  "padd<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Addition of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
@@ -141,7 +144,8 @@
 		    (match_operand:DI 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PADDD))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "paddd\t%0,%1,%2")
+  "paddd\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Addition, treating overflow by signed saturation.
 (define_insn "ssadd<mode>3"
@@ -149,7 +153,8 @@
         (ss_plus:VHB (match_operand:VHB 1 "register_operand" "f")
 		     (match_operand:VHB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "padds<V_suffix>\t%0,%1,%2")
+  "padds<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Addition, treating overflow by unsigned saturation.
 (define_insn "usadd<mode>3"
@@ -157,7 +162,8 @@
         (us_plus:VHB (match_operand:VHB 1 "register_operand" "f")
 		     (match_operand:VHB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "paddus<V_suffix>\t%0,%1,%2")
+  "paddus<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Logical AND NOT.
 (define_insn "loongson_pandn_<V_suffix>"
@@ -166,7 +172,8 @@
 	 (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
 	 (match_operand:VWHBDI 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pandn\t%0,%1,%2")
+  "pandn\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Average.
 (define_insn "loongson_pavg<V_suffix>"
@@ -175,7 +182,8 @@
 		     (match_operand:VHB 2 "register_operand" "f")]
 		    UNSPEC_LOONGSON_PAVG))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pavg<V_suffix>\t%0,%1,%2")
+  "pavg<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Equality test.
 (define_insn "loongson_pcmpeq<V_suffix>"
@@ -184,7 +192,8 @@
 		      (match_operand:VWHB 2 "register_operand" "f")]
 		     UNSPEC_LOONGSON_PCMPEQ))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pcmpeq<V_suffix>\t%0,%1,%2")
+  "pcmpeq<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Greater-than test.
 (define_insn "loongson_pcmpgt<V_suffix>"
@@ -193,7 +202,8 @@
 		      (match_operand:VWHB 2 "register_operand" "f")]
 		     UNSPEC_LOONGSON_PCMPGT))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pcmpgt<V_suffix>\t%0,%1,%2")
+  "pcmpgt<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Extract halfword.
 (define_insn "loongson_pextr<V_suffix>"
@@ -202,7 +212,8 @@
  		    (match_operand:SI 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PEXTR))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pextr<V_suffix>\t%0,%1,%2")
+  "pextr<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Insert halfword.
 (define_insn "loongson_pinsr<V_suffix>_0"
@@ -211,7 +222,8 @@
 		    (match_operand:VH 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PINSR_0))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pinsr<V_suffix>_0\t%0,%1,%2")
+  "pinsr<V_suffix>_0\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_pinsr<V_suffix>_1"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -219,7 +231,8 @@
 		    (match_operand:VH 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PINSR_1))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pinsr<V_suffix>_1\t%0,%1,%2")
+  "pinsr<V_suffix>_1\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_pinsr<V_suffix>_2"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -227,7 +240,8 @@
 		    (match_operand:VH 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PINSR_2))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pinsr<V_suffix>_2\t%0,%1,%2")
+  "pinsr<V_suffix>_2\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
 
 (define_insn "loongson_pinsr<V_suffix>_3"
   [(set (match_operand:VH 0 "register_operand" "=f")
@@ -235,7 +249,8 @@
 		    (match_operand:VH 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PINSR_3))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pinsr<V_suffix>_3\t%0,%1,%2")
+  "pinsr<V_suffix>_3\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
 
 ;; Multiply and add packed integers.
 (define_insn "loongson_pmadd<V_stretch_half_suffix>"
@@ -244,7 +259,8 @@
 				  (match_operand:VH 2 "register_operand" "f")]
 				 UNSPEC_LOONGSON_PMADD))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmadd<V_stretch_half_suffix>\t%0,%1,%2")
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Maximum of signed halfwords.
 (define_insn "smax<mode>3"
@@ -252,7 +268,8 @@
         (smax:VH (match_operand:VH 1 "register_operand" "f")
 		 (match_operand:VH 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmaxs<V_suffix>\t%0,%1,%2")
+  "pmaxs<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Maximum of unsigned bytes.
 (define_insn "umax<mode>3"
@@ -260,7 +277,8 @@
         (umax:VB (match_operand:VB 1 "register_operand" "f")
 		 (match_operand:VB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmaxu<V_suffix>\t%0,%1,%2")
+  "pmaxu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Minimum of signed halfwords.
 (define_insn "smin<mode>3"
@@ -268,7 +286,8 @@
         (smin:VH (match_operand:VH 1 "register_operand" "f")
 		 (match_operand:VH 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmins<V_suffix>\t%0,%1,%2")
+  "pmins<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Minimum of unsigned bytes.
 (define_insn "umin<mode>3"
@@ -276,7 +295,8 @@
         (umin:VB (match_operand:VB 1 "register_operand" "f")
 		 (match_operand:VB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pminu<V_suffix>\t%0,%1,%2")
+  "pminu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Move byte mask.
 (define_insn "loongson_pmovmsk<V_suffix>"
@@ -284,7 +304,8 @@
         (unspec:VB [(match_operand:VB 1 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PMOVMSK))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmovmsk<V_suffix>\t%0,%1")
+  "pmovmsk<V_suffix>\t%0,%1"
+  [(set_attr "type" "fabs")])
 
 ;; Multiply unsigned integers and store high result.
 (define_insn "umul<mode>3_highpart"
@@ -293,7 +314,8 @@
 		    (match_operand:VH 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PMULHU))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmulhu<V_suffix>\t%0,%1,%2")
+  "pmulhu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Multiply signed integers and store high result.
 (define_insn "smul<mode>3_highpart"
@@ -302,7 +324,8 @@
 		    (match_operand:VH 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PMULH))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmulh<V_suffix>\t%0,%1,%2")
+  "pmulh<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Multiply signed integers and store low result.
 (define_insn "loongson_pmull<V_suffix>"
@@ -311,7 +334,8 @@
 		    (match_operand:VH 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PMULL))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmull<V_suffix>\t%0,%1,%2")
+  "pmull<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Multiply unsigned word integers.
 (define_insn "loongson_pmulu<V_suffix>"
@@ -320,7 +344,8 @@
 		    (match_operand:VW 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PMULU))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pmulu<V_suffix>\t%0,%1,%2")
+  "pmulu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Absolute difference.
 (define_insn "loongson_pasubub"
@@ -329,7 +354,8 @@
 		    (match_operand:VB 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PASUBUB))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pasubub\t%0,%1,%2")
+  "pasubub\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Sum of unsigned byte integers.
 (define_insn "reduc_uplus_<mode>"
@@ -337,7 +363,8 @@
         (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")]
 				 UNSPEC_LOONGSON_BIADD))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "biadd\t%0,%1")
+  "biadd\t%0,%1"
+  [(set_attr "type" "fabs")])
 
 ;; Sum of absolute differences.
 (define_insn "loongson_psadbh"
@@ -346,7 +373,8 @@
 				  (match_operand:VB 2 "register_operand" "f")]
 				 UNSPEC_LOONGSON_PSADBH))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pasubub\t%0,%1,%2;biadd\t%0,%0")
+  "pasubub\t%0,%1,%2;biadd\t%0,%0"
+  [(set_attr "type" "fadd")])
 
 ;; Shuffle halfwords.
 (define_insn "loongson_pshufh"
@@ -356,7 +384,8 @@
 		    (match_operand:SI 3 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PSHUFH))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pshufh\t%0,%2,%3")
+  "pshufh\t%0,%2,%3"
+  [(set_attr "type" "fmul")])
 
 ;; Shift left logical.
 (define_insn "loongson_psll<V_suffix>"
@@ -364,7 +393,8 @@
         (ashift:VWH (match_operand:VWH 1 "register_operand" "f")
 		    (match_operand:SI 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psll<V_suffix>\t%0,%1,%2")
+  "psll<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
 
 ;; Shift right arithmetic.
 (define_insn "loongson_psra<V_suffix>"
@@ -372,7 +402,8 @@
         (ashiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
 		      (match_operand:SI 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psra<V_suffix>\t%0,%1,%2")
+  "psra<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
 
 ;; Shift right logical.
 (define_insn "loongson_psrl<V_suffix>"
@@ -380,7 +411,8 @@
         (lshiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
 		      (match_operand:SI 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psrl<V_suffix>\t%0,%1,%2")
+  "psrl<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
 
 ;; Subtraction, treating overflow by wraparound.
 (define_insn "sub<mode>3"
@@ -388,7 +420,8 @@
         (minus:VWHB (match_operand:VWHB 1 "register_operand" "f")
 		    (match_operand:VWHB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psub<V_suffix>\t%0,%1,%2")
+  "psub<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction of doubleword integers stored in FP registers.
 ;; Overflow is treated by wraparound.
@@ -400,7 +433,8 @@
 		    (match_operand:DI 2 "register_operand" "f")]
 		   UNSPEC_LOONGSON_PSUBD))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psubd\t%0,%1,%2")
+  "psubd\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction, treating overflow by signed saturation.
 (define_insn "sssub<mode>3"
@@ -408,7 +442,8 @@
         (ss_minus:VHB (match_operand:VHB 1 "register_operand" "f")
 		      (match_operand:VHB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psubs<V_suffix>\t%0,%1,%2")
+  "psubs<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Subtraction, treating overflow by unsigned saturation.
 (define_insn "ussub<mode>3"
@@ -416,7 +451,8 @@
         (us_minus:VHB (match_operand:VHB 1 "register_operand" "f")
 		      (match_operand:VHB 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "psubus<V_suffix>\t%0,%1,%2")
+  "psubus<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
 
 ;; Unpack high data.
 (define_insn "vec_interleave_high<mode>"
@@ -425,7 +461,8 @@
 		      (match_operand:VWHB 2 "register_operand" "f")]
 		     UNSPEC_LOONGSON_PUNPCKH))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "punpckh<V_stretch_half_suffix>\t%0,%1,%2")
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
 
 ;; Unpack low data.
 (define_insn "vec_interleave_low<mode>"
@@ -434,4 +471,5 @@
 		      (match_operand:VWHB 2 "register_operand" "f")]
 		     UNSPEC_LOONGSON_PUNPCKL))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "punpckl<V_stretch_half_suffix>\t%0,%1,%2")
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
--- gcc/config/mips/loongson2ef.md	(/local/gcc-3)	(revision 523)
+++ gcc/config/mips/loongson2ef.md	(/local/gcc-4)	(revision 523)
@@ -0,0 +1,247 @@
+;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
+
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Automaton for integer instructions.
+(define_automaton "ls2_alu")
+
+;; ALU1 and ALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_alu1_core,ls2_alu2_core" "ls2_alu")
+
+;; Pseudo units to help modeling of ALU1/2 round-robin dispatch strategy.
+(define_cpu_unit "ls2_alu1_turn,ls2_alu2_turn" "ls2_alu")
+
+;; Pseudo units to enable/disable ls2_alu[12]_turn units.
+;; ls2_alu[12]_turn unit can be subscribed only after ls2_alu[12]_turn_enabled
+;; unit is subscribed.
+(define_cpu_unit "ls2_alu1_turn_enabled,ls2_alu2_turn_enabled" "ls2_alu")
+(presence_set "ls2_alu1_turn" "ls2_alu1_turn_enabled")
+(presence_set "ls2_alu2_turn" "ls2_alu2_turn_enabled")
+
+;; Reservations for ALU1 (ALU2) instructions.
+;; Instruction goes to ALU1 (ALU2) and makes next ALU1/2 instruction to
+;; be dispatched to ALU2 (ALU1).
+(define_reservation "ls2_alu1"
+  "(ls2_alu1_core+ls2_alu2_turn_enabled)|ls2_alu1_core")
+(define_reservation "ls2_alu2"
+  "(ls2_alu2_core+ls2_alu1_turn_enabled)|ls2_alu2_core")
+
+;; Reservation for ALU1/2 instructions.
+;; Instruction will go to ALU1 iff ls2_alu1_turn_enabled is subscribed and
+;; switch the turn to ALU2 by subscribing ls2_alu2_turn_enabled.
+;; Or to ALU2 otherwise.
+(define_reservation "ls2_alu"
+  "(ls2_alu1_core+ls2_alu1_turn+ls2_alu2_turn_enabled)
+   |(ls2_alu1_core+ls2_alu1_turn)
+   |(ls2_alu2_core+ls2_alu2_turn+ls2_alu1_turn_enabled)
+   |(ls2_alu2_core+ls2_alu2_turn)")
+
+;; Automaton for floating-point instructions.
+(define_automaton "ls2_falu")
+
+;; FALU1 and FALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_falu1_core,ls2_falu2_core" "ls2_falu")
+
+;; Pseudo units to help modeling of FALU1/2 round-robin dispatch strategy.
+(define_cpu_unit "ls2_falu1_turn,ls2_falu2_turn" "ls2_falu")
+
+;; Pseudo units to enable/disable ls2_falu[12]_turn units.
+;; ls2_falu[12]_turn unit can be subscribed only after
+;; ls2_falu[12]_turn_enabled unit is subscribed.
+(define_cpu_unit "ls2_falu1_turn_enabled,ls2_falu2_turn_enabled" "ls2_falu")
+(presence_set "ls2_falu1_turn" "ls2_falu1_turn_enabled")
+(presence_set "ls2_falu2_turn" "ls2_falu2_turn_enabled")
+
+;; Reservations for FALU1 (FALU2) instructions.
+;; Instruction goes to FALU1 (FALU2) and makes next FALU1/2 instruction to
+;; be dispatched to FALU2 (FALU1).
+(define_reservation "ls2_falu1"
+  "(ls2_falu1_core+ls2_falu2_turn_enabled)|ls2_falu1_core")
+(define_reservation "ls2_falu2"
+  "(ls2_falu2_core+ls2_falu1_turn_enabled)|ls2_falu2_core")
+
+;; Reservation for FALU1/2 instructions.
+;; Instruction will go to FALU1 iff ls2_falu1_turn_enabled is subscribed and
+;; switch the turn to FALU2 by subscribing ls2_falu2_turn_enabled.
+;; Or to FALU2 otherwise.
+(define_reservation "ls2_falu"
+  "(ls2_falu1+ls2_falu1_turn+ls2_falu2_turn_enabled)
+   |(ls2_falu1+ls2_falu1_turn)
+   |(ls2_falu2+ls2_falu2_turn+ls2_falu1_turn_enabled)
+   |(ls2_falu2+ls2_falu2_turn)")
+
+;; The following 4 instructions each subscribe one of
+;; ls2_[f]alu{1,2}_turn_enabled units according to this attribute.
+;; These instructions are used in mips.c: sched_ls2_dfa_post_advance_cycle.
+
+(define_attr "ls2_turn_type" "alu1,alu2,falu1,falu2,unknown"
+  (const_string "unknown"))
+
+;; Subscribe ls2_alu1_turn_enabled.
+(define_insn "ls2_alu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "alu1")])
+
+(define_insn_reservation "ls2_alu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu1")
+  "ls2_alu1_turn_enabled")
+
+;; Subscribe ls2_alu2_turn_enabled.
+(define_insn "ls2_alu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "alu2")])
+
+(define_insn_reservation "ls2_alu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu2")
+  "ls2_alu2_turn_enabled")
+
+;; Subscribe ls2_falu1_turn_enabled.
+(define_insn "ls2_falu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "falu1")])
+
+(define_insn_reservation "ls2_falu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu1")
+  "ls2_falu1_turn_enabled")
+
+;; Subscribe ls2_falu2_turn_enabled.
+(define_insn "ls2_falu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "falu2")])
+
+(define_insn_reservation "ls2_falu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu2")
+  "ls2_falu2_turn_enabled")
+
+;; Automaton for memory operations.
+(define_automaton "ls2_mem")
+
+;; Memory unit.
+(define_query_cpu_unit "ls2_mem" "ls2_mem")
+
+;; Reservation for integer instructions.
+(define_insn_reservation "ls2_alu" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "arith,condmove,const,logical,mfhilo,move,
+                        mthilo,nop,shift,signext,slt"))
+  "ls2_alu")
+
+;; Reservation for branch instructions.
+(define_insn_reservation "ls2_branch" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "branch,jump,call,trap"))
+  "ls2_alu1")
+
+;; Reservation for integer multiplication instructions.
+(define_insn_reservation "ls2_imult" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "imul,imul3"))
+  "ls2_alu2,ls2_alu2_core")
+
+;; Reservation for integer division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take 2-38 cycles.
+(define_insn_reservation "ls2_idiv" 20
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "idiv"))
+  "ls2_alu2,ls2_alu2_core*18")
+
+;; Reservation for memory load instructions.
+(define_insn_reservation "ls2_load" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "load,fpload,mfc,mtc"))
+  "ls2_mem")
+
+;; Reservation for memory store instructions.
+;; With stores we assume they don't alias with dependent loads.
+;; Therefore we set the latency to zero.
+(define_insn_reservation "ls2_store" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "store,fpstore"))
+  "ls2_mem")
+
+;; Reservation for floating-point instructions of latency 3.
+(define_insn_reservation "ls2_fp3" 3
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fabs,fneg,fcmp,fmove"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions of latency 5.
+(define_insn_reservation "ls2_fp5" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fcvt"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions that can go
+;; to either of FALU1/2 units.
+(define_insn_reservation "ls2_falu" 7
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fadd,fmul,fmadd"))
+  "ls2_falu")
+
+;; Reservation for floating-point division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; div.s takes 5-11 cycles
+;; div.d takes 5-18 cycles
+(define_insn_reservation "ls2_fdiv" 9
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fdiv"))
+  "ls2_falu2,ls2_falu2_core*7")
+
+;; Reservation for floating-point sqrt instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; sqrt.s takes 5-17 cycles
+;; sqrt.d takes 5-32 cycles
+(define_insn_reservation "ls2_fsqrt" 15
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fsqrt"))
+  "ls2_falu2,ls2_falu2_core*13")
+
+;; Two consecutive ALU instructions.
+(define_insn_reservation "ls2_multi" 4
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "multi"))
+  "(ls2_alu1,ls2_alu2_core)|(ls2_alu2,ls2_alu1_core)")
+
+(define_insn_reservation "ls2_ghost" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "ghost"))
+  "nothing")
+
+;; Reservation for everything else.  Normally, this reservation
+;; will only be used to handle cases like compiling
+;; for non-loongson CPU with -mtune=loongson2?.
+;;
+;; !!! This is not a good thing to depend upon the fact that
+;; DFA will check reservations in the same order as they appear
+;; in the file, but it seems to work for the time being.
+;; Anyway, we already use this DFA property heavily with generic.md.
+(define_insn_reservation "ls2_unknown" 1
+  (eq_attr "cpu" "loongson_2e,loongson_2f")
+  "ls2_alu1_core+ls2_alu2_core+ls2_falu1_core+ls2_falu2_core+ls2_mem")
--- gcc/config/mips/mips.md	(/local/gcc-3)	(revision 523)
+++ gcc/config/mips/mips.md	(/local/gcc-4)	(revision 523)
@@ -239,6 +239,12 @@
    (UNSPEC_LOONGSON_PUNPCKL	519)
    (UNSPEC_LOONGSON_PADDD	520)
    (UNSPEC_LOONGSON_PSUBD	521)
+
+   ;; Used in loongson2ef.md
+   (UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN   530)
+   (UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN   531)
+   (UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN  532)
+   (UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN  533)
   ]
 )
 
@@ -441,7 +447,7 @@
 ;; Attribute describing the processor.  This attribute must match exactly
 ;; with the processor_type enumeration in mips.h.
 (define_attr "cpu"
-  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson2e,loongson2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
   (const (symbol_ref "mips_tune")))
 
 ;; The type of hardware hazard associated with this instruction.
@@ -794,6 +800,7 @@
 (include "sb1.md")
 (include "sr71k.md")
 (include "xlr.md")
+(include "loongson2ef.md")
 (include "generic.md")
 \f
 ;;
--- gcc/config/mips/mips.c	(/local/gcc-3)	(revision 523)
+++ gcc/config/mips/mips.c	(/local/gcc-4)	(revision 523)
@@ -9778,6 +9778,41 @@ mips_store_data_bypass_p (rtx out_insn, 
   return !store_data_bypass_p (out_insn, in_insn);
 }
 \f
+
+/* Variables and flags used in scheduler hooks when tuning for
+   Loongson 2E/2F.  */
+static struct
+{
+  /* Variables to support Loongson 2E/2F round-robin [F]ALU1/2 dispatch
+     strategy.  */
+
+  /* If true, then next ALU1/2 instruction will go to ALU1.  */
+  bool alu1_turn_p;
+
+  /* If true, then next FALU1/2 unstruction will go to FALU1.  */
+  bool falu1_turn_p;
+
+  /* Codes to query if [f]alu{1,2}_core units are subscribed or not.  */
+  int alu1_core_unit_code;
+  int alu2_core_unit_code;
+  int falu1_core_unit_code;
+  int falu2_core_unit_code;
+
+  /* True if current cycle has a multi instruction.
+     This flag is used in mips_ls2_dfa_post_advance_cycle.  */
+  bool cycle_has_multi_p;
+
+  /* Instructions to subscribe ls2_[f]alu{1,2}_turn_enabled units.
+     These are used in mips_ls2_dfa_post_advance_cycle to initialize
+     DFA state.
+     E.g., when alu1_turn_enabled_insn is issued it makes next ALU1/2
+     instruction to go ALU1.  */
+  rtx alu1_turn_enabled_insn;
+  rtx alu2_turn_enabled_insn;
+  rtx falu1_turn_enabled_insn;
+  rtx falu2_turn_enabled_insn;
+} mips_ls2;
+
 /* Implement TARGET_SCHED_ADJUST_COST.  We assume that anti and output
    dependencies have no cost, except on the 20Kc where output-dependence
    is treated like input-dependence.  */
@@ -9828,11 +9863,124 @@ mips_issue_rate (void)
 	 reach the theoretical max of 4.  */
       return 3;
 
+    case PROCESSOR_LOONGSON_2E:
+    case PROCESSOR_LOONGSON_2F:
+      return 4;
+
     default:
       return 1;
     }
 }
 
+/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook for Loongson2.  */
+
+static void
+mips_ls2_init_dfa_post_cycle_insn (void)
+{
+  start_sequence ();
+  emit_insn (gen_ls2_alu1_turn_enabled_insn ());
+  mips_ls2.alu1_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  start_sequence ();
+  emit_insn (gen_ls2_alu2_turn_enabled_insn ());
+  mips_ls2.alu2_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  start_sequence ();
+  emit_insn (gen_ls2_falu1_turn_enabled_insn ());
+  mips_ls2.falu1_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  start_sequence ();
+  emit_insn (gen_ls2_falu2_turn_enabled_insn ());
+  mips_ls2.falu2_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  mips_ls2.alu1_core_unit_code = get_cpu_unit_code ("ls2_alu1_core");
+  mips_ls2.alu2_core_unit_code = get_cpu_unit_code ("ls2_alu2_core");
+  mips_ls2.falu1_core_unit_code = get_cpu_unit_code ("ls2_falu1_core");
+  mips_ls2.falu2_core_unit_code = get_cpu_unit_code ("ls2_falu2_core");
+}
+
+/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook.
+   Init data used in mips_dfa_post_advance_cycle.  */
+
+static void
+mips_init_dfa_post_cycle_insn (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    mips_ls2_init_dfa_post_cycle_insn ();
+}
+
+/* Initialize STATE when scheduling for Loongson 2E/2F.
+   Support round-robin dispatch scheme by enabling only one of
+   ALU1/ALU2 and one of FALU1/FALU2 units for ALU1/2 and FALU1/2 instructions
+   respectively.  */
+
+static void
+mips_ls2_dfa_post_advance_cycle (state_t state)
+{
+  if (cpu_unit_reservation_p (state, mips_ls2.alu1_core_unit_code))
+    {
+      /* Though there are no non-pipelined ALU1 insns,
+	 we can get an instruction of type 'multi' before reload.  */
+      gcc_assert (mips_ls2.cycle_has_multi_p);
+      mips_ls2.alu1_turn_p = false;
+    }
+
+  mips_ls2.cycle_has_multi_p = false;
+
+  if (cpu_unit_reservation_p (state, mips_ls2.alu2_core_unit_code))
+    /* We have a non-pipelined alu instruction in the core,
+       adjust round-robin counter.  */
+    mips_ls2.alu1_turn_p = true;
+
+  if (mips_ls2.alu1_turn_p)
+    {
+      if (state_transition (state, mips_ls2.alu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, mips_ls2.alu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+
+  if (cpu_unit_reservation_p (state, mips_ls2.falu1_core_unit_code))
+    {
+      /* There are no non-pipelined FALU1 insns.  */
+      gcc_unreachable ();
+      mips_ls2.falu1_turn_p = false;
+    }
+
+  if (cpu_unit_reservation_p (state, mips_ls2.falu2_core_unit_code))
+    /* We have a non-pipelined falu instruction in the core,
+       adjust round-robin counter.  */
+    mips_ls2.falu1_turn_p = true;
+
+  if (mips_ls2.falu1_turn_p)
+    {
+      if (state_transition (state, mips_ls2.falu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, mips_ls2.falu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+}
+
+/* Implement TARGET_SCHED_DFA_POST_ADVANCE_CYCLE.
+   This hook is being called at the start of each cycle.  */
+
+static void
+mips_dfa_post_advance_cycle (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    mips_ls2_dfa_post_advance_cycle (curr_state);
+}
+
 /* Implement TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD.  This should
    be as wide as the scheduling freedom in the DFA.  */
 
@@ -9843,6 +9991,9 @@ mips_multipass_dfa_lookahead (void)
   if (TUNE_SB1)
     return 4;
 
+  if (TUNE_LOONGSON_2EF)
+    return 4;
+
   return 0;
 }
 \f
@@ -10103,6 +10254,12 @@ mips_sched_init (FILE *file ATTRIBUTE_UN
   mips_macc_chains_last_hilo = 0;
   vr4130_last_insn = 0;
   mips_74k_agen_init (NULL_RTX);
+
+  /* When scheduling for Loongson2, branch instructions go to ALU1,
+     therefore basic block is most likely to start with round-robin counter
+     pointed to ALU2.  */
+  mips_ls2.alu1_turn_p = false;
+  mips_ls2.falu1_turn_p = true;
 }
 
 /* Implement TARGET_SCHED_REORDER and TARGET_SCHED_REORDER2.  */
@@ -10128,6 +10285,37 @@ mips_sched_reorder (FILE *file ATTRIBUTE
   return mips_issue_rate ();
 }
 
+/* Update round-robin counters for ALU1/2 and FALU1/2.  */
+
+static void
+mips_ls2_variable_issue (rtx insn)
+{
+  if (mips_ls2.alu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.alu1_core_unit_code))
+	mips_ls2.alu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.alu2_core_unit_code))
+	mips_ls2.alu1_turn_p = true;
+    }
+
+  if (mips_ls2.falu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.falu1_core_unit_code))
+	mips_ls2.falu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.falu2_core_unit_code))
+	mips_ls2.falu1_turn_p = true;
+    }
+
+  if (recog_memoized (insn) >= 0)
+    mips_ls2.cycle_has_multi_p |= (get_attr_type (insn) == TYPE_MULTI);
+}
+
 /* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
 
 static int
@@ -10143,7 +10331,20 @@ mips_variable_issue (FILE *file ATTRIBUT
       vr4130_last_insn = insn;
       if (TUNE_74K)
 	mips_74k_agen_init (insn);
+      else if (TUNE_LOONGSON_2EF)
+	mips_ls2_variable_issue (insn);
     }
+
+  if (recog_memoized (insn) >= 0)
+    /* Instructions of type 'multi' should all be split before
+       second scheduling pass.  */
+    {
+      bool multi_p;
+
+      multi_p = (get_attr_type (insn) == TYPE_MULTI);
+      gcc_assert (!multi_p || !reload_completed);
+    }
+
   return more;
 }
 \f
@@ -12894,6 +13095,10 @@ mips_order_regs_for_local_alloc (void)
 #define TARGET_SCHED_ADJUST_COST mips_adjust_cost
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE mips_issue_rate
+#undef TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN
+#define TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN mips_init_dfa_post_cycle_insn
+#undef TARGET_SCHED_DFA_POST_ADVANCE_CYCLE
+#define TARGET_SCHED_DFA_POST_ADVANCE_CYCLE mips_dfa_post_advance_cycle
 #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
 #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
   mips_multipass_dfa_lookahead
--- gcc/config/mips/mips.h	(/local/gcc-3)	(revision 523)
+++ gcc/config/mips/mips.h	(/local/gcc-4)	(revision 523)
@@ -266,6 +266,8 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF1_1  \
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
+#define TUNE_LOONGSON_2EF           (mips_tune == PROCESSOR_LOONGSON_2E	\
+				     || mips_tune == PROCESSOR_LOONGSON_2F)
 
 /* Whether vector modes and intrinsics for ST Microelectronics
    Loongson-2E/2F processors should be enabled.  In o32 pairs of
@@ -910,10 +912,12 @@ enum mips_code_readable_setting {
 				 && !TARGET_MIPS16)
 
 /* Likewise mtc1 and mfc1.  */
-#define ISA_HAS_XFER_DELAY	(mips_isa <= 3)
+#define ISA_HAS_XFER_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* Likewise floating-point comparisons.  */
-#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3)
+#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* True if mflo and mfhi can be immediately followed by instructions
    which write to the HI and LO registers.
@@ -930,7 +934,8 @@ enum mips_code_readable_setting {
 #define ISA_HAS_HILO_INTERLOCKS	(ISA_MIPS32				\
 				 || ISA_MIPS32R2			\
 				 || ISA_MIPS64				\
-				 || TARGET_MIPS5500)
+				 || TARGET_MIPS5500			\
+				 || TARGET_LOONGSON_2EF)
 
 /* ISA includes synci, jr.hb and jalr.hb.  */
 #define ISA_HAS_SYNCI (ISA_MIPS32R2 && !TARGET_MIPS16)
@@ -3251,3 +3256,6 @@ extern const struct mips_cpu_info *mips_
 extern const struct mips_rtx_cost_data *mips_cost;
 extern enum mips_code_readable_setting mips_code_readable;
 #endif
+
+/* Enable querying of DFA units.  */
+#define CPU_UNITS_QUERY 1

Property changes on: 
___________________________________________________________________
Name: svk:merge
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:510
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-2:516
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-3:520
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:481
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
       [not found]       ` <48515794.7050007@codesourcery.com>
@ 2008-06-12 17:21         ` Maxim Kuvyrkov
  2008-06-12 18:43           ` Richard Sandiford
  2008-06-12 18:06         ` Richard Sandiford
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-12 17:21 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 447 bytes --]

Maxim Kuvyrkov wrote:

...

> I'm attaching a reduced testcase, dumps and a full patch against recent 
> trunk that is needed to reproduce this error.  I tried to reproduce the 
> failure on other MIPS architectures and didn't prevail.

In the previous message I've attached a patch against wrong branch, 
sorry.  Here is a correct one.

I also forgot to mention that I will gladly provide support in 
investigating this issue further.

--
Maxim


[-- Attachment #2: fsf-ls2ef-1234.patch --]
[-- Type: text/plain, Size: 190986 bytes --]

--- gcc/doc/extend.texi	(/local/gcc-trunk)	(revision 525)
+++ gcc/doc/extend.texi	(/local/gcc-4)	(revision 525)
@@ -6788,6 +6788,7 @@ instructions, but allow the compiler to 
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
 * PowerPC AltiVec Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
@@ -8667,6 +8668,132 @@ value is the upper one.  The opposite or
 For example, the code above will set the lower half of @code{a} to
 @code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
+
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
+
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
+
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
+
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+@end smallexample
+
 @menu
 * Paired-Single Arithmetic::
 * Paired-Single Built-in Functions::
--- gcc/testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-trunk)	(revision 525)
+++ gcc/testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-4)	(revision 525)
@@ -0,0 +1,1963 @@
+/* Test cases for ST Microelectronics Loongson-2E/2F SIMD intrinsics.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target mips_loongson } */
+
+#include "loongson.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+
+typedef union { int32x2_t v; int32_t a[2]; } int32x2_encap_t;
+typedef union { int16x4_t v; int16_t a[4]; } int16x4_encap_t;
+typedef union { int8x8_t v; int8_t a[8]; } int8x8_encap_t;
+typedef union { uint32x2_t v; uint32_t a[2]; } uint32x2_encap_t;
+typedef union { uint16x4_t v; uint16_t a[4]; } uint16x4_encap_t;
+typedef union { uint8x8_t v; uint8_t a[8]; } uint8x8_encap_t;
+
+#define UINT16x4_MAX USHRT_MAX
+#define UINT8x8_MAX UCHAR_MAX
+#define INT8x8_MAX SCHAR_MAX
+#define INT16x4_MAX SHRT_MAX
+#define INT32x2_MAX INT_MAX
+
+static void test_packsswh (void)
+{
+  int32x2_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = INT16x4_MAX - 2;
+  s.a[1] = INT16x4_MAX - 1;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX + 1;
+  r.v = packsswh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 2);
+  assert (r.a[1] == INT16x4_MAX - 1);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_packsshb (void)
+{
+  int16x4_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = INT8x8_MAX - 6;
+  s.a[1] = INT8x8_MAX - 5;
+  s.a[2] = INT8x8_MAX - 4;
+  s.a[3] = INT8x8_MAX - 3;
+  t.a[0] = INT8x8_MAX - 2;
+  t.a[1] = INT8x8_MAX - 1;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX + 1;
+  r.v = packsshb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_packushb (void)
+{
+  uint16x4_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = UINT8x8_MAX - 6;
+  s.a[1] = UINT8x8_MAX - 5;
+  s.a[2] = UINT8x8_MAX - 4;
+  s.a[3] = UINT8x8_MAX - 3;
+  t.a[0] = UINT8x8_MAX - 2;
+  t.a[1] = UINT8x8_MAX - 1;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX + 1;
+  r.v = packushb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX - 6);
+  assert (r.a[1] == UINT8x8_MAX - 5);
+  assert (r.a[2] == UINT8x8_MAX - 4);
+  assert (r.a[3] == UINT8x8_MAX - 3);
+  assert (r.a[4] == UINT8x8_MAX - 2);
+  assert (r.a[5] == UINT8x8_MAX - 1);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_paddw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 6);
+}
+
+static void test_paddw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_paddh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = paddh_u (s.v, t.v);
+  assert (r.a[0] == 6);
+  assert (r.a[1] == 8);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 12);
+}
+
+static void test_paddh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = paddh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+}
+
+static void test_paddb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 5;
+  s.a[5] = 6;
+  s.a[6] = 7;
+  s.a[7] = 8;
+  t.a[0] = 9;
+  t.a[1] = 10;
+  t.a[2] = 11;
+  t.a[3] = 12;
+  t.a[4] = 13;
+  t.a[5] = 14;
+  t.a[6] = 15;
+  t.a[7] = 16;
+  r.v = paddb_u (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 12);
+  assert (r.a[2] == 14);
+  assert (r.a[3] == 16);
+  assert (r.a[4] == 18);
+  assert (r.a[5] == 20);
+  assert (r.a[6] == 22);
+  assert (r.a[7] == 24);
+}
+
+static void test_paddb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = paddb_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+  assert (r.a[4] == -45);
+  assert (r.a[5] == -54);
+  assert (r.a[6] == -63);
+  assert (r.a[7] == -72);
+}
+
+static void test_paddd_u (void)
+{
+  uint64_t d = 123456;
+  uint64_t e = 789012;
+  uint64_t r;
+  r = paddd_u (d, e);
+  assert (r == 912468);
+}
+
+static void test_paddd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = paddd_s (d, e);
+  assert (r == -665556);
+}
+
+static void test_paddsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX;
+  t.a[2] = INT16x4_MAX;
+  t.a[3] = INT16x4_MAX;
+  r.v = paddsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_paddsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = INT8x8_MAX;
+  t.a[1] = INT8x8_MAX;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX;
+  t.a[4] = INT8x8_MAX;
+  t.a[5] = INT8x8_MAX;
+  t.a[6] = INT8x8_MAX;
+  t.a[7] = INT8x8_MAX;
+  r.v = paddsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_paddush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  t.a[0] = UINT16x4_MAX;
+  t.a[1] = UINT16x4_MAX;
+  t.a[2] = UINT16x4_MAX;
+  t.a[3] = UINT16x4_MAX;
+  r.v = paddush (s.v, t.v);
+  assert (r.a[0] == UINT16x4_MAX);
+  assert (r.a[1] == UINT16x4_MAX);
+  assert (r.a[2] == UINT16x4_MAX);
+  assert (r.a[3] == UINT16x4_MAX);
+}
+
+static void test_paddusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  s.a[4] = 0;
+  s.a[5] = 1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = UINT8x8_MAX;
+  t.a[1] = UINT8x8_MAX;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX;
+  t.a[4] = UINT8x8_MAX;
+  t.a[5] = UINT8x8_MAX;
+  t.a[6] = UINT8x8_MAX;
+  t.a[7] = UINT8x8_MAX;
+  r.v = paddusb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX);
+  assert (r.a[1] == UINT8x8_MAX);
+  assert (r.a[2] == UINT8x8_MAX);
+  assert (r.a[3] == UINT8x8_MAX);
+  assert (r.a[4] == UINT8x8_MAX);
+  assert (r.a[5] == UINT8x8_MAX);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_pandn_ud (void)
+{
+  uint64_t d1 = 0x0000ffff0000ffffull;
+  uint64_t d2 = 0x0000ffff0000ffffull;
+  uint64_t r;
+  r = pandn_ud (d1, d2);
+  assert (r == 0);
+}
+
+static void test_pandn_sd (void)
+{
+  int64_t d1 = (int64_t) 0x0000000000000000ull;
+  int64_t d2 = (int64_t) 0xfffffffffffffffeull;
+  int64_t r;
+  r = pandn_sd (d1, d2);
+  assert (r == -2);
+}
+
+static void test_pandn_uw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0x00000000;
+  t.a[1] = 0xffffffff;
+  r.v = pandn_uw (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pandn_sw (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0xffffffff;
+  t.a[1] = 0xfffffffe;
+  r.v = pandn_sw (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+}
+
+static void test_pandn_uh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0x0000;
+  t.a[1] = 0xffff;
+  t.a[2] = 0x0000;
+  t.a[3] = 0xffff;
+  r.v = pandn_uh (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pandn_sh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0xffff;
+  t.a[1] = 0xfffe;
+  t.a[2] = 0xffff;
+  t.a[3] = 0xfffe;
+  r.v = pandn_sh (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+}
+
+static void test_pandn_ub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0x00;
+  t.a[1] = 0xff;
+  t.a[2] = 0x00;
+  t.a[3] = 0xff;
+  t.a[4] = 0x00;
+  t.a[5] = 0xff;
+  t.a[6] = 0x00;
+  t.a[7] = 0xff;
+  r.v = pandn_ub (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pandn_sb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0xff;
+  t.a[1] = 0xfe;
+  t.a[2] = 0xff;
+  t.a[3] = 0xfe;
+  t.a[4] = 0xff;
+  t.a[5] = 0xfe;
+  t.a[6] = 0xff;
+  t.a[7] = 0xfe;
+  r.v = pandn_sb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -2);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -2);
+}
+
+static void test_pavgh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = pavgh (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+}
+
+static void test_pavgb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 1;
+  s.a[5] = 2;
+  s.a[6] = 3;
+  s.a[7] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = pavgb (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+  assert (r.a[4] == 3);
+  assert (r.a[5] == 4);
+  assert (r.a[6] == 5);
+  assert (r.a[7] == 6);
+}
+
+static void test_pcmpeqw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  r.v = pcmpeqw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpeqh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  r.v = pcmpeqh_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpeqb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 42;
+  s.a[5] = 43;
+  s.a[6] = 42;
+  s.a[7] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  t.a[4] = 43;
+  t.a[5] = 43;
+  t.a[6] = 43;
+  t.a[7] = 43;
+  r.v = pcmpeqb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpeqw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  r.v = pcmpeqw_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+}
+
+static void test_pcmpeqh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  r.v = pcmpeqh_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpeqb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = -42;
+  s.a[5] = -42;
+  s.a[6] = -42;
+  s.a[7] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = -42;
+  t.a[6] = 42;
+  t.a[7] = -42;
+  r.v = pcmpeqb_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -1);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -1);
+}
+
+static void test_pcmpgtw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 42;
+  r.v = pcmpgtw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpgth_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 40;
+  t.a[1] = 41;
+  t.a[2] = 43;
+  t.a[3] = 42;
+  r.v = pcmpgth_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0x0000);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpgtb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 44;
+  s.a[5] = 45;
+  s.a[6] = 46;
+  s.a[7] = 47;
+  t.a[0] = 48;
+  t.a[1] = 47;
+  t.a[2] = 46;
+  t.a[3] = 45;
+  t.a[4] = 44;
+  t.a[5] = 43;
+  t.a[6] = 42;
+  t.a[7] = 41;
+  r.v = pcmpgtb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0x00);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0x00);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0xff);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpgtw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = -42;
+  t.a[0] = -42;
+  t.a[1] = -42;
+  r.v = pcmpgtw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 0);
+}
+
+static void test_pcmpgth_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = 43;
+  t.a[2] = 44;
+  t.a[3] = -43;
+  r.v = pcmpgth_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpgtb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = 42;
+  s.a[5] = 42;
+  s.a[6] = 42;
+  s.a[7] = 42;
+  t.a[0] = -45;
+  t.a[1] = -44;
+  t.a[2] = -43;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = 43;
+  t.a[6] = 41;
+  t.a[7] = 40;
+  r.v = pcmpgtb_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == -1);
+  assert (r.a[7] == -1);
+}
+
+static void test_pextrh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  r.v = pextrh_u (s.v, 1);
+  assert (r.a[0] == 41);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pextrh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -40;
+  s.a[1] = -41;
+  s.a[2] = -42;
+  s.a[3] = -43;
+  r.v = pextrh_s (s.v, 2);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pinsrh_0123_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_u (t.v, s.v);
+  r.v = pinsrh_1_u (r.v, s.v);
+  r.v = pinsrh_2_u (r.v, s.v);
+  r.v = pinsrh_3_u (r.v, s.v);
+  assert (r.a[0] == 42);
+  assert (r.a[1] == 42);
+  assert (r.a[2] == 42);
+  assert (r.a[3] == 42);
+}
+
+static void test_pinsrh_0123_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_s (t.v, s.v);
+  r.v = pinsrh_1_s (r.v, s.v);
+  r.v = pinsrh_2_s (r.v, s.v);
+  r.v = pinsrh_3_s (r.v, s.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == -42);
+  assert (r.a[2] == -42);
+  assert (r.a[3] == -42);
+}
+
+static void test_pmaddhw (void)
+{
+  int16x4_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -5;
+  s.a[1] = -4;
+  s.a[2] = -3;
+  s.a[3] = -2;
+  t.a[0] = 10;
+  t.a[1] = 11;
+  t.a[2] = 12;
+  t.a[3] = 13;
+  r.v = pmaddhw (s.v, t.v);
+  assert (r.a[0] == (-5*10 + -4*11));
+  assert (r.a[1] == (-3*12 + -2*13));
+}
+
+static void test_pmaxsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pmaxsh (s.v, t.v);
+  assert (r.a[0] == 20);
+  assert (r.a[1] == 40);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 50);
+}
+
+static void test_pmaxub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pmaxub (s.v, t.v);
+  assert (r.a[0] == 80);
+  assert (r.a[1] == 70);
+  assert (r.a[2] == 60);
+  assert (r.a[3] == 50);
+  assert (r.a[4] == 50);
+  assert (r.a[5] == 60);
+  assert (r.a[6] == 70);
+  assert (r.a[7] == 80);
+}
+
+static void test_pminsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pminsh (s.v, t.v);
+  assert (r.a[0] == -20);
+  assert (r.a[1] == -40);
+  assert (r.a[2] == -10);
+  assert (r.a[3] == -50);
+}
+
+static void test_pminub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pminub (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 20);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 40);
+  assert (r.a[4] == 40);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 20);
+  assert (r.a[7] == 10);
+}
+
+static void test_pmovmskb_u (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_encap_t r;
+  s.a[0] = 0xf0;
+  s.a[1] = 0x40;
+  s.a[2] = 0xf0;
+  s.a[3] = 0x40;
+  s.a[4] = 0xf0;
+  s.a[5] = 0x40;
+  s.a[6] = 0xf0;
+  s.a[7] = 0x40;
+  r.v = pmovmskb_u (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmovmskb_s (void)
+{
+  int8x8_encap_t s;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 1;
+  s.a[2] = -1;
+  s.a[3] = 1;
+  s.a[4] = -1;
+  s.a[5] = 1;
+  s.a[6] = -1;
+  s.a[7] = 1;
+  r.v = pmovmskb_s (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmulhuh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xff00;
+  s.a[1] = 0xff00;
+  s.a[2] = 0xff00;
+  s.a[3] = 0xff00;
+  t.a[0] = 16;
+  t.a[1] = 16;
+  t.a[2] = 16;
+  t.a[3] = 16;
+  r.v = pmulhuh (s.v, t.v);
+  assert (r.a[0] == 0x000f);
+  assert (r.a[1] == 0x000f);
+  assert (r.a[2] == 0x000f);
+  assert (r.a[3] == 0x000f);
+}
+
+static void test_pmulhh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmulhh (s.v, t.v);
+  assert (r.a[0] == -16);
+  assert (r.a[1] == -16);
+  assert (r.a[2] == -16);
+  assert (r.a[3] == -16);
+}
+
+static void test_pmullh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmullh (s.v, t.v);
+  assert (r.a[0] == 4096);
+  assert (r.a[1] == 4096);
+  assert (r.a[2] == 4096);
+  assert (r.a[3] == 4096);
+}
+
+static void test_pmuluw (void)
+{
+  uint32x2_encap_t s, t;
+  uint64_t r;
+  s.a[0] = 0xdeadbeef;
+  s.a[1] = 0;
+  t.a[0] = 0x0f00baaa;
+  t.a[1] = 0;
+  r = pmuluw (s.v, t.v);
+  assert (r == 0xd0cd08e1d1a70b6ull);
+}
+
+static void test_pasubub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pasubub (s.v, t.v);
+  assert (r.a[0] == 70);
+  assert (r.a[1] == 50);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 10);
+  assert (r.a[4] == 10);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 50);
+  assert (r.a[7] == 70);
+}
+
+static void test_biadd (void)
+{
+  uint8x8_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  r.v = biadd (s.v);
+  assert (r.a[0] == 360);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psadbh (void)
+{
+  uint8x8_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = psadbh (s.v, t.v);
+  assert (r.a[0] == 0x0140);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pshufh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_u (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_pshufh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 2;
+  s.a[2] = -3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_s (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+}
+
+static void test_psllh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0xffff;
+  s.a[2] = 0xffff;
+  s.a[3] = 0xffff;
+  r.v = psllh_u (s.v, 1);
+  assert (r.a[0] == 0xfffe);
+  assert (r.a[1] == 0xfffe);
+  assert (r.a[2] == 0xfffe);
+  assert (r.a[3] == 0xfffe);
+}
+
+static void test_psllw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0xffffffff;
+  r.v = psllw_u (s.v, 2);
+  assert (r.a[0] == 0xfffffffc);
+  assert (r.a[1] == 0xfffffffc);
+}
+
+static void test_psllh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psllh_s (s.v, 1);
+  assert (r.a[0] == -2);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == -2);
+  assert (r.a[3] == -2);
+}
+
+static void test_psllw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psllw_s (s.v, 2);
+  assert (r.a[0] == -4);
+  assert (r.a[1] == -4);
+}
+
+static void test_psrah_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrah_u (s.v, 1);
+  assert (r.a[0] == 0xfff7);
+  assert (r.a[1] == 0xfff7);
+  assert (r.a[2] == 0xfff7);
+  assert (r.a[3] == 0xfff7);
+}
+
+static void test_psraw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psraw_u (s.v, 1);
+  assert (r.a[0] == 0xfffffff7);
+  assert (r.a[1] == 0xfffffff7);
+}
+
+static void test_psrah_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s.a[2] = -2;
+  s.a[3] = -2;
+  r.v = psrah_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == -1);
+}
+
+static void test_psraw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  r.v = psraw_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+}
+
+static void test_psrlh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrlh_u (s.v, 1);
+  assert (r.a[0] == 0x7ff7);
+  assert (r.a[1] == 0x7ff7);
+  assert (r.a[2] == 0x7ff7);
+  assert (r.a[3] == 0x7ff7);
+}
+
+static void test_psrlw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psrlw_u (s.v, 1);
+  assert (r.a[0] == 0x7ffffff7);
+  assert (r.a[1] == 0x7ffffff7);
+}
+
+static void test_psrlh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psrlh_s (s.v, 1);
+  assert (r.a[0] == INT16x4_MAX);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psrlw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psrlw_s (s.v, 1);
+  assert (r.a[0] == INT32x2_MAX);
+  assert (r.a[1] == INT32x2_MAX);
+}
+
+static void test_psubw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 3;
+  s.a[1] = 4;
+  t.a[0] = 2;
+  t.a[1] = 1;
+  r.v = psubw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = -4;
+  r.v = psubw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 5;
+  s.a[1] = 6;
+  s.a[2] = 7;
+  s.a[3] = 8;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 4);
+  assert (r.a[3] == 4);
+}
+
+static void test_psubh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+}
+
+static void test_psubb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 11;
+  s.a[2] = 12;
+  s.a[3] = 13;
+  s.a[4] = 14;
+  s.a[5] = 15;
+  s.a[6] = 16;
+  s.a[7] = 17;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 9);
+  assert (r.a[2] == 9);
+  assert (r.a[3] == 9);
+  assert (r.a[4] == 9);
+  assert (r.a[5] == 9);
+  assert (r.a[6] == 9);
+  assert (r.a[7] == 9);
+}
+
+static void test_psubb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+  assert (r.a[4] == -55);
+  assert (r.a[5] == -66);
+  assert (r.a[6] == -77);
+  assert (r.a[7] == -88);
+}
+
+static void test_psubd_u (void)
+{
+  uint64_t d = 789012;
+  uint64_t e = 123456;
+  uint64_t r;
+  r = psubd_u (d, e);
+  assert (r == 665556);
+}
+
+static void test_psubd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = psubd_s (d, e);
+  assert (r == 912468);
+}
+
+static void test_psubsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = -INT16x4_MAX;
+  t.a[1] = -INT16x4_MAX;
+  t.a[2] = -INT16x4_MAX;
+  t.a[3] = -INT16x4_MAX;
+  r.v = psubsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psubsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = -INT8x8_MAX;
+  t.a[1] = -INT8x8_MAX;
+  t.a[2] = -INT8x8_MAX;
+  t.a[3] = -INT8x8_MAX;
+  t.a[4] = -INT8x8_MAX;
+  t.a[5] = -INT8x8_MAX;
+  t.a[6] = -INT8x8_MAX;
+  t.a[7] = -INT8x8_MAX;
+  r.v = psubsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_psubush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  r.v = psubush (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psubusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  s.a[4] = 4;
+  s.a[5] = 5;
+  s.a[6] = 6;
+  s.a[7] = 7;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  t.a[4] = 5;
+  t.a[5] = 5;
+  t.a[6] = 7;
+  t.a[7] = 7;
+  r.v = psubusb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_punpckhbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == -11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == -13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == -15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == 11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == 13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == 15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhhw_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpckhhw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == -6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhhw_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpckhhw_u (s.v, t.v);
+  assert (r.a[0] == 5);
+  assert (r.a[1] == 6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhwd_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = -4;
+  r.v = punpckhwd_s (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == -4);
+}
+
+static void test_punpckhwd_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpckhwd_u (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+}
+
+static void test_punpcklbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == -5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == -7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == 5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == 7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklhw_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpcklhw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklhw_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpcklhw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklwd_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  r.v = punpcklwd_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == -2);
+}
+
+static void test_punpcklwd_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpcklwd_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+}
+
+int main (void)
+{
+  test_packsswh ();
+  test_packsshb ();
+  test_packushb ();
+  test_paddw_u ();
+  test_paddw_s ();
+  test_paddh_u ();
+  test_paddh_s ();
+  test_paddb_u ();
+  test_paddb_s ();
+  test_paddd_u ();
+  test_paddd_s ();
+  test_paddsh ();
+  test_paddsb ();
+  test_paddush ();
+  test_paddusb ();
+  test_pandn_ud ();
+  test_pandn_sd ();
+  test_pandn_uw ();
+  test_pandn_sw ();
+  test_pandn_uh ();
+  test_pandn_sh ();
+  test_pandn_ub ();
+  test_pandn_sb ();
+  test_pavgh ();
+  test_pavgb ();
+  test_pcmpeqw_u ();
+  test_pcmpeqh_u ();
+  test_pcmpeqb_u ();
+  test_pcmpeqw_s ();
+  test_pcmpeqh_s ();
+  test_pcmpeqb_s ();
+  test_pcmpgtw_u ();
+  test_pcmpgth_u ();
+  test_pcmpgtb_u ();
+  test_pcmpgtw_s ();
+  test_pcmpgth_s ();
+  test_pcmpgtb_s ();
+  test_pextrh_u ();
+  test_pextrh_s ();
+  test_pinsrh_0123_u ();
+  test_pinsrh_0123_s ();
+  test_pmaddhw ();
+  test_pmaxsh ();
+  test_pmaxub ();
+  test_pminsh ();
+  test_pminub ();
+  test_pmovmskb_u ();
+  test_pmovmskb_s ();
+  test_pmulhuh ();
+  test_pmulhh ();
+  test_pmullh ();
+  test_pmuluw ();
+  test_pasubub ();
+  test_biadd ();
+  test_psadbh ();
+  test_pshufh_u ();
+  test_pshufh_s ();
+  test_psllh_u ();
+  test_psllw_u ();
+  test_psllh_s ();
+  test_psllw_s ();
+  test_psrah_u ();
+  test_psraw_u ();
+  test_psrah_s ();
+  test_psraw_s ();
+  test_psrlh_u ();
+  test_psrlw_u ();
+  test_psrlh_s ();
+  test_psrlw_s ();
+  test_psubw_u ();
+  test_psubw_s ();
+  test_psubh_u ();
+  test_psubh_s ();
+  test_psubb_u ();
+  test_psubb_s ();
+  test_psubd_u ();
+  test_psubd_s ();
+  test_psubsh ();
+  test_psubsb ();
+  test_psubush ();
+  test_psubusb ();
+  test_punpckhbh_s ();
+  test_punpckhbh_u ();
+  test_punpckhhw_s ();
+  test_punpckhhw_u ();
+  test_punpckhwd_s ();
+  test_punpckhwd_u ();
+  test_punpcklbh_s ();
+  test_punpcklbh_u ();
+  test_punpcklhw_s ();
+  test_punpcklhw_u ();
+  test_punpcklwd_s ();
+  test_punpcklwd_u ();
+  return 0;
+}
--- gcc/testsuite/lib/target-supports.exp	(/local/gcc-trunk)	(revision 525)
+++ gcc/testsuite/lib/target-supports.exp	(/local/gcc-4)	(revision 525)
@@ -1249,6 +1249,17 @@ proc check_effective_target_arm_neon_hw 
     } "-mfpu=neon -mfloat-abi=softfp"]
 }
 
+# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+# the Loongson vector modes.
+
+proc check_effective_target_mips_loongson { } {
+    return [check_no_compiler_messages loongson assembly {
+	#if !defined(__mips_loongson_vector_rev)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {
--- gcc/config.gcc	(/local/gcc-trunk)	(revision 525)
+++ gcc/config.gcc	(/local/gcc-4)	(revision 525)
@@ -307,6 +307,7 @@ m68k-*-*)
 mips*-*-*)
 	cpu_type=mips
 	need_64bit_hwint=yes
+	extra_headers="loongson.h"
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
--- gcc/config/mips/loongson.md	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/loongson.md	(/local/gcc-4)	(revision 525)
@@ -0,0 +1,475 @@
+;; Machine description for ST Microelectronics Loongson-2E/2F.
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Mode iterators and attributes.
+
+;; 64-bit vectors of bytes.
+(define_mode_iterator VB [V8QI])
+
+;; 64-bit vectors of halfwords.
+(define_mode_iterator VH [V4HI])
+
+;; 64-bit vectors of words.
+(define_mode_iterator VW [V2SI])
+
+;; 64-bit vectors of halfwords and bytes.
+(define_mode_iterator VHB [V4HI V8QI])
+
+;; 64-bit vectors of words and halfwords.
+(define_mode_iterator VWH [V2SI V4HI])
+
+;; 64-bit vectors of words, halfwords and bytes.
+(define_mode_iterator VWHB [V2SI V4HI V8QI])
+
+;; 64-bit vectors of words, halfwords and bytes; and DImode.
+(define_mode_iterator VWHBDI [V2SI V4HI V8QI DI])
+
+;; The Loongson instruction suffixes corresponding to the modes in the
+;; VWHBDI iterator.
+(define_mode_attr V_suffix [(V2SI "w") (V4HI "h") (V8QI "b") (DI "d")])
+
+;; Given a vector type T, the mode of a vector half the size of T
+;; and with the same number of elements.
+(define_mode_attr V_squash [(V2SI "V2HI") (V4HI "V4QI")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with half as many elements.
+(define_mode_attr V_stretch_half [(V2SI "DI") (V4HI "V2SI") (V8QI "V4HI")])
+
+;; The Loongson instruction suffixes corresponding to the transformation
+;; expressed by V_stretch_half.
+(define_mode_attr V_stretch_half_suffix [(V2SI "wd") (V4HI "hw") (V8QI "bh")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with twice as many elements.
+(define_mode_attr V_squash_double [(V2SI "V4HI") (V4HI "V8QI")])
+
+;; The Loongson instruction suffixes corresponding to the conversions
+;; specified by V_half_width.
+(define_mode_attr V_squash_double_suffix [(V2SI "wh") (V4HI "hb")])
+
+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0)
+	(match_operand:VWHB 1))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})
+
+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,  d,  m,  d")
+	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  { return mips_output_move (operands[0], operands[1]); }
+  [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
+   (set_attr "mode" "DI")])
+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand")
+	(match_operand 1 ""))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+{
+  mips_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
+
+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 1 "register_operand" "f"))
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 2 "register_operand" "f"))))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "packss<V_squash_double_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Pack with unsigned saturation.
+(define_insn "vec_pack_usat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 1 "register_operand" "f"))
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 2 "register_operand" "f"))))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "packus<V_squash_double_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Addition, treating overflow by wraparound.
+(define_insn "add<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (plus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		   (match_operand:VWHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "padd<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Addition of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+;; We use 'unspec' instead of 'plus' here to avoid clash with
+;; mips.md::add<mode>3.  If 'plus' was used, then such instruction
+;; would be recognized as adddi3 and reload would make it use
+;; GPRs instead of FPRs.
+(define_insn "loongson_paddd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:DI 1 "register_operand" "f")
+		    (match_operand:DI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PADDD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "paddd\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Addition, treating overflow by signed saturation.
+(define_insn "ssadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "padds<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Addition, treating overflow by unsigned saturation.
+(define_insn "usadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "paddus<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Logical AND NOT.
+(define_insn "loongson_pandn_<V_suffix>"
+  [(set (match_operand:VWHBDI 0 "register_operand" "=f")
+        (and:VWHBDI
+	 (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
+	 (match_operand:VWHBDI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pandn\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Average.
+(define_insn "loongson_pavg<V_suffix>"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (unspec:VHB [(match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")]
+		    UNSPEC_LOONGSON_PAVG))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pavg<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Equality test.
+(define_insn "loongson_pcmpeq<V_suffix>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PCMPEQ))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pcmpeq<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Greater-than test.
+(define_insn "loongson_pcmpgt<V_suffix>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PCMPGT))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pcmpgt<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Extract halfword.
+(define_insn "loongson_pextr<V_suffix>"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+ 		    (match_operand:SI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PEXTR))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pextr<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Insert halfword.
+(define_insn "loongson_pinsr<V_suffix>_0"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_0))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_0\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
+
+(define_insn "loongson_pinsr<V_suffix>_1"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_1))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_1\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
+
+(define_insn "loongson_pinsr<V_suffix>_2"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_2))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_2\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
+
+(define_insn "loongson_pinsr<V_suffix>_3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_3))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_3\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
+
+;; Multiply and add packed integers.
+(define_insn "loongson_pmadd<V_stretch_half_suffix>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VH 1 "register_operand" "f")
+				  (match_operand:VH 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PMADD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Maximum of signed halfwords.
+(define_insn "smax<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smax:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmaxs<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Maximum of unsigned bytes.
+(define_insn "umax<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umax:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmaxu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Minimum of signed halfwords.
+(define_insn "smin<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smin:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmins<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Minimum of unsigned bytes.
+(define_insn "umin<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umin:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pminu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Move byte mask.
+(define_insn "loongson_pmovmsk<V_suffix>"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMOVMSK))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmovmsk<V_suffix>\t%0,%1"
+  [(set_attr "type" "fabs")])
+
+;; Multiply unsigned integers and store high result.
+(define_insn "umul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULHU))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulhu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Multiply signed integers and store high result.
+(define_insn "smul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulh<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Multiply signed integers and store low result.
+(define_insn "loongson_pmull<V_suffix>"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULL))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmull<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Multiply unsigned word integers.
+(define_insn "loongson_pmulu<V_suffix>"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:VW 1 "register_operand" "f")
+		    (match_operand:VW 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULU))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulu<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Absolute difference.
+(define_insn "loongson_pasubub"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")
+		    (match_operand:VB 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PASUBUB))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pasubub\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Sum of unsigned byte integers.
+(define_insn "reduc_uplus_<mode>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")]
+				 UNSPEC_LOONGSON_BIADD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "biadd\t%0,%1"
+  [(set_attr "type" "fabs")])
+
+;; Sum of absolute differences.
+(define_insn "loongson_psadbh"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")
+				  (match_operand:VB 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PSADBH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pasubub\t%0,%1,%2;biadd\t%0,%0"
+  [(set_attr "type" "fadd")])
+
+;; Shuffle halfwords.
+(define_insn "loongson_pshufh"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "0")
+		    (match_operand:VH 2 "register_operand" "f")
+		    (match_operand:SI 3 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSHUFH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pshufh\t%0,%2,%3"
+  [(set_attr "type" "fmul")])
+
+;; Shift left logical.
+(define_insn "loongson_psll<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashift:VWH (match_operand:VWH 1 "register_operand" "f")
+		    (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psll<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Shift right arithmetic.
+(define_insn "loongson_psra<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psra<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
+
+;; Shift right logical.
+(define_insn "loongson_psrl<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (lshiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psrl<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
+
+;; Subtraction, treating overflow by wraparound.
+(define_insn "sub<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (minus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		    (match_operand:VWHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psub<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Subtraction of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+;; See loongson_paddd for the reason we use 'unspec' rather than
+;; 'minus' here.
+(define_insn "loongson_psubd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:DI 1 "register_operand" "f")
+		    (match_operand:DI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSUBD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubd\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Subtraction, treating overflow by signed saturation.
+(define_insn "sssub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubs<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Subtraction, treating overflow by unsigned saturation.
+(define_insn "ussub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubus<V_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fadd")])
+
+;; Unpack high data.
+(define_insn "vec_interleave_high<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PUNPCKH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
+
+;; Unpack low data.
+(define_insn "vec_interleave_low<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PUNPCKL))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2"
+  [(set_attr "type" "fdiv")])
--- gcc/config/mips/loongson2ef.md	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/loongson2ef.md	(/local/gcc-4)	(revision 525)
@@ -0,0 +1,247 @@
+;; Pipeline model for ST Microelectronics Loongson-2E/2F cores.
+
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Automaton for integer instructions.
+(define_automaton "ls2_alu")
+
+;; ALU1 and ALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_alu1_core,ls2_alu2_core" "ls2_alu")
+
+;; Pseudo units to help modeling of ALU1/2 round-robin dispatch strategy.
+(define_cpu_unit "ls2_alu1_turn,ls2_alu2_turn" "ls2_alu")
+
+;; Pseudo units to enable/disable ls2_alu[12]_turn units.
+;; ls2_alu[12]_turn unit can be subscribed only after ls2_alu[12]_turn_enabled
+;; unit is subscribed.
+(define_cpu_unit "ls2_alu1_turn_enabled,ls2_alu2_turn_enabled" "ls2_alu")
+(presence_set "ls2_alu1_turn" "ls2_alu1_turn_enabled")
+(presence_set "ls2_alu2_turn" "ls2_alu2_turn_enabled")
+
+;; Reservations for ALU1 (ALU2) instructions.
+;; Instruction goes to ALU1 (ALU2) and makes next ALU1/2 instruction to
+;; be dispatched to ALU2 (ALU1).
+(define_reservation "ls2_alu1"
+  "(ls2_alu1_core+ls2_alu2_turn_enabled)|ls2_alu1_core")
+(define_reservation "ls2_alu2"
+  "(ls2_alu2_core+ls2_alu1_turn_enabled)|ls2_alu2_core")
+
+;; Reservation for ALU1/2 instructions.
+;; Instruction will go to ALU1 iff ls2_alu1_turn_enabled is subscribed and
+;; switch the turn to ALU2 by subscribing ls2_alu2_turn_enabled.
+;; Or to ALU2 otherwise.
+(define_reservation "ls2_alu"
+  "(ls2_alu1_core+ls2_alu1_turn+ls2_alu2_turn_enabled)
+   |(ls2_alu1_core+ls2_alu1_turn)
+   |(ls2_alu2_core+ls2_alu2_turn+ls2_alu1_turn_enabled)
+   |(ls2_alu2_core+ls2_alu2_turn)")
+
+;; Automaton for floating-point instructions.
+(define_automaton "ls2_falu")
+
+;; FALU1 and FALU2.
+;; We need to query these units to adjust round-robin counter.
+(define_query_cpu_unit "ls2_falu1_core,ls2_falu2_core" "ls2_falu")
+
+;; Pseudo units to help modeling of FALU1/2 round-robin dispatch strategy.
+(define_cpu_unit "ls2_falu1_turn,ls2_falu2_turn" "ls2_falu")
+
+;; Pseudo units to enable/disable ls2_falu[12]_turn units.
+;; ls2_falu[12]_turn unit can be subscribed only after
+;; ls2_falu[12]_turn_enabled unit is subscribed.
+(define_cpu_unit "ls2_falu1_turn_enabled,ls2_falu2_turn_enabled" "ls2_falu")
+(presence_set "ls2_falu1_turn" "ls2_falu1_turn_enabled")
+(presence_set "ls2_falu2_turn" "ls2_falu2_turn_enabled")
+
+;; Reservations for FALU1 (FALU2) instructions.
+;; Instruction goes to FALU1 (FALU2) and makes next FALU1/2 instruction to
+;; be dispatched to FALU2 (FALU1).
+(define_reservation "ls2_falu1"
+  "(ls2_falu1_core+ls2_falu2_turn_enabled)|ls2_falu1_core")
+(define_reservation "ls2_falu2"
+  "(ls2_falu2_core+ls2_falu1_turn_enabled)|ls2_falu2_core")
+
+;; Reservation for FALU1/2 instructions.
+;; Instruction will go to FALU1 iff ls2_falu1_turn_enabled is subscribed and
+;; switch the turn to FALU2 by subscribing ls2_falu2_turn_enabled.
+;; Or to FALU2 otherwise.
+(define_reservation "ls2_falu"
+  "(ls2_falu1+ls2_falu1_turn+ls2_falu2_turn_enabled)
+   |(ls2_falu1+ls2_falu1_turn)
+   |(ls2_falu2+ls2_falu2_turn+ls2_falu1_turn_enabled)
+   |(ls2_falu2+ls2_falu2_turn)")
+
+;; The following 4 instructions each subscribe one of
+;; ls2_[f]alu{1,2}_turn_enabled units according to this attribute.
+;; These instructions are used in mips.c: sched_ls2_dfa_post_advance_cycle.
+
+(define_attr "ls2_turn_type" "alu1,alu2,falu1,falu2,unknown"
+  (const_string "unknown"))
+
+;; Subscribe ls2_alu1_turn_enabled.
+(define_insn "ls2_alu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "alu1")])
+
+(define_insn_reservation "ls2_alu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu1")
+  "ls2_alu1_turn_enabled")
+
+;; Subscribe ls2_alu2_turn_enabled.
+(define_insn "ls2_alu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "alu2")])
+
+(define_insn_reservation "ls2_alu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "alu2")
+  "ls2_alu2_turn_enabled")
+
+;; Subscribe ls2_falu1_turn_enabled.
+(define_insn "ls2_falu1_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "falu1")])
+
+(define_insn_reservation "ls2_falu1_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu1")
+  "ls2_falu1_turn_enabled")
+
+;; Subscribe ls2_falu2_turn_enabled.
+(define_insn "ls2_falu2_turn_enabled_insn"
+  [(unspec [(const_int 0)] UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN)]
+  "TUNE_LOONGSON_2EF"
+  { gcc_unreachable (); }
+  [(set_attr "ls2_turn_type" "falu2")])
+
+(define_insn_reservation "ls2_falu2_turn_enabled" 0
+  (eq_attr "ls2_turn_type" "falu2")
+  "ls2_falu2_turn_enabled")
+
+;; Automaton for memory operations.
+(define_automaton "ls2_mem")
+
+;; Memory unit.
+(define_query_cpu_unit "ls2_mem" "ls2_mem")
+
+;; Reservation for integer instructions.
+(define_insn_reservation "ls2_alu" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "arith,condmove,const,logical,mfhilo,move,
+                        mthilo,nop,shift,signext,slt"))
+  "ls2_alu")
+
+;; Reservation for branch instructions.
+(define_insn_reservation "ls2_branch" 2
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "branch,jump,call,trap"))
+  "ls2_alu1")
+
+;; Reservation for integer multiplication instructions.
+(define_insn_reservation "ls2_imult" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "imul,imul3"))
+  "ls2_alu2,ls2_alu2_core")
+
+;; Reservation for integer division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take 2-38 cycles.
+(define_insn_reservation "ls2_idiv" 20
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "idiv"))
+  "ls2_alu2,ls2_alu2_core*18")
+
+;; Reservation for memory load instructions.
+(define_insn_reservation "ls2_load" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "load,fpload,mfc,mtc"))
+  "ls2_mem")
+
+;; Reservation for memory store instructions.
+;; With stores we assume they don't alias with dependent loads.
+;; Therefore we set the latency to zero.
+(define_insn_reservation "ls2_store" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "store,fpstore"))
+  "ls2_mem")
+
+;; Reservation for floating-point instructions of latency 3.
+(define_insn_reservation "ls2_fp3" 3
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fabs,fneg,fcmp,fmove"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions of latency 5.
+(define_insn_reservation "ls2_fp5" 5
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fcvt"))
+  "ls2_falu1")
+
+;; Reservation for floating-point instructions that can go
+;; to either of FALU1/2 units.
+(define_insn_reservation "ls2_falu" 7
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fadd,fmul,fmadd"))
+  "ls2_falu")
+
+;; Reservation for floating-point division / remainder instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; div.s takes 5-11 cycles
+;; div.d takes 5-18 cycles
+(define_insn_reservation "ls2_fdiv" 9
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fdiv"))
+  "ls2_falu2,ls2_falu2_core*7")
+
+;; Reservation for floating-point sqrt instructions.
+;; These instructions use the SRT algorithm and hence take a variable amount
+;; of cycles:
+;; sqrt.s takes 5-17 cycles
+;; sqrt.d takes 5-32 cycles
+(define_insn_reservation "ls2_fsqrt" 15
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "fsqrt"))
+  "ls2_falu2,ls2_falu2_core*13")
+
+;; Two consecutive ALU instructions.
+(define_insn_reservation "ls2_multi" 4
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "multi"))
+  "(ls2_alu1,ls2_alu2_core)|(ls2_alu2,ls2_alu1_core)")
+
+(define_insn_reservation "ls2_ghost" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "ghost"))
+  "nothing")
+
+;; Reservation for everything else.  Normally, this reservation
+;; will only be used to handle cases like compiling
+;; for non-loongson CPU with -mtune=loongson2?.
+;;
+;; !!! This is not a good thing to depend upon the fact that
+;; DFA will check reservations in the same order as they appear
+;; in the file, but it seems to work for the time being.
+;; Anyway, we already use this DFA property heavily with generic.md.
+(define_insn_reservation "ls2_unknown" 1
+  (eq_attr "cpu" "loongson_2e,loongson_2f")
+  "ls2_alu1_core+ls2_alu2_core+ls2_falu1_core+ls2_falu2_core+ls2_mem")
--- gcc/config/mips/mips-ftypes.def	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/mips-ftypes.def	(/local/gcc-4)	(revision 525)
@@ -66,6 +66,24 @@ DEF_MIPS_FTYPE (1, (SF, SF))
 DEF_MIPS_FTYPE (2, (SF, SF, SF))
 DEF_MIPS_FTYPE (1, (SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (UDI, UDI, UDI))
+DEF_MIPS_FTYPE (2, (UDI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UQI))
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV4HI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV8QI, UV8QI))
+
+DEF_MIPS_FTYPE (2, (UV8QI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV8QI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV8QI, UV8QI, UV8QI))
+
 DEF_MIPS_FTYPE (1, (V2HI, SI))
 DEF_MIPS_FTYPE (2, (V2HI, SI, SI))
 DEF_MIPS_FTYPE (3, (V2HI, SI, SI, SI))
@@ -81,12 +99,27 @@ DEF_MIPS_FTYPE (2, (V2SF, V2SF, V2SF))
 DEF_MIPS_FTYPE (3, (V2SF, V2SF, V2SF, INT))
 DEF_MIPS_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, UQI))
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V2SI, V4HI, V4HI))
+
+DEF_MIPS_FTYPE (2, (V4HI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, USI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, V4HI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, USI))
+
 DEF_MIPS_FTYPE (1, (V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V2HI, V2HI))
 DEF_MIPS_FTYPE (1, (V4QI, V4QI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, V4QI))
 
+DEF_MIPS_FTYPE (2, (V8QI, V4HI, V4HI))
+DEF_MIPS_FTYPE (1, (V8QI, V8QI))
+DEF_MIPS_FTYPE (2, (V8QI, V8QI, V8QI))
+
 DEF_MIPS_FTYPE (2, (VOID, SI, SI))
 DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI))
 DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI))
--- gcc/config/mips/mips-ps-3d.md	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/mips-ps-3d.md	(/local/gcc-4)	(revision 525)
@@ -25,7 +25,7 @@
 			  (const_int 0)])
 	 (match_operand:V2SF 2 "register_operand" "f,0")
 	 (match_operand:V2SF 3 "register_operand" "0,f")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
   "@
     mov%T4.ps\t%0,%2,%1
     mov%t4.ps\t%0,%3,%1"
@@ -38,7 +38,7 @@
 		      (match_operand:V2SF 2 "register_operand" "0,f")
 		      (match_operand:CCV2 3 "register_operand" "z,z")]
 		     UNSPEC_MOVE_TF_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
   "@
     movt.ps\t%0,%1,%3
     movf.ps\t%0,%2,%3"
@@ -51,7 +51,7 @@
 	(if_then_else:V2SF (match_dup 5)
 			   (match_operand:V2SF 2 "register_operand")
 			   (match_operand:V2SF 3 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   /* We can only support MOVN.PS and MOVZ.PS.
      NOTE: MOVT.PS and MOVF.PS have different semantics from MOVN.PS and 
@@ -72,7 +72,7 @@
 	 (match_operand:V2SF 1 "register_operand" "f")
 	 (match_operand:V2SF 2 "register_operand" "f")
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "pul.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -86,7 +86,7 @@
 			  (parallel [(const_int 1)
 				     (const_int 0)]))
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "puu.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -100,7 +100,7 @@
 				     (const_int 0)]))
 	 (match_operand:V2SF 2 "register_operand" "f")
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "pll.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -116,7 +116,7 @@
 			  (parallel [(const_int 1)
 				     (const_int 0)]))
 	 (const_int 2)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
   "plu.ps\t%0,%1,%2"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -125,7 +125,7 @@
 (define_expand "vec_initv2sf"
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:V2SF 1 "")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   rtx op0 = force_reg (SFmode, XVECEXP (operands[1], 0, 0));
   rtx op1 = force_reg (SFmode, XVECEXP (operands[1], 0, 1));
@@ -138,7 +138,7 @@
 	(vec_concat:V2SF
 	 (match_operand:SF 1 "register_operand" "f")
 	 (match_operand:SF 2 "register_operand" "f")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   if (BYTES_BIG_ENDIAN)
     return "cvt.ps.s\t%0,%1,%2";
@@ -157,7 +157,7 @@
 	(vec_select:SF (match_operand:V2SF 1 "register_operand" "f")
 		       (parallel
 			[(match_operand 2 "const_0_or_1_operand" "")])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   if (INTVAL (operands[2]) == !BYTES_BIG_ENDIAN)
     return "cvt.s.pu\t%0,%1";
@@ -174,7 +174,7 @@
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:SF 1 "register_operand")
    (match_operand 2 "const_0_or_1_operand")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_PXX_PS"
 {
   rtx temp;
 
@@ -194,7 +194,7 @@
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:SF 1 "register_operand")
    (match_operand:SF 2 "register_operand")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
 {
   if (BYTES_BIG_ENDIAN)
     emit_insn (gen_vec_initv2sf_internal (operands[0], operands[1],
@@ -210,7 +210,7 @@
   [(set (match_operand:SF 0 "register_operand")
 	(vec_select:SF (match_operand:V2SF 1 "register_operand")
 		       (parallel [(match_dup 2)])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   { operands[2] = GEN_INT (BYTES_BIG_ENDIAN); })
 
 ; cvt.s.pu - Floating Point Convert Pair Upper to Single Floating Point
@@ -218,7 +218,7 @@
   [(set (match_operand:SF 0 "register_operand")
 	(vec_select:SF (match_operand:V2SF 1 "register_operand")
 		       (parallel [(match_dup 2)])))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   { operands[2] = GEN_INT (!BYTES_BIG_ENDIAN); })
 
 ; alnv.ps - Floating Point Align Variable
@@ -228,7 +228,7 @@
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand:SI 3 "register_operand" "d")]
 		     UNSPEC_ALNV_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ALNV_PS"
   "alnv.ps\t%0,%1,%2,%3"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
@@ -239,7 +239,7 @@
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")
 		      (match_operand:V2SF 2 "register_operand" "f")]
 		     UNSPEC_ADDR_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ADDR_PS"
   "addr.ps\t%0,%1,%2"
   [(set_attr "type" "fadd")
    (set_attr "mode" "SF")])
@@ -249,7 +249,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_CVT_PW_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   "cvt.pw.ps\t%0,%1"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "SF")])
@@ -259,7 +259,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_CVT_PS_PW))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CVT_PS"
   "cvt.ps.pw\t%0,%1"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "SF")])
@@ -270,7 +270,7 @@
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")
 		      (match_operand:V2SF 2 "register_operand" "f")]
 		     UNSPEC_MULR_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MULR_PS"
   "mulr.ps\t%0,%1,%2"
   [(set_attr "type" "fmul")
    (set_attr "mode" "SF")])
@@ -280,7 +280,7 @@
   [(set (match_operand:V2SF 0 "register_operand")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand")]
 		     UNSPEC_ABS_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ABS_PS"
 {
   /* If we can ignore NaNs, this operation is equivalent to the
      rtl ABS code.  */
@@ -295,7 +295,7 @@
   [(set (match_operand:V2SF 0 "register_operand" "=f")
 	(unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")]
 		     UNSPEC_ABS_PS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_ABS_PS"
   "abs.ps\t%0,%1"
   [(set_attr "type" "fabs")
    (set_attr "mode" "SF")])
@@ -310,7 +310,7 @@
 		    (match_operand:SCALARF 2 "register_operand" "f")
 		    (match_operand 3 "const_int_operand" "")]
 		   UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CABS_PS"
   "cabs.%Y3.<fmt>\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -328,7 +328,7 @@
 		      (match_operand:V2SF 4 "register_operand" "f")
 		      (match_operand 5 "const_int_operand" "")]
 		     UNSPEC_C))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_C_COND_4S"
   "#"
   "&& reload_completed"
   [(set (match_dup 6)
@@ -357,7 +357,7 @@
 		      (match_operand:V2SF 4 "register_operand" "f")
 		      (match_operand 5 "const_int_operand" "")]
 		     UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CABS_PS"
   "#"
   "&& reload_completed"
   [(set (match_dup 6)
@@ -389,7 +389,7 @@
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand 3 "const_int_operand" "")]
 		     UNSPEC_C))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_C_COND_PS"
   "c.%Y3.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -400,7 +400,7 @@
 		      (match_operand:V2SF 2 "register_operand" "f")
 		      (match_operand 3 "const_int_operand" "")]
 		     UNSPEC_CABS))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_CABS_PS"
   "cabs.%Y3.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -416,7 +416,7 @@
 	   [(fcond (match_operand:V2SF 1 "register_operand" "f")
 		   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_SCC_PS"
   "c.<fcond>.ps\t%0,%1,%2"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -427,7 +427,7 @@
 	   [(swapped_fcond (match_operand:V2SF 1 "register_operand" "f")
 			   (match_operand:V2SF 2 "register_operand" "f"))]
 	   UNSPEC_SCC))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_SCC_PS"
   "c.<swapped_fcond>.ps\t%0,%2,%1"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FPSW")])
@@ -443,7 +443,7 @@
 			  (const_int 0))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any4t\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -455,7 +455,7 @@
 			  (const_int -1))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any4f\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -467,7 +467,7 @@
 			  (const_int 0))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any2t\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -479,7 +479,7 @@
 			  (const_int -1))
 		      (label_ref (match_operand 1 "" ""))
 		      (pc)))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_BC1ANY_PS"
   "%*bc1any2f\t%0,%1%/"
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
@@ -545,7 +545,7 @@
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
 		     UNSPEC_RSQRT1))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "rsqrt1.<fmt>\t%0,%1"
   [(set_attr "type" "frsqrt1")
    (set_attr "mode" "<UNITMODE>")])
@@ -555,7 +555,7 @@
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
 		      (match_operand:ANYF 2 "register_operand" "f")]
 		     UNSPEC_RSQRT2))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "rsqrt2.<fmt>\t%0,%1,%2"
   [(set_attr "type" "frsqrt2")
    (set_attr "mode" "<UNITMODE>")])
@@ -564,7 +564,7 @@
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
 		     UNSPEC_RECIP1))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "recip1.<fmt>\t%0,%1"
   [(set_attr "type" "frdiv1")
    (set_attr "mode" "<UNITMODE>")])
@@ -574,7 +574,7 @@
 	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
 		      (match_operand:ANYF 2 "register_operand" "f")]
 		     UNSPEC_RECIP2))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_RSQRT_PS"
   "recip2.<fmt>\t%0,%1,%2"
   [(set_attr "type" "frdiv2")
    (set_attr "mode" "<UNITMODE>")])
@@ -587,7 +587,7 @@
 	     (match_operand:V2SF 5 "register_operand")])
 	  (match_operand:V2SF 1 "register_operand")
 	  (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 GET_CODE (operands[3]), operands[4], operands[5]);
@@ -598,7 +598,7 @@
   [(set (match_operand:V2SF 0 "register_operand")
 	(smin:V2SF (match_operand:V2SF 1 "register_operand")
 		   (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 LE, operands[1], operands[2]);
@@ -609,7 +609,7 @@
   [(set (match_operand:V2SF 0 "register_operand")
 	(smax:V2SF (match_operand:V2SF 1 "register_operand")
 		   (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
+  "TARGET_HARD_FLOAT && ISA_HAS_MOVCC_PS"
 {
   mips_expand_vcondv2sf (operands[0], operands[1], operands[2],
 			 LE, operands[2], operands[1]);
--- gcc/config/mips/mips.md	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/mips.md	(/local/gcc-4)	(revision 525)
@@ -215,6 +215,36 @@
    (UNSPEC_DPAQX_SA_W_PH	446)
    (UNSPEC_DPSQX_S_W_PH		447)
    (UNSPEC_DPSQX_SA_W_PH	448)
+
+   ;; ST Microelectronics Loongson-2E/2F.
+   (UNSPEC_LOONGSON_PAVG	500)
+   (UNSPEC_LOONGSON_PCMPEQ	501)
+   (UNSPEC_LOONGSON_PCMPGT	502)
+   (UNSPEC_LOONGSON_PEXTR	503)
+   (UNSPEC_LOONGSON_PINSR_0	504)
+   (UNSPEC_LOONGSON_PINSR_1	505)
+   (UNSPEC_LOONGSON_PINSR_2	506)
+   (UNSPEC_LOONGSON_PINSR_3	507)
+   (UNSPEC_LOONGSON_PMADD	508)
+   (UNSPEC_LOONGSON_PMOVMSK	509)
+   (UNSPEC_LOONGSON_PMULHU	510)
+   (UNSPEC_LOONGSON_PMULH	511)
+   (UNSPEC_LOONGSON_PMULL	512)
+   (UNSPEC_LOONGSON_PMULU	513)
+   (UNSPEC_LOONGSON_PASUBUB	514)
+   (UNSPEC_LOONGSON_BIADD	515)
+   (UNSPEC_LOONGSON_PSADBH	516)
+   (UNSPEC_LOONGSON_PSHUFH	517)
+   (UNSPEC_LOONGSON_PUNPCKH	518)
+   (UNSPEC_LOONGSON_PUNPCKL	519)
+   (UNSPEC_LOONGSON_PADDD	520)
+   (UNSPEC_LOONGSON_PSUBD	521)
+
+   ;; Used in loongson2ef.md
+   (UNSPEC_LOONGSON_ALU1_TURN_ENABLED_INSN   530)
+   (UNSPEC_LOONGSON_ALU2_TURN_ENABLED_INSN   531)
+   (UNSPEC_LOONGSON_FALU1_TURN_ENABLED_INSN  532)
+   (UNSPEC_LOONGSON_FALU2_TURN_ENABLED_INSN  533)
   ]
 )
 
@@ -417,7 +447,7 @@
 ;; Attribute describing the processor.  This attribute must match exactly
 ;; with the processor_type enumeration in mips.h.
 (define_attr "cpu"
-  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson2e,loongson2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+  "r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
   (const (symbol_ref "mips_tune")))
 
 ;; The type of hardware hazard associated with this instruction.
@@ -496,11 +526,16 @@
 
 ;; This mode iterator allows :MOVECC to be used anywhere that a
 ;; conditional-move-type condition is needed.
-(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT") (CC "TARGET_HARD_FLOAT")])
+(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT")
+                              (CC "TARGET_HARD_FLOAT && !TARGET_LOONGSON_2EF")])
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
-  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
+  [DI DF
+   (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
+   (V4HI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
+   (V8QI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")])
 
 ;; 128-bit modes for which we provide move patterns on 64-bit targets.
 (define_mode_iterator MOVE128 [TI TF])
@@ -527,6 +562,9 @@
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (V2SF "!TARGET_64BIT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
+   (V4HI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
+   (V8QI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
    (TF "TARGET_64BIT && TARGET_FLOAT64")])
 
 ;; In GPR templates, a string like "<d>subu" will expand to "subu" in the
@@ -579,7 +617,9 @@
 
 ;; This attribute gives the integer mode that has half the size of
 ;; the controlling mode.
-(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")])
+(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI")
+			    (V2SI "SI") (V4HI "SI") (V8QI "SI")
+			    (TF "DI")])
 
 ;; This attribute works around the early SB-1 rev2 core "F2" erratum:
 ;;
@@ -760,6 +800,7 @@
 (include "sb1.md")
 (include "sr71k.md")
 (include "xlr.md")
+(include "loongson2ef.md")
 (include "generic.md")
 \f
 ;;
@@ -1864,33 +1905,53 @@
 
 ;; Floating point multiply accumulate instructions.
 
-(define_insn "*madd<mode>"
+(define_insn "*madd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "madd.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*msub<mode>"
+(define_insn "*madd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD"
+  "madd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*msub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			       (match_operand:ANYF 2 "register_operand" "f"))
 		    (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "msub.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>"
+(define_insn "*msub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			       (match_operand:ANYF 2 "register_operand" "f"))
+		    (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD"
+  "msub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (plus:ANYF
 		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1898,13 +1959,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>_fastmath"
+(define_insn "*nmadd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (plus:ANYF
+		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
 		    (match_operand:ANYF 2 "register_operand" "f"))
 	 (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1912,13 +1987,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>"
+(define_insn "*nmadd3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
+		    (match_operand:ANYF 2 "register_operand" "f"))
+	 (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (minus:ANYF
 		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 			      (match_operand:ANYF 3 "register_operand" "f"))
 		   (match_operand:ANYF 1 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1926,20 +2015,48 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>_fastmath"
+(define_insn "*nmsub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (minus:ANYF
+		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+			      (match_operand:ANYF 3 "register_operand" "f"))
+		   (match_operand:ANYF 1 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (match_operand:ANYF 1 "register_operand" "f")
 	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 		    (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
   "nmsub.<fmt>\t%0,%1,%2,%3"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
-\f
+
+(define_insn "*nmsub3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (match_operand:ANYF 1 "register_operand" "f")
+	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+		    (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
 ;;
 ;;  ....................
 ;;
@@ -6299,7 +6416,7 @@
 		 (const_int 0)])
 	 (match_operand:SCALARF 2 "register_operand" "f,0")
 	 (match_operand:SCALARF 3 "register_operand" "0,f")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
   "@
     mov%T4.<fmt>\t%0,%2,%1
     mov%t4.<fmt>\t%0,%3,%1"
@@ -6326,7 +6443,7 @@
 	(if_then_else:SCALARF (match_dup 5)
 			      (match_operand:SCALARF 2 "register_operand")
 			      (match_operand:SCALARF 3 "register_operand")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
 {
   mips_expand_conditional_move (operands);
   DONE;
@@ -6435,3 +6552,6 @@
 
 ; MIPS fixed-point instructions.
 (include "mips-fixed.md")
+
+; ST-Microelectronics Loongson-2E/2F-specific patterns.
+(include "loongson.md")
--- gcc/config/mips/mips-protos.h	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/mips-protos.h	(/local/gcc-4)	(revision 525)
@@ -303,4 +303,6 @@ union mips_gen_fn_ptrs
 extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 				     rtx, rtx, rtx, rtx);
 
+extern void mips_expand_vector_init (rtx, rtx);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
--- gcc/config/mips/loongson.h	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/loongson.h	(/local/gcc-4)	(revision 525)
@@ -0,0 +1,693 @@
+/* Intrinsics for ST Microelectronics Loongson-2E/2F SIMD operations.
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_LOONGSON_H
+#define _GCC_LOONGSON_H
+
+#if !defined(__mips_loongson_vector_rev)
+# error "You must select -march=loongson2e or -march=loongson2f to use loongson.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Vectors of unsigned bytes, halfwords and words.  */
+typedef uint8_t uint8x8_t __attribute__((vector_size (8)));
+typedef uint16_t uint16x4_t __attribute__((vector_size (8)));
+typedef uint32_t uint32x2_t __attribute__((vector_size (8)));
+
+/* Vectors of signed bytes, halfwords and words.  */
+typedef int8_t int8x8_t __attribute__((vector_size (8)));
+typedef int16_t int16x4_t __attribute__((vector_size (8)));
+typedef int32_t int32x2_t __attribute__((vector_size (8)));
+
+/* SIMD intrinsics.
+   Unless otherwise noted, calls to the functions below will expand into
+   precisely one machine instruction, modulo any moves required to
+   satisfy register allocation constraints.  */
+
+/* Pack with signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+packsswh (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_packsswh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+packsshb (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_packsshb (s, t);
+}
+
+/* Pack with unsigned saturation.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+packushb (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_packushb (s, t);
+}
+
+/* Vector addition, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+paddw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_paddw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+paddw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_paddw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddb_s (s, t);
+}
+
+/* Addition of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+paddd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_paddd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+paddd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_paddd_s (s, t);
+}
+
+/* Vector addition, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddsb (s, t);
+}
+
+/* Vector addition, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddusb (s, t);
+}
+
+/* Logical AND NOT.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+pandn_ud (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_pandn_ud (s, t);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pandn_uw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pandn_uw (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pandn_uh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pandn_uh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pandn_ub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pandn_ub (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pandn_sd (int64_t s, int64_t t)
+{
+  return __builtin_loongson_pandn_sd (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pandn_sw (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pandn_sw (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pandn_sh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pandn_sh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pandn_sb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pandn_sb (s, t);
+}
+
+/* Average.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pavgh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pavgh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pavgb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pavgb (s, t);
+}
+
+/* Equality test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_s (s, t);
+}
+
+/* Greater-than test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpgth_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpgth_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_s (s, t);
+}
+
+/* Extract halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pextrh_u (uint16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_u (s, field);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pextrh_s (int16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_s (s, field);
+}
+
+/* Insert halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_u (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_s (s, t);
+}
+
+/* Multiply and add.  */
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pmaddhw (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaddhw (s, t);
+}
+
+/* Maximum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmaxsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaxsh (s, t);
+}
+
+/* Maximum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmaxub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pmaxub (s, t);
+}
+
+/* Minimum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pminsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pminsh (s, t);
+}
+
+/* Minimum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pminub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pminub (s, t);
+}
+
+/* Move byte mask.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmovmskb_u (uint8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_u (s);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pmovmskb_s (int8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_s (s);
+}
+
+/* Multiply unsigned integers and store high result.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pmulhuh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pmulhuh (s, t);
+}
+
+/* Multiply signed integers and store high result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmulhh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmulhh (s, t);
+}
+
+/* Multiply signed integers and store low result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmullh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmullh (s, t);
+}
+
+/* Multiply unsigned word integers.  */
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pmuluw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pmuluw (s, t);
+}
+
+/* Absolute difference.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pasubub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pasubub (s, t);
+}
+
+/* Sum of unsigned byte integers.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+biadd (uint8x8_t s)
+{
+  return __builtin_loongson_biadd (s);
+}
+
+/* Sum of absolute differences.
+   Note that this intrinsic expands into two machine instructions:
+   PASUBUB followed by BIADD.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psadbh (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psadbh (s, t);
+}
+
+/* Shuffle halfwords.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_u (dest, s, order);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_s (dest, s, order);
+}
+
+/* Shift left logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psllh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psllh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psllw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psllw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_s (s, amount);
+}
+
+/* Shift right logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrlh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrlh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psrlw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psrlw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_s (s, amount);
+}
+
+/* Shift right arithmetic.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrah_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrah_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psraw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psraw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_s (s, amount);
+}
+
+/* Vector subtraction, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psubw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_psubw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psubw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_psubw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubb_s (s, t);
+}
+
+/* Subtraction of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+psubd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_psubd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+psubd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_psubd_s (s, t);
+}
+
+/* Vector subtraction, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubsb (s, t);
+}
+
+/* Vector subtraction, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubusb (s, t);
+}
+
+/* Unpack high data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpckhwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpckhhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpckhbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpckhwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpckhhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpckhbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_s (s, t);
+}
+
+/* Unpack low data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpcklwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpcklhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpcklbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpcklwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpcklhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpcklbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_s (s, t);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- gcc/config/mips/mips.c	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/mips.c	(/local/gcc-4)	(revision 525)
@@ -3286,7 +3286,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case MINUS:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && !HONOR_SIGNED_ZEROS (mode))
@@ -3337,7 +3337,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case NEG:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && HONOR_SIGNED_ZEROS (mode))
@@ -3532,6 +3532,12 @@ mips_split_doubleword_move (rtx dest, rt
 	emit_insn (gen_move_doubleword_fprdf (dest, src));
       else if (!TARGET_64BIT && GET_MODE (dest) == V2SFmode)
 	emit_insn (gen_move_doubleword_fprv2sf (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V2SImode)
+	emit_insn (gen_move_doubleword_fprv2si (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V4HImode)
+	emit_insn (gen_move_doubleword_fprv4hi (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V8QImode)
+	emit_insn (gen_move_doubleword_fprv8qi (dest, src));
       else if (TARGET_64BIT && GET_MODE (dest) == TFmode)
 	emit_insn (gen_move_doubleword_fprtf (dest, src));
       else
@@ -8960,6 +8966,14 @@ mips_hard_regno_mode_ok_p (unsigned int 
       if (mode == TFmode && ISA_HAS_8CC)
 	return true;
 
+      /* Allow 64-bit vector modes for Loongson-2E/2F.  */
+      if (TARGET_LOONGSON_VECTORS
+	  && (mode == V2SImode
+	      || mode == V4HImode
+	      || mode == V8QImode
+	      || mode == DImode))
+	return true;
+
       if (class == MODE_FLOAT
 	  || class == MODE_COMPLEX_FLOAT
 	  || class == MODE_VECTOR_FLOAT)
@@ -9323,6 +9337,11 @@ mips_vector_mode_supported_p (enum machi
     case V4UQQmode:
       return TARGET_DSP;
 
+    case V2SImode:
+    case V4HImode:
+    case V8QImode:
+      return TARGET_LOONGSON_VECTORS;
+
     default:
       return false;
     }
@@ -9759,6 +9778,41 @@ mips_store_data_bypass_p (rtx out_insn, 
   return !store_data_bypass_p (out_insn, in_insn);
 }
 \f
+
+/* Variables and flags used in scheduler hooks when tuning for
+   Loongson 2E/2F.  */
+static struct
+{
+  /* Variables to support Loongson 2E/2F round-robin [F]ALU1/2 dispatch
+     strategy.  */
+
+  /* If true, then next ALU1/2 instruction will go to ALU1.  */
+  bool alu1_turn_p;
+
+  /* If true, then next FALU1/2 unstruction will go to FALU1.  */
+  bool falu1_turn_p;
+
+  /* Codes to query if [f]alu{1,2}_core units are subscribed or not.  */
+  int alu1_core_unit_code;
+  int alu2_core_unit_code;
+  int falu1_core_unit_code;
+  int falu2_core_unit_code;
+
+  /* True if current cycle has a multi instruction.
+     This flag is used in mips_ls2_dfa_post_advance_cycle.  */
+  bool cycle_has_multi_p;
+
+  /* Instructions to subscribe ls2_[f]alu{1,2}_turn_enabled units.
+     These are used in mips_ls2_dfa_post_advance_cycle to initialize
+     DFA state.
+     E.g., when alu1_turn_enabled_insn is issued it makes next ALU1/2
+     instruction to go ALU1.  */
+  rtx alu1_turn_enabled_insn;
+  rtx alu2_turn_enabled_insn;
+  rtx falu1_turn_enabled_insn;
+  rtx falu2_turn_enabled_insn;
+} mips_ls2;
+
 /* Implement TARGET_SCHED_ADJUST_COST.  We assume that anti and output
    dependencies have no cost, except on the 20Kc where output-dependence
    is treated like input-dependence.  */
@@ -9809,11 +9863,124 @@ mips_issue_rate (void)
 	 reach the theoretical max of 4.  */
       return 3;
 
+    case PROCESSOR_LOONGSON_2E:
+    case PROCESSOR_LOONGSON_2F:
+      return 4;
+
     default:
       return 1;
     }
 }
 
+/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook for Loongson2.  */
+
+static void
+mips_ls2_init_dfa_post_cycle_insn (void)
+{
+  start_sequence ();
+  emit_insn (gen_ls2_alu1_turn_enabled_insn ());
+  mips_ls2.alu1_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  start_sequence ();
+  emit_insn (gen_ls2_alu2_turn_enabled_insn ());
+  mips_ls2.alu2_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  start_sequence ();
+  emit_insn (gen_ls2_falu1_turn_enabled_insn ());
+  mips_ls2.falu1_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  start_sequence ();
+  emit_insn (gen_ls2_falu2_turn_enabled_insn ());
+  mips_ls2.falu2_turn_enabled_insn = get_insns ();
+  end_sequence ();
+
+  mips_ls2.alu1_core_unit_code = get_cpu_unit_code ("ls2_alu1_core");
+  mips_ls2.alu2_core_unit_code = get_cpu_unit_code ("ls2_alu2_core");
+  mips_ls2.falu1_core_unit_code = get_cpu_unit_code ("ls2_falu1_core");
+  mips_ls2.falu2_core_unit_code = get_cpu_unit_code ("ls2_falu2_core");
+}
+
+/* Implement TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN hook.
+   Init data used in mips_dfa_post_advance_cycle.  */
+
+static void
+mips_init_dfa_post_cycle_insn (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    mips_ls2_init_dfa_post_cycle_insn ();
+}
+
+/* Initialize STATE when scheduling for Loongson 2E/2F.
+   Support round-robin dispatch scheme by enabling only one of
+   ALU1/ALU2 and one of FALU1/FALU2 units for ALU1/2 and FALU1/2 instructions
+   respectively.  */
+
+static void
+mips_ls2_dfa_post_advance_cycle (state_t state)
+{
+  if (cpu_unit_reservation_p (state, mips_ls2.alu1_core_unit_code))
+    {
+      /* Though there are no non-pipelined ALU1 insns,
+	 we can get an instruction of type 'multi' before reload.  */
+      gcc_assert (mips_ls2.cycle_has_multi_p);
+      mips_ls2.alu1_turn_p = false;
+    }
+
+  mips_ls2.cycle_has_multi_p = false;
+
+  if (cpu_unit_reservation_p (state, mips_ls2.alu2_core_unit_code))
+    /* We have a non-pipelined alu instruction in the core,
+       adjust round-robin counter.  */
+    mips_ls2.alu1_turn_p = true;
+
+  if (mips_ls2.alu1_turn_p)
+    {
+      if (state_transition (state, mips_ls2.alu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, mips_ls2.alu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+
+  if (cpu_unit_reservation_p (state, mips_ls2.falu1_core_unit_code))
+    {
+      /* There are no non-pipelined FALU1 insns.  */
+      gcc_unreachable ();
+      mips_ls2.falu1_turn_p = false;
+    }
+
+  if (cpu_unit_reservation_p (state, mips_ls2.falu2_core_unit_code))
+    /* We have a non-pipelined falu instruction in the core,
+       adjust round-robin counter.  */
+    mips_ls2.falu1_turn_p = true;
+
+  if (mips_ls2.falu1_turn_p)
+    {
+      if (state_transition (state, mips_ls2.falu1_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (state_transition (state, mips_ls2.falu2_turn_enabled_insn) >= 0)
+	gcc_unreachable ();
+    }
+}
+
+/* Implement TARGET_SCHED_DFA_POST_ADVANCE_CYCLE.
+   This hook is being called at the start of each cycle.  */
+
+static void
+mips_dfa_post_advance_cycle (void)
+{
+  if (TUNE_LOONGSON_2EF)
+    mips_ls2_dfa_post_advance_cycle (curr_state);
+}
+
 /* Implement TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD.  This should
    be as wide as the scheduling freedom in the DFA.  */
 
@@ -9824,6 +9991,9 @@ mips_multipass_dfa_lookahead (void)
   if (TUNE_SB1)
     return 4;
 
+  if (TUNE_LOONGSON_2EF)
+    return 4;
+
   return 0;
 }
 \f
@@ -10084,6 +10254,12 @@ mips_sched_init (FILE *file ATTRIBUTE_UN
   mips_macc_chains_last_hilo = 0;
   vr4130_last_insn = 0;
   mips_74k_agen_init (NULL_RTX);
+
+  /* When scheduling for Loongson2, branch instructions go to ALU1,
+     therefore basic block is most likely to start with round-robin counter
+     pointed to ALU2.  */
+  mips_ls2.alu1_turn_p = false;
+  mips_ls2.falu1_turn_p = true;
 }
 
 /* Implement TARGET_SCHED_REORDER and TARGET_SCHED_REORDER2.  */
@@ -10109,6 +10285,37 @@ mips_sched_reorder (FILE *file ATTRIBUTE
   return mips_issue_rate ();
 }
 
+/* Update round-robin counters for ALU1/2 and FALU1/2.  */
+
+static void
+mips_ls2_variable_issue (rtx insn)
+{
+  if (mips_ls2.alu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.alu1_core_unit_code))
+	mips_ls2.alu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.alu2_core_unit_code))
+	mips_ls2.alu1_turn_p = true;
+    }
+
+  if (mips_ls2.falu1_turn_p)
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.falu1_core_unit_code))
+	mips_ls2.falu1_turn_p = false;
+    }
+  else
+    {
+      if (cpu_unit_reservation_p (curr_state, mips_ls2.falu2_core_unit_code))
+	mips_ls2.falu1_turn_p = true;
+    }
+
+  if (recog_memoized (insn) >= 0)
+    mips_ls2.cycle_has_multi_p |= (get_attr_type (insn) == TYPE_MULTI);
+}
+
 /* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
 
 static int
@@ -10124,7 +10331,20 @@ mips_variable_issue (FILE *file ATTRIBUT
       vr4130_last_insn = insn;
       if (TUNE_74K)
 	mips_74k_agen_init (insn);
+      else if (TUNE_LOONGSON_2EF)
+	mips_ls2_variable_issue (insn);
     }
+
+  if (recog_memoized (insn) >= 0)
+    /* Instructions of type 'multi' should all be split before
+       second scheduling pass.  */
+    {
+      bool multi_p;
+
+      multi_p = (get_attr_type (insn) == TYPE_MULTI);
+      gcc_assert (!multi_p || !reload_completed);
+    }
+
   return more;
 }
 \f
@@ -10146,6 +10366,23 @@ mips_prefetch_cookie (rtx write, rtx loc
   return GEN_INT (INTVAL (write) + 6);
 }
 \f
+/* Flags that indicate when a built-in function is available.
+
+   BUILTIN_AVAIL_NON_MIPS16
+	The function is available on the current target, but only
+	in non-MIPS16 mode.  */
+#define BUILTIN_AVAIL_NON_MIPS16 1
+
+/* Declare an availability predicate for built-in functions that
+   require non-MIPS16 mode and also require COND to be true.
+   NAME is the main part of the predicate's name.  */
+#define AVAIL_NON_MIPS16(NAME, COND)					\
+ static unsigned int							\
+ mips_builtin_avail_##NAME (void)					\
+ {									\
+   return (COND) ? BUILTIN_AVAIL_NON_MIPS16 : 0;			\
+ }
+
 /* This structure describes a single built-in function.  */
 struct mips_builtin_description {
   /* The code of the main .md file instruction.  See mips_builtin_type
@@ -10164,309 +10401,451 @@ struct mips_builtin_description {
   /* The function's prototype.  */
   enum mips_function_type function_type;
 
-  /* The target flags required for this function.  */
-  int target_flags;
+  /* Whether the function is available.  */
+  unsigned int (*avail) (void);
 };
 
-/* Define a MIPS_BUILTIN_DIRECT function for instruction CODE_FOR_mips_<INSN>.
-   FUNCTION_TYPE and TARGET_FLAGS are mips_builtin_description fields.  */
-#define DIRECT_BUILTIN(INSN, FUNCTION_TYPE, TARGET_FLAGS)		\
-  { CODE_FOR_mips_ ## INSN, 0, "__builtin_mips_" #INSN,			\
-    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, TARGET_FLAGS }
+AVAIL_NON_MIPS16 (paired_single, TARGET_PAIRED_SINGLE_FLOAT)
+AVAIL_NON_MIPS16 (sb1_paired_single, TARGET_SB1 && TARGET_PAIRED_SINGLE_FLOAT)
+AVAIL_NON_MIPS16 (paired_single_no_ls2,
+		  TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+AVAIL_NON_MIPS16 (mips3d, TARGET_MIPS3D)
+AVAIL_NON_MIPS16 (dsp, TARGET_DSP)
+AVAIL_NON_MIPS16 (dspr2, TARGET_DSPR2)
+AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
+AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
+AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_VECTORS)
+
+/* Construct a mips_builtin_description from the given arguments.
+
+   INSN is the name of the associated instruction pattern, without the
+   leading CODE_FOR_mips_.
+
+   CODE is the floating-point condition code associated with the
+   function.  It can be 'f' if the field is not applicable.
+
+   NAME is the name of the function itself, without the leading
+   "__builtin_mips_".
+
+   BUILTIN_TYPE and FUNCTION_TYPE are mips_builtin_description fields.
+
+   AVAIL is the name of the availability predicate, without the leading
+   mips_builtin_avail_.  */
+#define MIPS_BUILTIN(INSN, COND, NAME, BUILTIN_TYPE,			\
+		     FUNCTION_TYPE, AVAIL)				\
+  { CODE_FOR_mips_ ## INSN, MIPS_FP_COND_ ## COND,			\
+    "__builtin_mips_" NAME, BUILTIN_TYPE, FUNCTION_TYPE,		\
+    mips_builtin_avail_ ## AVAIL }
+
+/* Define __builtin_mips_<INSN>, which is a MIPS_BUILTIN_DIRECT function
+   mapped to instruction CODE_FOR_mips_<INSN>,  FUNCTION_TYPE and AVAIL
+   are as for MIPS_BUILTIN.  */
+#define DIRECT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL)			\
+  MIPS_BUILTIN (INSN, f, #INSN, MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, AVAIL)
 
 /* Define __builtin_mips_<INSN>_<COND>_{s,d} functions, both of which
-   require TARGET_FLAGS.  */
-#define CMP_SCALAR_BUILTINS(INSN, COND, TARGET_FLAGS)			\
-  { CODE_FOR_mips_ ## INSN ## _cond_s, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_" #INSN "_" #COND "_s",				\
-    MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_SF_SF, TARGET_FLAGS },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_d, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_" #INSN "_" #COND "_d",				\
-    MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_DF_DF, TARGET_FLAGS }
+   are subject to mips_builtin_avail_<AVAIL>.  */
+#define CMP_SCALAR_BUILTINS(INSN, COND, AVAIL)				\
+  MIPS_BUILTIN (INSN ## _cond_s, COND, #INSN "_" #COND "_s",		\
+		MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_SF_SF, AVAIL),	\
+  MIPS_BUILTIN (INSN ## _cond_d, COND, #INSN "_" #COND "_d",		\
+		MIPS_BUILTIN_CMP_SINGLE, MIPS_INT_FTYPE_DF_DF, AVAIL)
 
 /* Define __builtin_mips_{any,all,upper,lower}_<INSN>_<COND>_ps.
-   The lower and upper forms require TARGET_FLAGS while the any and all
-   forms require MASK_MIPS3D.  */
-#define CMP_PS_BUILTINS(INSN, COND, TARGET_FLAGS)			\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_any_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_ANY, MIPS_INT_FTYPE_V2SF_V2SF, MASK_MIPS3D },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_all_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_ALL, MIPS_INT_FTYPE_V2SF_V2SF, MASK_MIPS3D },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_lower_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_LOWER, MIPS_INT_FTYPE_V2SF_V2SF, TARGET_FLAGS },	\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_upper_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_CMP_UPPER, MIPS_INT_FTYPE_V2SF_V2SF, TARGET_FLAGS }
+   The lower and upper forms are subject to mips_builtin_avail_<AVAIL>
+   while the any and all forms are subject to mips_builtin_avail_mips3d.  */
+#define CMP_PS_BUILTINS(INSN, COND, AVAIL)				\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "any_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_ANY, MIPS_INT_FTYPE_V2SF_V2SF,		\
+		mips3d),						\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "all_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_ALL, MIPS_INT_FTYPE_V2SF_V2SF,		\
+		mips3d),						\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "lower_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_LOWER, MIPS_INT_FTYPE_V2SF_V2SF,	\
+		AVAIL),							\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "upper_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_CMP_UPPER, MIPS_INT_FTYPE_V2SF_V2SF,	\
+		AVAIL)
 
 /* Define __builtin_mips_{any,all}_<INSN>_<COND>_4s.  The functions
-   require MASK_MIPS3D.  */
+   are subject to mips_builtin_avail_mips3d.  */
 #define CMP_4S_BUILTINS(INSN, COND)					\
-  { CODE_FOR_mips_ ## INSN ## _cond_4s, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_any_" #INSN "_" #COND "_4s",			\
-    MIPS_BUILTIN_CMP_ANY, MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    MASK_MIPS3D },							\
-  { CODE_FOR_mips_ ## INSN ## _cond_4s, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_all_" #INSN "_" #COND "_4s",			\
-    MIPS_BUILTIN_CMP_ALL, MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    MASK_MIPS3D }
+  MIPS_BUILTIN (INSN ## _cond_4s, COND, "any_" #INSN "_" #COND "_4s",	\
+		MIPS_BUILTIN_CMP_ANY,					\
+		MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF, mips3d),		\
+  MIPS_BUILTIN (INSN ## _cond_4s, COND, "all_" #INSN "_" #COND "_4s",	\
+		MIPS_BUILTIN_CMP_ALL,					\
+		MIPS_INT_FTYPE_V2SF_V2SF_V2SF_V2SF, mips3d)
 
 /* Define __builtin_mips_mov{t,f}_<INSN>_<COND>_ps.  The comparison
-   instruction requires TARGET_FLAGS.  */
-#define MOVTF_BUILTINS(INSN, COND, TARGET_FLAGS)			\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_movt_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_MOVT, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    TARGET_FLAGS },							\
-  { CODE_FOR_mips_ ## INSN ## _cond_ps, MIPS_FP_COND_ ## COND,		\
-    "__builtin_mips_movf_" #INSN "_" #COND "_ps",			\
-    MIPS_BUILTIN_MOVF, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,		\
-    TARGET_FLAGS }
+   instruction requires mips_builtin_avail_<AVAIL>.  */
+#define MOVTF_BUILTINS(INSN, COND, AVAIL)				\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "movt_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_MOVT, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,	\
+		AVAIL),							\
+  MIPS_BUILTIN (INSN ## _cond_ps, COND, "movf_" #INSN "_" #COND "_ps",	\
+		MIPS_BUILTIN_MOVF, MIPS_V2SF_FTYPE_V2SF_V2SF_V2SF_V2SF,	\
+		AVAIL)
 
 /* Define all the built-in functions related to C.cond.fmt condition COND.  */
 #define CMP_BUILTINS(COND)						\
-  MOVTF_BUILTINS (c, COND, MASK_PAIRED_SINGLE_FLOAT),			\
-  MOVTF_BUILTINS (cabs, COND, MASK_MIPS3D),				\
-  CMP_SCALAR_BUILTINS (cabs, COND, MASK_MIPS3D),			\
-  CMP_PS_BUILTINS (c, COND, MASK_PAIRED_SINGLE_FLOAT),			\
-  CMP_PS_BUILTINS (cabs, COND, MASK_MIPS3D),				\
+  MOVTF_BUILTINS (c, COND, paired_single_no_ls2),			\
+  MOVTF_BUILTINS (cabs, COND, mips3d),					\
+  CMP_SCALAR_BUILTINS (cabs, COND, mips3d),				\
+  CMP_PS_BUILTINS (c, COND, paired_single),				\
+  CMP_PS_BUILTINS (cabs, COND, mips3d),					\
   CMP_4S_BUILTINS (c, COND),						\
   CMP_4S_BUILTINS (cabs, COND)
 
-static const struct mips_builtin_description mips_ps_bdesc[] = {
-  DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (plu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (puu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (cvt_ps_s, MIPS_V2SF_FTYPE_SF_SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (cvt_s_pl, MIPS_SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (cvt_s_pu, MIPS_SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (abs_ps, MIPS_V2SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT),
-
-  DIRECT_BUILTIN (alnv_ps, MIPS_V2SF_FTYPE_V2SF_V2SF_INT,
-		  MASK_PAIRED_SINGLE_FLOAT),
-  DIRECT_BUILTIN (addr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (mulr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (cvt_pw_ps, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (cvt_ps_pw, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-
-  DIRECT_BUILTIN (recip1_s, MIPS_SF_FTYPE_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip1_d, MIPS_DF_FTYPE_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip1_ps, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip2_s, MIPS_SF_FTYPE_SF_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip2_d, MIPS_DF_FTYPE_DF_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (recip2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-
-  DIRECT_BUILTIN (rsqrt1_s, MIPS_SF_FTYPE_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt1_d, MIPS_DF_FTYPE_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt1_ps, MIPS_V2SF_FTYPE_V2SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt2_s, MIPS_SF_FTYPE_SF_SF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt2_d, MIPS_DF_FTYPE_DF_DF, MASK_MIPS3D),
-  DIRECT_BUILTIN (rsqrt2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, MASK_MIPS3D),
-
-  MIPS_FP_CONDITIONS (CMP_BUILTINS)
-};
+/* Define __builtin_mips_<INSN>, which is a MIPS_BUILTIN_DIRECT_NO_TARGET
+   function mapped to instruction CODE_FOR_mips_<INSN>,  FUNCTION_TYPE
+   and AVAIL are as for MIPS_BUILTIN.  */
+#define DIRECT_NO_TARGET_BUILTIN(INSN, FUNCTION_TYPE, AVAIL)		\
+  MIPS_BUILTIN (INSN, f, #INSN,	MIPS_BUILTIN_DIRECT_NO_TARGET,		\
+		FUNCTION_TYPE, AVAIL)
 
-/* Built-in functions for the SB-1 processor.  */
+/* Define __builtin_mips_bposge<VALUE>.  <VALUE> is 32 for the MIPS32 DSP
+   branch instruction.  AVAIL is as for MIPS_BUILTIN.  */
+#define BPOSGE_BUILTIN(VALUE, AVAIL)					\
+  MIPS_BUILTIN (bposge, f, "bposge" #VALUE,				\
+		MIPS_BUILTIN_BPOSGE ## VALUE, MIPS_SI_FTYPE_VOID, AVAIL)
+
+/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<FN_NAME>
+   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
+   builtin_description field.  */
+#define LOONGSON_BUILTIN_ALIAS(INSN, FN_NAME, FUNCTION_TYPE)		\
+  { CODE_FOR_loongson_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
+    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, mips_builtin_avail_loongson }
+
+/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<INSN>
+   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
+   builtin_description field.  */
+#define LOONGSON_BUILTIN(INSN, FUNCTION_TYPE)				\
+  LOONGSON_BUILTIN_ALIAS (INSN, INSN, FUNCTION_TYPE)
+
+/* Like LOONGSON_BUILTIN, but add _<SUFFIX> to the end of the function name.
+   We use functions of this form when the same insn can be usefully applied
+   to more than one datatype.  */
+#define LOONGSON_BUILTIN_SUFFIX(INSN, SUFFIX, FUNCTION_TYPE)		\
+  LOONGSON_BUILTIN_ALIAS (INSN, INSN ## _ ## SUFFIX, FUNCTION_TYPE)
 
 #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2
-
-static const struct mips_builtin_description mips_sb1_bdesc[] = {
-  DIRECT_BUILTIN (sqrt_ps, MIPS_V2SF_FTYPE_V2SF, MASK_PAIRED_SINGLE_FLOAT)
-};
-
-/* Built-in functions for the DSP ASE.  */
-
 #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3
 #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3
 #define CODE_FOR_mips_subq_ph CODE_FOR_subv2hi3
 #define CODE_FOR_mips_subu_qb CODE_FOR_subv4qi3
 #define CODE_FOR_mips_mul_ph CODE_FOR_mulv2hi3
 
-/* Define a MIPS_BUILTIN_DIRECT_NO_TARGET function for instruction
-   CODE_FOR_mips_<INSN>.  FUNCTION_TYPE and TARGET_FLAGS are
-   mips_builtin_description fields.  */
-#define DIRECT_NO_TARGET_BUILTIN(INSN, FUNCTION_TYPE, TARGET_FLAGS)	\
-  { CODE_FOR_mips_ ## INSN, 0, "__builtin_mips_" #INSN,			\
-    MIPS_BUILTIN_DIRECT_NO_TARGET, FUNCTION_TYPE, TARGET_FLAGS }
-
-/* Define __builtin_mips_bposge<VALUE>.  <VALUE> is 32 for the MIPS32 DSP
-   branch instruction.  TARGET_FLAGS is a mips_builtin_description field.  */
-#define BPOSGE_BUILTIN(VALUE, TARGET_FLAGS)				\
-  { CODE_FOR_mips_bposge, 0, "__builtin_mips_bposge" #VALUE,		\
-    MIPS_BUILTIN_BPOSGE ## VALUE, MIPS_SI_FTYPE_VOID, TARGET_FLAGS }
-
-static const struct mips_builtin_description mips_dsp_bdesc[] = {
-  DIRECT_BUILTIN (addq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (addq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (addq_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (addu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (addu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (subq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (subq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (subq_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (subu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (subu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (addsc, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (addwc, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (modsub, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (raddu_w_qb, MIPS_SI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (absq_s_ph, MIPS_V2HI_FTYPE_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (absq_s_w, MIPS_SI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (precrq_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (precrq_ph_w, MIPS_V2HI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (precrq_rs_ph_w, MIPS_V2HI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (precrqu_s_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (preceq_w_phl, MIPS_SI_FTYPE_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (preceq_w_phr, MIPS_SI_FTYPE_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbl, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbr, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbla, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (precequ_ph_qbra, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbl, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbr, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbla, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (preceu_ph_qbra, MIPS_V2HI_FTYPE_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (shll_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shll_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shll_s_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shll_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shrl_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shra_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shra_r_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shra_r_w, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (muleu_s_ph_qbl, MIPS_V2HI_FTYPE_V4QI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (muleu_s_ph_qbr, MIPS_V2HI_FTYPE_V4QI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (mulq_rs_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (muleq_s_w_phl, MIPS_SI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (muleq_s_w_phr, MIPS_SI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (bitrev, MIPS_SI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (insv, MIPS_SI_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (repl_qb, MIPS_V4QI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (repl_ph, MIPS_V2HI_FTYPE_SI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmpu_eq_qb, MIPS_VOID_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmpu_lt_qb, MIPS_VOID_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmpu_le_qb, MIPS_VOID_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (cmpgu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (cmpgu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (cmpgu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmp_eq_ph, MIPS_VOID_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmp_lt_ph, MIPS_VOID_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (cmp_le_ph, MIPS_VOID_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (pick_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (pick_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (packrl_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSP),
-  DIRECT_NO_TARGET_BUILTIN (wrdsp, MIPS_VOID_FTYPE_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (rddsp, MIPS_SI_FTYPE_SI, MASK_DSP),
-  DIRECT_BUILTIN (lbux, MIPS_SI_FTYPE_POINTER_SI, MASK_DSP),
-  DIRECT_BUILTIN (lhx, MIPS_SI_FTYPE_POINTER_SI, MASK_DSP),
-  DIRECT_BUILTIN (lwx, MIPS_SI_FTYPE_POINTER_SI, MASK_DSP),
-  BPOSGE_BUILTIN (32, MASK_DSP),
-
-  /* The following are for the MIPS DSP ASE REV 2.  */
-  DIRECT_BUILTIN (absq_s_qb, MIPS_V4QI_FTYPE_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (addu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (addu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (adduh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (adduh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (append, MIPS_SI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (balign, MIPS_SI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (cmpgdu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (cmpgdu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (cmpgdu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (mul_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mul_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulq_rs_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulq_s_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (precr_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (precr_sra_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (precr_sra_r_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (prepend, MIPS_SI_FTYPE_SI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (shra_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (shra_r_qb, MIPS_V4QI_FTYPE_V4QI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (shrl_ph, MIPS_V2HI_FTYPE_V2HI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (subu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subuh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (subuh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (addqh_r_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (subqh_r_w, MIPS_SI_FTYPE_SI_SI, MASK_DSPR2)
-};
-
-static const struct mips_builtin_description mips_dsp_32only_bdesc[] = {
-  DIRECT_BUILTIN (dpau_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpau_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpsu_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpsu_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, MASK_DSP),
-  DIRECT_BUILTIN (dpaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (dpsq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (mulsaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (dpaq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (dpsq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSP),
-  DIRECT_BUILTIN (maq_s_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (maq_s_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (maq_sa_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (maq_sa_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSP),
-  DIRECT_BUILTIN (extr_w, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extr_r_w, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extr_rs_w, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extr_s_h, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extp, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (extpdp, MIPS_SI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (shilo, MIPS_DI_FTYPE_DI_SI, MASK_DSP),
-  DIRECT_BUILTIN (mthlip, MIPS_DI_FTYPE_DI_SI, MASK_DSP),
-
-  /* The following are for the MIPS DSP ASE REV 2.  */
-  DIRECT_BUILTIN (dpa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dps_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (madd, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (maddu, MIPS_DI_FTYPE_DI_USI_USI, MASK_DSPR2),
-  DIRECT_BUILTIN (msub, MIPS_DI_FTYPE_DI_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (msubu, MIPS_DI_FTYPE_DI_USI_USI, MASK_DSPR2),
-  DIRECT_BUILTIN (mulsa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (mult, MIPS_DI_FTYPE_SI_SI, MASK_DSPR2),
-  DIRECT_BUILTIN (multu, MIPS_DI_FTYPE_USI_USI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpax_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpsx_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpaqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpaqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpsqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2),
-  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2)
-};
-
-/* This structure describes an array of mips_builtin_description entries.  */
-struct mips_bdesc_map {
-  /* The array that this entry describes.  */
-  const struct mips_builtin_description *bdesc;
-
-  /* The number of entries in BDESC.  */
-  unsigned int size;
-
-  /* The target processor that supports the functions in BDESC.
-     PROCESSOR_MAX means we enable them for all processors.  */
-  enum processor_type proc;
-
-  /* The functions in BDESC are not supported if any of these
-     target flags are set.  */
-  int unsupported_target_flags;
-};
-
-/* All MIPS-specific built-in functions.  */
-static const struct mips_bdesc_map mips_bdesc_arrays[] = {
-  { mips_ps_bdesc, ARRAY_SIZE (mips_ps_bdesc), PROCESSOR_MAX, 0 },
-  { mips_sb1_bdesc, ARRAY_SIZE (mips_sb1_bdesc), PROCESSOR_SB1, 0 },
-  { mips_dsp_bdesc, ARRAY_SIZE (mips_dsp_bdesc), PROCESSOR_MAX, 0 },
-  { mips_dsp_32only_bdesc, ARRAY_SIZE (mips_dsp_32only_bdesc),
-    PROCESSOR_MAX, MASK_64BIT }
+#define CODE_FOR_loongson_packsswh CODE_FOR_vec_pack_ssat_v2si
+#define CODE_FOR_loongson_packsshb CODE_FOR_vec_pack_ssat_v4hi
+#define CODE_FOR_loongson_packushb CODE_FOR_vec_pack_usat_v4hi
+#define CODE_FOR_loongson_paddw CODE_FOR_addv2si3
+#define CODE_FOR_loongson_paddh CODE_FOR_addv4hi3
+#define CODE_FOR_loongson_paddb CODE_FOR_addv8qi3
+#define CODE_FOR_loongson_paddsh CODE_FOR_ssaddv4hi3
+#define CODE_FOR_loongson_paddsb CODE_FOR_ssaddv8qi3
+#define CODE_FOR_loongson_paddush CODE_FOR_usaddv4hi3
+#define CODE_FOR_loongson_paddusb CODE_FOR_usaddv8qi3
+#define CODE_FOR_loongson_pmaxsh CODE_FOR_smaxv4hi3
+#define CODE_FOR_loongson_pmaxub CODE_FOR_umaxv8qi3
+#define CODE_FOR_loongson_pminsh CODE_FOR_sminv4hi3
+#define CODE_FOR_loongson_pminub CODE_FOR_uminv8qi3
+#define CODE_FOR_loongson_pmulhuh CODE_FOR_umulv4hi3_highpart
+#define CODE_FOR_loongson_pmulhh CODE_FOR_smulv4hi3_highpart
+#define CODE_FOR_loongson_biadd CODE_FOR_reduc_uplus_v8qi
+#define CODE_FOR_loongson_psubw CODE_FOR_subv2si3
+#define CODE_FOR_loongson_psubh CODE_FOR_subv4hi3
+#define CODE_FOR_loongson_psubb CODE_FOR_subv8qi3
+#define CODE_FOR_loongson_psubsh CODE_FOR_sssubv4hi3
+#define CODE_FOR_loongson_psubsb CODE_FOR_sssubv8qi3
+#define CODE_FOR_loongson_psubush CODE_FOR_ussubv4hi3
+#define CODE_FOR_loongson_psubusb CODE_FOR_ussubv8qi3
+#define CODE_FOR_loongson_punpckhbh CODE_FOR_vec_interleave_highv8qi
+#define CODE_FOR_loongson_punpckhhw CODE_FOR_vec_interleave_highv4hi
+#define CODE_FOR_loongson_punpckhwd CODE_FOR_vec_interleave_highv2si
+#define CODE_FOR_loongson_punpcklbh CODE_FOR_vec_interleave_lowv8qi
+#define CODE_FOR_loongson_punpcklhw CODE_FOR_vec_interleave_lowv4hi
+#define CODE_FOR_loongson_punpcklwd CODE_FOR_vec_interleave_lowv2si
+
+static const struct mips_builtin_description mips_builtins[] = {
+  DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),
+  DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),
+  DIRECT_BUILTIN (plu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),
+  DIRECT_BUILTIN (puu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),
+  DIRECT_BUILTIN (cvt_ps_s, MIPS_V2SF_FTYPE_SF_SF, paired_single_no_ls2),
+  DIRECT_BUILTIN (cvt_s_pl, MIPS_SF_FTYPE_V2SF, paired_single_no_ls2),
+  DIRECT_BUILTIN (cvt_s_pu, MIPS_SF_FTYPE_V2SF, paired_single_no_ls2),
+  DIRECT_BUILTIN (abs_ps, MIPS_V2SF_FTYPE_V2SF, paired_single),
+
+  DIRECT_BUILTIN (alnv_ps, MIPS_V2SF_FTYPE_V2SF_V2SF_INT, paired_single_no_ls2),
+  DIRECT_BUILTIN (addr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+  DIRECT_BUILTIN (mulr_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+  DIRECT_BUILTIN (cvt_pw_ps, MIPS_V2SF_FTYPE_V2SF, mips3d),
+  DIRECT_BUILTIN (cvt_ps_pw, MIPS_V2SF_FTYPE_V2SF, mips3d),
+
+  DIRECT_BUILTIN (recip1_s, MIPS_SF_FTYPE_SF, mips3d),
+  DIRECT_BUILTIN (recip1_d, MIPS_DF_FTYPE_DF, mips3d),
+  DIRECT_BUILTIN (recip1_ps, MIPS_V2SF_FTYPE_V2SF, mips3d),
+  DIRECT_BUILTIN (recip2_s, MIPS_SF_FTYPE_SF_SF, mips3d),
+  DIRECT_BUILTIN (recip2_d, MIPS_DF_FTYPE_DF_DF, mips3d),
+  DIRECT_BUILTIN (recip2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+
+  DIRECT_BUILTIN (rsqrt1_s, MIPS_SF_FTYPE_SF, mips3d),
+  DIRECT_BUILTIN (rsqrt1_d, MIPS_DF_FTYPE_DF, mips3d),
+  DIRECT_BUILTIN (rsqrt1_ps, MIPS_V2SF_FTYPE_V2SF, mips3d),
+  DIRECT_BUILTIN (rsqrt2_s, MIPS_SF_FTYPE_SF_SF, mips3d),
+  DIRECT_BUILTIN (rsqrt2_d, MIPS_DF_FTYPE_DF_DF, mips3d),
+  DIRECT_BUILTIN (rsqrt2_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, mips3d),
+
+  MIPS_FP_CONDITIONS (CMP_BUILTINS),
+
+  /* Built-in functions for the SB-1 processor.  */
+  DIRECT_BUILTIN (sqrt_ps, MIPS_V2SF_FTYPE_V2SF, sb1_paired_single),
+
+  /* Built-in functions for the DSP ASE (32-bit and 64-bit).  */
+  DIRECT_BUILTIN (addq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (addq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (addq_s_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (addu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (addu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (subq_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (subq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (subq_s_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (subu_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (subu_s_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (addsc, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (addwc, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (modsub, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (raddu_w_qb, MIPS_SI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (absq_s_ph, MIPS_V2HI_FTYPE_V2HI, dsp),
+  DIRECT_BUILTIN (absq_s_w, MIPS_SI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (precrq_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (precrq_ph_w, MIPS_V2HI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (precrq_rs_ph_w, MIPS_V2HI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (precrqu_s_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (preceq_w_phl, MIPS_SI_FTYPE_V2HI, dsp),
+  DIRECT_BUILTIN (preceq_w_phr, MIPS_SI_FTYPE_V2HI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbl, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbr, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbla, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (precequ_ph_qbra, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbl, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbr, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbla, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (preceu_ph_qbra, MIPS_V2HI_FTYPE_V4QI, dsp),
+  DIRECT_BUILTIN (shll_qb, MIPS_V4QI_FTYPE_V4QI_SI, dsp),
+  DIRECT_BUILTIN (shll_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shll_s_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shll_s_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (shrl_qb, MIPS_V4QI_FTYPE_V4QI_SI, dsp),
+  DIRECT_BUILTIN (shra_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shra_r_ph, MIPS_V2HI_FTYPE_V2HI_SI, dsp),
+  DIRECT_BUILTIN (shra_r_w, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (muleu_s_ph_qbl, MIPS_V2HI_FTYPE_V4QI_V2HI, dsp),
+  DIRECT_BUILTIN (muleu_s_ph_qbr, MIPS_V2HI_FTYPE_V4QI_V2HI, dsp),
+  DIRECT_BUILTIN (mulq_rs_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (muleq_s_w_phl, MIPS_SI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (muleq_s_w_phr, MIPS_SI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (bitrev, MIPS_SI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (insv, MIPS_SI_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (repl_qb, MIPS_V4QI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (repl_ph, MIPS_V2HI_FTYPE_SI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmpu_eq_qb, MIPS_VOID_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmpu_lt_qb, MIPS_VOID_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmpu_le_qb, MIPS_VOID_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (cmpgu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (cmpgu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (cmpgu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmp_eq_ph, MIPS_VOID_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmp_lt_ph, MIPS_VOID_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (cmp_le_ph, MIPS_VOID_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (pick_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dsp),
+  DIRECT_BUILTIN (pick_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_BUILTIN (packrl_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dsp),
+  DIRECT_NO_TARGET_BUILTIN (wrdsp, MIPS_VOID_FTYPE_SI_SI, dsp),
+  DIRECT_BUILTIN (rddsp, MIPS_SI_FTYPE_SI, dsp),
+  DIRECT_BUILTIN (lbux, MIPS_SI_FTYPE_POINTER_SI, dsp),
+  DIRECT_BUILTIN (lhx, MIPS_SI_FTYPE_POINTER_SI, dsp),
+  DIRECT_BUILTIN (lwx, MIPS_SI_FTYPE_POINTER_SI, dsp),
+  BPOSGE_BUILTIN (32, dsp),
+
+  /* The following are for the MIPS DSP ASE REV 2 (32-bit and 64-bit).  */
+  DIRECT_BUILTIN (absq_s_qb, MIPS_V4QI_FTYPE_V4QI, dspr2),
+  DIRECT_BUILTIN (addu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (addu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (adduh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (adduh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (append, MIPS_SI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (balign, MIPS_SI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (cmpgdu_eq_qb, MIPS_SI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (cmpgdu_lt_qb, MIPS_SI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (cmpgdu_le_qb, MIPS_SI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (mul_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (mul_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (mulq_rs_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (mulq_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (mulq_s_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (precr_qb_ph, MIPS_V4QI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (precr_sra_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (precr_sra_r_ph_w, MIPS_V2HI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (prepend, MIPS_SI_FTYPE_SI_SI_SI, dspr2),
+  DIRECT_BUILTIN (shra_qb, MIPS_V4QI_FTYPE_V4QI_SI, dspr2),
+  DIRECT_BUILTIN (shra_r_qb, MIPS_V4QI_FTYPE_V4QI_SI, dspr2),
+  DIRECT_BUILTIN (shrl_ph, MIPS_V2HI_FTYPE_V2HI_SI, dspr2),
+  DIRECT_BUILTIN (subu_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subu_s_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subuh_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (subuh_r_qb, MIPS_V4QI_FTYPE_V4QI_V4QI, dspr2),
+  DIRECT_BUILTIN (addqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (addqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (addqh_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (addqh_r_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (subqh_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subqh_r_ph, MIPS_V2HI_FTYPE_V2HI_V2HI, dspr2),
+  DIRECT_BUILTIN (subqh_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+  DIRECT_BUILTIN (subqh_r_w, MIPS_SI_FTYPE_SI_SI, dspr2),
+
+  /* Built-in functions for the DSP ASE (32-bit only).  */
+  DIRECT_BUILTIN (dpau_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpau_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpsu_h_qbl, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpsu_h_qbr, MIPS_DI_FTYPE_DI_V4QI_V4QI, dsp_32),
+  DIRECT_BUILTIN (dpaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (dpsq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (mulsaq_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (dpaq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, dsp_32),
+  DIRECT_BUILTIN (dpsq_sa_l_w, MIPS_DI_FTYPE_DI_SI_SI, dsp_32),
+  DIRECT_BUILTIN (maq_s_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (maq_s_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (maq_sa_w_phl, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (maq_sa_w_phr, MIPS_DI_FTYPE_DI_V2HI_V2HI, dsp_32),
+  DIRECT_BUILTIN (extr_w, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extr_r_w, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extr_rs_w, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extr_s_h, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extp, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (extpdp, MIPS_SI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (shilo, MIPS_DI_FTYPE_DI_SI, dsp_32),
+  DIRECT_BUILTIN (mthlip, MIPS_DI_FTYPE_DI_SI, dsp_32),
+
+  /* The following are for the MIPS DSP ASE REV 2 (32-bit only).  */
+  DIRECT_BUILTIN (dpa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dps_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (madd, MIPS_DI_FTYPE_DI_SI_SI, dspr2_32),
+  DIRECT_BUILTIN (maddu, MIPS_DI_FTYPE_DI_USI_USI, dspr2_32),
+  DIRECT_BUILTIN (msub, MIPS_DI_FTYPE_DI_SI_SI, dspr2_32),
+  DIRECT_BUILTIN (msubu, MIPS_DI_FTYPE_DI_USI_USI, dspr2_32),
+  DIRECT_BUILTIN (mulsa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (mult, MIPS_DI_FTYPE_SI_SI, dspr2_32),
+  DIRECT_BUILTIN (multu, MIPS_DI_FTYPE_USI_USI, dspr2_32),
+  DIRECT_BUILTIN (dpax_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpsx_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpaqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpaqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpsqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+
+  /* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
+  LOONGSON_BUILTIN (packsswh, MIPS_V4HI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (packsshb, MIPS_V8QI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (packushb, MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (paddh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (paddw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (paddh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (paddd, u, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_SUFFIX (paddd, s, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (paddsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (paddush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_ud, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_uw, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_uh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_ub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_sd, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_sw, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_sh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_sb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (pavgh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pavgb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgth, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgth, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (pextrh, u, MIPS_UV4HI_FTYPE_UV4HI_USI),
+  LOONGSON_BUILTIN_SUFFIX (pextrh, s, MIPS_V4HI_FTYPE_V4HI_USI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaddhw, MIPS_V2SI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaxsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaxub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pminsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pminub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pmovmskb, u, MIPS_UV8QI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pmovmskb, s, MIPS_V8QI_FTYPE_V8QI),
+  LOONGSON_BUILTIN (pmulhuh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pmulhh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmullh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmuluw, MIPS_UDI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pasubub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (biadd, MIPS_UV4HI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN (psadbh, MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrah, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrah, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psraw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psraw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psubw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (psubh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (psubb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (psubw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (psubh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (psubb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (psubd, u, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_SUFFIX (psubd, s, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (psubsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (psubush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI)
 };
 
 /* MODE is a vector mode whose elements have type TYPE.  Return the type
@@ -10475,11 +10854,17 @@ static const struct mips_bdesc_map mips_
 static tree
 mips_builtin_vector_type (tree type, enum machine_mode mode)
 {
-  static tree types[(int) MAX_MACHINE_MODE];
+  static tree types[2 * (int) MAX_MACHINE_MODE];
+  int mode_index;
+
+  mode_index = (int) mode;
 
-  if (types[(int) mode] == NULL_TREE)
-    types[(int) mode] = build_vector_type_for_mode (type, mode);
-  return types[(int) mode];
+  if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type))
+    mode_index += MAX_MACHINE_MODE;
+
+  if (types[mode_index] == NULL_TREE)
+    types[mode_index] = build_vector_type_for_mode (type, mode);
+  return types[mode_index];
 }
 
 /* Source-level argument types.  */
@@ -10488,16 +10873,27 @@ mips_builtin_vector_type (tree type, enu
 #define MIPS_ATYPE_POINTER ptr_type_node
 
 /* Standard mode-based argument types.  */
+#define MIPS_ATYPE_UQI unsigned_intQI_type_node
 #define MIPS_ATYPE_SI intSI_type_node
 #define MIPS_ATYPE_USI unsigned_intSI_type_node
 #define MIPS_ATYPE_DI intDI_type_node
+#define MIPS_ATYPE_UDI unsigned_intDI_type_node
 #define MIPS_ATYPE_SF float_type_node
 #define MIPS_ATYPE_DF double_type_node
 
 /* Vector argument types.  */
 #define MIPS_ATYPE_V2SF mips_builtin_vector_type (float_type_node, V2SFmode)
 #define MIPS_ATYPE_V2HI mips_builtin_vector_type (intHI_type_node, V2HImode)
+#define MIPS_ATYPE_V2SI mips_builtin_vector_type (intSI_type_node, V2SImode)
 #define MIPS_ATYPE_V4QI mips_builtin_vector_type (intQI_type_node, V4QImode)
+#define MIPS_ATYPE_V4HI mips_builtin_vector_type (intHI_type_node, V4HImode)
+#define MIPS_ATYPE_V8QI mips_builtin_vector_type (intQI_type_node, V8QImode)
+#define MIPS_ATYPE_UV2SI					\
+  mips_builtin_vector_type (unsigned_intSI_type_node, V2SImode)
+#define MIPS_ATYPE_UV4HI					\
+  mips_builtin_vector_type (unsigned_intHI_type_node, V4HImode)
+#define MIPS_ATYPE_UV8QI					\
+  mips_builtin_vector_type (unsigned_intQI_type_node, V8QImode)
 
 /* MIPS_FTYPE_ATYPESN takes N MIPS_FTYPES-like type codes and lists
    their associated MIPS_ATYPEs.  */
@@ -10545,25 +10941,17 @@ static void
 mips_init_builtins (void)
 {
   const struct mips_builtin_description *d;
-  const struct mips_bdesc_map *m;
-  unsigned int offset;
+  unsigned int i;
 
   /* Iterate through all of the bdesc arrays, initializing all of the
      builtin functions.  */
-  offset = 0;
-  for (m = mips_bdesc_arrays;
-       m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
-       m++)
-    {
-      if ((m->proc == PROCESSOR_MAX || m->proc == mips_arch)
-	  && (m->unsupported_target_flags & target_flags) == 0)
-	for (d = m->bdesc; d < &m->bdesc[m->size]; d++)
-	  if ((d->target_flags & target_flags) == d->target_flags)
-	    add_builtin_function (d->name,
-				  mips_build_function_type (d->function_type),
-				  d - m->bdesc + offset,
-				  BUILT_IN_MD, NULL, NULL);
-      offset += m->size;
+  for (i = 0; i < ARRAY_SIZE (mips_builtins); i++)
+    {
+      d = &mips_builtins[i];
+      if (d->avail ())
+	add_builtin_function (d->name,
+			      mips_build_function_type (d->function_type),
+			      i, BUILT_IN_MD, NULL, NULL);
     }
 }
 
@@ -10808,41 +11196,6 @@ mips_expand_builtin_bposge (enum mips_bu
 				       const1_rtx, const0_rtx);
 }
 
-/* EXP is a CALL_EXPR that calls the function described by BDESC.
-   Expand the call and return an rtx for its return value.
-   TARGET, if nonnull, suggests a good place to put this value.  */
-
-static rtx
-mips_expand_builtin_1 (const struct mips_builtin_description *bdesc,
-		       tree exp, rtx target)
-{
-  switch (bdesc->builtin_type)
-    {
-    case MIPS_BUILTIN_DIRECT:
-      return mips_expand_builtin_direct (bdesc->icode, target, exp, true);
-
-    case MIPS_BUILTIN_DIRECT_NO_TARGET:
-      return mips_expand_builtin_direct (bdesc->icode, target, exp, false);
-
-    case MIPS_BUILTIN_MOVT:
-    case MIPS_BUILTIN_MOVF:
-      return mips_expand_builtin_movtf (bdesc->builtin_type, bdesc->icode,
-					bdesc->cond, target, exp);
-
-    case MIPS_BUILTIN_CMP_ANY:
-    case MIPS_BUILTIN_CMP_ALL:
-    case MIPS_BUILTIN_CMP_UPPER:
-    case MIPS_BUILTIN_CMP_LOWER:
-    case MIPS_BUILTIN_CMP_SINGLE:
-      return mips_expand_builtin_compare (bdesc->builtin_type, bdesc->icode,
-					  bdesc->cond, target, exp);
-
-    case MIPS_BUILTIN_BPOSGE32:
-      return mips_expand_builtin_bposge (bdesc->builtin_type, target);
-    }
-  gcc_unreachable ();
-}
-
 /* Implement TARGET_EXPAND_BUILTIN.  */
 
 static rtx
@@ -10851,25 +11204,44 @@ mips_expand_builtin (tree exp, rtx targe
 		     int ignore ATTRIBUTE_UNUSED)
 {
   tree fndecl;
-  unsigned int fcode;
-  const struct mips_bdesc_map *m;
+  unsigned int fcode, avail;
+  const struct mips_builtin_description *d;
 
   fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   fcode = DECL_FUNCTION_CODE (fndecl);
+  gcc_assert (fcode < ARRAY_SIZE (mips_builtins));
+  d = &mips_builtins[fcode];
+  avail = d->avail ();
+  gcc_assert (avail != 0);
   if (TARGET_MIPS16)
     {
       error ("built-in function %qs not supported for MIPS16",
 	     IDENTIFIER_POINTER (DECL_NAME (fndecl)));
       return const0_rtx;
     }
+  switch (d->builtin_type)
+    {
+    case MIPS_BUILTIN_DIRECT:
+      return mips_expand_builtin_direct (d->icode, target, exp, true);
+
+    case MIPS_BUILTIN_DIRECT_NO_TARGET:
+      return mips_expand_builtin_direct (d->icode, target, exp, false);
+
+    case MIPS_BUILTIN_MOVT:
+    case MIPS_BUILTIN_MOVF:
+      return mips_expand_builtin_movtf (d->builtin_type, d->icode,
+					d->cond, target, exp);
+
+    case MIPS_BUILTIN_CMP_ANY:
+    case MIPS_BUILTIN_CMP_ALL:
+    case MIPS_BUILTIN_CMP_UPPER:
+    case MIPS_BUILTIN_CMP_LOWER:
+    case MIPS_BUILTIN_CMP_SINGLE:
+      return mips_expand_builtin_compare (d->builtin_type, d->icode,
+					  d->cond, target, exp);
 
-  for (m = mips_bdesc_arrays;
-       m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
-       m++)
-    {
-      if (fcode < m->size)
-	return mips_expand_builtin_1 (m->bdesc + fcode, exp, target);
-      fcode -= m->size;
+    case MIPS_BUILTIN_BPOSGE32:
+      return mips_expand_builtin_bposge (d->builtin_type, target);
     }
   gcc_unreachable ();
 }
@@ -12440,23 +12812,34 @@ mips_override_options (void)
   REAL_MODE_FORMAT (TFmode) = &MIPS_TFMODE_FORMAT;
 #endif
 
-  /* Make sure that the user didn't turn off paired single support when
-     MIPS-3D support is requested.  */
-  if (TARGET_MIPS3D
-      && (target_flags_explicit & MASK_PAIRED_SINGLE_FLOAT)
-      && !TARGET_PAIRED_SINGLE_FLOAT)
-    error ("%<-mips3d%> requires %<-mpaired-single%>");
-
-  /* If TARGET_MIPS3D, enable MASK_PAIRED_SINGLE_FLOAT.  */
-  if (TARGET_MIPS3D)
-    target_flags |= MASK_PAIRED_SINGLE_FLOAT;
-
-  /* Make sure that when TARGET_PAIRED_SINGLE_FLOAT is true, TARGET_FLOAT64
-     and TARGET_HARD_FLOAT_ABI are both true.  */
-  if (TARGET_PAIRED_SINGLE_FLOAT && !(TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI))
-    error ("%qs must be used with %qs",
-	   TARGET_MIPS3D ? "-mips3d" : "-mpaired-single",
-	   TARGET_HARD_FLOAT_ABI ? "-mfp64" : "-mhard-float");
+  if (TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI)
+    {
+      /* Make sure that the user didn't turn off paired single support when
+	 MIPS-3D support is requested.  */
+      if (TARGET_MIPS3D
+	  && (target_flags_explicit & MASK_PAIRED_SINGLE_FLOAT)
+	  && !TARGET_PAIRED_SINGLE_FLOAT)
+	error ("%<-mips3d%> requires %<-mpaired-single%>");
+
+      /* We can use paired-single instructions.  Select them for targets
+	 that always provide them.  */
+      if (TARGET_LOONGSON_2EF || TARGET_MIPS3D)
+	target_flags |= MASK_PAIRED_SINGLE_FLOAT;
+    }
+  else
+    {
+      const char *missing_option;
+
+      /* If we need TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI, pick the
+	 most likely missing option.  -mfp64 only makes sense for
+	 -mhard-float, so if both conditions are false, complain about
+	 -mhard-float.  */
+      missing_option = (TARGET_HARD_FLOAT_ABI ? "-mfp64" : "-mhard-float");
+      if (TARGET_MIPS3D)
+	error ("%qs must be used with %qs", "-mips3d", missing_option);
+      else if (TARGET_PAIRED_SINGLE_FLOAT)
+	error ("%qs must be used with %qs", "-mpaired-single", missing_option);
+    }
 
   /* Make sure that the ISA supports TARGET_PAIRED_SINGLE_FLOAT when it is
      enabled.  */
@@ -12637,6 +13020,30 @@ mips_conditional_register_usage (void)
     }
 }
 
+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode;
+  enum machine_mode inner;
+  unsigned int i, n_elts;
+  rtx mem;
+
+  mode = GET_MODE (target);
+  inner = GET_MODE_INNER (mode);
+  n_elts = GET_MODE_NUNITS (mode);
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
+
 /* When generating MIPS16 code, we want to allocate $24 (T_REG) before
    other registers for instructions for which it is possible.  This
    encourages the compiler to use CMP in cases where an XOR would
@@ -12688,6 +13095,10 @@ mips_order_regs_for_local_alloc (void)
 #define TARGET_SCHED_ADJUST_COST mips_adjust_cost
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE mips_issue_rate
+#undef TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN
+#define TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN mips_init_dfa_post_cycle_insn
+#undef TARGET_SCHED_DFA_POST_ADVANCE_CYCLE
+#define TARGET_SCHED_DFA_POST_ADVANCE_CYCLE mips_dfa_post_advance_cycle
 #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
 #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
   mips_multipass_dfa_lookahead
--- gcc/config/mips/mips.h	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/mips.h	(/local/gcc-4)	(revision 525)
@@ -266,6 +266,14 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF1_1  \
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
+#define TUNE_LOONGSON_2EF           (mips_tune == PROCESSOR_LOONGSON_2E	\
+				     || mips_tune == PROCESSOR_LOONGSON_2F)
+
+/* Whether vector modes and intrinsics for ST Microelectronics
+   Loongson-2E/2F processors should be enabled.  In o32 pairs of
+   floating-point registers provide 64-bit values.  */
+#define TARGET_LOONGSON_VECTORS	    (TARGET_HARD_FLOAT_ABI		\
+				     && TARGET_LOONGSON_2EF)
 
 /* True if the pre-reload scheduler should try to create chains of
    multiply-add or multiply-subtract instructions.  For example,
@@ -497,6 +505,10 @@ enum mips_code_readable_setting {
 	  builtin_define_std ("MIPSEL");				\
 	  builtin_define ("_MIPSEL");					\
 	}								\
+                                                                        \
+      /* Whether Loongson vector modes are enabled.  */                 \
+      if (TARGET_LOONGSON_VECTORS)					\
+        builtin_define ("__mips_loongson_vector_rev");                  \
 									\
       /* Macros dependent on the C dialect.  */				\
       if (preprocessing_asm_p ())					\
@@ -733,14 +745,19 @@ enum mips_code_readable_setting {
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS16)
 
-/* ISA has the conditional move instructions introduced in mips4.  */
-#define ISA_HAS_CONDMOVE	((ISA_MIPS4				\
+/* ISA has the floating-point conditional move instructions introduced
+   in mips4.  */
+#define ISA_HAS_FP_CONDMOVE	((ISA_MIPS4				\
 				  || ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS5500			\
 				 && !TARGET_MIPS16)
 
+/* ISA has the integer conditional move instructions introduced in mips4 and
+   ST Loongson 2E/2F.  */
+#define ISA_HAS_CONDMOVE        (ISA_HAS_FP_CONDMOVE || TARGET_LOONGSON_2EF)
+
 /* ISA has LDC1 and SDC1.  */
 #define ISA_HAS_LDC1_SDC1	(!ISA_MIPS1 && !TARGET_MIPS16)
 
@@ -760,7 +777,9 @@ enum mips_code_readable_setting {
 				 && !TARGET_MIPS16)
 
 /* ISA has paired-single instructions.  */
-#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2 || ISA_MIPS64)
+#define ISA_HAS_PAIRED_SINGLE	(ISA_MIPS32R2				\
+				 || ISA_MIPS64				\
+				 || TARGET_LOONGSON_2EF)
 
 /* ISA has conditional trap instructions.  */
 #define ISA_HAS_COND_TRAP	(!ISA_MIPS1				\
@@ -775,14 +794,26 @@ enum mips_code_readable_setting {
 /* Integer multiply-accumulate instructions should be generated.  */
 #define GENERATE_MADD_MSUB      (ISA_HAS_MADD_MSUB && !TUNE_74K)
 
-/* ISA has floating-point nmadd and nmsub instructions for mode MODE.  */
-#define ISA_HAS_NMADD_NMSUB(MODE) \
+/* ISA has floating-point madd and msub instructions 'd = a * b [+-] c'.  */
+#define ISA_HAS_FP_MADD4_MSUB4  ISA_HAS_FP4
+
+/* ISA has floating-point madd and msub instructions 'c [+-]= a * b'.  */
+#define ISA_HAS_FP_MADD3_MSUB3  TARGET_LOONGSON_2EF
+
+/* ISA has floating-point nmadd and nmsub instructions
+   'd = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD4_NMSUB4(MODE)					\
 				((ISA_MIPS4				\
 				  || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
 				  || ISA_MIPS64)			\
 				 && (!TARGET_MIPS5400 || TARGET_MAD)	\
 				 && !TARGET_MIPS16)
 
+/* ISA has floating-point nmadd and nmsub instructions
+   'c = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD3_NMSUB3(MODE)					\
+                                TARGET_LOONGSON_2EF
+
 /* ISA has count leading zeroes/ones instruction (not implemented).  */
 #define ISA_HAS_CLZ_CLO		((ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
@@ -881,10 +912,12 @@ enum mips_code_readable_setting {
 				 && !TARGET_MIPS16)
 
 /* Likewise mtc1 and mfc1.  */
-#define ISA_HAS_XFER_DELAY	(mips_isa <= 3)
+#define ISA_HAS_XFER_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* Likewise floating-point comparisons.  */
-#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3)
+#define ISA_HAS_FCMP_DELAY	(mips_isa <= 3			\
+				 && !TARGET_LOONGSON_2EF)
 
 /* True if mflo and mfhi can be immediately followed by instructions
    which write to the HI and LO registers.
@@ -901,7 +934,8 @@ enum mips_code_readable_setting {
 #define ISA_HAS_HILO_INTERLOCKS	(ISA_MIPS32				\
 				 || ISA_MIPS32R2			\
 				 || ISA_MIPS64				\
-				 || TARGET_MIPS5500)
+				 || TARGET_MIPS5500			\
+				 || TARGET_LOONGSON_2EF)
 
 /* ISA includes synci, jr.hb and jalr.hb.  */
 #define ISA_HAS_SYNCI (ISA_MIPS32R2 && !TARGET_MIPS16)
@@ -921,7 +955,27 @@ enum mips_code_readable_setting {
   (target_flags_explicit & MASK_LLSC	\
    ? TARGET_LLSC && !TARGET_MIPS16	\
    : ISA_HAS_LL_SC)
-\f
+
+/* Predicates for paired-single float instructions.
+   ST Loongson 2E/2F CPUs support only a subset of all
+   paired-single float instructions, so we use below predicates
+   to restrict unsupported instructions.  */
+
+#define ISA_HAS_ABS_PS     TARGET_PAIRED_SINGLE_FLOAT
+#define ISA_HAS_C_COND_PS  TARGET_PAIRED_SINGLE_FLOAT
+#define ISA_HAS_SCC_PS     TARGET_PAIRED_SINGLE_FLOAT
+
+#define ISA_HAS_MOVCC_PS   (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_PXX_PS     (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_CVT_PS     (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_ALNV_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_ADDR_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_MULR_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_CABS_PS    (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_C_COND_4S  (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_BC1ANY_PS  (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+#define ISA_HAS_RSQRT_PS   (TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)
+
 /* Add -G xx support.  */
 
 #undef  SWITCH_TAKES_ARG
@@ -3202,3 +3256,6 @@ extern const struct mips_cpu_info *mips_
 extern const struct mips_rtx_cost_data *mips_cost;
 extern enum mips_code_readable_setting mips_code_readable;
 #endif
+
+/* Enable querying of DFA units.  */
+#define CPU_UNITS_QUERY 1
--- gcc/config/mips/mips-modes.def	(/local/gcc-trunk)	(revision 525)
+++ gcc/config/mips/mips-modes.def	(/local/gcc-4)	(revision 525)
@@ -26,6 +26,7 @@ RESET_FLOAT_FORMAT (DF, mips_double_form
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
+VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
 

Property changes on: 
___________________________________________________________________
Name: svk:merge
  23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-pta-dev:259
  23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-trunk:531
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:510
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-2:516
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-3:520
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:481
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-06-12 13:45     ` Maxim Kuvyrkov
@ 2008-06-12 17:49       ` Richard Sandiford
  2008-06-12 18:04         ` Maxim Kuvyrkov
       [not found]       ` <48515794.7050007@codesourcery.com>
  1 sibling, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-12 17:49 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> +(define_insn_reservation "ls2_ghost" 0
> +  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
> +       (eq_attr "type" "ghost"))
> +  "nothing")

This should be dead code because:

;; Ghost instructions produce no real code and introduce no hazards.
;; They exist purely to express an effect on dataflow.
(define_insn_reservation "ghost" 0
  (eq_attr "type" "ghost")
  "nothing")

in mips.md should (by design) trump it.  Let's just remove it.

> +;; Reservation for everything else.  Normally, this reservation
> +;; will only be used to handle cases like compiling
> +;; for non-loongson CPU with -mtune=loongson2?.
> +;;
> +;; !!! This is not a good thing to depend upon the fact that
> +;; DFA will check reservations in the same order as they appear
> +;; in the file, but it seems to work for the time being.
> +;; Anyway, we already use this DFA property heavily with generic.md.
> +(define_insn_reservation "ls2_unknown" 1
> +  (eq_attr "cpu" "loongson_2e,loongson_2f")
> +  "ls2_alu1_core+ls2_alu2_core+ls2_falu1_core+ls2_falu2_core+ls2_mem")

I disagree that it's a bad idea.  (And as you say, we deliberately
rely on this for legacy schedulers.)  Let's just tone it down to:

;; Reservation for everything else.  Normally, this reservation
;; will only be used to handle cases like compiling for non-Loongson
;; CPUs with -mtune=loongson2?.
;;
;; This reservation depends upon the fact that DFA will check
;; reservations in the same order as they appear in the file.

> @@ -10143,7 +10331,20 @@ mips_variable_issue (FILE *file ATTRIBUT
>        vr4130_last_insn = insn;
>        if (TUNE_74K)
>  	mips_74k_agen_init (insn);
> +      else if (TUNE_LOONGSON_2EF)
> +	mips_ls2_variable_issue (insn);
>      }
> +
> +  if (recog_memoized (insn) >= 0)
> +    /* Instructions of type 'multi' should all be split before
> +       second scheduling pass.  */
> +    {
> +      bool multi_p;
> +
> +      multi_p = (get_attr_type (insn) == TYPE_MULTI);
> +      gcc_assert (!multi_p || !reload_completed);
> +    }
> +

Simplify this "if" statement to:

  /* Instructions of type 'multi' should all be split before
     second scheduling pass.  */
  gcc_assert (!reload_completed
	      || recog_memoized (insn) < 0
	      || get_attr_type (insn) != TYPE_MULTI);

OK with those changes, thanks.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-06-12 17:49       ` Richard Sandiford
@ 2008-06-12 18:04         ` Maxim Kuvyrkov
  2008-06-12 18:53           ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-12 18:04 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> +(define_insn_reservation "ls2_ghost" 0
>> +  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
>> +       (eq_attr "type" "ghost"))
>> +  "nothing")
> 
> This should be dead code because:
> 
> ;; Ghost instructions produce no real code and introduce no hazards.
> ;; They exist purely to express an effect on dataflow.
> (define_insn_reservation "ghost" 0
>   (eq_attr "type" "ghost")
>   "nothing")
> 
> in mips.md should (by design) trump it.  Let's just remove it.

OK.

> 
>> +;; Reservation for everything else.  Normally, this reservation
>> +;; will only be used to handle cases like compiling
>> +;; for non-loongson CPU with -mtune=loongson2?.
>> +;;
>> +;; !!! This is not a good thing to depend upon the fact that
>> +;; DFA will check reservations in the same order as they appear
>> +;; in the file, but it seems to work for the time being.
>> +;; Anyway, we already use this DFA property heavily with generic.md.
>> +(define_insn_reservation "ls2_unknown" 1
>> +  (eq_attr "cpu" "loongson_2e,loongson_2f")
>> +  "ls2_alu1_core+ls2_alu2_core+ls2_falu1_core+ls2_falu2_core+ls2_mem")
> 
> I disagree that it's a bad idea.  (And as you say, we deliberately
> rely on this for legacy schedulers.)  Let's just tone it down to:
> 
> ;; Reservation for everything else.  Normally, this reservation
> ;; will only be used to handle cases like compiling for non-Loongson
> ;; CPUs with -mtune=loongson2?.
> ;;
> ;; This reservation depends upon the fact that DFA will check
> ;; reservations in the same order as they appear in the file.

I can update the comment, but first I'll cite GCC Internals:

/condition/ [of define_insn_reservation] defines what RTL insns are 
described by this construction. You should remember that you will be in 
trouble if condition for two or more different define_insn_reservation 
constructions is TRUE for an insn. In this case what reservation will be 
used for the insn is not defined.

> 
>> @@ -10143,7 +10331,20 @@ mips_variable_issue (FILE *file ATTRIBUT
>>        vr4130_last_insn = insn;
>>        if (TUNE_74K)
>>  	mips_74k_agen_init (insn);
>> +      else if (TUNE_LOONGSON_2EF)
>> +	mips_ls2_variable_issue (insn);
>>      }
>> +
>> +  if (recog_memoized (insn) >= 0)
>> +    /* Instructions of type 'multi' should all be split before
>> +       second scheduling pass.  */
>> +    {
>> +      bool multi_p;
>> +
>> +      multi_p = (get_attr_type (insn) == TYPE_MULTI);
>> +      gcc_assert (!multi_p || !reload_completed);
>> +    }
>> +
> 
> Simplify this "if" statement to:
> 
>   /* Instructions of type 'multi' should all be split before
>      second scheduling pass.  */
>   gcc_assert (!reload_completed
> 	      || recog_memoized (insn) < 0
> 	      || get_attr_type (insn) != TYPE_MULTI);

Err, I don't like putting functions with side-effects into asserts -- 
I've got burned once ;)  In this case, recog_memoized sets 
INSN_CODE(insn) if it was -1.

I don't want to argue about these nit picks, so, if you prefer, I'll 
change the comment and the check.


Thanks,

Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
       [not found]       ` <48515794.7050007@codesourcery.com>
  2008-06-12 17:21         ` Maxim Kuvyrkov
@ 2008-06-12 18:06         ` Richard Sandiford
  2008-06-13 18:17           ` Maxim Kuvyrkov
  1 sibling, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-12 18:06 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Maxim Kuvyrkov wrote:
>> Richard Sandiford wrote:
>
> ...
>
>>> Anyway, this patch looks good, thanks.  A neat use of CPU querying ;)
>
> I believe this patch exposes a latent bug in MIPS backend.
>
> Though the bug is visible only when scheduling for Loongson, my 
> investigation suggests that it can occur on other MIPS architectures too.
>
> Here is the error:
>
> <snip>
> ./cc1 -quiet -march=loongson2e -O2 -o testcase.s testcase.c
> testcase.c: In function '_nl_normalize_codeset':
> testcase.c:211: error: unable to find a register to spill in class 'V1_REG'
> testcase.c:211: error: this is the insn:
> (insn 51 45 47 3 testcase.c:150 (set (reg:SI 11 $11 [279])
>          (unspec:SI [
>                  (const_int 0 [0x0])
>              ] 30)) 570 {tls_get_tp_si} (expr_list:REG_EQUIV (unspec:SI [
>                  (const_int 0 [0x0])
>              ] 30)
>          (nil)))
> testcase.c:211: internal compiler error: in spill_failure, at reload1.c:1999
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> </snip>

FWIW, this is PR 35802.  'Fraid I still haven't had chance to look at it,
what with IRA, recog-related stuff and reviews.

> The error is triggered by two consecutive tls_get_tp_si instructions. 
> These instructions use "v" constraint, which specifies V1_REG, aka $3.
>
> If no scheduling is performed, then the two tls_get_tp_si instructions 
> are separated by one instruction; when the scheduler is on, they end up 
> together and reload crashes.
>
> In the case when two get_tp instructions are separated by a third 
> instruction, reload manages to remove the second get_tp instruction 
> (instructions are the same, so only one of them is needed) and 
> everything works out well.  If the two instructions are side-by-side 
> reload can't find a spill register for the second get_tp and crashes.
>
> My speculation is that mov<mode>_internal should be taught how to move 
> "v" into "d".  RichardS should know for sure.

It already knows how to do that.  "v" is a subclass of "d", so it's
just a normal GPR<->GPR move.

The problem is that "v" represents a single register: $3.  We initially
generate these instructions with $3 as the result, but the insn predicates
allow it to be replaced by any register_operand later.  Thus we can end
up with a pseudo register that is:

  (a) the output operand of a tls_get_tp_<mode> insn, and must thus
      be reloaded into $3;
  (b) live at the same time as an explicit $3.

We have two choices:

  - always use pseudo registers, rather than introducing uses of $3
    from the outset

  - force the destination of tls_get_tp_<mode> to be $3 only.

The second is probably the most conservative approach, since explicit
uses of $3 can occur through normal calls.  The patch below does this.

I admit I haven't verified any of this yet, so sorry if I'm off-ball.
But does the patch fix things?

Richard


gcc/
	* config/mips/predicates.md (v1_operand): New predicate.
	* config/mips/mips.md (tls_get_tp_<mode>): Use it instead of
	register_operand.

Index: gcc/config/mips/predicates.md
===================================================================
--- gcc/config/mips/predicates.md	2008-06-12 18:54:01.000000000 +0100
+++ gcc/config/mips/predicates.md	2008-06-12 19:03:27.000000000 +0100
@@ -76,6 +76,10 @@ (define_predicate "const_0_or_1_operand"
        (ior (match_test "op == CONST0_RTX (GET_MODE (op))")
 	    (match_test "op == CONST1_RTX (GET_MODE (op))"))))
 
+(define_predicate "v1_operand"
+  (and (match_code "reg")
+       (match_test "REGNO (op) == GP_RETURN + 1")))
+
 (define_predicate "d_operand"
   (and (match_code "reg")
        (match_test "TARGET_MIPS16
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	2008-06-12 18:53:41.000000000 +0100
+++ gcc/config/mips/mips.md	2008-06-12 18:53:49.000000000 +0100
@@ -6410,7 +6410,7 @@ (define_insn "*mips16e_save_restore"
 ; accept it.
 
 (define_insn "tls_get_tp_<mode>"
-  [(set (match_operand:P 0 "register_operand" "=v")
+  [(set (match_operand:P 0 "v1_operand" "=v")
 	(unspec:P [(const_int 0)]
 		  UNSPEC_TLS_GET_TP))]
   "HAVE_AS_TLS && !TARGET_MIPS16"

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-06-12 17:21         ` Maxim Kuvyrkov
@ 2008-06-12 18:43           ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-12 18:43 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Hi Maxim,

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> +AVAIL_NON_MIPS16 (paired_single, TARGET_PAIRED_SINGLE_FLOAT)
> +AVAIL_NON_MIPS16 (sb1_paired_single, TARGET_SB1 && TARGET_PAIRED_SINGLE_FLOAT)
> +AVAIL_NON_MIPS16 (paired_single_no_ls2,
> +		  TARGET_PAIRED_SINGLE_FLOAT && !TARGET_LOONGSON_2EF)

I was hoping we'd match the insn conditions.  So:

+  MOVTF_BUILTINS (c, COND, paired_single_no_ls2),			\

...this would depend on:

AVAIL_NON_MIPS16 (movcc_ps, ISA_HAS_C_COND_PS && ISA_HAS_MOVCC_PS);

> +  CMP_PS_BUILTINS (c, COND, paired_single),				\

...this would depend on:

AVAIL_NON_MIPS16 (c_cond_ps, ISA_HAS_C_COND_PS);

> +  DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),
> +  DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),
> +  DIRECT_BUILTIN (plu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),
> +  DIRECT_BUILTIN (puu_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single_no_ls2),

...these would depend on:

AVAIL_NON_MIPS16 (pxx_ps, ISA_HAS_PXX_PS);

> +  DIRECT_BUILTIN (cvt_ps_s, MIPS_V2SF_FTYPE_SF_SF, paired_single_no_ls2),
> +  DIRECT_BUILTIN (cvt_s_pl, MIPS_SF_FTYPE_V2SF, paired_single_no_ls2),
> +  DIRECT_BUILTIN (cvt_s_pu, MIPS_SF_FTYPE_V2SF, paired_single_no_ls2),

...these would depend on:

AVAIL_NON_MIPS16 (cvt_ps, ISA_HAS_CVT_PS);

> +  DIRECT_BUILTIN (abs_ps, MIPS_V2SF_FTYPE_V2SF, paired_single),

...this would depend on:

AVAIL_NON_MIPS16 (abs_ps, ISA_HAS_ABS_PS);

> +  DIRECT_BUILTIN (alnv_ps, MIPS_V2SF_FTYPE_V2SF_V2SF_INT, paired_single_no_ls2),

...this would depend on:

AVAIL_NON_MIPS16 (alnv_ps, ISA_HAS_ALVN_PS);

I think that's all of the paired_single{,_no_ls2} uses sorted out.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-06-12 18:04         ` Maxim Kuvyrkov
@ 2008-06-12 18:53           ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-12 18:53 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher, vmakarov

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>> +;; Reservation for everything else.  Normally, this reservation
>>> +;; will only be used to handle cases like compiling
>>> +;; for non-loongson CPU with -mtune=loongson2?.
>>> +;;
>>> +;; !!! This is not a good thing to depend upon the fact that
>>> +;; DFA will check reservations in the same order as they appear
>>> +;; in the file, but it seems to work for the time being.
>>> +;; Anyway, we already use this DFA property heavily with generic.md.
>>> +(define_insn_reservation "ls2_unknown" 1
>>> +  (eq_attr "cpu" "loongson_2e,loongson_2f")
>>> +  "ls2_alu1_core+ls2_alu2_core+ls2_falu1_core+ls2_falu2_core+ls2_mem")
>> 
>> I disagree that it's a bad idea.  (And as you say, we deliberately
>> rely on this for legacy schedulers.)  Let's just tone it down to:
>> 
>> ;; Reservation for everything else.  Normally, this reservation
>> ;; will only be used to handle cases like compiling for non-Loongson
>> ;; CPUs with -mtune=loongson2?.
>> ;;
>> ;; This reservation depends upon the fact that DFA will check
>> ;; reservations in the same order as they appear in the file.
>
> I can update the comment,

Thanks.

> but first I'll cite GCC Internals:
>
> /condition/ [of define_insn_reservation] defines what RTL insns are 
> described by this construction. You should remember that you will be in 
> trouble if condition for two or more different define_insn_reservation 
> constructions is TRUE for an insn. In this case what reservation will be 
> used for the insn is not defined.

Vlad, I think we should remove this, and guarantee .md-file order,
like we do for other .md things like insn patterns.  Would that be OK?

>> Simplify this "if" statement to:
>> 
>>   /* Instructions of type 'multi' should all be split before
>>      second scheduling pass.  */
>>   gcc_assert (!reload_completed
>> 	      || recog_memoized (insn) < 0
>> 	      || get_attr_type (insn) != TYPE_MULTI);
>
> Err, I don't like putting functions with side-effects into asserts -- 
> I've got burned once ;)  In this case, recog_memoized sets 
> INSN_CODE(insn) if it was -1.

But that's not conceptually a side-effect.  You can't sensibly
schedule something without finding out what it is first.

> I don't want to argue about these nit picks, so, if you prefer, I'll 
> change the comment and the check.

Please do, thanks.  BTW, just noticed a missing "the"; should be:

   /* Instructions of type 'multi' should all be split before
      the second scheduling pass.  */

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-06-09 17:45         ` Richard Sandiford
@ 2008-06-13  6:59           ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-13  6:59 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Richard Sandiford <rdsandiford@googlemail.com> writes:
> [...] I'd already written a patch to replace the
> current target_flags/mips_bdesc_arrays stuff with predicate functions.
> I'd written it as part of a patch to add R10000 cache barriers that
> I sent out a while ago.  The interest in that patch seems to have
> waned, but it looks like the builtins patch is useful on its own.
>
> With this, you should be able to attach accurate ISA_HAS_* predicates
> to each paired-single function.  Let me know if it works.

Tested on mipsisa64-elfoabi and applied to mainline.

> gcc/
> 	* config/mips/mips.c (BUILTIN_AVAIL_NON_MIPS16): New macro.
> 	(AVAIL_NON_MIPS16): Likewise.
> 	(mips_builtin_description): Replace target_flags with a predicate.
> 	(paired_single, sb1_paired_single, mips3d, dsp, dspr2, dsp_32)
> 	(dspr2_32): New availability predicates.
> 	(MIPS_BUILTIN): New macro.
> 	(DIRECT_BUILTIN, CMP_SCALAR_BUILTINS, CMP_PS_BUILTINS)
> 	(CMP_4S_BUILTINS, MOVTF_BUILTINS, CMP_BUILTINS)
> 	(DIRECT_NO_TARGET_BUILTIN, BPOSGE_BUILTIN): Use it.
> 	Replace the TARGET_FLAGS parameters with AVAIL parameters.
> 	(mips_ps_bdesc, mips_sb1_bdesc, mips_dsp_bdesc)
> 	(mips_dsp_32only_bdesc): Merge into...
> 	(mips_builtins): ...this new array.
> 	(mips_bdesc_map, mips_bdesc_arrays): Delete.
> 	(mips_init_builtins): Update after above changes.
> 	(mips_expand_builtin_1): Merge into...
> 	(mips_expand_builtin): ...here and update after above changes.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-06-12 18:06         ` Richard Sandiford
@ 2008-06-13 18:17           ` Maxim Kuvyrkov
  2008-06-14  8:32             ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-13 18:17 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:

...

> FWIW, this is PR 35802.  'Fraid I still haven't had chance to look at it,
> what with IRA, recog-related stuff and reviews.

...

> We have two choices:
> 
>   - always use pseudo registers, rather than introducing uses of $3
>     from the outset
> 
>   - force the destination of tls_get_tp_<mode> to be $3 only.

I don't see how first approach can be any better than second.  We will 
allocate register $3 for all those pseudos in the end.

> 
> The second is probably the most conservative approach, since explicit
> uses of $3 can occur through normal calls.  The patch below does this.
> 
> I admit I haven't verified any of this yet, so sorry if I'm off-ball.
> But does the patch fix things?

Looks like it does.  I didn't run full regression testsuite, but glibc 
now builds.

--
Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-12  9:03                       ` Richard Sandiford
@ 2008-06-13 18:36                         ` Maxim Kuvyrkov
  2008-06-14  8:20                           ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-13 18:36 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]

Richard Sandiford wrote:

...

> Well, I wasn't talking about adding optimisations, so the reply was
> a bit confusing ;)  Looks like we're really in violent agreement though.
> 
>>> PS. Just as a reminder, I think it would be a good idea to have
>>>     scan-assembler tests too, as a separate submission.
>> This is on my list.

Here is latest version of the patch.  The only thing changed from the 
previous version is that I replaced 'plus' in loongson_paddd and 'minus' 
in loongson_psubd with unspecs.

It was tested with -march={mips3, loongson2?}/-mabi=n32,32,64 on 
Loongson-2E and Loongson-2F boxes.

I believe this patch is approved, so I'll check it in sometime tomorrow. 
  I still owe a scan-assembler test for the intrinsics, though.

Richard, thanks again for reviewing and helping with this patch.

--
Maxim

[-- Attachment #2: fsf-ls2ef-2-vector.ChangeLog --]
[-- Type: text/plain, Size: 2393 bytes --]

	* config/mips/mips-modes.def: Add V8QI, V4HI and V2SI modes.
	* config/mips/mips-protos.h (mips_expand_vector_init): New.
	* config/mips/mips-ftypes.def: Add function types for Loongson-2E/2F
	builtins.
	* config/mips/mips.c (mips_split_doubleword_move): Handle new modes.
	(mips_hard_regno_mode_ok_p): Allow 64-bit vector modes for Loongson.
	(mips_vector_mode_supported_p): Add V2SImode, V4HImode and
	V8QImode cases.
	(LOONGSON_BUILTIN, LOONGSON_BUILTIN_ALIAS): New.
	(CODE_FOR_loongson_packsswh, CODE_FOR_loongson_packsshb,
	(CODE_FOR_loongson_packushb, CODE_FOR_loongson_paddw,
	(CODE_FOR_loongson_paddh, CODE_FOR_loongson_paddb,
	(CODE_FOR_loongson_paddsh, CODE_FOR_loongson_paddsb)
	(CODE_FOR_loongson_paddush, CODE_FOR_loongson_paddusb)
	(CODE_FOR_loongson_pmaxsh, CODE_FOR_loongson_pmaxub)
	(CODE_FOR_loongson_pminsh, CODE_FOR_loongson_pminub)
	(CODE_FOR_loongson_pmulhuh, CODE_FOR_loongson_pmulhh)
	(CODE_FOR_loongson_biadd, CODE_FOR_loongson_psubw)
	(CODE_FOR_loongson_psubh, CODE_FOR_loongson_psubb)
	(CODE_FOR_loongson_psubsh, CODE_FOR_loongson_psubsb)
	(CODE_FOR_loongson_psubush, CODE_FOR_loongson_psubusb)
	(CODE_FOR_loongson_punpckhbh, CODE_FOR_loongson_punpckhhw)
	(CODE_FOR_loongson_punpckhwd, CODE_FOR_loongson_punpcklbh)
	(CODE_FOR_loongson_punpcklhw, CODE_FOR_loongson_punpcklwd): New.
	(mips_builtins): Add Loongson builtins.
	(mips_loongson_2ef_bdesc): New.
	(mips_bdesc_arrays): Add mips_loongson_2ef_bdesc.
	(mips_builtin_vector_type): Handle unsigned versions of vector modes.
	(MIPS_ATYPE_UQI, MIPS_ATYPE_UDI, MIPS_ATYPE_V2SI, MIPS_ATYPE_UV2SI)
	(MIPS_ATYPE_V4HI, MIPS_ATYPE_UV4HI, MIPS_ATYPE_V8QI, MIPS_ATYPE_UV8QI):
	New.
	(mips_expand_vector_init): New.
	* config/mips/mips.h (HAVE_LOONGSON_VECTOR_MODES): New.
	(TARGET_CPU_CPP_BUILTINS): Define __mips_loongson_vector_rev
	if appropriate.
	* config/mips/mips.md: Add unspec numbers for Loongson
	builtins.  Include loongson.md.
	(MOVE64): Include Loongson vector modes.
	(SPLITF): Include Loongson vector modes.
	(HALFMODE): Handle Loongson vector modes.
	* config/mips/loongson.md: New.
	* config/mips/loongson.h: New.
	* config.gcc: Add loongson.h header for mips*-*-* targets.
	* doc/extend.texi (MIPS Loongson Built-in Functions): New.

2008-06-13  Mark Shinwell  <shinwell@codesourcery.com>

	* lib/target-supports.exp (check_effective_target_mips_loongson): New.
	* gcc.target/mips/loongson-simd.c: New.

[-- Attachment #3: fsf-ls2ef-2-vector.patch --]
[-- Type: text/plain, Size: 107570 bytes --]

--- gcc/doc/extend.texi	(/local/gcc-1)	(revision 556)
+++ gcc/doc/extend.texi	(/local/gcc-2)	(revision 556)
@@ -6788,6 +6788,7 @@ instructions, but allow the compiler to 
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
 * PowerPC AltiVec Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
@@ -8667,6 +8668,132 @@ value is the upper one.  The opposite or
 For example, the code above will set the lower half of @code{a} to
 @code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
+
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
+
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
+
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
+
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+@end smallexample
+
 @menu
 * Paired-Single Arithmetic::
 * Paired-Single Built-in Functions::
--- gcc/testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-1)	(revision 556)
+++ gcc/testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-2)	(revision 556)
@@ -0,0 +1,1963 @@
+/* Test cases for ST Microelectronics Loongson-2E/2F SIMD intrinsics.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target mips_loongson } */
+
+#include "loongson.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+
+typedef union { int32x2_t v; int32_t a[2]; } int32x2_encap_t;
+typedef union { int16x4_t v; int16_t a[4]; } int16x4_encap_t;
+typedef union { int8x8_t v; int8_t a[8]; } int8x8_encap_t;
+typedef union { uint32x2_t v; uint32_t a[2]; } uint32x2_encap_t;
+typedef union { uint16x4_t v; uint16_t a[4]; } uint16x4_encap_t;
+typedef union { uint8x8_t v; uint8_t a[8]; } uint8x8_encap_t;
+
+#define UINT16x4_MAX USHRT_MAX
+#define UINT8x8_MAX UCHAR_MAX
+#define INT8x8_MAX SCHAR_MAX
+#define INT16x4_MAX SHRT_MAX
+#define INT32x2_MAX INT_MAX
+
+static void test_packsswh (void)
+{
+  int32x2_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = INT16x4_MAX - 2;
+  s.a[1] = INT16x4_MAX - 1;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX + 1;
+  r.v = packsswh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 2);
+  assert (r.a[1] == INT16x4_MAX - 1);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_packsshb (void)
+{
+  int16x4_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = INT8x8_MAX - 6;
+  s.a[1] = INT8x8_MAX - 5;
+  s.a[2] = INT8x8_MAX - 4;
+  s.a[3] = INT8x8_MAX - 3;
+  t.a[0] = INT8x8_MAX - 2;
+  t.a[1] = INT8x8_MAX - 1;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX + 1;
+  r.v = packsshb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_packushb (void)
+{
+  uint16x4_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = UINT8x8_MAX - 6;
+  s.a[1] = UINT8x8_MAX - 5;
+  s.a[2] = UINT8x8_MAX - 4;
+  s.a[3] = UINT8x8_MAX - 3;
+  t.a[0] = UINT8x8_MAX - 2;
+  t.a[1] = UINT8x8_MAX - 1;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX + 1;
+  r.v = packushb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX - 6);
+  assert (r.a[1] == UINT8x8_MAX - 5);
+  assert (r.a[2] == UINT8x8_MAX - 4);
+  assert (r.a[3] == UINT8x8_MAX - 3);
+  assert (r.a[4] == UINT8x8_MAX - 2);
+  assert (r.a[5] == UINT8x8_MAX - 1);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_paddw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 6);
+}
+
+static void test_paddw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  r.v = paddw_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_paddh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = paddh_u (s.v, t.v);
+  assert (r.a[0] == 6);
+  assert (r.a[1] == 8);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 12);
+}
+
+static void test_paddh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = paddh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+}
+
+static void test_paddb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 5;
+  s.a[5] = 6;
+  s.a[6] = 7;
+  s.a[7] = 8;
+  t.a[0] = 9;
+  t.a[1] = 10;
+  t.a[2] = 11;
+  t.a[3] = 12;
+  t.a[4] = 13;
+  t.a[5] = 14;
+  t.a[6] = 15;
+  t.a[7] = 16;
+  r.v = paddb_u (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 12);
+  assert (r.a[2] == 14);
+  assert (r.a[3] == 16);
+  assert (r.a[4] == 18);
+  assert (r.a[5] == 20);
+  assert (r.a[6] == 22);
+  assert (r.a[7] == 24);
+}
+
+static void test_paddb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = paddb_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+  assert (r.a[4] == -45);
+  assert (r.a[5] == -54);
+  assert (r.a[6] == -63);
+  assert (r.a[7] == -72);
+}
+
+static void test_paddd_u (void)
+{
+  uint64_t d = 123456;
+  uint64_t e = 789012;
+  uint64_t r;
+  r = paddd_u (d, e);
+  assert (r == 912468);
+}
+
+static void test_paddd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = paddd_s (d, e);
+  assert (r == -665556);
+}
+
+static void test_paddsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX;
+  t.a[2] = INT16x4_MAX;
+  t.a[3] = INT16x4_MAX;
+  r.v = paddsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_paddsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = INT8x8_MAX;
+  t.a[1] = INT8x8_MAX;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX;
+  t.a[4] = INT8x8_MAX;
+  t.a[5] = INT8x8_MAX;
+  t.a[6] = INT8x8_MAX;
+  t.a[7] = INT8x8_MAX;
+  r.v = paddsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_paddush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  t.a[0] = UINT16x4_MAX;
+  t.a[1] = UINT16x4_MAX;
+  t.a[2] = UINT16x4_MAX;
+  t.a[3] = UINT16x4_MAX;
+  r.v = paddush (s.v, t.v);
+  assert (r.a[0] == UINT16x4_MAX);
+  assert (r.a[1] == UINT16x4_MAX);
+  assert (r.a[2] == UINT16x4_MAX);
+  assert (r.a[3] == UINT16x4_MAX);
+}
+
+static void test_paddusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  s.a[4] = 0;
+  s.a[5] = 1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = UINT8x8_MAX;
+  t.a[1] = UINT8x8_MAX;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX;
+  t.a[4] = UINT8x8_MAX;
+  t.a[5] = UINT8x8_MAX;
+  t.a[6] = UINT8x8_MAX;
+  t.a[7] = UINT8x8_MAX;
+  r.v = paddusb (s.v, t.v);
+  assert (r.a[0] == UINT8x8_MAX);
+  assert (r.a[1] == UINT8x8_MAX);
+  assert (r.a[2] == UINT8x8_MAX);
+  assert (r.a[3] == UINT8x8_MAX);
+  assert (r.a[4] == UINT8x8_MAX);
+  assert (r.a[5] == UINT8x8_MAX);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_pandn_ud (void)
+{
+  uint64_t d1 = 0x0000ffff0000ffffull;
+  uint64_t d2 = 0x0000ffff0000ffffull;
+  uint64_t r;
+  r = pandn_ud (d1, d2);
+  assert (r == 0);
+}
+
+static void test_pandn_sd (void)
+{
+  int64_t d1 = (int64_t) 0x0000000000000000ull;
+  int64_t d2 = (int64_t) 0xfffffffffffffffeull;
+  int64_t r;
+  r = pandn_sd (d1, d2);
+  assert (r == -2);
+}
+
+static void test_pandn_uw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0x00000000;
+  t.a[1] = 0xffffffff;
+  r.v = pandn_uw (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pandn_sw (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0xffffffff;
+  t.a[1] = 0xfffffffe;
+  r.v = pandn_sw (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+}
+
+static void test_pandn_uh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0x0000;
+  t.a[1] = 0xffff;
+  t.a[2] = 0x0000;
+  t.a[3] = 0xffff;
+  r.v = pandn_uh (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pandn_sh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0xffff;
+  t.a[1] = 0xfffe;
+  t.a[2] = 0xffff;
+  t.a[3] = 0xfffe;
+  r.v = pandn_sh (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+}
+
+static void test_pandn_ub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0x00;
+  t.a[1] = 0xff;
+  t.a[2] = 0x00;
+  t.a[3] = 0xff;
+  t.a[4] = 0x00;
+  t.a[5] = 0xff;
+  t.a[6] = 0x00;
+  t.a[7] = 0xff;
+  r.v = pandn_ub (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pandn_sb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0xff;
+  t.a[1] = 0xfe;
+  t.a[2] = 0xff;
+  t.a[3] = 0xfe;
+  t.a[4] = 0xff;
+  t.a[5] = 0xfe;
+  t.a[6] = 0xff;
+  t.a[7] = 0xfe;
+  r.v = pandn_sb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -2);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -2);
+}
+
+static void test_pavgh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  r.v = pavgh (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+}
+
+static void test_pavgb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 1;
+  s.a[5] = 2;
+  s.a[6] = 3;
+  s.a[7] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = pavgb (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+  assert (r.a[4] == 3);
+  assert (r.a[5] == 4);
+  assert (r.a[6] == 5);
+  assert (r.a[7] == 6);
+}
+
+static void test_pcmpeqw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  r.v = pcmpeqw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpeqh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  r.v = pcmpeqh_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpeqb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 42;
+  s.a[5] = 43;
+  s.a[6] = 42;
+  s.a[7] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  t.a[4] = 43;
+  t.a[5] = 43;
+  t.a[6] = 43;
+  t.a[7] = 43;
+  r.v = pcmpeqb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpeqw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  r.v = pcmpeqw_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+}
+
+static void test_pcmpeqh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  r.v = pcmpeqh_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpeqb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = -42;
+  s.a[5] = -42;
+  s.a[6] = -42;
+  s.a[7] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = -42;
+  t.a[6] = 42;
+  t.a[7] = -42;
+  r.v = pcmpeqb_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -1);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -1);
+}
+
+static void test_pcmpgtw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 42;
+  r.v = pcmpgtw_u (s.v, t.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpgth_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 40;
+  t.a[1] = 41;
+  t.a[2] = 43;
+  t.a[3] = 42;
+  r.v = pcmpgth_u (s.v, t.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0x0000);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpgtb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 44;
+  s.a[5] = 45;
+  s.a[6] = 46;
+  s.a[7] = 47;
+  t.a[0] = 48;
+  t.a[1] = 47;
+  t.a[2] = 46;
+  t.a[3] = 45;
+  t.a[4] = 44;
+  t.a[5] = 43;
+  t.a[6] = 42;
+  t.a[7] = 41;
+  r.v = pcmpgtb_u (s.v, t.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0x00);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0x00);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0xff);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpgtw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = -42;
+  t.a[0] = -42;
+  t.a[1] = -42;
+  r.v = pcmpgtw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 0);
+}
+
+static void test_pcmpgth_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = 43;
+  t.a[2] = 44;
+  t.a[3] = -43;
+  r.v = pcmpgth_s (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpgtb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = 42;
+  s.a[5] = 42;
+  s.a[6] = 42;
+  s.a[7] = 42;
+  t.a[0] = -45;
+  t.a[1] = -44;
+  t.a[2] = -43;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = 43;
+  t.a[6] = 41;
+  t.a[7] = 40;
+  r.v = pcmpgtb_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == -1);
+  assert (r.a[7] == -1);
+}
+
+static void test_pextrh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  r.v = pextrh_u (s.v, 1);
+  assert (r.a[0] == 41);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pextrh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -40;
+  s.a[1] = -41;
+  s.a[2] = -42;
+  s.a[3] = -43;
+  r.v = pextrh_s (s.v, 2);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pinsrh_0123_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_u (t.v, s.v);
+  r.v = pinsrh_1_u (r.v, s.v);
+  r.v = pinsrh_2_u (r.v, s.v);
+  r.v = pinsrh_3_u (r.v, s.v);
+  assert (r.a[0] == 42);
+  assert (r.a[1] == 42);
+  assert (r.a[2] == 42);
+  assert (r.a[3] == 42);
+}
+
+static void test_pinsrh_0123_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  r.v = pinsrh_0_s (t.v, s.v);
+  r.v = pinsrh_1_s (r.v, s.v);
+  r.v = pinsrh_2_s (r.v, s.v);
+  r.v = pinsrh_3_s (r.v, s.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == -42);
+  assert (r.a[2] == -42);
+  assert (r.a[3] == -42);
+}
+
+static void test_pmaddhw (void)
+{
+  int16x4_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -5;
+  s.a[1] = -4;
+  s.a[2] = -3;
+  s.a[3] = -2;
+  t.a[0] = 10;
+  t.a[1] = 11;
+  t.a[2] = 12;
+  t.a[3] = 13;
+  r.v = pmaddhw (s.v, t.v);
+  assert (r.a[0] == (-5*10 + -4*11));
+  assert (r.a[1] == (-3*12 + -2*13));
+}
+
+static void test_pmaxsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pmaxsh (s.v, t.v);
+  assert (r.a[0] == 20);
+  assert (r.a[1] == 40);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 50);
+}
+
+static void test_pmaxub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pmaxub (s.v, t.v);
+  assert (r.a[0] == 80);
+  assert (r.a[1] == 70);
+  assert (r.a[2] == 60);
+  assert (r.a[3] == 50);
+  assert (r.a[4] == 50);
+  assert (r.a[5] == 60);
+  assert (r.a[6] == 70);
+  assert (r.a[7] == 80);
+}
+
+static void test_pminsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  r.v = pminsh (s.v, t.v);
+  assert (r.a[0] == -20);
+  assert (r.a[1] == -40);
+  assert (r.a[2] == -10);
+  assert (r.a[3] == -50);
+}
+
+static void test_pminub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pminub (s.v, t.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 20);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 40);
+  assert (r.a[4] == 40);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 20);
+  assert (r.a[7] == 10);
+}
+
+static void test_pmovmskb_u (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_encap_t r;
+  s.a[0] = 0xf0;
+  s.a[1] = 0x40;
+  s.a[2] = 0xf0;
+  s.a[3] = 0x40;
+  s.a[4] = 0xf0;
+  s.a[5] = 0x40;
+  s.a[6] = 0xf0;
+  s.a[7] = 0x40;
+  r.v = pmovmskb_u (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmovmskb_s (void)
+{
+  int8x8_encap_t s;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 1;
+  s.a[2] = -1;
+  s.a[3] = 1;
+  s.a[4] = -1;
+  s.a[5] = 1;
+  s.a[6] = -1;
+  s.a[7] = 1;
+  r.v = pmovmskb_s (s.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmulhuh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0xff00;
+  s.a[1] = 0xff00;
+  s.a[2] = 0xff00;
+  s.a[3] = 0xff00;
+  t.a[0] = 16;
+  t.a[1] = 16;
+  t.a[2] = 16;
+  t.a[3] = 16;
+  r.v = pmulhuh (s.v, t.v);
+  assert (r.a[0] == 0x000f);
+  assert (r.a[1] == 0x000f);
+  assert (r.a[2] == 0x000f);
+  assert (r.a[3] == 0x000f);
+}
+
+static void test_pmulhh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmulhh (s.v, t.v);
+  assert (r.a[0] == -16);
+  assert (r.a[1] == -16);
+  assert (r.a[2] == -16);
+  assert (r.a[3] == -16);
+}
+
+static void test_pmullh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  r.v = pmullh (s.v, t.v);
+  assert (r.a[0] == 4096);
+  assert (r.a[1] == 4096);
+  assert (r.a[2] == 4096);
+  assert (r.a[3] == 4096);
+}
+
+static void test_pmuluw (void)
+{
+  uint32x2_encap_t s, t;
+  uint64_t r;
+  s.a[0] = 0xdeadbeef;
+  s.a[1] = 0;
+  t.a[0] = 0x0f00baaa;
+  t.a[1] = 0;
+  r = pmuluw (s.v, t.v);
+  assert (r == 0xd0cd08e1d1a70b6ull);
+}
+
+static void test_pasubub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = pasubub (s.v, t.v);
+  assert (r.a[0] == 70);
+  assert (r.a[1] == 50);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 10);
+  assert (r.a[4] == 10);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 50);
+  assert (r.a[7] == 70);
+}
+
+static void test_biadd (void)
+{
+  uint8x8_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  r.v = biadd (s.v);
+  assert (r.a[0] == 360);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psadbh (void)
+{
+  uint8x8_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  r.v = psadbh (s.v, t.v);
+  assert (r.a[0] == 0x0140);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pshufh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_u (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_pshufh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 2;
+  s.a[2] = -3;
+  s.a[3] = 4;
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r.v = pshufh_s (r.v, s.v, 0xe5);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+}
+
+static void test_psllh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0xffff;
+  s.a[2] = 0xffff;
+  s.a[3] = 0xffff;
+  r.v = psllh_u (s.v, 1);
+  assert (r.a[0] == 0xfffe);
+  assert (r.a[1] == 0xfffe);
+  assert (r.a[2] == 0xfffe);
+  assert (r.a[3] == 0xfffe);
+}
+
+static void test_psllw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0xffffffff;
+  r.v = psllw_u (s.v, 2);
+  assert (r.a[0] == 0xfffffffc);
+  assert (r.a[1] == 0xfffffffc);
+}
+
+static void test_psllh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psllh_s (s.v, 1);
+  assert (r.a[0] == -2);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == -2);
+  assert (r.a[3] == -2);
+}
+
+static void test_psllw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psllw_s (s.v, 2);
+  assert (r.a[0] == -4);
+  assert (r.a[1] == -4);
+}
+
+static void test_psrah_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrah_u (s.v, 1);
+  assert (r.a[0] == 0xfff7);
+  assert (r.a[1] == 0xfff7);
+  assert (r.a[2] == 0xfff7);
+  assert (r.a[3] == 0xfff7);
+}
+
+static void test_psraw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psraw_u (s.v, 1);
+  assert (r.a[0] == 0xfffffff7);
+  assert (r.a[1] == 0xfffffff7);
+}
+
+static void test_psrah_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s.a[2] = -2;
+  s.a[3] = -2;
+  r.v = psrah_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == -1);
+}
+
+static void test_psraw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  r.v = psraw_s (s.v, 1);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+}
+
+static void test_psrlh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  r.v = psrlh_u (s.v, 1);
+  assert (r.a[0] == 0x7ff7);
+  assert (r.a[1] == 0x7ff7);
+  assert (r.a[2] == 0x7ff7);
+  assert (r.a[3] == 0x7ff7);
+}
+
+static void test_psrlw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  r.v = psrlw_u (s.v, 1);
+  assert (r.a[0] == 0x7ffffff7);
+  assert (r.a[1] == 0x7ffffff7);
+}
+
+static void test_psrlh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  r.v = psrlh_s (s.v, 1);
+  assert (r.a[0] == INT16x4_MAX);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psrlw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  r.v = psrlw_s (s.v, 1);
+  assert (r.a[0] == INT32x2_MAX);
+  assert (r.a[1] == INT32x2_MAX);
+}
+
+static void test_psubw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 3;
+  s.a[1] = 4;
+  t.a[0] = 2;
+  t.a[1] = 1;
+  r.v = psubw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = -4;
+  r.v = psubw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 5;
+  s.a[1] = 6;
+  s.a[2] = 7;
+  s.a[3] = 8;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_u (s.v, t.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 4);
+  assert (r.a[3] == 4);
+}
+
+static void test_psubh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  r.v = psubh_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+}
+
+static void test_psubb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 11;
+  s.a[2] = 12;
+  s.a[3] = 13;
+  s.a[4] = 14;
+  s.a[5] = 15;
+  s.a[6] = 16;
+  s.a[7] = 17;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 9);
+  assert (r.a[2] == 9);
+  assert (r.a[3] == 9);
+  assert (r.a[4] == 9);
+  assert (r.a[5] == 9);
+  assert (r.a[6] == 9);
+  assert (r.a[7] == 9);
+}
+
+static void test_psubb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  r.v = psubb_s (s.v, t.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+  assert (r.a[4] == -55);
+  assert (r.a[5] == -66);
+  assert (r.a[6] == -77);
+  assert (r.a[7] == -88);
+}
+
+static void test_psubd_u (void)
+{
+  uint64_t d = 789012;
+  uint64_t e = 123456;
+  uint64_t r;
+  r = psubd_u (d, e);
+  assert (r == 665556);
+}
+
+static void test_psubd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = psubd_s (d, e);
+  assert (r == 912468);
+}
+
+static void test_psubsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = -INT16x4_MAX;
+  t.a[1] = -INT16x4_MAX;
+  t.a[2] = -INT16x4_MAX;
+  t.a[3] = -INT16x4_MAX;
+  r.v = psubsh (s.v, t.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psubsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = -INT8x8_MAX;
+  t.a[1] = -INT8x8_MAX;
+  t.a[2] = -INT8x8_MAX;
+  t.a[3] = -INT8x8_MAX;
+  t.a[4] = -INT8x8_MAX;
+  t.a[5] = -INT8x8_MAX;
+  t.a[6] = -INT8x8_MAX;
+  t.a[7] = -INT8x8_MAX;
+  r.v = psubsb (s.v, t.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_psubush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  r.v = psubush (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psubusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  s.a[4] = 4;
+  s.a[5] = 5;
+  s.a[6] = 6;
+  s.a[7] = 7;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  t.a[4] = 5;
+  t.a[5] = 5;
+  t.a[6] = 7;
+  t.a[7] = 7;
+  r.v = psubusb (s.v, t.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_punpckhbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_s (s.v, t.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == -11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == -13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == -15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpckhbh_u (s.v, t.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == 11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == 13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == 15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhhw_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpckhhw_s (s.v, t.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == -6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhhw_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpckhhw_u (s.v, t.v);
+  assert (r.a[0] == 5);
+  assert (r.a[1] == 6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhwd_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = -4;
+  r.v = punpckhwd_s (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == -4);
+}
+
+static void test_punpckhwd_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpckhwd_u (s.v, t.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+}
+
+static void test_punpcklbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == -5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == -7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  r.v = punpcklbh_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == 5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == 7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklhw_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  r.v = punpcklhw_s (s.v, t.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklhw_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  r.v = punpcklhw_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklwd_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  r.v = punpcklwd_s (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == -2);
+}
+
+static void test_punpcklwd_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  r.v = punpcklwd_u (s.v, t.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+}
+
+int main (void)
+{
+  test_packsswh ();
+  test_packsshb ();
+  test_packushb ();
+  test_paddw_u ();
+  test_paddw_s ();
+  test_paddh_u ();
+  test_paddh_s ();
+  test_paddb_u ();
+  test_paddb_s ();
+  test_paddd_u ();
+  test_paddd_s ();
+  test_paddsh ();
+  test_paddsb ();
+  test_paddush ();
+  test_paddusb ();
+  test_pandn_ud ();
+  test_pandn_sd ();
+  test_pandn_uw ();
+  test_pandn_sw ();
+  test_pandn_uh ();
+  test_pandn_sh ();
+  test_pandn_ub ();
+  test_pandn_sb ();
+  test_pavgh ();
+  test_pavgb ();
+  test_pcmpeqw_u ();
+  test_pcmpeqh_u ();
+  test_pcmpeqb_u ();
+  test_pcmpeqw_s ();
+  test_pcmpeqh_s ();
+  test_pcmpeqb_s ();
+  test_pcmpgtw_u ();
+  test_pcmpgth_u ();
+  test_pcmpgtb_u ();
+  test_pcmpgtw_s ();
+  test_pcmpgth_s ();
+  test_pcmpgtb_s ();
+  test_pextrh_u ();
+  test_pextrh_s ();
+  test_pinsrh_0123_u ();
+  test_pinsrh_0123_s ();
+  test_pmaddhw ();
+  test_pmaxsh ();
+  test_pmaxub ();
+  test_pminsh ();
+  test_pminub ();
+  test_pmovmskb_u ();
+  test_pmovmskb_s ();
+  test_pmulhuh ();
+  test_pmulhh ();
+  test_pmullh ();
+  test_pmuluw ();
+  test_pasubub ();
+  test_biadd ();
+  test_psadbh ();
+  test_pshufh_u ();
+  test_pshufh_s ();
+  test_psllh_u ();
+  test_psllw_u ();
+  test_psllh_s ();
+  test_psllw_s ();
+  test_psrah_u ();
+  test_psraw_u ();
+  test_psrah_s ();
+  test_psraw_s ();
+  test_psrlh_u ();
+  test_psrlw_u ();
+  test_psrlh_s ();
+  test_psrlw_s ();
+  test_psubw_u ();
+  test_psubw_s ();
+  test_psubh_u ();
+  test_psubh_s ();
+  test_psubb_u ();
+  test_psubb_s ();
+  test_psubd_u ();
+  test_psubd_s ();
+  test_psubsh ();
+  test_psubsb ();
+  test_psubush ();
+  test_psubusb ();
+  test_punpckhbh_s ();
+  test_punpckhbh_u ();
+  test_punpckhhw_s ();
+  test_punpckhhw_u ();
+  test_punpckhwd_s ();
+  test_punpckhwd_u ();
+  test_punpcklbh_s ();
+  test_punpcklbh_u ();
+  test_punpcklhw_s ();
+  test_punpcklhw_u ();
+  test_punpcklwd_s ();
+  test_punpcklwd_u ();
+  return 0;
+}
--- gcc/testsuite/lib/target-supports.exp	(/local/gcc-1)	(revision 556)
+++ gcc/testsuite/lib/target-supports.exp	(/local/gcc-2)	(revision 556)
@@ -1249,6 +1249,17 @@ proc check_effective_target_arm_neon_hw 
     } "-mfpu=neon -mfloat-abi=softfp"]
 }
 
+# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+# the Loongson vector modes.
+
+proc check_effective_target_mips_loongson { } {
+    return [check_no_compiler_messages loongson assembly {
+	#if !defined(__mips_loongson_vector_rev)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {
--- gcc/config.gcc	(/local/gcc-1)	(revision 556)
+++ gcc/config.gcc	(/local/gcc-2)	(revision 556)
@@ -307,6 +307,7 @@ m68k-*-*)
 mips*-*-*)
 	cpu_type=mips
 	need_64bit_hwint=yes
+	extra_headers="loongson.h"
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
--- gcc/config/mips/loongson.md	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/loongson.md	(/local/gcc-2)	(revision 556)
@@ -0,0 +1,437 @@
+;; Machine description for ST Microelectronics Loongson-2E/2F.
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Mode iterators and attributes.
+
+;; 64-bit vectors of bytes.
+(define_mode_iterator VB [V8QI])
+
+;; 64-bit vectors of halfwords.
+(define_mode_iterator VH [V4HI])
+
+;; 64-bit vectors of words.
+(define_mode_iterator VW [V2SI])
+
+;; 64-bit vectors of halfwords and bytes.
+(define_mode_iterator VHB [V4HI V8QI])
+
+;; 64-bit vectors of words and halfwords.
+(define_mode_iterator VWH [V2SI V4HI])
+
+;; 64-bit vectors of words, halfwords and bytes.
+(define_mode_iterator VWHB [V2SI V4HI V8QI])
+
+;; 64-bit vectors of words, halfwords and bytes; and DImode.
+(define_mode_iterator VWHBDI [V2SI V4HI V8QI DI])
+
+;; The Loongson instruction suffixes corresponding to the modes in the
+;; VWHBDI iterator.
+(define_mode_attr V_suffix [(V2SI "w") (V4HI "h") (V8QI "b") (DI "d")])
+
+;; Given a vector type T, the mode of a vector half the size of T
+;; and with the same number of elements.
+(define_mode_attr V_squash [(V2SI "V2HI") (V4HI "V4QI")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with half as many elements.
+(define_mode_attr V_stretch_half [(V2SI "DI") (V4HI "V2SI") (V8QI "V4HI")])
+
+;; The Loongson instruction suffixes corresponding to the transformation
+;; expressed by V_stretch_half.
+(define_mode_attr V_stretch_half_suffix [(V2SI "wd") (V4HI "hw") (V8QI "bh")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with twice as many elements.
+(define_mode_attr V_squash_double [(V2SI "V4HI") (V4HI "V8QI")])
+
+;; The Loongson instruction suffixes corresponding to the conversions
+;; specified by V_half_width.
+(define_mode_attr V_squash_double_suffix [(V2SI "wh") (V4HI "hb")])
+
+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0)
+	(match_operand:VWHB 1))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})
+
+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,  d,  m,  d")
+	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  { return mips_output_move (operands[0], operands[1]); }
+  [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
+   (set_attr "mode" "DI")])
+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand")
+	(match_operand 1 ""))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+{
+  mips_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
+
+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 1 "register_operand" "f"))
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 2 "register_operand" "f"))))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "packss<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Pack with unsigned saturation.
+(define_insn "vec_pack_usat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 1 "register_operand" "f"))
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 2 "register_operand" "f"))))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "packus<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by wraparound.
+(define_insn "add<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (plus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		   (match_operand:VWHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "padd<V_suffix>\t%0,%1,%2")
+
+;; Addition of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+;; We use 'unspec' instead of 'plus' here to avoid clash with
+;; mips.md::add<mode>3.  If 'plus' was used, then such instruction
+;; would be recognized as adddi3 and reload would make it use
+;; GPRs instead of FPRs.
+(define_insn "loongson_paddd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:DI 1 "register_operand" "f")
+		    (match_operand:DI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PADDD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "paddd\t%0,%1,%2")
+
+;; Addition, treating overflow by signed saturation.
+(define_insn "ssadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "padds<V_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by unsigned saturation.
+(define_insn "usadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "paddus<V_suffix>\t%0,%1,%2")
+
+;; Logical AND NOT.
+(define_insn "loongson_pandn_<V_suffix>"
+  [(set (match_operand:VWHBDI 0 "register_operand" "=f")
+        (and:VWHBDI
+	 (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
+	 (match_operand:VWHBDI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pandn\t%0,%1,%2")
+
+;; Average.
+(define_insn "loongson_pavg<V_suffix>"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (unspec:VHB [(match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")]
+		    UNSPEC_LOONGSON_PAVG))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pavg<V_suffix>\t%0,%1,%2")
+
+;; Equality test.
+(define_insn "loongson_pcmpeq<V_suffix>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PCMPEQ))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pcmpeq<V_suffix>\t%0,%1,%2")
+
+;; Greater-than test.
+(define_insn "loongson_pcmpgt<V_suffix>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PCMPGT))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pcmpgt<V_suffix>\t%0,%1,%2")
+
+;; Extract halfword.
+(define_insn "loongson_pextr<V_suffix>"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+ 		    (match_operand:SI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PEXTR))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pextr<V_suffix>\t%0,%1,%2")
+
+;; Insert halfword.
+(define_insn "loongson_pinsr<V_suffix>_0"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_0))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_0\t%0,%1,%2")
+
+(define_insn "loongson_pinsr<V_suffix>_1"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_1))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_1\t%0,%1,%2")
+
+(define_insn "loongson_pinsr<V_suffix>_2"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_2))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_2\t%0,%1,%2")
+
+(define_insn "loongson_pinsr<V_suffix>_3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PINSR_3))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pinsr<V_suffix>_3\t%0,%1,%2")
+
+;; Multiply and add packed integers.
+(define_insn "loongson_pmadd<V_stretch_half_suffix>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VH 1 "register_operand" "f")
+				  (match_operand:VH 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PMADD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Maximum of signed halfwords.
+(define_insn "smax<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smax:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmaxs<V_suffix>\t%0,%1,%2")
+
+;; Maximum of unsigned bytes.
+(define_insn "umax<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umax:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmaxu<V_suffix>\t%0,%1,%2")
+
+;; Minimum of signed halfwords.
+(define_insn "smin<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smin:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmins<V_suffix>\t%0,%1,%2")
+
+;; Minimum of unsigned bytes.
+(define_insn "umin<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umin:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pminu<V_suffix>\t%0,%1,%2")
+
+;; Move byte mask.
+(define_insn "loongson_pmovmsk<V_suffix>"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMOVMSK))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmovmsk<V_suffix>\t%0,%1")
+
+;; Multiply unsigned integers and store high result.
+(define_insn "umul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULHU))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulhu<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store high result.
+(define_insn "smul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulh<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store low result.
+(define_insn "loongson_pmull<V_suffix>"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULL))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmull<V_suffix>\t%0,%1,%2")
+
+;; Multiply unsigned word integers.
+(define_insn "loongson_pmulu<V_suffix>"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:VW 1 "register_operand" "f")
+		    (match_operand:VW 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PMULU))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pmulu<V_suffix>\t%0,%1,%2")
+
+;; Absolute difference.
+(define_insn "loongson_pasubub"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")
+		    (match_operand:VB 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PASUBUB))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pasubub\t%0,%1,%2")
+
+;; Sum of unsigned byte integers.
+(define_insn "reduc_uplus_<mode>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")]
+				 UNSPEC_LOONGSON_BIADD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "biadd\t%0,%1")
+
+;; Sum of absolute differences.
+(define_insn "loongson_psadbh"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")
+				  (match_operand:VB 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PSADBH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pasubub\t%0,%1,%2;biadd\t%0,%0")
+
+;; Shuffle halfwords.
+(define_insn "loongson_pshufh"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "0")
+		    (match_operand:VH 2 "register_operand" "f")
+		    (match_operand:SI 3 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSHUFH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "pshufh\t%0,%2,%3")
+
+;; Shift left logical.
+(define_insn "loongson_psll<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashift:VWH (match_operand:VWH 1 "register_operand" "f")
+		    (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psll<V_suffix>\t%0,%1,%2")
+
+;; Shift right arithmetic.
+(define_insn "loongson_psra<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psra<V_suffix>\t%0,%1,%2")
+
+;; Shift right logical.
+(define_insn "loongson_psrl<V_suffix>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (lshiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psrl<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by wraparound.
+(define_insn "sub<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (minus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		    (match_operand:VWHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psub<V_suffix>\t%0,%1,%2")
+
+;; Subtraction of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+;; See loongson_paddd for the reason we use 'unspec' rather than
+;; 'minus' here.
+(define_insn "loongson_psubd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:DI 1 "register_operand" "f")
+		    (match_operand:DI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSUBD))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubd\t%0,%1,%2")
+
+;; Subtraction, treating overflow by signed saturation.
+(define_insn "sssub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubs<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by unsigned saturation.
+(define_insn "ussub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "psubus<V_suffix>\t%0,%1,%2")
+
+;; Unpack high data.
+(define_insn "vec_interleave_high<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PUNPCKH))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Unpack low data.
+(define_insn "vec_interleave_low<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_PUNPCKL))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2")
--- gcc/config/mips/mips-ftypes.def	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/mips-ftypes.def	(/local/gcc-2)	(revision 556)
@@ -66,6 +66,24 @@ DEF_MIPS_FTYPE (1, (SF, SF))
 DEF_MIPS_FTYPE (2, (SF, SF, SF))
 DEF_MIPS_FTYPE (1, (SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (UDI, UDI, UDI))
+DEF_MIPS_FTYPE (2, (UDI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UQI))
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV4HI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV8QI, UV8QI))
+
+DEF_MIPS_FTYPE (2, (UV8QI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV8QI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV8QI, UV8QI, UV8QI))
+
 DEF_MIPS_FTYPE (1, (V2HI, SI))
 DEF_MIPS_FTYPE (2, (V2HI, SI, SI))
 DEF_MIPS_FTYPE (3, (V2HI, SI, SI, SI))
@@ -81,12 +99,27 @@ DEF_MIPS_FTYPE (2, (V2SF, V2SF, V2SF))
 DEF_MIPS_FTYPE (3, (V2SF, V2SF, V2SF, INT))
 DEF_MIPS_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, UQI))
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V2SI, V4HI, V4HI))
+
+DEF_MIPS_FTYPE (2, (V4HI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, USI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, V4HI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, USI))
+
 DEF_MIPS_FTYPE (1, (V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V2HI, V2HI))
 DEF_MIPS_FTYPE (1, (V4QI, V4QI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, V4QI))
 
+DEF_MIPS_FTYPE (2, (V8QI, V4HI, V4HI))
+DEF_MIPS_FTYPE (1, (V8QI, V8QI))
+DEF_MIPS_FTYPE (2, (V8QI, V8QI, V8QI))
+
 DEF_MIPS_FTYPE (2, (VOID, SI, SI))
 DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI))
 DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI))
--- gcc/config/mips/mips.md	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/mips.md	(/local/gcc-2)	(revision 556)
@@ -215,6 +215,30 @@
    (UNSPEC_DPAQX_SA_W_PH	446)
    (UNSPEC_DPSQX_S_W_PH		447)
    (UNSPEC_DPSQX_SA_W_PH	448)
+
+   ;; ST Microelectronics Loongson-2E/2F.
+   (UNSPEC_LOONGSON_PAVG	500)
+   (UNSPEC_LOONGSON_PCMPEQ	501)
+   (UNSPEC_LOONGSON_PCMPGT	502)
+   (UNSPEC_LOONGSON_PEXTR	503)
+   (UNSPEC_LOONGSON_PINSR_0	504)
+   (UNSPEC_LOONGSON_PINSR_1	505)
+   (UNSPEC_LOONGSON_PINSR_2	506)
+   (UNSPEC_LOONGSON_PINSR_3	507)
+   (UNSPEC_LOONGSON_PMADD	508)
+   (UNSPEC_LOONGSON_PMOVMSK	509)
+   (UNSPEC_LOONGSON_PMULHU	510)
+   (UNSPEC_LOONGSON_PMULH	511)
+   (UNSPEC_LOONGSON_PMULL	512)
+   (UNSPEC_LOONGSON_PMULU	513)
+   (UNSPEC_LOONGSON_PASUBUB	514)
+   (UNSPEC_LOONGSON_BIADD	515)
+   (UNSPEC_LOONGSON_PSADBH	516)
+   (UNSPEC_LOONGSON_PSHUFH	517)
+   (UNSPEC_LOONGSON_PUNPCKH	518)
+   (UNSPEC_LOONGSON_PUNPCKL	519)
+   (UNSPEC_LOONGSON_PADDD	520)
+   (UNSPEC_LOONGSON_PSUBD	521)
   ]
 )
 
@@ -500,7 +524,11 @@
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
-  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
+  [DI DF
+   (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
+   (V4HI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")
+   (V8QI "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS")])
 
 ;; 128-bit modes for which we provide move patterns on 64-bit targets.
 (define_mode_iterator MOVE128 [TI TF])
@@ -527,6 +555,9 @@
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (V2SF "!TARGET_64BIT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
+   (V4HI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
+   (V8QI "!TARGET_64BIT && TARGET_LOONGSON_VECTORS")
    (TF "TARGET_64BIT && TARGET_FLOAT64")])
 
 ;; In GPR templates, a string like "<d>subu" will expand to "subu" in the
@@ -579,7 +610,9 @@
 
 ;; This attribute gives the integer mode that has half the size of
 ;; the controlling mode.
-(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")])
+(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI")
+			    (V2SI "SI") (V4HI "SI") (V8QI "SI")
+			    (TF "DI")])
 
 ;; This attribute works around the early SB-1 rev2 core "F2" erratum:
 ;;
@@ -6435,3 +6468,6 @@
 
 ; MIPS fixed-point instructions.
 (include "mips-fixed.md")
+
+; ST-Microelectronics Loongson-2E/2F-specific patterns.
+(include "loongson.md")
--- gcc/config/mips/mips-protos.h	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/mips-protos.h	(/local/gcc-2)	(revision 556)
@@ -303,4 +303,6 @@ union mips_gen_fn_ptrs
 extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 				     rtx, rtx, rtx, rtx);
 
+extern void mips_expand_vector_init (rtx, rtx);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
--- gcc/config/mips/loongson.h	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/loongson.h	(/local/gcc-2)	(revision 556)
@@ -0,0 +1,693 @@
+/* Intrinsics for ST Microelectronics Loongson-2E/2F SIMD operations.
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_LOONGSON_H
+#define _GCC_LOONGSON_H
+
+#if !defined(__mips_loongson_vector_rev)
+# error "You must select -march=loongson2e or -march=loongson2f to use loongson.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Vectors of unsigned bytes, halfwords and words.  */
+typedef uint8_t uint8x8_t __attribute__((vector_size (8)));
+typedef uint16_t uint16x4_t __attribute__((vector_size (8)));
+typedef uint32_t uint32x2_t __attribute__((vector_size (8)));
+
+/* Vectors of signed bytes, halfwords and words.  */
+typedef int8_t int8x8_t __attribute__((vector_size (8)));
+typedef int16_t int16x4_t __attribute__((vector_size (8)));
+typedef int32_t int32x2_t __attribute__((vector_size (8)));
+
+/* SIMD intrinsics.
+   Unless otherwise noted, calls to the functions below will expand into
+   precisely one machine instruction, modulo any moves required to
+   satisfy register allocation constraints.  */
+
+/* Pack with signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+packsswh (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_packsswh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+packsshb (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_packsshb (s, t);
+}
+
+/* Pack with unsigned saturation.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+packushb (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_packushb (s, t);
+}
+
+/* Vector addition, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+paddw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_paddw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+paddw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_paddw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddb_s (s, t);
+}
+
+/* Addition of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+paddd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_paddd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+paddd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_paddd_s (s, t);
+}
+
+/* Vector addition, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddsb (s, t);
+}
+
+/* Vector addition, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddusb (s, t);
+}
+
+/* Logical AND NOT.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+pandn_ud (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_pandn_ud (s, t);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pandn_uw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pandn_uw (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pandn_uh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pandn_uh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pandn_ub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pandn_ub (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pandn_sd (int64_t s, int64_t t)
+{
+  return __builtin_loongson_pandn_sd (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pandn_sw (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pandn_sw (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pandn_sh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pandn_sh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pandn_sb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pandn_sb (s, t);
+}
+
+/* Average.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pavgh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pavgh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pavgb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pavgb (s, t);
+}
+
+/* Equality test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_s (s, t);
+}
+
+/* Greater-than test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpgth_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpgth_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_s (s, t);
+}
+
+/* Extract halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pextrh_u (uint16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_u (s, field);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pextrh_s (int16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_s (s, field);
+}
+
+/* Insert halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_u (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_s (s, t);
+}
+
+/* Multiply and add.  */
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pmaddhw (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaddhw (s, t);
+}
+
+/* Maximum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmaxsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaxsh (s, t);
+}
+
+/* Maximum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmaxub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pmaxub (s, t);
+}
+
+/* Minimum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pminsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pminsh (s, t);
+}
+
+/* Minimum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pminub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pminub (s, t);
+}
+
+/* Move byte mask.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmovmskb_u (uint8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_u (s);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pmovmskb_s (int8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_s (s);
+}
+
+/* Multiply unsigned integers and store high result.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pmulhuh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pmulhuh (s, t);
+}
+
+/* Multiply signed integers and store high result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmulhh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmulhh (s, t);
+}
+
+/* Multiply signed integers and store low result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmullh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmullh (s, t);
+}
+
+/* Multiply unsigned word integers.  */
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pmuluw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pmuluw (s, t);
+}
+
+/* Absolute difference.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pasubub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pasubub (s, t);
+}
+
+/* Sum of unsigned byte integers.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+biadd (uint8x8_t s)
+{
+  return __builtin_loongson_biadd (s);
+}
+
+/* Sum of absolute differences.
+   Note that this intrinsic expands into two machine instructions:
+   PASUBUB followed by BIADD.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psadbh (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psadbh (s, t);
+}
+
+/* Shuffle halfwords.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_u (dest, s, order);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_s (dest, s, order);
+}
+
+/* Shift left logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psllh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psllh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psllw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psllw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_s (s, amount);
+}
+
+/* Shift right logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrlh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrlh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psrlw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psrlw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_s (s, amount);
+}
+
+/* Shift right arithmetic.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrah_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrah_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psraw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psraw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_s (s, amount);
+}
+
+/* Vector subtraction, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psubw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_psubw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psubw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_psubw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubb_s (s, t);
+}
+
+/* Subtraction of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+psubd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_psubd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+psubd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_psubd_s (s, t);
+}
+
+/* Vector subtraction, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubsb (s, t);
+}
+
+/* Vector subtraction, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubusb (s, t);
+}
+
+/* Unpack high data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpckhwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpckhhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpckhbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpckhwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpckhhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpckhbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_s (s, t);
+}
+
+/* Unpack low data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpcklwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpcklhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpcklbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpcklwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpcklhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpcklbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_s (s, t);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- gcc/config/mips/mips.c	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/mips.c	(/local/gcc-2)	(revision 556)
@@ -3532,6 +3532,12 @@ mips_split_doubleword_move (rtx dest, rt
 	emit_insn (gen_move_doubleword_fprdf (dest, src));
       else if (!TARGET_64BIT && GET_MODE (dest) == V2SFmode)
 	emit_insn (gen_move_doubleword_fprv2sf (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V2SImode)
+	emit_insn (gen_move_doubleword_fprv2si (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V4HImode)
+	emit_insn (gen_move_doubleword_fprv4hi (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V8QImode)
+	emit_insn (gen_move_doubleword_fprv8qi (dest, src));
       else if (TARGET_64BIT && GET_MODE (dest) == TFmode)
 	emit_insn (gen_move_doubleword_fprtf (dest, src));
       else
@@ -8960,6 +8966,14 @@ mips_hard_regno_mode_ok_p (unsigned int 
       if (mode == TFmode && ISA_HAS_8CC)
 	return true;
 
+      /* Allow 64-bit vector modes for Loongson-2E/2F.  */
+      if (TARGET_LOONGSON_VECTORS
+	  && (mode == V2SImode
+	      || mode == V4HImode
+	      || mode == V8QImode
+	      || mode == DImode))
+	return true;
+
       if (class == MODE_FLOAT
 	  || class == MODE_COMPLEX_FLOAT
 	  || class == MODE_VECTOR_FLOAT)
@@ -9323,6 +9337,11 @@ mips_vector_mode_supported_p (enum machi
     case V4UQQmode:
       return TARGET_DSP;
 
+    case V2SImode:
+    case V4HImode:
+    case V8QImode:
+      return TARGET_LOONGSON_VECTORS;
+
     default:
       return false;
     }
@@ -10192,6 +10211,7 @@ AVAIL_NON_MIPS16 (dsp, TARGET_DSP)
 AVAIL_NON_MIPS16 (dspr2, TARGET_DSPR2)
 AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
+AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_VECTORS)
 
 /* Construct a mips_builtin_description from the given arguments.
 
@@ -10288,6 +10308,25 @@ AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BI
   MIPS_BUILTIN (bposge, f, "bposge" #VALUE,				\
 		MIPS_BUILTIN_BPOSGE ## VALUE, MIPS_SI_FTYPE_VOID, AVAIL)
 
+/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<FN_NAME>
+   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
+   builtin_description field.  */
+#define LOONGSON_BUILTIN_ALIAS(INSN, FN_NAME, FUNCTION_TYPE)		\
+  { CODE_FOR_loongson_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
+    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, mips_builtin_avail_loongson }
+
+/* Define a Loongson MIPS_BUILTIN_DIRECT function __builtin_loongson_<INSN>
+   for instruction CODE_FOR_loongson_<INSN>.  FUNCTION_TYPE is a
+   builtin_description field.  */
+#define LOONGSON_BUILTIN(INSN, FUNCTION_TYPE)				\
+  LOONGSON_BUILTIN_ALIAS (INSN, INSN, FUNCTION_TYPE)
+
+/* Like LOONGSON_BUILTIN, but add _<SUFFIX> to the end of the function name.
+   We use functions of this form when the same insn can be usefully applied
+   to more than one datatype.  */
+#define LOONGSON_BUILTIN_SUFFIX(INSN, SUFFIX, FUNCTION_TYPE)		\
+  LOONGSON_BUILTIN_ALIAS (INSN, INSN ## _ ## SUFFIX, FUNCTION_TYPE)
+
 #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2
 #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3
 #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3
@@ -10295,6 +10334,37 @@ AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BI
 #define CODE_FOR_mips_subu_qb CODE_FOR_subv4qi3
 #define CODE_FOR_mips_mul_ph CODE_FOR_mulv2hi3
 
+#define CODE_FOR_loongson_packsswh CODE_FOR_vec_pack_ssat_v2si
+#define CODE_FOR_loongson_packsshb CODE_FOR_vec_pack_ssat_v4hi
+#define CODE_FOR_loongson_packushb CODE_FOR_vec_pack_usat_v4hi
+#define CODE_FOR_loongson_paddw CODE_FOR_addv2si3
+#define CODE_FOR_loongson_paddh CODE_FOR_addv4hi3
+#define CODE_FOR_loongson_paddb CODE_FOR_addv8qi3
+#define CODE_FOR_loongson_paddsh CODE_FOR_ssaddv4hi3
+#define CODE_FOR_loongson_paddsb CODE_FOR_ssaddv8qi3
+#define CODE_FOR_loongson_paddush CODE_FOR_usaddv4hi3
+#define CODE_FOR_loongson_paddusb CODE_FOR_usaddv8qi3
+#define CODE_FOR_loongson_pmaxsh CODE_FOR_smaxv4hi3
+#define CODE_FOR_loongson_pmaxub CODE_FOR_umaxv8qi3
+#define CODE_FOR_loongson_pminsh CODE_FOR_sminv4hi3
+#define CODE_FOR_loongson_pminub CODE_FOR_uminv8qi3
+#define CODE_FOR_loongson_pmulhuh CODE_FOR_umulv4hi3_highpart
+#define CODE_FOR_loongson_pmulhh CODE_FOR_smulv4hi3_highpart
+#define CODE_FOR_loongson_biadd CODE_FOR_reduc_uplus_v8qi
+#define CODE_FOR_loongson_psubw CODE_FOR_subv2si3
+#define CODE_FOR_loongson_psubh CODE_FOR_subv4hi3
+#define CODE_FOR_loongson_psubb CODE_FOR_subv8qi3
+#define CODE_FOR_loongson_psubsh CODE_FOR_sssubv4hi3
+#define CODE_FOR_loongson_psubsb CODE_FOR_sssubv8qi3
+#define CODE_FOR_loongson_psubush CODE_FOR_ussubv4hi3
+#define CODE_FOR_loongson_psubusb CODE_FOR_ussubv8qi3
+#define CODE_FOR_loongson_punpckhbh CODE_FOR_vec_interleave_highv8qi
+#define CODE_FOR_loongson_punpckhhw CODE_FOR_vec_interleave_highv4hi
+#define CODE_FOR_loongson_punpckhwd CODE_FOR_vec_interleave_highv2si
+#define CODE_FOR_loongson_punpcklbh CODE_FOR_vec_interleave_lowv8qi
+#define CODE_FOR_loongson_punpcklhw CODE_FOR_vec_interleave_lowv4hi
+#define CODE_FOR_loongson_punpcklwd CODE_FOR_vec_interleave_lowv2si
+
 static const struct mips_builtin_description mips_builtins[] = {
   DIRECT_BUILTIN (pll_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
   DIRECT_BUILTIN (pul_ps, MIPS_V2SF_FTYPE_V2SF_V2SF, paired_single),
@@ -10471,7 +10541,108 @@ static const struct mips_builtin_descrip
   DIRECT_BUILTIN (dpaqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
   DIRECT_BUILTIN (dpaqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
   DIRECT_BUILTIN (dpsqx_s_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
-  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32)
+  DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, dspr2_32),
+
+  /* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
+  LOONGSON_BUILTIN (packsswh, MIPS_V4HI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (packsshb, MIPS_V8QI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (packushb, MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (paddh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (paddw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (paddh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (paddb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (paddd, u, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_SUFFIX (paddd, s, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (paddsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (paddush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_ud, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_uw, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_uh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_ub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_ALIAS (pandn_d, pandn_sd, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN_ALIAS (pandn_w, pandn_sw, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_ALIAS (pandn_h, pandn_sh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_ALIAS (pandn_b, pandn_sb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (pavgh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pavgb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpeqb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgth, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgth, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pcmpgtb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (pextrh, u, MIPS_UV4HI_FTYPE_UV4HI_USI),
+  LOONGSON_BUILTIN_SUFFIX (pextrh, s, MIPS_V4HI_FTYPE_V4HI_USI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_0, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_1, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_2, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (pinsrh_3, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaddhw, MIPS_V2SI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaxsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmaxub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pminsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pminub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pmovmskb, u, MIPS_UV8QI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pmovmskb, s, MIPS_V8QI_FTYPE_V8QI),
+  LOONGSON_BUILTIN (pmulhuh, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pmulhh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmullh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pmuluw, MIPS_UDI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pasubub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (biadd, MIPS_UV4HI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN (psadbh, MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psllw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrah, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrah, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psraw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psraw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlw, u, MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psrlw, s, MIPS_V2SI_FTYPE_V2SI_UQI),
+  LOONGSON_BUILTIN_SUFFIX (psubw, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (psubh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (psubb, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (psubw, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (psubh, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (psubb, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (psubd, u, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN_SUFFIX (psubd, s, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (psubsh, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubsb, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (psubush, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubusb, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpckhwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklbh, u, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklhw, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklwd, u, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklbh, s, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklhw, s, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN_SUFFIX (punpcklwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI)
 };
 
 /* MODE is a vector mode whose elements have type TYPE.  Return the type
@@ -10480,11 +10651,17 @@ static const struct mips_builtin_descrip
 static tree
 mips_builtin_vector_type (tree type, enum machine_mode mode)
 {
-  static tree types[(int) MAX_MACHINE_MODE];
+  static tree types[2 * (int) MAX_MACHINE_MODE];
+  int mode_index;
+
+  mode_index = (int) mode;
 
-  if (types[(int) mode] == NULL_TREE)
-    types[(int) mode] = build_vector_type_for_mode (type, mode);
-  return types[(int) mode];
+  if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type))
+    mode_index += MAX_MACHINE_MODE;
+
+  if (types[mode_index] == NULL_TREE)
+    types[mode_index] = build_vector_type_for_mode (type, mode);
+  return types[mode_index];
 }
 
 /* Source-level argument types.  */
@@ -10493,16 +10670,27 @@ mips_builtin_vector_type (tree type, enu
 #define MIPS_ATYPE_POINTER ptr_type_node
 
 /* Standard mode-based argument types.  */
+#define MIPS_ATYPE_UQI unsigned_intQI_type_node
 #define MIPS_ATYPE_SI intSI_type_node
 #define MIPS_ATYPE_USI unsigned_intSI_type_node
 #define MIPS_ATYPE_DI intDI_type_node
+#define MIPS_ATYPE_UDI unsigned_intDI_type_node
 #define MIPS_ATYPE_SF float_type_node
 #define MIPS_ATYPE_DF double_type_node
 
 /* Vector argument types.  */
 #define MIPS_ATYPE_V2SF mips_builtin_vector_type (float_type_node, V2SFmode)
 #define MIPS_ATYPE_V2HI mips_builtin_vector_type (intHI_type_node, V2HImode)
+#define MIPS_ATYPE_V2SI mips_builtin_vector_type (intSI_type_node, V2SImode)
 #define MIPS_ATYPE_V4QI mips_builtin_vector_type (intQI_type_node, V4QImode)
+#define MIPS_ATYPE_V4HI mips_builtin_vector_type (intHI_type_node, V4HImode)
+#define MIPS_ATYPE_V8QI mips_builtin_vector_type (intQI_type_node, V8QImode)
+#define MIPS_ATYPE_UV2SI					\
+  mips_builtin_vector_type (unsigned_intSI_type_node, V2SImode)
+#define MIPS_ATYPE_UV4HI					\
+  mips_builtin_vector_type (unsigned_intHI_type_node, V4HImode)
+#define MIPS_ATYPE_UV8QI					\
+  mips_builtin_vector_type (unsigned_intQI_type_node, V8QImode)
 
 /* MIPS_FTYPE_ATYPESN takes N MIPS_FTYPES-like type codes and lists
    their associated MIPS_ATYPEs.  */
@@ -12618,6 +12806,30 @@ mips_conditional_register_usage (void)
     }
 }
 
+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode;
+  enum machine_mode inner;
+  unsigned int i, n_elts;
+  rtx mem;
+
+  mode = GET_MODE (target);
+  inner = GET_MODE_INNER (mode);
+  n_elts = GET_MODE_NUNITS (mode);
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
+
 /* When generating MIPS16 code, we want to allocate $24 (T_REG) before
    other registers for instructions for which it is possible.  This
    encourages the compiler to use CMP in cases where an XOR would
--- gcc/config/mips/mips.h	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/mips.h	(/local/gcc-2)	(revision 556)
@@ -267,6 +267,12 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
 
+/* Whether vector modes and intrinsics for ST Microelectronics
+   Loongson-2E/2F processors should be enabled.  In o32 pairs of
+   floating-point registers provide 64-bit values.  */
+#define TARGET_LOONGSON_VECTORS	    (TARGET_HARD_FLOAT_ABI		\
+				     && TARGET_LOONGSON_2EF)
+
 /* True if the pre-reload scheduler should try to create chains of
    multiply-add or multiply-subtract instructions.  For example,
    suppose we have:
@@ -497,6 +503,10 @@ enum mips_code_readable_setting {
 	  builtin_define_std ("MIPSEL");				\
 	  builtin_define ("_MIPSEL");					\
 	}								\
+                                                                        \
+      /* Whether Loongson vector modes are enabled.  */                 \
+      if (TARGET_LOONGSON_VECTORS)					\
+        builtin_define ("__mips_loongson_vector_rev");                  \
 									\
       /* Macros dependent on the C dialect.  */				\
       if (preprocessing_asm_p ())					\
--- gcc/config/mips/mips-modes.def	(/local/gcc-1)	(revision 556)
+++ gcc/config/mips/mips-modes.def	(/local/gcc-2)	(revision 556)
@@ -26,6 +26,7 @@ RESET_FLOAT_FORMAT (DF, mips_double_form
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
+VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
 

Property changes on: 
___________________________________________________________________
Name: svk:merge
  23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-pta-dev:259
  23c3ee16-a423-49b3-8738-b114dc1aabb6:/local/gcc-trunk:531
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:510
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:481
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][2/5] Vector intrinsics
  2008-06-13 18:36                         ` Maxim Kuvyrkov
@ 2008-06-14  8:20                           ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-14  8:20 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Here is latest version of the patch.  The only thing changed from the 
> previous version is that I replaced 'plus' in loongson_paddd and 'minus' 
> in loongson_psubd with unspecs.
>
> It was tested with -march={mips3, loongson2?}/-mabi=n32,32,64 on 
> Loongson-2E and Loongson-2F boxes.
>
> I believe this patch is approved, so I'll check it in sometime tomorrow. 

It is.  Thanks again for your patience, and for making the changes.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-06-13 18:17           ` Maxim Kuvyrkov
@ 2008-06-14  8:32             ` Richard Sandiford
  2008-06-15 17:28               ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-14  8:32 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> FWIW, this is PR 35802.  'Fraid I still haven't had chance to look at it,
>> what with IRA, recog-related stuff and reviews.
>
> ...
>
>> We have two choices:
>> 
>>   - always use pseudo registers, rather than introducing uses of $3
>>     from the outset
>> 
>>   - force the destination of tls_get_tp_<mode> to be $3 only.
>
> I don't see how first approach can be any better than second.  We will 
> allocate register $3 for all those pseudos in the end.

The first approach is better if it works because some passes can only
optimise things that can have pseudo destinations.  E.g. after the patch,
gcse won't be able to optimise these patterns.

The ICE currently only occurs when a pass has made such a replacement,
presumably because it thought that the change was an improvement.
I imagine that the patch I posted prevents something that we thought
was an optimisation in your testcase.

That's the big drawback of the second approach.  These instructions
are emulated on the vast majority of processors, so we're losing
the ability to optimise very expensive instructions.

But like I say, I'm not convinced the first approach is going to avoid
the ICE in all cases.  We have a PR against a release branch, so in the
first instance, I think we need something that is safe over something
that leads to better optimisation.

A third alternative is to use ugly workarounds like:

(define_insn "tls_get_tp_<mode>"
  [(set (match_operand:P 0 "register_operand" "=v,???d")
	(unspec:P [(const_int 0)]
		  UNSPEC_TLS_GET_TP))]
  "HAVE_AS_TLS && !TARGET_MIPS16"

and make the second alternative use a sequence like:

        move    $1,$3
        rdhwr   $3,$29
        move    %0,$3
        move    $3,$1

which is _usually_ going to be a win in cases where it avoids a further
rdhwr instruction at runtime.  But it would lose otherwise.

This too should be safe, either on its own or in combination with
the first approach.

But longer-term, I think we need to do something less hacky that
still allows the optimisations.  I'm just not sure what yet. ;)

>> The second is probably the most conservative approach, since explicit
>> uses of $3 can occur through normal calls.  The patch below does this.
>> 
>> I admit I haven't verified any of this yet, so sorry if I'm off-ball.
>> But does the patch fix things?
>
> Looks like it does.  I didn't run full regression testsuite, but glibc 
> now builds.

Great, thanks.  I'll think a bit more about this before installing anything.
You needn't wait before applying your patches.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][4/5] Scheduling and tuning
  2008-06-14  8:32             ` Richard Sandiford
@ 2008-06-15 17:28               ` Maxim Kuvyrkov
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-15 17:28 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Richard Sandiford wrote:
>>> FWIW, this is PR 35802.  'Fraid I still haven't had chance to look at it,
>>> what with IRA, recog-related stuff and reviews.
>> ...
>>
>>> We have two choices:
>>>
>>>   - always use pseudo registers, rather than introducing uses of $3
>>>     from the outset
>>>
>>>   - force the destination of tls_get_tp_<mode> to be $3 only.
>> I don't see how first approach can be any better than second.  We will 
>> allocate register $3 for all those pseudos in the end.
> 
> The first approach is better if it works because some passes can only
> optimise things that can have pseudo destinations.  E.g. after the patch,
> gcse won't be able to optimise these patterns.
> 
> The ICE currently only occurs when a pass has made such a replacement,
> presumably because it thought that the change was an improvement.
> I imagine that the patch I posted prevents something that we thought
> was an optimisation in your testcase.
> 
> That's the big drawback of the second approach.  These instructions
> are emulated on the vast majority of processors, so we're losing
> the ability to optimise very expensive instructions.
> 
> But like I say, I'm not convinced the first approach is going to avoid
> the ICE in all cases.  We have a PR against a release branch, so in the
> first instance, I think we need something that is safe over something
> that leads to better optimisation.
> 
> A third alternative is to use ugly workarounds like:
> 
> (define_insn "tls_get_tp_<mode>"
>   [(set (match_operand:P 0 "register_operand" "=v,???d")
> 	(unspec:P [(const_int 0)]
> 		  UNSPEC_TLS_GET_TP))]
>   "HAVE_AS_TLS && !TARGET_MIPS16"
> 
> and make the second alternative use a sequence like:
> 
>         move    $1,$3
>         rdhwr   $3,$29
>         move    %0,$3
>         move    $3,$1
> 
> which is _usually_ going to be a win in cases where it avoids a further
> rdhwr instruction at runtime.  But it would lose otherwise.
> 
> This too should be safe, either on its own or in combination with
> the first approach.
> 
> But longer-term, I think we need to do something less hacky that
> still allows the optimisations.  I'm just not sure what yet. ;)

Oh, I can't suggest any better alternative than those you outlined 
above.  Shout if you need any help testing the eventual fix.

>>> The second is probably the most conservative approach, since explicit
>>> uses of $3 can occur through normal calls.  The patch below does this.
>>>
>>> I admit I haven't verified any of this yet, so sorry if I'm off-ball.
>>> But does the patch fix things?
>> Looks like it does.  I didn't run full regression testsuite, but glibc 
>> now builds.
> 
> Great, thanks.  I'll think a bit more about this before installing anything.
> You needn't wait before applying your patches.

I've applied the scheduling patch today.  Thanks for the review.

--
Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-05-25 12:02   ` Richard Sandiford
@ 2008-06-16 18:55     ` Maxim Kuvyrkov
  2008-06-16 21:59       ` Richard Sandiford
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-16 18:55 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 1750 bytes --]

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> This patch adds support for -march=native and -mtune=native options.
>>
>> File driver-st.c contains a routine that checks "cpu model" line in 
>> /proc/cpuinfo and appends a proper option to compiler command line.
>>
>> This patch also specifies layout for Loongson multilibs.
> 
> OK, the obvious question here is: should the -march=native support
> be specific to mips*-st*-linux* configurations, or should it apply
> to all mips*-linux* configurations?  I can imagine you discussed
> this internally; if you did, why settle on the former?

The rationale was to avoid testing on all the different MIPSes out there.

> My gut feeling was that -march=native ought to be supported for all
> mips*-linux* configurations, and that mips*-st*-linux* ought simply to
> specify a particular selection of multilibs (and associated multilib
> layout).  Thus driver-st.c would be called something more generic and
> would be included by config/mips/linux.h.  We could then add other
> names to the list as the need arises.

Fixed thusly.

> 
> I think it would be worth adding a comment saying that, if we can't
> detect a known processor, we simply discard the -march or -mtune option.
> This is in contrast to x86, where we force a lowest common demoninator.
> (For the record, I agree the behaviour you've got makes sense.)
> 
> You need to document the new option.
> 
> The implementation itself looks fine, thanks.

With help of Daniel Jacobowitz I updated the patch to be less 
loongson-specific and provide -march/mtune=native for all processors (at 
the moment, however, only Loongson 2E/2F CPUs can be detected).  Does 
the attached patch look ok to you?


Thanks,

Maxim

[-- Attachment #2: fsf-ls2ef-5-configure.ChangeLog --]
[-- Type: text/plain, Size: 596 bytes --]

2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
	    Kazu Hirata  <kazu@codesourcery.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com

	* config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
	* config.host: Use driver-native.o and mips/x-native for mips*-linux*.
	* config/mips/linux.h (host_detect_local_cpu): Declare, add to
	EXTRA_SPEC_FUNCTIONS.
	(MARCH_MTUNE_NATIVE_SPECS): New macro.
	(DRIVER_SELF_SPECS): Adjust.
	* config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
	* config/mips/st.h, config/mips/t-st: New.
	* config/mips/driver-native.c, config/mips/x-native: New.

[-- Attachment #3: fsf-ls2ef-5-configure.patch --]
[-- Type: text/plain, Size: 7522 bytes --]

--- gcc/config.gcc	(/local/gcc-4)	(revision 613)
+++ gcc/config.gcc	(/local/gcc-5)	(revision 613)
@@ -1523,6 +1523,12 @@ mips64*-*-linux*)
 	tm_file="dbxelf.h elfos.h svr4.h linux.h ${tm_file} mips/linux.h mips/linux64.h"
 	tmake_file="${tmake_file} mips/t-linux64"
 	tm_defines="${tm_defines} MIPS_ABI_DEFAULT=ABI_N32"
+	case ${target} in
+		mips64el-st-linux-gnu)
+			tm_file="${tm_file} mips/st.h"
+			tmake_file="${tmake_file} mips/t-st"
+			;;
+	esac
 	gnu_ld=yes
 	gas=yes
 	test x$with_llsc != x || with_llsc=yes
--- gcc/config.host	(/local/gcc-4)	(revision 613)
+++ gcc/config.host	(/local/gcc-5)	(revision 613)
@@ -104,6 +104,14 @@ case ${host} in
 	;;
     esac
     ;;
+  mips*-*-linux*)
+    case ${target} in
+      mips*-*-linux*)
+	host_extra_gcc_objs="driver-native.o"
+	host_xmake_file="${host_xmake_file} mips/x-native"
+      ;;
+    esac
+    ;;
 esac
 
 case ${host} in
--- gcc/config/mips/linux.h	(/local/gcc-4)	(revision 613)
+++ gcc/config/mips/linux.h	(/local/gcc-5)	(revision 613)
@@ -149,3 +149,23 @@ along with GCC; see the file COPYING3.  
 #else
 #define NO_SHARED_SPECS
 #endif
+
+/* -march=native handling only makes sense with compiler running on
+   a MIPS chip.  */
+#if defined(__mips__)
+extern const char *host_detect_local_cpu (int argc, const char **argv);
+# define EXTRA_SPEC_FUNCTIONS \
+  { "local_cpu_detect", host_detect_local_cpu },
+
+#define MARCH_MTUNE_NATIVE_SPECS				\
+  " %{march=native:%<march=native %:local_cpu_detect(arch)"	\
+  " %{!mtune=*:%<mtune=native %:local_cpu_detect(tune)}}"	\
+  " %{mtune=native:%<mtune=native %:local_cpu_detect(tune)}"
+
+#undef DRIVER_SELF_SPECS
+#define DRIVER_SELF_SPECS			\
+  NO_SHARED_SPECS				\
+  MARCH_MTUNE_NATIVE_SPECS
+#else
+#define MARCH_MTUNE_NATIVE_SPECS
+#endif
--- gcc/config/mips/driver-native.c	(/local/gcc-4)	(revision 613)
+++ gcc/config/mips/driver-native.c	(/local/gcc-5)	(revision 613)
@@ -0,0 +1,73 @@
+/* Subroutines for the gcc driver.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+
+/* This will be called by the spec parser in gcc.c when it sees
+   a %:local_cpu_detect(args) construct.  Currently it will be called
+   with either "arch" or "tune" as argument depending on if -march=native
+   or -mtune=native is to be substituted.
+
+   It returns a string containing new command line parameters to be
+   put at the place of the above two options, depending on what CPU
+   this is executed.  E.g. "-march=loongson2f" on a Loongson 2F for
+   -march=native.  If the routine can't detect a known processor,
+   the -march or -mtune option is discarded.
+
+   ARGC and ARGV are set depending on the actual arguments given
+   in the spec.  */
+const char *
+host_detect_local_cpu (int argc, const char **argv)
+{
+  const char *cpu = NULL;
+  char buf[128];
+  FILE *f;
+  bool arch;
+
+  if (argc < 1)
+    return NULL;
+
+  arch = strcmp (argv[0], "arch") == 0;
+  if (!arch && strcmp (argv[0], "tune"))
+    return NULL;
+
+  f = fopen ("/proc/cpuinfo", "r");
+  if (f == NULL)
+    return NULL;
+
+  while (fgets (buf, sizeof (buf), f) != NULL)
+    if (strncmp (buf, "cpu model", sizeof ("cpu model") - 1) == 0)
+      {
+	if (strstr (buf, "Godson2 V0.2") != NULL
+	    || strstr (buf, "Loongson-2 V0.2") != NULL)
+	  cpu = "loongson2e";
+	else if (strstr (buf, "Godson2 V0.3") != NULL
+		 || strstr (buf, "Loongson-2 V0.3") != NULL)
+	  cpu = "loongson2f";
+	break;
+      }
+
+  fclose (f);
+
+  if (cpu == NULL)
+    return NULL;
+
+  return concat ("-m", argv[0], "=", cpu, NULL);
+}
--- gcc/config/mips/t-st	(/local/gcc-4)	(revision 613)
+++ gcc/config/mips/t-st	(/local/gcc-5)	(revision 613)
@@ -0,0 +1,14 @@
+MULTILIB_OPTIONS = march=loongson2e/march=loongson2f mabi=n32/mabi=32/mabi=64 
+MULTILIB_DIRNAMES = 2e 2f lib32 lib lib64
+
+MULTILIB_OSDIRNAMES  = march.loongson2e/mabi.n32=../lib32/2e
+MULTILIB_OSDIRNAMES += march.loongson2e/mabi.32=../lib/2e
+MULTILIB_OSDIRNAMES += march.loongson2e/mabi.64=../lib64/2e
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.n32=../lib32/2f
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.32=../lib/2f
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.64=../lib64/2f
+MULTILIB_OSDIRNAMES += mabi.n32=../lib32
+MULTILIB_OSDIRNAMES += mabi.32=../lib
+MULTILIB_OSDIRNAMES += mabi.64=../lib64
+
+EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
--- gcc/config/mips/x-native	(/local/gcc-4)	(revision 613)
+++ gcc/config/mips/x-native	(/local/gcc-5)	(revision 613)
@@ -0,0 +1,3 @@
+driver-native.o : $(srcdir)/config/mips/driver-native.c \
+  $(CONFIG_H) $(SYSTEM_H)
+	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
--- gcc/config/mips/linux64.h	(/local/gcc-4)	(revision 613)
+++ gcc/config/mips/linux64.h	(/local/gcc-5)	(revision 613)
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  
 #undef DRIVER_SELF_SPECS
 #define DRIVER_SELF_SPECS \
 NO_SHARED_SPECS \
+MARCH_MTUNE_NATIVE_SPECS \
 " %{!EB:%{!EL:%(endian_spec)}}" \
 " %{!mabi=*: -mabi=n32}"
 
--- gcc/config/mips/st.h	(/local/gcc-4)	(revision 613)
+++ gcc/config/mips/st.h	(/local/gcc-5)	(revision 613)
@@ -0,0 +1,31 @@
+/* ST 2e / 2f GNU/Linux Configuration.
+   Copyright (C) 2008
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* The various C libraries each have their own subdirectory.  */
+#undef SYSROOT_SUFFIX_SPEC
+#define SYSROOT_SUFFIX_SPEC			\
+  "%{march=loongson2e:/2e ;			\
+     march=loongson2f:/2f}"
+
+#undef STARTFILE_PREFIX_SPEC
+#define STARTFILE_PREFIX_SPEC				\
+  "%{mabi=32: /usr/local/lib/ /lib/ /usr/lib/}		\
+   %{mabi=n32: /usr/local/lib32/ /lib32/ /usr/lib32/}	\
+   %{mabi=64: /usr/local/lib64/ /lib64/ /usr/lib64/}"

Property changes on: 
___________________________________________________________________
Name: svk:merge
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:597
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-2:598
 -cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-3:604
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-3:599
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-4:600
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:596
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-06-16 18:55     ` Maxim Kuvyrkov
@ 2008-06-16 21:59       ` Richard Sandiford
  2008-06-17 10:29         ` Maxim Kuvyrkov
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Sandiford @ 2008-06-16 21:59 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> 2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
> 	    Kazu Hirata  <kazu@codesourcery.com>
> 	    Maxim Kuvyrkov  <maxim@codesourcery.com
>
> 	* config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
> 	* config.host: Use driver-native.o and mips/x-native for mips*-linux*.
> 	* config/mips/linux.h (host_detect_local_cpu): Declare, add to
> 	EXTRA_SPEC_FUNCTIONS.
> 	(MARCH_MTUNE_NATIVE_SPECS): New macro.
> 	(DRIVER_SELF_SPECS): Adjust.
> 	* config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
> 	* config/mips/st.h, config/mips/t-st: New.
> 	* config/mips/driver-native.c, config/mips/x-native: New.

OK with the changes below.

> +/* -march=native handling only makes sense with compiler running on
> +   a MIPS chip.  */
> +#if defined(__mips__)
> +extern const char *host_detect_local_cpu (int argc, const char **argv);
> +# define EXTRA_SPEC_FUNCTIONS \
> +  { "local_cpu_detect", host_detect_local_cpu },
> +
> +#define MARCH_MTUNE_NATIVE_SPECS				\
> +  " %{march=native:%<march=native %:local_cpu_detect(arch)"	\
> +  " %{!mtune=*:%<mtune=native %:local_cpu_detect(tune)}}"	\
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Remove this; -march implies -mtune.  (Not just a nit; I was genuinely
confused at first, especially with the redundant "%<mtune=native"
inside "%{!mtune=*:...}".)

> +  " %{mtune=native:%<mtune=native %:local_cpu_detect(tune)}"
> +
> +#undef DRIVER_SELF_SPECS
> +#define DRIVER_SELF_SPECS			\
> +  NO_SHARED_SPECS				\
> +  MARCH_MTUNE_NATIVE_SPECS
> +#else
> +#define MARCH_MTUNE_NATIVE_SPECS
> +#endif

This is going to get complicated if we ever need to add to it.
Let's just go with:

----------------------------------------------------------------------
#ifdef HAVE_AS_NO_SHARED
/* Default to -mno-shared for non-PIC.  */
#define NO_SHARED_SPECS \
  "%{mshared|mno-shared|fpic|fPIC|fpie|fPIE:;:-mno-shared}"
#else
#define NO_SHARED_SPECS ""
#endif

/* -march=native handling only makes sense with compiler running on
   a MIPS chip.  */
#if defined(__mips__)
extern const char *host_detect_local_cpu (int argc, const char **argv);
# define EXTRA_SPEC_FUNCTIONS \
  { "local_cpu_detect", host_detect_local_cpu },

#define MARCH_MTUNE_NATIVE_SPECS				\
  " %{march=native:%<march=native %:local_cpu_detect(arch)}"	\
  " %{mtune=native:%<mtune=native %:local_cpu_detect(tune)}"
#else
#define MARCH_MTUNE_NATIVE_SPECS ""
#endif

#define BASE_DRIVER_SELF_SPECS \
  NO_SHARED_SPECS \
  MARCH_MTUNE_NATIVE_SPECS
#define DRIVER_SELF_SPECS BASE_DRIVER_SELF_SPECS
----------------------------------------------------------------------

Then in linux64.h:

----------------------------------------------------------------------
#undef DRIVER_SELF_SPECS
#define DRIVER_SELF_SPECS \
BASE_DRIVER_SELF_SPECS \
" %{!EB:%{!EL:%(endian_spec)}}" \
" %{!mabi=*: -mabi=n32}"
----------------------------------------------------------------------

We need to document the new options.  After:

----------------------------------------------------------------------
The special value @samp{from-abi} selects the
most compatible architecture for the selected ABI (that is,
@samp{mips1} for 32-bit ABIs and @samp{mips3} for 64-bit ABIs)@.
----------------------------------------------------------------------

let's add something like:

----------------------------------------------------------------------
Native Linux/GNU toolchains also support the value @samp{native},
which selects the best architecture option for the host processor.
@option{-march=native} has no effect if GCC does not recognize
the processor.
----------------------------------------------------------------------

Either add it as part of the same paragraph, or split a new paragraph
before "The special value"; whichever you think is best.

The -march=native support deserves it's own webpage entry.  Let me
know if you'd rather not add it yourself.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-06-16 21:59       ` Richard Sandiford
@ 2008-06-17 10:29         ` Maxim Kuvyrkov
  2008-06-18 19:48           ` Richard Sandiford
  2008-07-08 20:52           ` David Daney
  0 siblings, 2 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-17 10:29 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 1248 bytes --]

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> 2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
>> 	    Kazu Hirata  <kazu@codesourcery.com>
>> 	    Maxim Kuvyrkov  <maxim@codesourcery.com
>>
>> 	* config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
>> 	* config.host: Use driver-native.o and mips/x-native for mips*-linux*.
>> 	* config/mips/linux.h (host_detect_local_cpu): Declare, add to
>> 	EXTRA_SPEC_FUNCTIONS.
>> 	(MARCH_MTUNE_NATIVE_SPECS): New macro.
>> 	(DRIVER_SELF_SPECS): Adjust.
>> 	* config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
>> 	* config/mips/st.h, config/mips/t-st: New.
>> 	* config/mips/driver-native.c, config/mips/x-native: New.
> 
> OK with the changes below.

The patch is fixed as per your changes.

...

> We need to document the new options.  After:

I did document the new options, but attached an earlier version of the 
patch without those changes, sorry.  Anyway, your wording sounds better 
so I used it instead.

...

> The -march=native support deserves it's own webpage entry.  Let me
> know if you'd rather not add it yourself.

I don't think I have write access to website repository.  Can you, 
please, add the necessary announcement?


Thanks,

Maxim

[-- Attachment #2: fsf-ls2ef-5-configure.ChangeLog --]
[-- Type: text/plain, Size: 679 bytes --]

2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
	    Kazu Hirata  <kazu@codesourcery.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com

	* config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
	* config.host: Use driver-native.o and mips/x-native for mips*-linux*.
	* config/mips/linux.h (host_detect_local_cpu): Declare, add to
	EXTRA_SPEC_FUNCTIONS.
	(MARCH_MTUNE_NATIVE_SPECS): New macro.
	(DRIVER_SELF_SPECS): Adjust.
	* config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
	* config/mips/st.h, config/mips/t-st: New.
	* config/mips/driver-native.c, config/mips/x-native: New.
	* doc/invoke.texi (MIPS): Document 'native' value for -march and
	-mtune options.

[-- Attachment #3: fsf-ls2ef-5-configure.patch --]
[-- Type: text/plain, Size: 8496 bytes --]

--- gcc/doc/invoke.texi	(/local/gcc-4)	(revision 616)
+++ gcc/doc/invoke.texi	(/local/gcc-5)	(revision 616)
@@ -11972,6 +11972,11 @@ The special value @samp{from-abi} select
 most compatible architecture for the selected ABI (that is,
 @samp{mips1} for 32-bit ABIs and @samp{mips3} for 64-bit ABIs)@.
 
+Native Linux/GNU toolchains also support the value @samp{native},
+which selects the best architecture option for the host processor.
+@option{-march=native} has no effect if GCC does not recognize
+the processor.
+
 In processor names, a final @samp{000} can be abbreviated as @samp{k}
 (for example, @samp{-march=r2k}).  Prefixes are optional, and
 @samp{vr} may be written @samp{r}.
--- gcc/config.gcc	(/local/gcc-4)	(revision 616)
+++ gcc/config.gcc	(/local/gcc-5)	(revision 616)
@@ -1523,6 +1523,12 @@ mips64*-*-linux*)
 	tm_file="dbxelf.h elfos.h svr4.h linux.h ${tm_file} mips/linux.h mips/linux64.h"
 	tmake_file="${tmake_file} mips/t-linux64"
 	tm_defines="${tm_defines} MIPS_ABI_DEFAULT=ABI_N32"
+	case ${target} in
+		mips64el-st-linux-gnu)
+			tm_file="${tm_file} mips/st.h"
+			tmake_file="${tmake_file} mips/t-st"
+			;;
+	esac
 	gnu_ld=yes
 	gas=yes
 	test x$with_llsc != x || with_llsc=yes
--- gcc/config.host	(/local/gcc-4)	(revision 616)
+++ gcc/config.host	(/local/gcc-5)	(revision 616)
@@ -104,6 +104,14 @@ case ${host} in
 	;;
     esac
     ;;
+  mips*-*-linux*)
+    case ${target} in
+      mips*-*-linux*)
+	host_extra_gcc_objs="driver-native.o"
+	host_xmake_file="${host_xmake_file} mips/x-native"
+      ;;
+    esac
+    ;;
 esac
 
 case ${host} in
--- gcc/config/mips/linux.h	(/local/gcc-4)	(revision 616)
+++ gcc/config/mips/linux.h	(/local/gcc-5)	(revision 616)
@@ -143,9 +143,27 @@ along with GCC; see the file COPYING3.  
 
 #ifdef HAVE_AS_NO_SHARED
 /* Default to -mno-shared for non-PIC.  */
-#define NO_SHARED_SPECS \
+# define NO_SHARED_SPECS \
   "%{mshared|mno-shared|fpic|fPIC|fpie|fPIE:;:-mno-shared}"
-#define DRIVER_SELF_SPECS NO_SHARED_SPECS
 #else
-#define NO_SHARED_SPECS
+# define NO_SHARED_SPECS ""
 #endif
+
+/* -march=native handling only makes sense with compiler running on
+   a MIPS chip.  */
+#if defined(__mips__)
+extern const char *host_detect_local_cpu (int argc, const char **argv);
+# define EXTRA_SPEC_FUNCTIONS \
+  { "local_cpu_detect", host_detect_local_cpu },
+
+# define MARCH_MTUNE_NATIVE_SPECS				\
+  " %{march=native:%<march=native %:local_cpu_detect(arch)"	\
+  " %{mtune=native:%<mtune=native %:local_cpu_detect(tune)}"
+#else
+# define MARCH_MTUNE_NATIVE_SPECS ""
+#endif
+
+#define BASE_DRIVER_SELF_SPECS \
+  NO_SHARED_SPECS \
+  MARCH_MTUNE_NATIVE_SPECS
+#define DRIVER_SELF_SPECS BASE_DRIVER_SELF_SPECS
--- gcc/config/mips/driver-native.c	(/local/gcc-4)	(revision 616)
+++ gcc/config/mips/driver-native.c	(/local/gcc-5)	(revision 616)
@@ -0,0 +1,73 @@
+/* Subroutines for the gcc driver.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+
+/* This will be called by the spec parser in gcc.c when it sees
+   a %:local_cpu_detect(args) construct.  Currently it will be called
+   with either "arch" or "tune" as argument depending on if -march=native
+   or -mtune=native is to be substituted.
+
+   It returns a string containing new command line parameters to be
+   put at the place of the above two options, depending on what CPU
+   this is executed.  E.g. "-march=loongson2f" on a Loongson 2F for
+   -march=native.  If the routine can't detect a known processor,
+   the -march or -mtune option is discarded.
+
+   ARGC and ARGV are set depending on the actual arguments given
+   in the spec.  */
+const char *
+host_detect_local_cpu (int argc, const char **argv)
+{
+  const char *cpu = NULL;
+  char buf[128];
+  FILE *f;
+  bool arch;
+
+  if (argc < 1)
+    return NULL;
+
+  arch = strcmp (argv[0], "arch") == 0;
+  if (!arch && strcmp (argv[0], "tune"))
+    return NULL;
+
+  f = fopen ("/proc/cpuinfo", "r");
+  if (f == NULL)
+    return NULL;
+
+  while (fgets (buf, sizeof (buf), f) != NULL)
+    if (strncmp (buf, "cpu model", sizeof ("cpu model") - 1) == 0)
+      {
+	if (strstr (buf, "Godson2 V0.2") != NULL
+	    || strstr (buf, "Loongson-2 V0.2") != NULL)
+	  cpu = "loongson2e";
+	else if (strstr (buf, "Godson2 V0.3") != NULL
+		 || strstr (buf, "Loongson-2 V0.3") != NULL)
+	  cpu = "loongson2f";
+	break;
+      }
+
+  fclose (f);
+
+  if (cpu == NULL)
+    return NULL;
+
+  return concat ("-m", argv[0], "=", cpu, NULL);
+}
--- gcc/config/mips/t-st	(/local/gcc-4)	(revision 616)
+++ gcc/config/mips/t-st	(/local/gcc-5)	(revision 616)
@@ -0,0 +1,14 @@
+MULTILIB_OPTIONS = march=loongson2e/march=loongson2f mabi=n32/mabi=32/mabi=64 
+MULTILIB_DIRNAMES = 2e 2f lib32 lib lib64
+
+MULTILIB_OSDIRNAMES  = march.loongson2e/mabi.n32=../lib32/2e
+MULTILIB_OSDIRNAMES += march.loongson2e/mabi.32=../lib/2e
+MULTILIB_OSDIRNAMES += march.loongson2e/mabi.64=../lib64/2e
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.n32=../lib32/2f
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.32=../lib/2f
+MULTILIB_OSDIRNAMES += march.loongson2f/mabi.64=../lib64/2f
+MULTILIB_OSDIRNAMES += mabi.n32=../lib32
+MULTILIB_OSDIRNAMES += mabi.32=../lib
+MULTILIB_OSDIRNAMES += mabi.64=../lib64
+
+EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
--- gcc/config/mips/x-native	(/local/gcc-4)	(revision 616)
+++ gcc/config/mips/x-native	(/local/gcc-5)	(revision 616)
@@ -0,0 +1,3 @@
+driver-native.o : $(srcdir)/config/mips/driver-native.c \
+  $(CONFIG_H) $(SYSTEM_H)
+	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
--- gcc/config/mips/linux64.h	(/local/gcc-4)	(revision 616)
+++ gcc/config/mips/linux64.h	(/local/gcc-5)	(revision 616)
@@ -22,7 +22,7 @@ along with GCC; see the file COPYING3.  
    in order to make the other specs easier to write.  */
 #undef DRIVER_SELF_SPECS
 #define DRIVER_SELF_SPECS \
-NO_SHARED_SPECS \
+BASE_DRIVER_SELF_SPECS \
 " %{!EB:%{!EL:%(endian_spec)}}" \
 " %{!mabi=*: -mabi=n32}"
 
--- gcc/config/mips/st.h	(/local/gcc-4)	(revision 616)
+++ gcc/config/mips/st.h	(/local/gcc-5)	(revision 616)
@@ -0,0 +1,31 @@
+/* ST 2e / 2f GNU/Linux Configuration.
+   Copyright (C) 2008
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* The various C libraries each have their own subdirectory.  */
+#undef SYSROOT_SUFFIX_SPEC
+#define SYSROOT_SUFFIX_SPEC			\
+  "%{march=loongson2e:/2e ;			\
+     march=loongson2f:/2f}"
+
+#undef STARTFILE_PREFIX_SPEC
+#define STARTFILE_PREFIX_SPEC				\
+  "%{mabi=32: /usr/local/lib/ /lib/ /usr/lib/}		\
+   %{mabi=n32: /usr/local/lib32/ /lib32/ /usr/lib32/}	\
+   %{mabi=64: /usr/local/lib64/ /lib64/ /usr/lib64/}"

Property changes on: 
___________________________________________________________________
Name: svk:merge
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:597
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-2:598
 -cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-3:604
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-3:599
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-4:600
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:596
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-06-09 17:39       ` Richard Sandiford
@ 2008-06-17 19:52         ` Maxim Kuvyrkov
  2008-06-18 19:07         ` Maxim Kuvyrkov
  1 sibling, 0 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-17 19:52 UTC (permalink / raw)
  To: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 2115 bytes --]

Richard Sandiford wrote:

...

>> -(define_insn "*nmadd<mode>"
>> +(define_insn "*msub3<mode>"
>> +  [(set (match_operand:ANYF 0 "register_operand" "=f")
>> +	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
>> +			       (match_operand:ANYF 2 "register_operand" "f"))
>> +		    (match_operand:ANYF 3 "register_operand" "0")))]
>> +  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD
>> +   && !HONOR_NANS (<MODE>mode)"
>> +  "msub.<fmt>\t%0,%1,%2"
>> +  [(set_attr "type" "fmadd")
>> +   (set_attr "mode" "<UNITMODE>")])
>> +
> 
> I'd like a comment to say why we need !HONOR_NANS (<MODE>mode) for
> the 3-operand form but not the 4-operand form.
> 
> Like I say, here's an adjusted patch.  I've made all the changes
> except for the last one (because I don't know the reason why we
> need the HONOR_NANS check).
> 
> If the revised patch is OK with you, then it's OK to commit after testing.
> Please run the HONOR_NANS comment past me first, though.  (Alternatively,
> if we don't need the extra !HONOR_NANS condition, the patch is OK with
> that removed.)

Originally, I added !HONOR_NANS as a fix for a test that failed.  After 
further investigation it turned out that the kernel should have handled 
the situation, but failed to do so.  I remove the !HONOR_NANS check from 
madd3 and msub3 patterns now.

I've split this patch into two:

1. the one that adds [n]madd/[n]msub, mov[zn] instructions
2. and the one that adds paired-single float instructions.

I believe, (1) is approved, so I'll commit it tomorrow.  It was tested 
on Loongson 2E/2F boxes with the only regression which is due to kernel 
fault.

With (2) I ran into further problems during testing.  Apparently 
c.<cond>.ps instructions that Loongson supports take only 2 arguments, 
while respective instructions from normal ISA have 3 arguments -- the 
third argument being one of fccX registers (ISA_HAS_8CC).  Loongson have 
only one FP status register, hence the support for c.<cond>.ps 
instructions should be adjusted to handled not only CCV2/CCV4 modes, but 
also CC mode.  I need some time to fix that.


Thanks,

Maxim

[-- Attachment #2: fsf-ls2ef-madd-movz.ChangeLog --]
[-- Type: text/plain, Size: 1057 bytes --]

2008-06-16  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/mips/mips.h (ISA_HAS_CONDMOVE): Slice ISA_HAS_FP_CONDMOVE
	from it.
	(ISA_HAS_FP_CONDMOVE): New macro.
	(ISA_HAS_FP_MADD4_MSUB4, ISA_HAS_FP_MADD3_MSUB3): New macros.
	(ISA_HAS_NMADD_NMSUB): Rename to ISA_HAS_NMADD4_NMSUB4.
	(ISA_HAS_NMADD3_NMSUB3): New macro.
	* config/mips/mips.c (mips_rtx_costs): Update.
	* config/mips/mips.md (MOVECC): Don't use FP conditional moves when
	compiling for ST Loongson 2E/2F.
	(madd<mode>): Rename to madd4<mode>.  Update.
	(madd3<mode>): New pattern.
	(msub<mode>): Rename to msub4<mode>.  Update.
	(msub3<mode>): New pattern.
	(nmadd<mode>): Rename to nmadd4<mode>.  Update.
	(nmadd3<mode>): New pattern.
	(nmadd<mode>_fastmath): Rename to nmadd4<mode>_fastmath.  Update.
	(nmadd3<mode>_fastmath): New pattern.
	(nmsub<mode>): Rename to nmsub4<mode>.  Update.
	(nmsub3<mode>): New pattern.
	(nmsub<mode>_fastmath): Rename to nmsub4<mode>_fastmath.  Update.
	(nmsub3<mode>_fastmath): New pattern.
	(mov<SCALARF:mode>_on_<MOVECC:mode>, mov<mode>cc): Update.

[-- Attachment #3: fsf-ls2ef-madd-movz.patch --]
[-- Type: text/plain, Size: 10653 bytes --]

--- gcc/config/mips/mips.md	(/local/gcc-3)	(revision 616)
+++ gcc/config/mips/mips.md	(/local/gcc-4)	(revision 616)
@@ -526,7 +526,8 @@
 
 ;; This mode iterator allows :MOVECC to be used anywhere that a
 ;; conditional-move-type condition is needed.
-(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT") (CC "TARGET_HARD_FLOAT")])
+(define_mode_iterator MOVECC [SI (DI "TARGET_64BIT")
+                              (CC "TARGET_HARD_FLOAT && !TARGET_LOONGSON_2EF")])
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
@@ -1904,33 +1905,53 @@
 
 ;; Floating point multiply accumulate instructions.
 
-(define_insn "*madd<mode>"
+(define_insn "*madd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "madd.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*msub<mode>"
+(define_insn "*madd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(plus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD"
+  "madd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*msub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			       (match_operand:ANYF 2 "register_operand" "f"))
 		    (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_FP4 && TARGET_FUSED_MADD"
+  "ISA_HAS_FP_MADD4_MSUB4 && TARGET_FUSED_MADD"
   "msub.<fmt>\t%0,%3,%1,%2"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>"
+(define_insn "*msub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			       (match_operand:ANYF 2 "register_operand" "f"))
+		    (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_FP_MADD3_MSUB3 && TARGET_FUSED_MADD"
+  "msub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (plus:ANYF
 		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1938,13 +1959,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmadd<mode>_fastmath"
+(define_insn "*nmadd3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (plus:ANYF
+		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
+			      (match_operand:ANYF 2 "register_operand" "f"))
+		   (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmadd4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
 		    (match_operand:ANYF 2 "register_operand" "f"))
 	 (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1952,13 +1987,27 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>"
+(define_insn "*nmadd3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
+		    (match_operand:ANYF 2 "register_operand" "f"))
+	 (match_operand:ANYF 3 "register_operand" "0")))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmadd.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(neg:ANYF (minus:ANYF
 		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 			      (match_operand:ANYF 3 "register_operand" "f"))
 		   (match_operand:ANYF 1 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -1966,20 +2015,48 @@
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
 
-(define_insn "*nmsub<mode>_fastmath"
+(define_insn "*nmsub3<mode>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(neg:ANYF (minus:ANYF
+		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+			      (match_operand:ANYF 3 "register_operand" "f"))
+		   (match_operand:ANYF 1 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
+(define_insn "*nmsub4<mode>_fastmath"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(minus:ANYF
 	 (match_operand:ANYF 1 "register_operand" "f")
 	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 		    (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
   "nmsub.<fmt>\t%0,%1,%2,%3"
   [(set_attr "type" "fmadd")
    (set_attr "mode" "<UNITMODE>")])
-\f
+
+(define_insn "*nmsub3<mode>_fastmath"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+	(minus:ANYF
+	 (match_operand:ANYF 1 "register_operand" "f")
+	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
+		    (match_operand:ANYF 3 "register_operand" "0"))))]
+  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+   && TARGET_FUSED_MADD
+   && !HONOR_SIGNED_ZEROS (<MODE>mode)
+   && !HONOR_NANS (<MODE>mode)"
+  "nmsub.<fmt>\t%0,%1,%2"
+  [(set_attr "type" "fmadd")
+   (set_attr "mode" "<UNITMODE>")])
+
 ;;
 ;;  ....................
 ;;
@@ -6339,7 +6416,7 @@
 		 (const_int 0)])
 	 (match_operand:SCALARF 2 "register_operand" "f,0")
 	 (match_operand:SCALARF 3 "register_operand" "0,f")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
   "@
     mov%T4.<fmt>\t%0,%2,%1
     mov%t4.<fmt>\t%0,%3,%1"
@@ -6366,7 +6443,7 @@
 	(if_then_else:SCALARF (match_dup 5)
 			      (match_operand:SCALARF 2 "register_operand")
 			      (match_operand:SCALARF 3 "register_operand")))]
-  "ISA_HAS_CONDMOVE"
+  "ISA_HAS_FP_CONDMOVE"
 {
   mips_expand_conditional_move (operands);
   DONE;
--- gcc/config/mips/mips.c	(/local/gcc-3)	(revision 616)
+++ gcc/config/mips/mips.c	(/local/gcc-4)	(revision 616)
@@ -3286,7 +3286,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case MINUS:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && !HONOR_SIGNED_ZEROS (mode))
@@ -3337,7 +3337,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case NEG:
       if (float_mode_p
-	  && ISA_HAS_NMADD_NMSUB (mode)
+	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && HONOR_SIGNED_ZEROS (mode))
--- gcc/config/mips/mips.h	(/local/gcc-3)	(revision 616)
+++ gcc/config/mips/mips.h	(/local/gcc-4)	(revision 616)
@@ -745,14 +745,19 @@ enum mips_code_readable_setting {
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS16)
 
-/* ISA has the conditional move instructions introduced in mips4.  */
-#define ISA_HAS_CONDMOVE	((ISA_MIPS4				\
+/* ISA has the floating-point conditional move instructions introduced
+   in mips4.  */
+#define ISA_HAS_FP_CONDMOVE	((ISA_MIPS4				\
 				  || ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64)			\
 				 && !TARGET_MIPS5500			\
 				 && !TARGET_MIPS16)
 
+/* ISA has the integer conditional move instructions introduced in mips4 and
+   ST Loongson 2E/2F.  */
+#define ISA_HAS_CONDMOVE        (ISA_HAS_FP_CONDMOVE || TARGET_LOONGSON_2EF)
+
 /* ISA has LDC1 and SDC1.  */
 #define ISA_HAS_LDC1_SDC1	(!ISA_MIPS1 && !TARGET_MIPS16)
 
@@ -787,14 +792,26 @@ enum mips_code_readable_setting {
 /* Integer multiply-accumulate instructions should be generated.  */
 #define GENERATE_MADD_MSUB      (ISA_HAS_MADD_MSUB && !TUNE_74K)
 
-/* ISA has floating-point nmadd and nmsub instructions for mode MODE.  */
-#define ISA_HAS_NMADD_NMSUB(MODE) \
+/* ISA has floating-point madd and msub instructions 'd = a * b [+-] c'.  */
+#define ISA_HAS_FP_MADD4_MSUB4  ISA_HAS_FP4
+
+/* ISA has floating-point madd and msub instructions 'c [+-]= a * b'.  */
+#define ISA_HAS_FP_MADD3_MSUB3  TARGET_LOONGSON_2EF
+
+/* ISA has floating-point nmadd and nmsub instructions
+   'd = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD4_NMSUB4(MODE)					\
 				((ISA_MIPS4				\
 				  || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
 				  || ISA_MIPS64)			\
 				 && (!TARGET_MIPS5400 || TARGET_MAD)	\
 				 && !TARGET_MIPS16)
 
+/* ISA has floating-point nmadd and nmsub instructions
+   'c = -(a * b) [+-] c'.  */
+#define ISA_HAS_NMADD3_NMSUB3(MODE)					\
+                                TARGET_LOONGSON_2EF
+
 /* ISA has count leading zeroes/ones instruction (not implemented).  */
 #define ISA_HAS_CLZ_CLO		((ISA_MIPS32				\
 				  || ISA_MIPS32R2			\

Property changes on: 
___________________________________________________________________
Name: svk:merge
  7dca8dba-45c1-47dc-8958-1a7301c5ed47:/local-gcc/md-constraint:113709
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-1:597
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-2:598
 +cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-3:604
  cd855902-26a6-11dd-a899-33fab5efdf21:/local/gcc-trunk:596
  f367781f-d768-471e-ba66-e306e17dff77:/local/gen-rework-20060122:110130


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-06-09 17:39       ` Richard Sandiford
  2008-06-17 19:52         ` Maxim Kuvyrkov
@ 2008-06-18 19:07         ` Maxim Kuvyrkov
  2008-06-18 19:58           ` Richard Sandiford
  1 sibling, 1 reply; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-06-18 19:07 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Richard Sandiford wrote:

...

Richard, I have a general question about nmadd4 and nmadd4_fastmath 
patterns.  Why does condition for nmadd4 pattern contains 
HONOR_SIGNED_ZEROS?  I suspect we still can use nmadd4 pattern even if 
HONOR_SIGNED_ZEROS is false.  With current state of things if 
HONOR_SIGNED_ZEROS is false, then the instruction matching nmadd4 
pattern has to be transformed to match nmadd4_fastmath -- I doubt that 
happens in all arising cases.

Probably, I'm missing something here and I am curious what it is.


Thanks,

Maxim

> +(define_insn "*nmadd4<mode>"
>    [(set (match_operand:ANYF 0 "register_operand" "=f")
>  	(neg:ANYF (plus:ANYF
>  		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
>  			      (match_operand:ANYF 2 "register_operand" "f"))
>  		   (match_operand:ANYF 3 "register_operand" "f"))))]
> -  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
> +  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
>     && TARGET_FUSED_MADD
>     && HONOR_SIGNED_ZEROS (<MODE>mode)
>     && !HONOR_NANS (<MODE>mode)"
> @@ -1898,13 +1919,27 @@ (define_insn "*nmadd<mode>"
>    [(set_attr "type" "fmadd")
>     (set_attr "mode" "<UNITMODE>")])

...

  +(define_insn "*nmadd4<mode>_fastmath"
>    [(set (match_operand:ANYF 0 "register_operand" "=f")
>  	(minus:ANYF
>  	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
>  		    (match_operand:ANYF 2 "register_operand" "f"))
>  	 (match_operand:ANYF 3 "register_operand" "f")))]
> -  "ISA_HAS_NMADD_NMSUB (<MODE>mode)
> +  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
>     && TARGET_FUSED_MADD
>     && !HONOR_SIGNED_ZEROS (<MODE>mode)
>     && !HONOR_NANS (<MODE>mode)"
> @@ -1912,13 +1947,27 @@ (define_insn "*nmadd<mode>_fastmath"
>    [(set_attr "type" "fmadd")
>     (set_attr "mode" "<UNITMODE>")])


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-06-17 10:29         ` Maxim Kuvyrkov
@ 2008-06-18 19:48           ` Richard Sandiford
  2008-07-08 20:52           ` David Daney
  1 sibling, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-18 19:48 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> The -march=native support deserves it's own webpage entry.  Let me
>> know if you'd rather not add it yourself.
>
> I don't think I have write access to website repository.  Can you, 
> please, add the necessary announcement?

FWIW, everyone with access to the SVN repository also has access to the
web pages.  See:

    http://gcc.gnu.org/cvs.html

I installed the patch below.

Richard


Index: htdocs/gcc-4.4/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.4/changes.html,v
retrieving revision 1.13
diff -u -p -r1.13 changes.html
--- htdocs/gcc-4.4/changes.html	15 Jun 2008 17:53:26 -0000	1.13
+++ htdocs/gcc-4.4/changes.html	18 Jun 2008 19:38:02 -0000
@@ -102,6 +102,10 @@
         <code>-march=xlr</code> and <code>-mtune=xlr</code> options.</li>
     <li>64-bit targets can now perform 128-bit multiplications inline,
         instead of relying on a <code>libgcc</code> function.</li>
+    <li>Native GNU/Linux toolchains now support <code>-march=native</code>
+        and <code>-mtune=native</code>, which select the host processor.
+        This currently only works for toolchains runnings on Loongson
+        processors.</li>
   </ul>
 
 <h2>Documentation improvements</h2>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][3/5] Miscellaneous instructions
  2008-06-18 19:07         ` Maxim Kuvyrkov
@ 2008-06-18 19:58           ` Richard Sandiford
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Sandiford @ 2008-06-18 19:58 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>> Richard Sandiford wrote:
>
> ...
>
> Richard, I have a general question about nmadd4 and nmadd4_fastmath 
> patterns.  Why does condition for nmadd4 pattern contains 
> HONOR_SIGNED_ZEROS?  I suspect we still can use nmadd4 pattern even if 
> HONOR_SIGNED_ZEROS is false.  With current state of things if 
> HONOR_SIGNED_ZEROS is false, then the instruction matching nmadd4 
> pattern has to be transformed to match nmadd4_fastmath -- I doubt that 
> happens in all arising cases.
>
> Probably, I'm missing something here and I am curious what it is.

We have patterns for both HONOR_SIGNED_ZEROS and !HONOR_SIGNED_ZEROS,
but they need to match different things because HONOR_SIGNED_ZEROS
changes the canonical form of the rtl.  See:

    http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00911.html

for details.

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-06-17 10:29         ` Maxim Kuvyrkov
  2008-06-18 19:48           ` Richard Sandiford
@ 2008-07-08 20:52           ` David Daney
  2008-07-09 19:07             ` Maxim Kuvyrkov
  1 sibling, 1 reply; 66+ messages in thread
From: David Daney @ 2008-07-08 20:52 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Maxim Kuvyrkov wrote:
> Richard Sandiford wrote:
>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>> 2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
>>>         Kazu Hirata  <kazu@codesourcery.com>
>>>         Maxim Kuvyrkov  <maxim@codesourcery.com
>>>
>>>     * config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
>>>     * config.host: Use driver-native.o and mips/x-native for 
>>> mips*-linux*.
>>>     * config/mips/linux.h (host_detect_local_cpu): Declare, add to
>>>     EXTRA_SPEC_FUNCTIONS.
>>>     (MARCH_MTUNE_NATIVE_SPECS): New macro.
>>>     (DRIVER_SELF_SPECS): Adjust.
>>>     * config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
>>>     * config/mips/st.h, config/mips/t-st: New.
>>>     * config/mips/driver-native.c, config/mips/x-native: New.
>>
>> OK with the changes below.
> 
> The patch is fixed as per your changes.
> 

How was this tested?  I cannot bootstrap due to the following in stage2:

/home/ddaney/gccsvn/trunk-build/./prev-gcc/xgcc -B/home/ddaney/gccsvn/trunk-build/./prev-gcc/ -B/home/ddaney/gccsvn/trunk-install/mipsel-linux/bin/ -c  -g -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual -Wc++-compat -Wold-style-definition -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros                            -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../trunk/gcc -I../../trunk/gcc/. -I../../trunk/gcc/../include -I../../trunk/gcc/../libcpp/include -I/home/ddaney/mp/include -I/home/ddaney/mp/include -I../../trunk/gcc/../libdecnumber -I../../trunk/gcc/../libdecnumber/dpd -I../libdecnumber  -I. -I. -I../../trunk/gcc -I../../trunk/gcc/. -I../../trunk/gcc/../include -I../../trunk/gcc/../libcpp/include -I/home/ddaney/mp/include -I/home/ddaney/mp/include -I../../trunk/gcc/../libdecnumber -I../../trunk/gcc/../libdecnumber/dpd -I../libdecnumber ../../trunk/gcc/config/mips/driver-nat
ive.c
cc1: warnings being treated as errors
../../trunk/gcc/config/mips/driver-native.c:37: error: no previous prototype for 'host_detect_local_cpu'
make[3]: *** [driver-native.o] Error 1
make[3]: *** Waiting for unfinished jobs....
rm gcj-dbtool.pod gcov.pod fsf-funding.pod jcf-dump.pod jv-convert.pod grmic.pod gcj.pod gc-analyze.pod gfdl.pod cpp.pod gij.pod gfortran.pod gcc.pod
make[3]: Leaving directory `/home/ddaney/gccsvn/trunk-build/gcc'
.
.
.

This is with trunk revision 137587 configured as:
../trunk/configure --prefix=/home/ddaney/gccsvn/trunk-install --target=mipsel-linux --build=mipsel-linux --host=mipsel-linux --with-gmp=/home/ddaney/mp --with-mpfr=/home/ddaney/mp --with-arch=sb1 --disable-java-awt --without-x --enable-__cxa_atexit --enable-languages=all

David Daney

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-07-08 20:52           ` David Daney
@ 2008-07-09 19:07             ` Maxim Kuvyrkov
  2008-07-10  3:52               ` Eric Fisher
  2008-07-10 20:00               ` Mark Mitchell
  0 siblings, 2 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-07-09 19:07 UTC (permalink / raw)
  To: David Daney; +Cc: gcc-patches, Zhang Le, Eric Fisher, rdsandiford

David Daney wrote:
> Maxim Kuvyrkov wrote:
>> Richard Sandiford wrote:
>>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>>> 2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
>>>>         Kazu Hirata  <kazu@codesourcery.com>
>>>>         Maxim Kuvyrkov  <maxim@codesourcery.com
>>>>
>>>>     * config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
>>>>     * config.host: Use driver-native.o and mips/x-native for 
>>>> mips*-linux*.
>>>>     * config/mips/linux.h (host_detect_local_cpu): Declare, add to
>>>>     EXTRA_SPEC_FUNCTIONS.
>>>>     (MARCH_MTUNE_NATIVE_SPECS): New macro.
>>>>     (DRIVER_SELF_SPECS): Adjust.
>>>>     * config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
>>>>     * config/mips/st.h, config/mips/t-st: New.
>>>>     * config/mips/driver-native.c, config/mips/x-native: New.
>>>
>>> OK with the changes below.
>>
>> The patch is fixed as per your changes.
>>
> 
> How was this tested?  I cannot bootstrap due to the following in stage2:

Sorry for breaking the bootstrap.  The patch was cross-tested on 
mips64el-st-linux-gnu.  I remember testing the bootstrap, but, probably, 
that was for the original version of the patch.

FWIW, I think the fix you posted in the other thread is the proper one.


Thanks,

Maxim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-07-09 19:07             ` Maxim Kuvyrkov
@ 2008-07-10  3:52               ` Eric Fisher
  2008-07-10  5:16                 ` Maxim Kuvyrkov
  2008-07-10 20:00               ` Mark Mitchell
  1 sibling, 1 reply; 66+ messages in thread
From: Eric Fisher @ 2008-07-10  3:52 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: David Daney, gcc-patches, Zhang Le, rdsandiford

I got an error when build the cross tools, using the updated patches.

checking for suffix of object files... configure: error: cannot
compute suffix of object files: cannot compile
See `config.log' for more details.
make[1]: *** [configure-target-libgcc] Error 1


the configure is,

../cs-gcc-4.3.0/configure --prefix=/home/xmj/install/cs-gcc-4.3.0
--enable-languages=c,c++,fortran --enable-shared
--enable-threads=posix --with-gmp=/home/xmj/install/gmp-4.2.2
--with-mpfr=/home/xmj/install/mpfr-2.3.1 --disable-multilib
--target=mips64el-linux --build=mipsel-linux --host=mipsel-linux

details from  mipsel-linux/libgcc/config.log are,

Reading specs from /home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/specs
xgcc: braced spec body '%<march=native %:local_cpu_detect(arch)
%{mtune=native:%<mtune=native %:local_cpu_detect(tune)} %{!EB:
%{!EL:%(endian_spec)}} %{!mabi=*: -mabi=n32}' is invalid
configure:2374: $? = 1
configure:2376:
/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/xgcc
-B/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc
/ -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/bin/
-B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/lib/ -isystem
/home/xmj
/install/cs-gcc-4.3.0/mips64el-linux/include -isystem
/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/sys-include -V
</dev/null
>&5
xgcc: '-V' must come at the start of the command line
configure:2379: $? = 1
configure:2398:
/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/xgcc
-B/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc
/ -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/bin/
-B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/lib/ -isystem
/home/xmj
/install/cs-gcc-4.3.0/mips64el-linux/include -isystem
/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/sys-include -o conftest
-O
2 -g -g -O2     conftest.c  >&5
xgcc: braced spec body '%<march=native %:local_cpu_detect(arch)
%{mtune=native:%<mtune=native %:local_cpu_detect(tune)} %{!EB:
%{!EL:%(endian_spec)}} %{!mabi=*: -mabi=n32}' is invalid
configure:2401: $? = 1
configure:2567: checking for suffix of object files
configure:2588:
/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/xgcc
-B/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc
/ -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/bin/
-B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/lib/ -isystem
/home/xmj
/install/cs-gcc-4.3.0/mips64el-linux/include -isystem
/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/sys-include -c -O2 -g -g
-
O2    conftest.c >&5
xgcc: braced spec body '%<march=native %:local_cpu_detect(arch)
%{mtune=native:%<mtune=native %:local_cpu_detect(tune)} %{!EB:
%{!EL:%(endian_spec)}} %{!mabi=*: -mabi=n32}' is invalid
configure:2591: $? = 1




2008/7/10 Maxim Kuvyrkov <maxim@codesourcery.com>:
> David Daney wrote:
>>
>> Maxim Kuvyrkov wrote:
>>>
>>> Richard Sandiford wrote:
>>>>
>>>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>>>>
>>>>> 2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
>>>>>        Kazu Hirata  <kazu@codesourcery.com>
>>>>>        Maxim Kuvyrkov  <maxim@codesourcery.com
>>>>>
>>>>>    * config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
>>>>>    * config.host: Use driver-native.o and mips/x-native for
>>>>> mips*-linux*.
>>>>>    * config/mips/linux.h (host_detect_local_cpu): Declare, add to
>>>>>    EXTRA_SPEC_FUNCTIONS.
>>>>>    (MARCH_MTUNE_NATIVE_SPECS): New macro.
>>>>>    (DRIVER_SELF_SPECS): Adjust.
>>>>>    * config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
>>>>>    * config/mips/st.h, config/mips/t-st: New.
>>>>>    * config/mips/driver-native.c, config/mips/x-native: New.
>>>>
>>>> OK with the changes below.
>>>
>>> The patch is fixed as per your changes.
>>>
>>
>> How was this tested?  I cannot bootstrap due to the following in stage2:
>
> Sorry for breaking the bootstrap.  The patch was cross-tested on
> mips64el-st-linux-gnu.  I remember testing the bootstrap, but, probably,
> that was for the original version of the patch.
>
> FWIW, I think the fix you posted in the other thread is the proper one.
>
>
> Thanks,
>
> Maxim
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-07-10  3:52               ` Eric Fisher
@ 2008-07-10  5:16                 ` Maxim Kuvyrkov
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Kuvyrkov @ 2008-07-10  5:16 UTC (permalink / raw)
  To: Eric Fisher; +Cc: David Daney, gcc-patches, Zhang Le, rdsandiford

Eric Fisher wrote:
> I got an error when build the cross tools, using the updated patches.

...

> Reading specs from /home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/specs
> xgcc: braced spec body '%<march=native %:local_cpu_detect(arch)
> %{mtune=native:%<mtune=native %:local_cpu_detect(tune)} %{!EB:
> %{!EL:%(endian_spec)}} %{!mabi=*: -mabi=n32}' is invalid

The brace after " %{march=native:%<march=native 
%:local_cpu_detect(arch)" is missing.  Before committing the patch I 
retested it and fixed this typo.  So, the relevant piece of the patch 
should be
+# define MARCH_MTUNE_NATIVE_SPECS				\
+  " %{march=native:%<march=native %:local_cpu_detect(arch)}"	\
+  " %{mtune=native:%<mtune=native %:local_cpu_detect(tune)}"

This is the only difference between the patch posted in thread and patch 
committed in revision 136888.


--
Maxim

> configure:2374: $? = 1
> configure:2376:
> /home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/xgcc
> -B/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc
> / -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/bin/
> -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/lib/ -isystem
> /home/xmj
> /install/cs-gcc-4.3.0/mips64el-linux/include -isystem
> /home/xmj/install/cs-gcc-4.3.0/mips64el-linux/sys-include -V
> </dev/null
>> &5
> xgcc: '-V' must come at the start of the command line
> configure:2379: $? = 1
> configure:2398:
> /home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/xgcc
> -B/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc
> / -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/bin/
> -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/lib/ -isystem
> /home/xmj
> /install/cs-gcc-4.3.0/mips64el-linux/include -isystem
> /home/xmj/install/cs-gcc-4.3.0/mips64el-linux/sys-include -o conftest
> -O
> 2 -g -g -O2     conftest.c  >&5
> xgcc: braced spec body '%<march=native %:local_cpu_detect(arch)
> %{mtune=native:%<mtune=native %:local_cpu_detect(tune)} %{!EB:
> %{!EL:%(endian_spec)}} %{!mabi=*: -mabi=n32}' is invalid
> configure:2401: $? = 1
> configure:2567: checking for suffix of object files
> configure:2588:
> /home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc/xgcc
> -B/home/xmj/loongson2e-tools/build-cs-gcc-4.3.0/./gcc
> / -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/bin/
> -B/home/xmj/install/cs-gcc-4.3.0/mips64el-linux/lib/ -isystem
> /home/xmj
> /install/cs-gcc-4.3.0/mips64el-linux/include -isystem
> /home/xmj/install/cs-gcc-4.3.0/mips64el-linux/sys-include -c -O2 -g -g
> -
> O2    conftest.c >&5
> xgcc: braced spec body '%<march=native %:local_cpu_detect(arch)
> %{mtune=native:%<mtune=native %:local_cpu_detect(tune)} %{!EB:
> %{!EL:%(endian_spec)}} %{!mabi=*: -mabi=n32}' is invalid
> configure:2591: $? = 1
> 
> 
> 
> 
> 2008/7/10 Maxim Kuvyrkov <maxim@codesourcery.com>:
>> David Daney wrote:
>>> Maxim Kuvyrkov wrote:
>>>> Richard Sandiford wrote:
>>>>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>>>>> 2008-06-16  Daniel Jacobowitz  <dan@codesourcery.com>
>>>>>>        Kazu Hirata  <kazu@codesourcery.com>
>>>>>>        Maxim Kuvyrkov  <maxim@codesourcery.com
>>>>>>
>>>>>>    * config.gcc (mips64el-st-linux-gnu): Use mips/st.h and mips/t-st.
>>>>>>    * config.host: Use driver-native.o and mips/x-native for
>>>>>> mips*-linux*.
>>>>>>    * config/mips/linux.h (host_detect_local_cpu): Declare, add to
>>>>>>    EXTRA_SPEC_FUNCTIONS.
>>>>>>    (MARCH_MTUNE_NATIVE_SPECS): New macro.
>>>>>>    (DRIVER_SELF_SPECS): Adjust.
>>>>>>    * config/mips/linux64.h (DRIVER_SELF_SPECS): Update.
>>>>>>    * config/mips/st.h, config/mips/t-st: New.
>>>>>>    * config/mips/driver-native.c, config/mips/x-native: New.
>>>>> OK with the changes below.
>>>> The patch is fixed as per your changes.
>>>>
>>> How was this tested?  I cannot bootstrap due to the following in stage2:
>> Sorry for breaking the bootstrap.  The patch was cross-tested on
>> mips64el-st-linux-gnu.  I remember testing the bootstrap, but, probably,
>> that was for the original version of the patch.
>>
>> FWIW, I think the fix you posted in the other thread is the proper one.
>>
>>
>> Thanks,
>>
>> Maxim
>>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-07-09 19:07             ` Maxim Kuvyrkov
  2008-07-10  3:52               ` Eric Fisher
@ 2008-07-10 20:00               ` Mark Mitchell
  2008-07-10 21:01                 ` David Daney
  1 sibling, 1 reply; 66+ messages in thread
From: Mark Mitchell @ 2008-07-10 20:00 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: David Daney, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Maxim Kuvyrkov wrote:

> Sorry for breaking the bootstrap.  The patch was cross-tested on 
> mips64el-st-linux-gnu.  I remember testing the bootstrap, but, probably, 
> that was for the original version of the patch.
> 
> FWIW, I think the fix you posted in the other thread is the proper one.

If this patch needs approval, please feel free to forward it to me for 
review.  (If it looks MIPS-y, I'll defer to the MIPS maintainers, but if 
it's just a generic fix, then I can expedite so that we don't leave the 
tree broken.)

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [MIPS][LS2][5/5] Support for native MIPS GCC
  2008-07-10 20:00               ` Mark Mitchell
@ 2008-07-10 21:01                 ` David Daney
  0 siblings, 0 replies; 66+ messages in thread
From: David Daney @ 2008-07-10 21:01 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Maxim Kuvyrkov, gcc-patches, Zhang Le, Eric Fisher, rdsandiford

Mark Mitchell wrote:
> Maxim Kuvyrkov wrote:
> 
>> Sorry for breaking the bootstrap.  The patch was cross-tested on 
>> mips64el-st-linux-gnu.  I remember testing the bootstrap, but, 
>> probably, that was for the original version of the patch.
>>
>> FWIW, I think the fix you posted in the other thread is the proper one.
> 
> If this patch needs approval, please feel free to forward it to me for 
> review.  (If it looks MIPS-y, I'll defer to the MIPS maintainers, but if 
> it's just a generic fix, then I can expedite so that we don't leave the 
> tree broken.)
> 

Thanks Mark,

Richard already approved the patch, and it has been committed.

David Daney

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2008-07-10 20:00 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-22 17:29 [MIPS] ST Loongson 2E/2F submission Maxim Kuvyrkov
2008-05-22 17:53 ` [MIPS][LS2][1/5] Generic support Maxim Kuvyrkov
2008-05-22 19:27   ` Richard Sandiford
2008-05-26 13:47     ` Maxim Kuvyrkov
2008-05-27 18:41       ` Richard Sandiford
2008-05-28 12:29         ` Maxim Kuvyrkov
2008-05-22 18:08 ` [MIPS][LS2][2/5] Vector intrinsics Maxim Kuvyrkov
2008-05-22 19:35   ` Richard Sandiford
2008-05-28 12:52     ` Maxim Kuvyrkov
2008-05-28 18:20       ` Richard Sandiford
2008-06-05 10:38     ` Maxim Kuvyrkov
2008-06-05 16:16       ` Richard Sandiford
2008-06-06  8:08         ` Ruan Beihong
2008-06-09 18:24           ` Maxim Kuvyrkov
2008-06-10  7:32             ` Richard Sandiford
2008-06-06 12:31         ` Maxim Kuvyrkov
2008-06-06 14:06           ` Richard Sandiford
2008-06-09 18:27             ` Maxim Kuvyrkov
2008-06-10 10:29               ` Richard Sandiford
2008-06-11 10:09                 ` Maxim Kuvyrkov
2008-06-11 10:23                   ` Richard Sandiford
2008-06-11 20:34                 ` Maxim Kuvyrkov
2008-06-12  8:16                   ` Richard Sandiford
2008-06-12  8:45                     ` Maxim Kuvyrkov
2008-06-12  9:03                       ` Richard Sandiford
2008-06-13 18:36                         ` Maxim Kuvyrkov
2008-06-14  8:20                           ` Richard Sandiford
2008-05-22 18:16 ` [MIPS][LS2][3/5] Miscellaneous instructions Maxim Kuvyrkov
2008-05-22 19:33   ` Richard Sandiford
2008-06-08 19:59     ` Maxim Kuvyrkov
2008-06-09 13:16       ` Maxim Kuvyrkov
2008-06-09 17:45         ` Richard Sandiford
2008-06-13  6:59           ` Richard Sandiford
2008-06-09 17:39       ` Richard Sandiford
2008-06-17 19:52         ` Maxim Kuvyrkov
2008-06-18 19:07         ` Maxim Kuvyrkov
2008-06-18 19:58           ` Richard Sandiford
2008-05-22 18:22 ` [MIPS][LS2][4/5] Scheduling and tuning Maxim Kuvyrkov
2008-05-23  3:07   ` Zhang Le
2008-05-23 13:17     ` Maxim Kuvyrkov
2008-05-25 11:57   ` Richard Sandiford
2008-06-12 13:45     ` Maxim Kuvyrkov
2008-06-12 17:49       ` Richard Sandiford
2008-06-12 18:04         ` Maxim Kuvyrkov
2008-06-12 18:53           ` Richard Sandiford
     [not found]       ` <48515794.7050007@codesourcery.com>
2008-06-12 17:21         ` Maxim Kuvyrkov
2008-06-12 18:43           ` Richard Sandiford
2008-06-12 18:06         ` Richard Sandiford
2008-06-13 18:17           ` Maxim Kuvyrkov
2008-06-14  8:32             ` Richard Sandiford
2008-06-15 17:28               ` Maxim Kuvyrkov
2008-05-22 18:29 ` [MIPS][LS2][5/5] Support for native MIPS GCC Maxim Kuvyrkov
2008-05-25 12:02   ` Richard Sandiford
2008-06-16 18:55     ` Maxim Kuvyrkov
2008-06-16 21:59       ` Richard Sandiford
2008-06-17 10:29         ` Maxim Kuvyrkov
2008-06-18 19:48           ` Richard Sandiford
2008-07-08 20:52           ` David Daney
2008-07-09 19:07             ` Maxim Kuvyrkov
2008-07-10  3:52               ` Eric Fisher
2008-07-10  5:16                 ` Maxim Kuvyrkov
2008-07-10 20:00               ` Mark Mitchell
2008-07-10 21:01                 ` David Daney
2008-05-22 19:25 ` [MIPS] ST Loongson 2E/2F submission Gerald Pfeifer
2008-05-23  4:46 ` Eric Fisher
2008-05-23  6:05   ` Zhang Le

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).