public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [ARM] Refactor Neon Builtins infrastructure
@ 2014-11-12 17:09 James Greenhalgh
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
  2014-11-18  9:15 ` [ARM] Refactor Neon Builtins infrastructure Ramana Radhakrishnan
  0 siblings, 2 replies; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, paul, nickc

[-- Attachment #1: Type: text/plain, Size: 4615 bytes --]

Hi,

I was taking a look at fixing the issues in the ARM back-end exposed
by Marc Glisse's patch in [1], and hoped to fix them by adapting the
patch recently commited by Tejas ([2]).

As I looked, I realised that the ARM target and the AArch64 target
now differ drastically in how their Advanced SIMD builtin
initialisation and expansion logic works. This is a growing
maintenance burden. This patch series is an attempt to start fixing
the problem.

From a high level, I see five problems with the ARM Neon builtin code.

First is the "magic number" interface, which gives builtins with signed
and unsigned, or saturating and non-saturating, variants an extra
parameter used to control which instruction is ultimately emitted. This
is problematic as it enforces that these intrinsics be implemented with
an UNSPEC pattern, we would like the flexibility to try to do a better job
of modeling these patterns.

Second, is that all the code lives in arm.c. This file is huge and
frightening. The least we could do is start to split it up!

Third, is the complicated builtin initialisation code. If we collect
common cases together from the large switch in the initialisation function,
it is clear we can eliminate much of the existing code. In fact, we have
already solved the same problem in AArch64 ([3]), and we don't gain
anything from having these interfaces separate.

Fourth, is that we don't have infrastructure to strongly type the functions
in arm_neon.h -  instead casting around between signed and unsigned vector
arguments as required. We need this to avoid special casing some builtins
we may want to vectorize (bswap and friends). Again we've solved this
in AArch64 ([4]).

Finally, there are the issues with type mangling Marc has seen.

This patch-set tries to fix those issues in order, and progresses as so:

First the magic words:

  [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words"

Then moving code out to arm-builtins.c:

  [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h
  [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file

And then making the ARM backend look like the AArch64 backend and fixing
Marc's issue.

  [Patch ARM Refactor Builtins 4/8]  Refactor "VAR<n>" Macros
  [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in
    ARM.
  [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling
  [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when
    initialising builtins and fix type mangling
  [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin
    infrastructure

Clearly there is more we could do to start sharing code between the two
targets rather than duplicating it. For now, the benefit did not seem worth
the substantial churn that this would cause both back-ends.

I've bootstrapped each patch in this series in turn for both arm and
thumb on arm-none-linux-gnueabihf.

OK for trunk?

Thanks,
James

---
[1]: [c++] typeinfo for target types
     https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00618.html
[2]: [AArch64, Patch] Restructure arm_neon.h vector types's implementation
     https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00264.html
[3]: [AArch64] Refactor Advanced SIMD builtin initialisation.
     https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00532.html
[4]: [AArch64] AArch64 SIMD Builtins Better Type Correctness.
     https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02005.html

---
 gcc/config.gcc                               |    3 +-
 gcc/config/arm/arm-builtins.c                | 2925 ++++++++++++++++++++++++
 gcc/config/arm/arm-protos.h                  |  173 +-
 gcc/config/arm/arm-simd-builtin-types.def    |   48 +
 gcc/config/arm/arm.c                         | 3149 +-------------------------
 gcc/config/arm/arm_neon.h                    | 1743 +++++++-------
 gcc/config/arm/arm_neon_builtins.def         |  435 ++--
 gcc/config/arm/iterators.md                  |  167 ++
 gcc/config/arm/neon.md                       |  893 ++++----
 gcc/config/arm/t-arm                         |   11 +
 gcc/config/arm/unspecs.md                    |  109 +-
 gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C |   16 +
 gcc/testsuite/g++.dg/abi/mangle-neon.C       |    5 +
 gcc/testsuite/gcc.target/arm/pr51968.c       |    2 +-
 create mode 100644 gcc/config/arm/arm-builtins.c
 create mode 100644 gcc/config/arm/arm-simd-builtin-types.def
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
 14 files changed, 4992 insertions(+), 4687 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words"
  2014-11-12 17:09 [ARM] Refactor Neon Builtins infrastructure James Greenhalgh
@ 2014-11-12 17:11 ` James Greenhalgh
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM James Greenhalgh
                     ` (7 more replies)
  2014-11-18  9:15 ` [ARM] Refactor Neon Builtins infrastructure Ramana Radhakrishnan
  1 sibling, 8 replies; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 34456 bytes --]


Hi,

As part of some wider cleanup I'd like to do to ARM's Neon Builtin
infrastructure, my first step will be to remove the "Magic Words" used
to decide which variant of an instruction should be emitted.

The "Magic Words" interface allows a single builtin
(say, __builtin_neon_shr_nv4hi) to cover signed, unsigned and rounding
variants through the use of an extra control parameter.

This patch removes that interface, defining individual builtins for each
variant and dropping the extra parameter.

There are several benefits to cleaning this up:

  * We can start to drop some of the UNSPEC operations without having to
    add additional expand patterns to map them.
  * The interface is confusing on first glance at the file.
  * Having such a different interface to AArch64 doubles the amount of
    time it takes to grok the Neon Builtins infrastructure.

The drawbacks of changing this interface are:

  * Another big churn change for the ARM backend.
  * A series of new iterators, UNSPECs and builtin functions to cover the
    variants which were previously controlled by a "Magic Word".
  * Lots more patterns for genrecog to think about, potentially slowing
    down compilation, increasing bootstrap time, and increasing compiler
    binary size.

On balance, I think we should deal with drawbacks in return for the future
clean-ups we enable, but I expect this to be controversial.

This patch is naieve and conservative. I don't make any effort to merge
patterns across iterators, nor any attempt to change UNSPECs to specified
tree codes. Future improvements in this area would be useful.

I've bootstrapped the patch for arm-none-linux-gnueabihf in isolation, and
in series.

OK for trunk?

Thanks,
James

---
gcc/testsuite/

2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.target/arm/pr51968.c (foo): Do not try to pass "Magic Word".

gcc/

2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/arm/arm.c (arm_expand_neon_builtin): Remove "Magic Word"
	parameter, rearrange switch statement accordingly.
	(arm_evpc_neon_vrev): Remove "Magic Word".
	* config/arm/unspecs.md (unspec): Split many UNSPECs to
	rounding, or signed/unsigned variants.
	* config/arm/neon.md (vcond<mode><mode>): Remove "Magic Word" code.
	(vcondu<mode><mode>): Likewise.
	(neon_vadd): Remove "Magic Word" operand.
	(neon_vaddl): Remove "Magic Word" operand, convert to use
	signed/unsigned iterator.
	(neon_vaddw): Likewise.
	(neon_vhadd): Likewise, also iterate over "rounding" forms.
	(neon_vqadd): Remove "Magic Word" operand, convert to use
	signed/unsigned iterator.
	(neon_v<r>addhn): Remove "Magic Word" operand, convert to iterate
	over "rounding" forms.
	(neon_vmul): Remove "Magic Word" operand, iterate over
	polynomial/float instruction forms.
	(neon_vmla): Remove "Magic Word" operand.
	(neon_vfma): Likewise.
	(neon_vfms): Likewise.
	(neon_vmls): Likewise.
	(neon_vmlal): Remove "Magic Word" operand, iterate over
	signed/unsigned forms.
	(neon_vmlsl): Likewise.
	(neon_vqdmulh): Remove "Magic Word" operand, iterate over "rounding"
	forms.
	(neon_vqdmlal): Remove "Magic Word" operand, iterate over
	signed/unsigned forms.
	(neon_vqdmlsl): Likewise.
	(neon_vmull): Likewise.
	(neon_vqdmull): Remove "Magic Word" operand.
	(neon_vsub): Remove "Magic Word" operand.
	(neon_vsubl): Remove "Magic Word" operand, convert to use
	signed/unsigned iterator.
	(neon_vsubw): Likewise.
	(neon_vhsub): Likewise.
	(neon_vqsub): Likewise.
	(neon_v<r>subhn): Remove "Magic Word" operand, convert to iterate
	over "rounding" forms.
	(neon_vceq): Remove "Magic Word" operand.
	(neon_vcge): Likewise.
	(neon_vcgeu): Likewise.
	(neon_vcgt): Likewise.
	(neon_vcgtu): Likewise.
	(neon_vcle): Likewise.
	(neon_vclt): Likewise.
	(neon_vcage): Likewise.
	(neon_vcagt): Likewise.
	(neon_vabd): Remove "Magic Word" operand, iterate over
	signed/unsigned forms, and split out...
	(neon_vabdf): ...this as new.
	(neon_vabdl): Remove "Magic Word" operand, iterate over
	signed/unsigned forms.
	(neon_vaba): Likewise.
	(neon_vmax): Remove "Magic Word" operand, iterate over
	signed/unsigned and max/min forms, and split out...
	(neon_v<maxmin>f): ...this as new.
	(neon_vmin): Delete.
	(neon_vpadd): Remove "Magic Word" operand.
	(neon_vpaddl): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vpadal): Likewise.
	(neon_vpmax): Remove "Magic Word" operand, iterate over
	signed/unsigned and max/min forms, and split out...
	(neon_vp<maxmin>f): ...this as new.
	(neon_vpmin): Delete.
	(neon_vrecps): Remove "Magic Word" operand.
	(neon_vrsqrts): Likewise.
	(neon_vabs): Likewise.
	(neon_vqabs): Likewise.
	(neon_vneg): Likewise.
	(neon_vqneg): Likewise.
	(neon_vcls): Likewise.
	(neon_vcnt): Likewise.
	(neon_vrecpe): Likewise.
	(neon_vrsqrte): Likewise.
	(neon_vmvn): Likewise.
	(neon_vget_lane): Likewise.
	(neon_vget_laneu): New.
	(neon_vget_lanedi): Remove "Magic Word" operand.
	(neon_vget_lanev2di): Likewise.
	(neon_vcvt): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vcvt_n): Likewise.
	(neon_vmovn): Remove "Magic Word" operand.
	(neon_vqmovn): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vmovun): Remove "Magic Word" operand.
	(neon_vmovl): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vmul_lane): Remove "Magic Word" operand.
	(neon_vmull_lane): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vqdmull_lane): Remove "Magic Word" operand.
	(neon_vqdmulh_lane): Remove "Magic Word" operand, iterate over
	rounding variants.
	(neon_vmla_lane): Remove "Magic Word" operand.
	(neon_vmlal_lane): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vqdmlal_lane): Remove "Magic Word" operand.
	(neon_vmls_lane): Likewise.
	(neon_vmlsl_lane): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vqdmlsl_lane): Remove "Magic Word" operand.
	(neon_vmul_n): Remove "Magic Word" operand.
	(neon_vmull_n): Rename to...
	(neon_vmulls_n): ...this, remove "Magic Word" operand.
	(neon_vmullu_n): New.
	(neon_vqdmull_n): Remove "Magic Word" operand.
	(neon_vqdmulh_n): Likewise.
	(neon_vqrdmulh_n): New.
	(neon_vmla_n): Remove "Magic Word" operand.
	(neon_vmls_n): Likewise.
	(neon_vmlal_n): Rename to...
	(neon_vmlals_n): ...this, remove "Magic Word" operand.
	(neon_vmlalu_n): New.
	(neon_vqdmlal_n): Remove "Magic Word" operand.
	(neon_vmlsl_n): Rename to...
	(neon_vmlsls_n): ...this, remove "Magic Word" operand.
	(neon_vmlslu_n): New.
	(neon_vqdmlsl_n): Remove "Magic Word" operand.
	(neon_vrev64): Remove "Magic Word" operand.
	(neon_vrev32): Likewise.
	(neon_vrev16): Likewise.
	(neon_vshl): Remove "Magic Word" operand, iterate over
	signed/unsigned and "rounding" forms.
	(neon_vqshl): Likewise.
	(neon_vshr_n): Likewise.
	(neon_vshrn_n): Remove "Magic Word" operand, iterate over
	"rounding" forms.
	(neon_vqshrn_n): Remove "Magic Word" operand, iterate over
	signed/unsigned and "rounding" forms.
	(neon_vqshrun_n): Remove "Magic Word" operand, iterate over
	"rounding" forms.
	(neon_vshl_n): Remove "Magic Word" operand.
	(neon_vqshl_n): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vqshlu_n): Remove "Magic Word" operand.
	(neon_vshll_n): Remove "Magic Word" operand, iterate over
	signed/unsigned variants.
	(neon_vsra_n): Remove "Magic Word" operand, iterate over
	signed/unsigned and "rounding" forms.
	* config/arm/iterators.md (VPF): New.
	(VADDL): Likewise.
	(VADDW): Likewise.
	(VHADD): Likewise.
	(VQADD): Likewise.
	(VADDHN): Likewise.
	(VMLAL): Likewise.
	(VMLAL_LANE): Likewise.
	(VLMSL): Likewise.
	(VMLSL_LANE): Likewise.
	(VQDMULH): Likewise,
	(VQDMULH_LANE): Likewise.
	(VMULL): Likewise.
	(VMULL_LANE): Likewise.
	(VSUBL): Likewise.
	(VSUBW): Likewise.
	(VHSUB): Likewise.
	(VQSUB): Likewise.
	(VSUBHN): Likewise.
	(VABD): Likewise.
	(VABDL): Likewise.
	(VMAXMIN): Likewise.
	(VMAXMINF): Likewise.
	(VPADDL): Likewise.
	(VPADAL): Likewise.
	(VPMAXMIN): Likewise.
	(VPMAXMINF): Likewise.
	(VCVT_US): Likewise.
	(VCVT_US_N): Likewise.
	(VQMOVN): Likewise.
	(VMOVL): Likewise.
	(VSHL): Likewise.
	(VQSHL): Likewise.
	(VSHR_N): Likewise.
	(VSHRN_N): Likewise.
	(VQSHRN_N): Likewise.
	(VQSHRUN_N): Likewise.
	(VQSHL_N): Likewise.
	(VSHLL_N): Likewise.
	(VSRA_N): Likewise.
	(pf): Likewise.
	(sup): Likewise.
	(r): Liekwise.
	(maxmin): Likewise.
	(shift_op): Likewise.
	* config/arm/arm_neon_builtins.def (vaddl): Split to...
	(vaddls): ...this and...
	(vaddlu): ...this.
	(vaddw): Split to...
	(vaddws): ...this and...
	(vaddwu): ...this.
	(vhadd): Split to...
	(vhadds): ...this and...
	(vhaddu): ...this and...
	(vrhadds): ...this and...
	(vrhaddu): ...this.
	(vqadd): Split to...
	(vqadds): ...this and...
	(vqaddu): ...this.
	(vaddhn): Split to itself and...
	(vraddhn): ...this.
	(vmul): Split to...
	(vmulf): ...this and...
	(vmulp): ...this.
	(vmlal): Split to...
	(vmlals): ...this and...
	(vmlalu): ...this.
	(vmlsl): Split to...
	(vmlsls): ...this and...
	(vmlslu): ...this.
	(vqdmulh): Split to itself and...
	(vqrdmulh): ...this.
	(vmull): Split to...
	(vmullp): ...this and...
	(vmulls): ...this and...
	(vmullu): ...this.
	(vmull_n): Split to...
	(vmulls_n): ...this and...
	(vmullu_n): ...this.
	(vmull_lane): Split to...
	(vmulls_lane): ...this and...
	(vmullu_lane): ...this.
	(vqdmulh_n): Split to itself and...
	(vqrdmulh_n): ...this.
	(vqdmulh_lane): Split to itself and...
	(vqrdmulh_lane): ...this.
	(vshl): Split to...
	(vshls): ...this and...
	(vshlu): ...this and...
	(vrshls): ...this and...
	(vrshlu): ...this.
	(vqshl): Split to...
	(vqshls): ...this and...
	(vqrshlu): ...this and...
	(vqrshls): ...this and...
	(vqrshlu): ...this.
	(vshr_n): Split to...
	(vshrs_n): ...this and...
	(vshru_n): ...this and...
	(vrshrs_n): ...this and...
	(vrshru_n): ...this.
	(vshrn_n): Split to itself and...
	(vrshrn_n): ...this.
	(vqshrn_n): Split to...
	(vqshrns_n): ...this and...
	(vqshrnu_n): ...this and...
	(vqrshrns_n): ...this and...
	(vqrshrnu_n): ...this.
	(vqshrun_n): Split to itself and...
	(vqrshrun_n): ...this.
	(vqshl_n): Split to...
	(vqshl_s_n): ...this and...
	(vqshl_u_n): ...this.
	(vshll_n): Split to...
	(vshlls_n): ...this and...
	(vshllu_n): ...this.
	(vsra_n): Split to...
	(vsras_n): ...this and...
	(vsrau_n): ...this and.
	(vrsras_n): ...this and...
	(vrsrau_n): ...this and.
	(vsubl): Split to...
	(vsubls): ...this and...
	(vsublu): ...this.
	(vsubw): Split to...
	(vsubws): ...this and...
	(vsubwu): ...this.
	(vqsub): Split to...
	(vqsubs): ...this and...
	(vqsubu): ...this.
	(vhsub): Split to...
	(vhsubs): ...this and...
	(vhsubu): ...this.
	(vsubhn): Split to itself and...
	(vrsubhn): ...this.
	(vabd): Split to...
	(vabds): ...this and...
	(vabdu): ...this and...
	(vabdf): ...this.
	(vabdl): Split to...
	(vabdls): ...this and...
	(vabdlu): ...this.
	(vaba): Split to...
	(vabas): ...this and...
	(vabau): ...this and...
	(vabal): Split to...
	(vabals): ...this and...
	(vabalu): ...this.
	(vmax): Split to...
	(vmaxs): ...this and...
	(vmaxu): ...this and...
	(vmaxf): ...this.
	(vmin): Split to...
	(vmins): ...this and...
	(vminu): ...this and...
	(vminf): ...this.
	(vpmax): Split to...
	(vpmaxs): ...this and...
	(vpmaxu): ...this and...
	(vpmaxf): ...this.
	(vpmin): Split to...
	(vpmins): ...this and...
	(vpminu): ...this and...
	(vpminf): ...this.
	(vpaddl): Split to...
	(vpaddls): ...this and...
	(vpaddlu): ...this.
	(vpadal): Split to...
	(vpadals): ...this and...
	(vpadalu): ...this.
	(vget_laneu): New.
	(vqmovn): Split to...
	(vqmovns): ...this and...
	(vqmovnu): ...this.
	(vmovl): Split to...
	(vmovls): ...this and...
	(vmovlu): ...this.
	(vmlal_lane): Split to...
	(vmlals_lane): ...this and...
	(vmlalu_lane): ...this.
	(vmlsl_lane): Split to...
	(vmlsls_lane): ...this and...
	(vmlslu_lane): ...this.
	(vmlal_n): Split to...
	(vmlals_n): ...this and...
	(vmlalu_n): ...this.
	(vmlsl_n): Split to...
	(vmlsls_n): ...this and...
	(vmlslu_n): ...this.
	(vext): Make type "SHIFTINSERT".
	(vcvt): Split to...
	(vcvts): ...this and...
	(vcvtu): ...this.
	(vcvt_n): Split to...
	(vcvts_n): ...this and...
	(vcvtu_n): ...this.
	* config/arm/arm_neon.h (vaddl_s8): Remove "Magic Word".
	(vaddl_s16): Likewise.
	(vaddl_s32): Likewise.
	(vaddl_u8): Likewise.
	(vaddl_u16): Likewise.
	(vaddl_u32): Likewise.
	(vaddw_s8): Likewise.
	(vaddw_s16): Likewise.
	(vaddw_s32): Likewise.
	(vaddw_u8): Likewise.
	(vaddw_u16): Likewise.
	(vaddw_u32): Likewise.
	(vhadd_s8): Likewise.
	(vhadd_s16): Likewise.
	(vhadd_s32): Likewise.
	(vhadd_u8): Likewise.
	(vhadd_u16): Likewise.
	(vhadd_u32): Likewise.
	(vhaddq_s8): Likewise.
	(vhaddq_s16): Likewise.
	(vhaddq_s32): Likewise.
	(vhaddq_u8): Likewise.
	(vhaddq_u16): Likewise.
	(vrhadd_s8): Likewise.
	(vrhadd_s16): Likewise.
	(vrhadd_s32): Likewise.
	(vrhadd_u8): Likewise.
	(vrhadd_u16): Likewise.
	(vrhadd_u32): Likewise.
	(vrhaddq_s8): Likewise.
	(vrhaddq_s16): Likewise.
	(vrhaddq_s32): Likewise.
	(vrhaddq_u8): Likewise.
	(vrhaddq_u16): Likewise.
	(vrhaddq_u32): Likewise.
	(vqadd_s8): Likewise.
	(vqadd_s16): Likewise.
	(vqadd_s32): Likewise.
	(vqadd_s64): Likewise.
	(vqadd_u8): Likewise.
	(vqadd_u16): Likewise.
	(vqadd_u32): Likewise.
	(vqadd_u64): Likewise.
	(vqaddq_s8): Likewise.
	(vqaddq_s16): Likewise.
	(vqaddq_s32): Likewise.
	(vqaddq_s64): Likewise.
	(vqaddq_u8): Likewise.
	(vqaddq_u16): Likewise.
	(vqaddq_u32): Likewise.
	(vqaddq_u64): Likewise.
	(vaddhn_s16): Likewise.
	(vaddhn_s32): Likewise.
	(vaddhn_s64): Likewise.
	(vaddhn_u16): Likewise.
	(vaddhn_u32): Likewise.
	(vaddhn_u64): Likewise.
	(vraddhn_s16): Likewise.
	(vraddhn_s32): Likewise.
	(vraddhn_s64): Likewise.
	(vraddhn_u16): Likewise.
	(vraddhn_u32): Likewise.
	(vraddhn_u64): Likewise.
	(vmul_p8): Likewise.
	(vmulq_p8): Likewise.
	(vqdmulh_s16): Likewise.
	(vqdmulh_s32): Likewise.
	(vqdmulhq_s16): Likewise.
	(vqdmulhq_s32): Likewise.
	(vqrdmulh_s16): Likewise.
	(vqrdmulh_s32): Likewise.
	(vqrdmulhq_s16): Likewise.
	(vqrdmulhq_s32): Likewise.
	(vmull_s8): Likewise.
	(vmull_s16): Likewise.
	(vmull_s32): Likewise.
	(vmull_u8): Likewise.
	(vmull_u16): Likewise.
	(vmull_u32): Likewise.
	(vmull_p8): Likewise.
	(vqdmull_s16): Likewise.
	(vqdmull_s32): Likewise.
	(vmla_s8): Likewise.
	(vmla_s16): Likewise.
	(vmla_s32): Likewise.
	(vmla_f32): Likewise.
	(vmla_u8): Likewise.
	(vmla_u16): Likewise.
	(vmla_u32): Likewise.
	(vmlaq_s8): Likewise.
	(vmlaq_s16): Likewise.
	(vmlaq_s32): Likewise.
	(vmlaq_f32): Likewise.
	(vmlaq_u8): Likewise.
	(vmlaq_u16): Likewise.
	(vmlaq_u32): Likewise.
	(vmlal_s8): Likewise.
	(vmlal_s16): Likewise.
	(vmlal_s32): Likewise.
	(vmlal_u8): Likewise.
	(vmlal_u16): Likewise.
	(vmlal_u32): Likewise.
	(vqdmlal_s16): Likewise.
	(vqdmlal_s32): Likewise.
	(vmls_s8): Likewise.
	(vmls_s16): Likewise.
	(vmls_s32): Likewise.
	(vmls_f32): Likewise.
	(vmls_u8): Likewise.
	(vmls_u16): Likewise.
	(vmls_u32): Likewise.
	(vmlsq_s8): Likewise.
	(vmlsq_s16): Likewise.
	(vmlsq_s32): Likewise.
	(vmlsq_f32): Likewise.
	(vmlsq_u8): Likewise.
	(vmlsq_u16): Likewise.
	(vmlsq_u32): Likewise.
	(vmlsl_s8): Likewise.
	(vmlsl_s16): Likewise.
	(vmlsl_s32): Likewise.
	(vmlsl_u8): Likewise.
	(vmlsl_u16): Likewise.
	(vmlsl_u32): Likewise.
	(vqdmlsl_s16): Likewise.
	(vqdmlsl_s32): Likewise.
	(vfma_f32): Likewise.
	(vfmaq_f32): Likewise.
	(vfms_f32): Likewise.
	(vfmsq_f32): Likewise.
	(vsubl_s8): Likewise.
	(vsubl_s16): Likewise.
	(vsubl_s32): Likewise.
	(vsubl_u8): Likewise.
	(vsubl_u16): Likewise.
	(vsubl_u32): Likewise.
	(vsubw_s8): Likewise.
	(vsubw_s16): Likewise.
	(vsubw_s32): Likewise.
	(vsubw_u8): Likewise.
	(vsubw_u16): Likewise.
	(vsubw_u32): Likewise.
	(vhsub_s8): Likewise.
	(vhsub_s16): Likewise.
	(vhsub_s32): Likewise.
	(vhsub_u8): Likewise.
	(vhsub_u16): Likewise.
	(vhsub_u32): Likewise.
	(vhsubq_s8): Likewise.
	(vhsubq_s16): Likewise.
	(vhsubq_s32): Likewise.
	(vhsubq_u8): Likewise.
	(vhsubq_u16): Likewise.
	(vhsubq_u32): Likewise.
	(vqsub_s8): Likewise.
	(vqsub_s16): Likewise.
	(vqsub_s32): Likewise.
	(vqsub_s64): Likewise.
	(vqsub_u8): Likewise.
	(vqsub_u16): Likewise.
	(vqsub_u32): Likewise.
	(vqsub_u64): Likewise.
	(vqsubq_s8): Likewise.
	(vqsubq_s16): Likewise.
	(vqsubq_s32): Likewise.
	(vqsubq_s64): Likewise.
	(vqsubq_u8): Likewise.
	(vqsubq_u16): Likewise.
	(vqsubq_u32): Likewise.
	(vqsubq_u64): Likewise.
	(vsubhn_s16): Likewise.
	(vsubhn_s32): Likewise.
	(vsubhn_s64): Likewise.
	(vsubhn_u16): Likewise.
	(vsubhn_u32): Likewise.
	(vsubhn_u64): Likewise.
	(vrsubhn_s16): Likewise.
	(vrsubhn_s32): Likewise.
	(vrsubhn_s64): Likewise.
	(vrsubhn_u16): Likewise.
	(vrsubhn_u32): Likewise.
	(vrsubhn_u64): Likewise.
	(vceq_s8): Likewise.
	(vceq_s16): Likewise.
	(vceq_s32): Likewise.
	(vceq_f32): Likewise.
	(vceq_u8): Likewise.
	(vceq_u16): Likewise.
	(vceq_u32): Likewise.
	(vceq_p8): Likewise.
	(vceqq_s8): Likewise.
	(vceqq_s16): Likewise.
	(vceqq_s32): Likewise.
	(vceqq_f32): Likewise.
	(vceqq_u8): Likewise.
	(vceqq_u16): Likewise.
	(vceqq_u32): Likewise.
	(vceqq_p8): Likewise.
	(vcge_s8): Likewise.
	(vcge_s16): Likewise.
	(vcge_s32): Likewise.
	(vcge_f32): Likewise.
	(vcge_u8): Likewise.
	(vcge_u16): Likewise.
	(vcge_u32): Likewise.
	(vcgeq_s8): Likewise.
	(vcgeq_s16): Likewise.
	(vcgeq_s32): Likewise.
	(vcgeq_f32): Likewise.
	(vcgeq_u8): Likewise.
	(vcgeq_u16): Likewise.
	(vcgeq_u32): Likewise.
	(vcle_s8): Likewise.
	(vcle_s16): Likewise.
	(vcle_s32): Likewise.
	(vcle_f32): Likewise.
	(vcle_u8): Likewise.
	(vcle_u16): Likewise.
	(vcle_u32): Likewise.
	(vcleq_s8): Likewise.
	(vcleq_s16): Likewise.
	(vcleq_s32): Likewise.
	(vcleq_f32): Likewise.
	(vcleq_u8): Likewise.
	(vcleq_u16): Likewise.
	(vcleq_u32): Likewise.
	(vcgt_s8): Likewise.
	(vcgt_s16): Likewise.
	(vcgt_s32): Likewise.
	(vcgt_f32): Likewise.
	(vcgt_u8): Likewise.
	(vcgt_u16): Likewise.
	(vcgt_u32): Likewise.
	(vcgtq_s8): Likewise.
	(vcgtq_s16): Likewise.
	(vcgtq_s32): Likewise.
	(vcgtq_f32): Likewise.
	(vcgtq_u8): Likewise.
	(vcgtq_u16): Likewise.
	(vcgtq_u32): Likewise.
	(vclt_s8): Likewise.
	(vclt_s16): Likewise.
	(vclt_s32): Likewise.
	(vclt_f32): Likewise.
	(vclt_u8): Likewise.
	(vclt_u16): Likewise.
	(vclt_u32): Likewise.
	(vcltq_s8): Likewise.
	(vcltq_s16): Likewise.
	(vcltq_s32): Likewise.
	(vcltq_f32): Likewise.
	(vcltq_u8): Likewise.
	(vcltq_u16): Likewise.
	(vcltq_u32): Likewise.
	(vcage_f32): Likewise.
	(vcageq_f32): Likewise.
	(vcale_f32): Likewise.
	(vcaleq_f32): Likewise.
	(vcagt_f32): Likewise.
	(vcagtq_f32): Likewise.
	(vcalt_f32): Likewise.
	(vcaltq_f32): Likewise.
	(vtst_s8): Likewise.
	(vtst_s16): Likewise.
	(vtst_s32): Likewise.
	(vtst_u8): Likewise.
	(vtst_u16): Likewise.
	(vtst_u32): Likewise.
	(vtst_p8): Likewise.
	(vtstq_s8): Likewise.
	(vtstq_s16): Likewise.
	(vtstq_s32): Likewise.
	(vtstq_u8): Likewise.
	(vtstq_u16): Likewise.
	(vtstq_u32): Likewise.
	(vtstq_p8): Likewise.
	(vabd_s8): Likewise.
	(vabd_s16): Likewise.
	(vabd_s32): Likewise.
	(vabd_f32): Likewise.
	(vabd_u8): Likewise.
	(vabd_u16): Likewise.
	(vabd_u32): Likewise.
	(vabdq_s8): Likewise.
	(vabdq_s16): Likewise.
	(vabdq_s32): Likewise.
	(vabdq_f32): Likewise.
	(vabdq_u8): Likewise.
	(vabdq_u16): Likewise.
	(vabdq_u32): Likewise.
	(vabdl_s8): Likewise.
	(vabdl_s16): Likewise.
	(vabdl_s32): Likewise.
	(vabdl_u8): Likewise.
	(vabdl_u16): Likewise.
	(vabdl_u32): Likewise.
	(vaba_s8): Likewise.
	(vaba_s16): Likewise.
	(vaba_s32): Likewise.
	(vaba_u8): Likewise.
	(vaba_u16): Likewise.
	(vaba_u32): Likewise.
	(vabaq_s8): Likewise.
	(vabaq_s16): Likewise.
	(vabaq_s32): Likewise.
	(vabaq_u8): Likewise.
	(vabaq_u16): Likewise.
	(vabaq_u32): Likewise.
	(vabal_s8): Likewise.
	(vabal_s16): Likewise.
	(vabal_s32): Likewise.
	(vabal_u8): Likewise.
	(vabal_u16): Likewise.
	(vabal_u32): Likewise.
	(vmax_s8): Likewise.
	(vmax_s16): Likewise.
	(vmax_s32): Likewise.
	(vmax_f32): Likewise.
	(vmax_u8): Likewise.
	(vmax_u16): Likewise.
	(vmax_u32): Likewise.
	(vmaxq_s8): Likewise.
	(vmaxq_s16): Likewise.
	(vmaxq_s32): Likewise.
	(vmaxq_f32): Likewise.
	(vmaxq_u8): Likewise.
	(vmaxq_u16): Likewise.
	(vmaxq_u32): Likewise.
	(vmin_s8): Likewise.
	(vmin_s16): Likewise.
	(vmin_s32): Likewise.
	(vmin_f32): Likewise.
	(vmin_u8): Likewise.
	(vmin_u16): Likewise.
	(vmin_u32): Likewise.
	(vminq_s8): Likewise.
	(vminq_s16): Likewise.
	(vminq_s32): Likewise.
	(vminq_f32): Likewise.
	(vminq_u8): Likewise.
	(vminq_u16): Likewise.
	(vminq_u32): Likewise.
	(vpadd_s8): Likewise.
	(vpadd_s16): Likewise.
	(vpadd_s32): Likewise.
	(vpadd_f32): Likewise.
	(vpadd_u8): Likewise.
	(vpadd_u16): Likewise.
	(vpadd_u32): Likewise.
	(vpaddl_s8): Likewise.
	(vpaddl_s16): Likewise.
	(vpaddl_s32): Likewise.
	(vpaddl_u8): Likewise.
	(vpaddl_u16): Likewise.
	(vpaddl_u32): Likewise.
	(vpaddlq_s8): Likewise.
	(vpaddlq_s16): Likewise.
	(vpaddlq_s32): Likewise.
	(vpaddlq_u8): Likewise.
	(vpaddlq_u16): Likewise.
	(vpaddlq_u32): Likewise.
	(vpadal_s8): Likewise.
	(vpadal_s16): Likewise.
	(vpadal_s32): Likewise.
	(vpadal_u8): Likewise.
	(vpadal_u16): Likewise.
	(vpadal_u32): Likewise.
	(vpadalq_s8): Likewise.
	(vpadalq_s16): Likewise.
	(vpadalq_s32): Likewise.
	(vpadalq_u8): Likewise.
	(vpadalq_u16): Likewise.
	(vpadalq_u32): Likewise.
	(vpmax_s8): Likewise.
	(vpmax_s16): Likewise.
	(vpmax_s32): Likewise.
	(vpmax_f32): Likewise.
	(vpmax_u8): Likewise.
	(vpmax_u16): Likewise.
	(vpmax_u32): Likewise.
	(vpmin_s8): Likewise.
	(vpmin_s16): Likewise.
	(vpmin_s32): Likewise.
	(vpmin_f32): Likewise.
	(vpmin_u8): Likewise.
	(vpmin_u16): Likewise.
	(vpmin_u32): Likewise.
	(vrecps_f32): Likewise.
	(vrecpsq_f32): Likewise.
	(vrsqrts_f32): Likewise.
	(vrsqrtsq_f32): Likewise.
	(vshl_s8): Likewise.
	(vshl_s16): Likewise.
	(vshl_s32): Likewise.
	(vshl_s64): Likewise.
	(vshl_u8): Likewise.
	(vshl_u16): Likewise.
	(vshl_u32): Likewise.
	(vshl_u64): Likewise.
	(vshlq_s8): Likewise.
	(vshlq_s16): Likewise.
	(vshlq_s32): Likewise.
	(vshlq_s64): Likewise.
	(vshlq_u8): Likewise.
	(vshlq_u16): Likewise.
	(vshlq_u32): Likewise.
	(vshlq_u64): Likewise.
	(vrshl_s8): Likewise.
	(vrshl_s16): Likewise.
	(vrshl_s32): Likewise.
	(vrshl_s64): Likewise.
	(vrshl_u8): Likewise.
	(vrshl_u16): Likewise.
	(vrshl_u32): Likewise.
	(vrshl_u64): Likewise.
	(vrshlq_s8): Likewise.
	(vrshlq_s16): Likewise.
	(vrshlq_s32): Likewise.
	(vrshlq_s64): Likewise.
	(vrshlq_u8): Likewise.
	(vrshlq_u16): Likewise.
	(vrshlq_u32): Likewise.
	(vrshlq_u64): Likewise.
	(vqshl_s8): Likewise.
	(vqshl_s16): Likewise.
	(vqshl_s32): Likewise.
	(vqshl_s64): Likewise.
	(vqshl_u8): Likewise.
	(vqshl_u16): Likewise.
	(vqshl_u32): Likewise.
	(vqshl_u64): Likewise.
	(vqshlq_s8): Likewise.
	(vqshlq_s16): Likewise.
	(vqshlq_s32): Likewise.
	(vqshlq_s64): Likewise.
	(vqshlq_u8): Likewise.
	(vqshlq_u16): Likewise.
	(vqshlq_u32): Likewise.
	(vqshlq_u64): Likewise.
	(vqrshl_s8): Likewise.
	(vqrshl_s16): Likewise.
	(vqrshl_s32): Likewise.
	(vqrshl_s64): Likewise.
	(vqrshl_u8): Likewise.
	(vqrshl_u16): Likewise.
	(vqrshl_u32): Likewise.
	(vqrshl_u64): Likewise.
	(vqrshlq_s8): Likewise.
	(vqrshlq_s16): Likewise.
	(vqrshlq_s32): Likewise.
	(vqrshlq_s64): Likewise.
	(vqrshlq_u8): Likewise.
	(vqrshlq_u16): Likewise.
	(vqrshlq_u32): Likewise.
	(vqrshlq_u64): Likewise.
	(vshr_n_s8): Likewise.
	(vshr_n_s16): Likewise.
	(vshr_n_s32): Likewise.
	(vshr_n_s64): Likewise.
	(vshr_n_u8): Likewise.
	(vshr_n_u16): Likewise.
	(vshr_n_u32): Likewise.
	(vshr_n_u64): Likewise.
	(vshrq_n_s8): Likewise.
	(vshrq_n_s16): Likewise.
	(vshrq_n_s32): Likewise.
	(vshrq_n_s64): Likewise.
	(vshrq_n_u8): Likewise.
	(vshrq_n_u16): Likewise.
	(vshrq_n_u32): Likewise.
	(vshrq_n_u64): Likewise.
	(vrshr_n_s8): Likewise.
	(vrshr_n_s16): Likewise.
	(vrshr_n_s32): Likewise.
	(vrshr_n_s64): Likewise.
	(vrshr_n_u8): Likewise.
	(vrshr_n_u16): Likewise.
	(vrshr_n_u32): Likewise.
	(vrshr_n_u64): Likewise.
	(vrshrq_n_s8): Likewise.
	(vrshrq_n_s16): Likewise.
	(vrshrq_n_s32): Likewise.
	(vrshrq_n_s64): Likewise.
	(vrshrq_n_u8): Likewise.
	(vrshrq_n_u16): Likewise.
	(vrshrq_n_u32): Likewise.
	(vrshrq_n_u64): Likewise.
	(vshrn_n_s16): Likewise.
	(vshrn_n_s32): Likewise.
	(vshrn_n_s64): Likewise.
	(vshrn_n_u16): Likewise.
	(vshrn_n_u32): Likewise.
	(vshrn_n_u64): Likewise.
	(vrshrn_n_s16): Likewise.
	(vrshrn_n_s32): Likewise.
	(vrshrn_n_s64): Likewise.
	(vrshrn_n_u16): Likewise.
	(vrshrn_n_u32): Likewise.
	(vrshrn_n_u64): Likewise.
	(vqshrn_n_s16): Likewise.
	(vqshrn_n_s32): Likewise.
	(vqshrn_n_s64): Likewise.
	(vqshrn_n_u16): Likewise.
	(vqshrn_n_u32): Likewise.
	(vqshrn_n_u64): Likewise.
	(vqrshrn_n_s16): Likewise.
	(vqrshrn_n_s32): Likewise.
	(vqrshrn_n_s64): Likewise.
	(vqrshrn_n_u16): Likewise.
	(vqrshrn_n_u32): Likewise.
	(vqrshrn_n_u64): Likewise.
	(vqshrun_n_s16): Likewise.
	(vqshrun_n_s32): Likewise.
	(vqshrun_n_s64): Likewise.
	(vqrshrun_n_s16): Likewise.
	(vqrshrun_n_s32): Likewise.
	(vqrshrun_n_s64): Likewise.
	(vshl_n_s8): Likewise.
	(vshl_n_s16): Likewise.
	(vshl_n_s32): Likewise.
	(vshl_n_s64): Likewise.
	(vshl_n_u8): Likewise.
	(vshl_n_u16): Likewise.
	(vshl_n_u32): Likewise.
	(vshl_n_u64): Likewise.
	(vshlq_n_s8): Likewise.
	(vshlq_n_s16): Likewise.
	(vshlq_n_s32): Likewise.
	(vshlq_n_s64): Likewise.
	(vshlq_n_u8): Likewise.
	(vshlq_n_u16): Likewise.
	(vshlq_n_u32): Likewise.
	(vshlq_n_u64): Likewise.
	(vqshl_n_s8): Likewise.
	(vqshl_n_s16): Likewise.
	(vqshl_n_s32): Likewise.
	(vqshl_n_s64): Likewise.
	(vqshl_n_u8): Likewise.
	(vqshl_n_u16): Likewise.
	(vqshl_n_u32): Likewise.
	(vqshl_n_u64): Likewise.
	(vqshlq_n_s8): Likewise.
	(vqshlq_n_s16): Likewise.
	(vqshlq_n_s32): Likewise.
	(vqshlq_n_s64): Likewise.
	(vqshlq_n_u8): Likewise.
	(vqshlq_n_u16): Likewise.
	(vqshlq_n_u32): Likewise.
	(vqshlq_n_u64): Likewise.
	(vqshlu_n_s8): Likewise.
	(vqshlu_n_s16): Likewise.
	(vqshlu_n_s32): Likewise.
	(vqshlu_n_s64): Likewise.
	(vqshluq_n_s8): Likewise.
	(vqshluq_n_s16): Likewise.
	(vqshluq_n_s32): Likewise.
	(vqshluq_n_s64): Likewise.
	(vshll_n_s8): Likewise.
	(vshll_n_s16): Likewise.
	(vshll_n_s32): Likewise.
	(vshll_n_u8): Likewise.
	(vshll_n_u16): Likewise.
	(vshll_n_u32): Likewise.
	(vsra_n_s8): Likewise.
	(vsra_n_s16): Likewise.
	(vsra_n_s32): Likewise.
	(vsra_n_s64): Likewise.
	(vsra_n_u8): Likewise.
	(vsra_n_u16): Likewise.
	(vsra_n_u32): Likewise.
	(vsra_n_u64): Likewise.
	(vsraq_n_s8): Likewise.
	(vsraq_n_s16): Likewise.
	(vsraq_n_s32): Likewise.
	(vsraq_n_s64): Likewise.
	(vsraq_n_u8): Likewise.
	(vsraq_n_u16): Likewise.
	(vsraq_n_u32): Likewise.
	(vsraq_n_u64): Likewise.
	(vrsra_n_s8): Likewise.
	(vrsra_n_s16): Likewise.
	(vrsra_n_s32): Likewise.
	(vrsra_n_s64): Likewise.
	(vrsra_n_u8): Likewise.
	(vrsra_n_u16): Likewise.
	(vrsra_n_u32): Likewise.
	(vrsra_n_u64): Likewise.
	(vrsraq_n_s8): Likewise.
	(vrsraq_n_s16): Likewise.
	(vrsraq_n_s32): Likewise.
	(vrsraq_n_s64): Likewise.
	(vrsraq_n_u8): Likewise.
	(vrsraq_n_u16): Likewise.
	(vrsraq_n_u32): Likewise.
	(vrsraq_n_u64): Likewise.
	(vabs_s8): Likewise.
	(vabs_s16): Likewise.
	(vabs_s32): Likewise.
	(vabs_f32): Likewise.
	(vabsq_s8): Likewise.
	(vabsq_s16): Likewise.
	(vabsq_s32): Likewise.
	(vabsq_f32): Likewise.
	(vqabs_s8): Likewise.
	(vqabs_s16): Likewise.
	(vqabs_s32): Likewise.
	(vqabsq_s8): Likewise.
	(vqabsq_s16): Likewise.
	(vqabsq_s32): Likewise.
	(vneg_s8): Likewise.
	(vneg_s16): Likewise.
	(vneg_s32): Likewise.
	(vneg_f32): Likewise.
	(vnegq_s8): Likewise.
	(vnegq_s16): Likewise.
	(vnegq_s32): Likewise.
	(vnegq_f32): Likewise.
	(vqneg_s8): Likewise.
	(vqneg_s16): Likewise.
	(vqneg_s32): Likewise.
	(vqnegq_s8): Likewise.
	(vqnegq_s16): Likewise.
	(vqnegq_s32): Likewise.
	(vmvn_s8): Likewise.
	(vmvn_s16): Likewise.
	(vmvn_s32): Likewise.
	(vmvn_u8): Likewise.
	(vmvn_u16): Likewise.
	(vmvn_u32): Likewise.
	(vmvn_p8): Likewise.
	(vmvnq_s8): Likewise.
	(vmvnq_s16): Likewise.
	(vmvnq_s32): Likewise.
	(vmvnq_u8): Likewise.
	(vmvnq_u16): Likewise.
	(vmvnq_u32): Likewise.
	(vmvnq_p8): Likewise.
	(vcls_s8): Likewise.
	(vcls_s16): Likewise.
	(vcls_s32): Likewise.
	(vclsq_s8): Likewise.
	(vclsq_s16): Likewise.
	(vclsq_s32): Likewise.
	(vclz_s8): Likewise.
	(vclz_s16): Likewise.
	(vclz_s32): Likewise.
	(vclz_u8): Likewise.
	(vclz_u16): Likewise.
	(vclz_u32): Likewise.
	(vclzq_s8): Likewise.
	(vclzq_s16): Likewise.
	(vclzq_s32): Likewise.
	(vclzq_u8): Likewise.
	(vclzq_u16): Likewise.
	(vclzq_u32): Likewise.
	(vcnt_s8): Likewise.
	(vcnt_u8): Likewise.
	(vcnt_p8): Likewise.
	(vcntq_s8): Likewise.
	(vcntq_u8): Likewise.
	(vcntq_p8): Likewise.
	(vrecpe_f32): Likewise.
	(vrecpe_u32): Likewise.
	(vrecpeq_f32): Likewise.
	(vrecpeq_u32): Likewise.
	(vrsqrte_f32): Likewise.
	(vrsqrte_u32): Likewise.
	(vrsqrteq_f32): Likewise.
	(vrsqrteq_u32): Likewise.
	(vget_lane_s8): Likewise.
	(vget_lane_s16): Likewise.
	(vget_lane_s32): Likewise.
	(vget_lane_f32): Likewise.
	(vget_lane_u8): Likewise.
	(vget_lane_u16): Likewise.
	(vget_lane_u32): Likewise.
	(vget_lane_p8): Likewise.
	(vget_lane_p16): Likewise.
	(vget_lane_s64): Likewise.
	(vget_lane_u64): Likewise.
	(vgetq_lane_s8): Likewise.
	(vgetq_lane_s16): Likewise.
	(vgetq_lane_s32): Likewise.
	(vgetq_lane_f32): Likewise.
	(vgetq_lane_u8): Likewise.
	(vgetq_lane_u16): Likewise.
	(vgetq_lane_u32): Likewise.
	(vgetq_lane_p8): Likewise.
	(vgetq_lane_p16): Likewise.
	(vgetq_lane_s64): Likewise.
	(vgetq_lane_u64): Likewise.
	(vcvt_s32_f32): Likewise.
	(vcvt_f32_s32): Likewise.
	(vcvt_f32_u32): Likewise.
	(vcvt_u32_f32): Likewise.
	(vcvtq_s32_f32): Likewise.
	(vcvtq_f32_s32): Likewise.
	(vcvtq_f32_u32): Likewise.
	(vcvtq_u32_f32): Likewise.
	(vcvt_n_s32_f32): Likewise.
	(vcvt_n_f32_s32): Likewise.
	(vcvt_n_f32_u32): Likewise.
	(vcvt_n_u32_f32): Likewise.
	(vcvtq_n_s32_f32): Likewise.
	(vcvtq_n_f32_s32): Likewise.
	(vcvtq_n_f32_u32): Likewise.
	(vcvtq_n_u32_f32): Likewise.
	(vmovn_s16): Likewise.
	(vmovn_s32): Likewise.
	(vmovn_s64): Likewise.
	(vmovn_u16): Likewise.
	(vmovn_u32): Likewise.
	(vmovn_u64): Likewise.
	(vqmovn_s16): Likewise.
	(vqmovn_s32): Likewise.
	(vqmovn_s64): Likewise.
	(vqmovn_u16): Likewise.
	(vqmovn_u32): Likewise.
	(vqmovn_u64): Likewise.
	(vqmovun_s16): Likewise.
	(vqmovun_s32): Likewise.
	(vqmovun_s64): Likewise.
	(vmovl_s8): Likewise.
	(vmovl_s16): Likewise.
	(vmovl_s32): Likewise.
	(vmovl_u8): Likewise.
	(vmovl_u16): Likewise.
	(vmovl_u32): Likewise.
	(vmul_lane_s16): Likewise.
	(vmul_lane_s32): Likewise.
	(vmul_lane_f32): Likewise.
	(vmul_lane_u16): Likewise.
	(vmul_lane_u32): Likewise.
	(vmulq_lane_s16): Likewise.
	(vmulq_lane_s32): Likewise.
	(vmulq_lane_f32): Likewise.
	(vmulq_lane_u16): Likewise.
	(vmulq_lane_u32): Likewise.
	(vmla_lane_s16): Likewise.
	(vmla_lane_s32): Likewise.
	(vmla_lane_f32): Likewise.
	(vmla_lane_u16): Likewise.
	(vmla_lane_u32): Likewise.
	(vmlaq_lane_s16): Likewise.
	(vmlaq_lane_s32): Likewise.
	(vmlaq_lane_f32): Likewise.
	(vmlaq_lane_u16): Likewise.
	(vmlaq_lane_u32): Likewise.
	(vmlal_lane_s16): Likewise.
	(vmlal_lane_s32): Likewise.
	(vmlal_lane_u16): Likewise.
	(vmlal_lane_u32): Likewise.
	(vqdmlal_lane_s16): Likewise.
	(vqdmlal_lane_s32): Likewise.
	(vmls_lane_s16): Likewise.
	(vmls_lane_s32): Likewise.
	(vmls_lane_f32): Likewise.
	(vmls_lane_u16): Likewise.
	(vmls_lane_u32): Likewise.
	(vmlsq_lane_s16): Likewise.
	(vmlsq_lane_s32): Likewise.
	(vmlsq_lane_f32): Likewise.
	(vmlsq_lane_u16): Likewise.
	(vmlsq_lane_u32): Likewise.
	(vmlsl_lane_s16): Likewise.
	(vmlsl_lane_s32): Likewise.
	(vmlsl_lane_u16): Likewise.
	(vmlsl_lane_u32): Likewise.
	(vqdmlsl_lane_s16): Likewise.
	(vqdmlsl_lane_s32): Likewise.
	(vmull_lane_s16): Likewise.
	(vmull_lane_s32): Likewise.
	(vmull_lane_u16): Likewise.
	(vmull_lane_u32): Likewise.
	(vqdmull_lane_s16): Likewise.
	(vqdmull_lane_s32): Likewise.
	(vqdmulhq_lane_s16): Likewise.
	(vqdmulhq_lane_s32): Likewise.
	(vqdmulh_lane_s16): Likewise.
	(vqdmulh_lane_s32): Likewise.
	(vqrdmulhq_lane_s16): Likewise.
	(vqrdmulhq_lane_s32): Likewise.
	(vqrdmulh_lane_s16): Likewise.
	(vqrdmulh_lane_s32): Likewise.
	(vmul_n_s16): Likewise.
	(vmul_n_s32): Likewise.
	(vmul_n_f32): Likewise.
	(vmul_n_u16): Likewise.
	(vmul_n_u32): Likewise.
	(vmulq_n_s16): Likewise.
	(vmulq_n_s32): Likewise.
	(vmulq_n_f32): Likewise.
	(vmulq_n_u16): Likewise.
	(vmulq_n_u32): Likewise.
	(vmull_n_s16): Likewise.
	(vmull_n_s32): Likewise.
	(vmull_n_u16): Likewise.
	(vmull_n_u32): Likewise.
	(vqdmull_n_s16): Likewise.
	(vqdmull_n_s32): Likewise.
	(vqdmulhq_n_s16): Likewise.
	(vqdmulhq_n_s32): Likewise.
	(vqdmulh_n_s16): Likewise.
	(vqdmulh_n_s32): Likewise.
	(vqrdmulhq_n_s16): Likewise.
	(vqrdmulhq_n_s32): Likewise.
	(vqrdmulh_n_s16): Likewise.
	(vqrdmulh_n_s32): Likewise.
	(vmla_n_s16): Likewise.
	(vmla_n_s32): Likewise.
	(vmla_n_f32): Likewise.
	(vmla_n_u16): Likewise.
	(vmla_n_u32): Likewise.
	(vmlaq_n_s16): Likewise.
	(vmlaq_n_s32): Likewise.
	(vmlaq_n_f32): Likewise.
	(vmlaq_n_u16): Likewise.
	(vmlaq_n_u32): Likewise.
	(vmlal_n_s16): Likewise.
	(vmlal_n_s32): Likewise.
	(vmlal_n_u16): Likewise.
	(vmlal_n_u32): Likewise.
	(vqdmlal_n_s16): Likewise.
	(vqdmlal_n_s32): Likewise.
	(vmls_n_s16): Likewise.
	(vmls_n_s32): Likewise.
	(vmls_n_f32): Likewise.
	(vmls_n_u16): Likewise.
	(vmls_n_u32): Likewise.
	(vmlsq_n_s16): Likewise.
	(vmlsq_n_s32): Likewise.
	(vmlsq_n_f32): Likewise.
	(vmlsq_n_u16): Likewise.
	(vmlsq_n_u32): Likewise.
	(vmlsl_n_s16): Likewise.
	(vmlsl_n_s32): Likewise.
	(vmlsl_n_u16): Likewise.
	(vmlsl_n_u32): Likewise.
	(vqdmlsl_n_s16): Likewise.
	(vqdmlsl_n_s32): Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Refactor-Builtins-1-8-Remove-arm_neon.h-s-Magic-Word.patch --]
[-- Type: text/x-patch;  name=0001-Refactor-Builtins-1-8-Remove-arm_neon.h-s-Magic-Word.patch, Size: 346342 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3f2ddd4..35a3932 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25519,29 +25519,26 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
     case NEON_CONVERT:
     case NEON_DUPLANE:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP);
+        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
 
     case NEON_BINOP:
-    case NEON_SETLANE:
+    case NEON_LOGICBINOP:
     case NEON_SCALARMUL:
     case NEON_SCALARMULL:
     case NEON_SCALARMULH:
-    case NEON_SHIFTINSERT:
-    case NEON_LOGICBINOP:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_STOP);
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
 
     case NEON_TERNOP:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
         NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+        NEON_ARG_STOP);
 
     case NEON_GETLANE:
     case NEON_FIXCONV:
     case NEON_SHIFTIMM:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_CONSTANT,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
         NEON_ARG_STOP);
 
     case NEON_CREATE:
@@ -25567,24 +25564,26 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
     case NEON_LANEMUL:
     case NEON_LANEMULL:
     case NEON_LANEMULH:
+    case NEON_SETLANE:
+    case NEON_SHIFTINSERT:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
         NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+        NEON_ARG_STOP);
 
     case NEON_LANEMAC:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
         NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_CONSTANT, NEON_ARG_CONSTANT, NEON_ARG_STOP);
+        NEON_ARG_CONSTANT, NEON_ARG_STOP);
 
     case NEON_SHIFTACC:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
         NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+        NEON_ARG_STOP);
 
     case NEON_SCALARMAC:
       return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
 	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+        NEON_ARG_STOP);
 
     case NEON_SELECT:
     case NEON_VTBX:
@@ -30986,7 +30985,7 @@ static bool
 arm_evpc_neon_vrev (struct expand_vec_perm_d *d)
 {
   unsigned int i, j, diff, nelt = d->nelt;
-  rtx (*gen)(rtx, rtx, rtx);
+  rtx (*gen)(rtx, rtx);
 
   if (!d->one_vector_p)
     return false;
@@ -31050,9 +31049,7 @@ arm_evpc_neon_vrev (struct expand_vec_perm_d *d)
   if (d->testing_p)
     return true;
 
-  /* ??? The third operand is an artifact of the builtin infrastructure
-     and is ignored by the actual instruction.  */
-  emit_insn (gen (d->target, d->op0, const0_rtx));
+  emit_insn (gen (d->target, d->op0));
   return true;
 }
 
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 47f6c5e..d27d970 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -477,7 +477,7 @@ vadd_f32 (float32x2_t __a, float32x2_t __b)
 #ifdef __FAST_MATH
   return __a + __b;
 #else
-  return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b, 3);
+  return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b);
 #endif
 }
 
@@ -541,7 +541,7 @@ vaddq_f32 (float32x4_t __a, float32x4_t __b)
 #ifdef __FAST_MATH
   return __a + __b;
 #else
-  return (float32x4_t) __builtin_neon_vaddv4sf (__a, __b, 3);
+  return (float32x4_t) __builtin_neon_vaddv4sf (__a, __b);
 #endif
 }
 
@@ -572,385 +572,385 @@ vaddq_u64 (uint64x2_t __a, uint64x2_t __b)
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vaddl_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vaddlv8qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vaddlsv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vaddl_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vaddlv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vaddlsv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vaddl_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vaddlv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vaddlsv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vaddl_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vaddlv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vaddluv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vaddl_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vaddlv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vaddluv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vaddl_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vaddlv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vaddluv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vaddw_s8 (int16x8_t __a, int8x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vaddwv8qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vaddwsv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vaddw_s16 (int32x4_t __a, int16x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vaddwv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vaddwsv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vaddw_s32 (int64x2_t __a, int32x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vaddwv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vaddwsv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vaddw_u8 (uint16x8_t __a, uint8x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vaddwv8qi ((int16x8_t) __a, (int8x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vaddwuv8qi ((int16x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vaddw_u16 (uint32x4_t __a, uint16x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vaddwv4hi ((int32x4_t) __a, (int16x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vaddwuv4hi ((int32x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vaddw_u32 (uint64x2_t __a, uint32x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vaddwv2si ((int64x2_t) __a, (int32x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vaddwuv2si ((int64x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vhadd_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vhaddv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vhaddsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vhadd_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vhaddv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vhaddsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vhadd_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vhaddv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vhaddsv2si (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vhadd_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vhaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vhadduv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vhadd_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vhaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vhadduv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vhadd_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vhaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vhadduv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vhaddq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vhaddv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vhaddsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vhaddq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vhaddv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vhaddsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vhaddq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vhaddv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vhaddsv4si (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vhaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vhadduv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vhaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vhadduv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vhaddv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vhadduv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vrhadd_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vhaddv8qi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vrhaddsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vrhadd_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vhaddv4hi (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vrhaddsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vrhadd_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vhaddv2si (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vrhaddsv2si (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vrhadd_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vhaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 4);
+  return (uint8x8_t)__builtin_neon_vrhadduv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vrhadd_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vhaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 4);
+  return (uint16x4_t)__builtin_neon_vrhadduv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrhadd_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vhaddv2si ((int32x2_t) __a, (int32x2_t) __b, 4);
+  return (uint32x2_t)__builtin_neon_vrhadduv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vrhaddq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vhaddv16qi (__a, __b, 5);
+  return (int8x16_t)__builtin_neon_vrhaddsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vrhaddq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vhaddv8hi (__a, __b, 5);
+  return (int16x8_t)__builtin_neon_vrhaddsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vrhaddq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vhaddv4si (__a, __b, 5);
+  return (int32x4_t)__builtin_neon_vrhaddsv4si (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vrhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vhaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 4);
+  return (uint8x16_t)__builtin_neon_vrhadduv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vrhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vhaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 4);
+  return (uint16x8_t)__builtin_neon_vrhadduv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vhaddv4si ((int32x4_t) __a, (int32x4_t) __b, 4);
+  return (uint32x4_t)__builtin_neon_vrhadduv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqadd_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vqaddv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vqaddsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqadd_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqaddv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vqaddsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqadd_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqaddv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vqaddsv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vqadd_s64 (int64x1_t __a, int64x1_t __b)
 {
-  return (int64x1_t)__builtin_neon_vqadddi (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vqaddsdi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqadd_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vqaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vqadduv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqadd_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vqaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vqadduv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqadd_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vqaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vqadduv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vqadd_u64 (uint64x1_t __a, uint64x1_t __b)
 {
-  return (uint64x1_t)__builtin_neon_vqadddi ((int64x1_t) __a, (int64x1_t) __b, 0);
+  return (uint64x1_t)__builtin_neon_vqaddudi ((int64x1_t) __a, (int64x1_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vqaddq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vqaddv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vqaddsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqaddq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqaddv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vqaddsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqaddq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqaddv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vqaddsv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqaddq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vqaddv2di (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vqaddsv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vqaddq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vqaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vqadduv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vqaddq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vqaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vqadduv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vqaddq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vqaddv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vqadduv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vqaddq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vqaddv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vqadduv2di ((int64x2_t) __a, (int64x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vaddhn_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vaddhnv8hi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vaddhnv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vaddhn_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vaddhnv4si (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vaddhnv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vaddhn_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vaddhnv2di (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vaddhnv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vaddhn_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vaddhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vaddhnv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vaddhn_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vaddhnv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vaddhnv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vaddhn_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vaddhnv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vaddhnv2di ((int64x2_t) __a, (int64x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vraddhn_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vaddhnv8hi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vraddhnv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vraddhn_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vaddhnv4si (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vraddhnv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vraddhn_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vaddhnv2di (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vraddhnv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vraddhn_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vaddhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 4);
+  return (uint8x8_t)__builtin_neon_vraddhnv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vraddhn_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vaddhnv4si ((int32x4_t) __a, (int32x4_t) __b, 4);
+  return (uint16x4_t)__builtin_neon_vraddhnv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vraddhn_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vaddhnv2di ((int64x2_t) __a, (int64x2_t) __b, 4);
+  return (uint32x2_t)__builtin_neon_vraddhnv2di ((int64x2_t) __a, (int64x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
@@ -977,7 +977,7 @@ vmul_f32 (float32x2_t __a, float32x2_t __b)
 #ifdef __FAST_MATH
   return __a * __b;
 #else
-  return (float32x2_t) __builtin_neon_vmulv2sf (__a, __b, 3);
+  return (float32x2_t) __builtin_neon_vmulfv2sf (__a, __b);
 #endif
 
 }
@@ -1024,7 +1024,7 @@ vmulq_f32 (float32x4_t __a, float32x4_t __b)
 #ifdef __FAST_MATH
   return __a * __b;
 #else
-  return (float32x4_t) __builtin_neon_vmulv4sf (__a, __b, 3);
+  return (float32x4_t) __builtin_neon_vmulfv4sf (__a, __b);
 #endif
 }
 
@@ -1049,386 +1049,386 @@ vmulq_u32 (uint32x4_t __a, uint32x4_t __b)
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vmul_p8 (poly8x8_t __a, poly8x8_t __b)
 {
-  return (poly8x8_t)__builtin_neon_vmulv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+  return (poly8x8_t)__builtin_neon_vmulpv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vmulq_p8 (poly8x16_t __a, poly8x16_t __b)
 {
-  return (poly8x16_t)__builtin_neon_vmulv16qi ((int8x16_t) __a, (int8x16_t) __b, 2);
+  return (poly8x16_t)__builtin_neon_vmulpv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqdmulh_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqdmulhv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vqdmulhv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqdmulh_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqdmulhv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vqdmulhv2si (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqdmulhq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqdmulhv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vqdmulhv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmulhq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqdmulhv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vqdmulhv4si (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqrdmulh_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqdmulhv4hi (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vqrdmulhv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqrdmulh_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqdmulhv2si (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vqrdmulhv2si (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqrdmulhq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqdmulhv8hi (__a, __b, 5);
+  return (int16x8_t)__builtin_neon_vqrdmulhv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqdmulhv4si (__a, __b, 5);
+  return (int32x4_t)__builtin_neon_vqrdmulhv4si (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmull_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vmullv8qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vmullsv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmull_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vmullv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vmullsv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmull_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vmullv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vmullsv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmull_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vmullv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vmulluv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmull_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vmullv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vmulluv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmull_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vmullv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vmulluv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vmull_p8 (poly8x8_t __a, poly8x8_t __b)
 {
-  return (poly16x8_t)__builtin_neon_vmullv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+  return (poly16x8_t)__builtin_neon_vmullpv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmull_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqdmullv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vqdmullv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmull_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vqdmullv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vqdmullv2si (__a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vmla_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  return (int8x8_t)__builtin_neon_vmlav8qi (__a, __b, __c, 1);
+  return (int8x8_t)__builtin_neon_vmlav8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmla_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int16x4_t)__builtin_neon_vmlav4hi (__a, __b, __c, 1);
+  return (int16x4_t)__builtin_neon_vmlav4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmla_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int32x2_t)__builtin_neon_vmlav2si (__a, __b, __c, 1);
+  return (int32x2_t)__builtin_neon_vmlav2si (__a, __b, __c);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmla_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
 {
-  return (float32x2_t)__builtin_neon_vmlav2sf (__a, __b, __c, 3);
+  return (float32x2_t)__builtin_neon_vmlav2sf (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vmla_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  return (uint8x8_t)__builtin_neon_vmlav8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+  return (uint8x8_t)__builtin_neon_vmlav8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmla_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  return (uint16x4_t)__builtin_neon_vmlav4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+  return (uint16x4_t)__builtin_neon_vmlav4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmla_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  return (uint32x2_t)__builtin_neon_vmlav2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+  return (uint32x2_t)__builtin_neon_vmlav2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vmlaq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
 {
-  return (int8x16_t)__builtin_neon_vmlav16qi (__a, __b, __c, 1);
+  return (int8x16_t)__builtin_neon_vmlav16qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlaq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
 {
-  return (int16x8_t)__builtin_neon_vmlav8hi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vmlav8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlaq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmlav4si (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vmlav4si (__a, __b, __c);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmlaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
 {
-  return (float32x4_t)__builtin_neon_vmlav4sf (__a, __b, __c, 3);
+  return (float32x4_t)__builtin_neon_vmlav4sf (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vmlaq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
 {
-  return (uint8x16_t)__builtin_neon_vmlav16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c, 0);
+  return (uint8x16_t)__builtin_neon_vmlav16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlaq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vmlav8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c, 0);
+  return (uint16x8_t)__builtin_neon_vmlav8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlaq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmlav4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmlav4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlal_s8 (int16x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  return (int16x8_t)__builtin_neon_vmlalv8qi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vmlalsv8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmlalv4hi (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vmlalsv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmlal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int64x2_t)__builtin_neon_vmlalv2si (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vmlalsv2si (__a, __b, __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlal_u8 (uint16x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vmlalv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+  return (uint16x8_t)__builtin_neon_vmlaluv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlal_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmlalv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmlaluv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmlal_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  return (uint64x2_t)__builtin_neon_vmlalv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+  return (uint64x2_t)__builtin_neon_vmlaluv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vqdmlalv4hi (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vqdmlalv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int64x2_t)__builtin_neon_vqdmlalv2si (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vqdmlalv2si (__a, __b, __c);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vmls_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  return (int8x8_t)__builtin_neon_vmlsv8qi (__a, __b, __c, 1);
+  return (int8x8_t)__builtin_neon_vmlsv8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmls_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int16x4_t)__builtin_neon_vmlsv4hi (__a, __b, __c, 1);
+  return (int16x4_t)__builtin_neon_vmlsv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmls_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int32x2_t)__builtin_neon_vmlsv2si (__a, __b, __c, 1);
+  return (int32x2_t)__builtin_neon_vmlsv2si (__a, __b, __c);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmls_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
 {
-  return (float32x2_t)__builtin_neon_vmlsv2sf (__a, __b, __c, 3);
+  return (float32x2_t)__builtin_neon_vmlsv2sf (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vmls_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  return (uint8x8_t)__builtin_neon_vmlsv8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+  return (uint8x8_t)__builtin_neon_vmlsv8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmls_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  return (uint16x4_t)__builtin_neon_vmlsv4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+  return (uint16x4_t)__builtin_neon_vmlsv4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmls_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  return (uint32x2_t)__builtin_neon_vmlsv2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+  return (uint32x2_t)__builtin_neon_vmlsv2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vmlsq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
 {
-  return (int8x16_t)__builtin_neon_vmlsv16qi (__a, __b, __c, 1);
+  return (int8x16_t)__builtin_neon_vmlsv16qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlsq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
 {
-  return (int16x8_t)__builtin_neon_vmlsv8hi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vmlsv8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlsq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmlsv4si (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vmlsv4si (__a, __b, __c);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmlsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
 {
-  return (float32x4_t)__builtin_neon_vmlsv4sf (__a, __b, __c, 3);
+  return (float32x4_t)__builtin_neon_vmlsv4sf (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vmlsq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
 {
-  return (uint8x16_t)__builtin_neon_vmlsv16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c, 0);
+  return (uint8x16_t)__builtin_neon_vmlsv16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlsq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vmlsv8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c, 0);
+  return (uint16x8_t)__builtin_neon_vmlsv8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlsq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmlsv4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmlsv4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlsl_s8 (int16x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  return (int16x8_t)__builtin_neon_vmlslv8qi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vmlslsv8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlsl_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmlslv4hi (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vmlslsv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmlsl_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int64x2_t)__builtin_neon_vmlslv2si (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vmlslsv2si (__a, __b, __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlsl_u8 (uint16x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vmlslv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+  return (uint16x8_t)__builtin_neon_vmlsluv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlsl_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmlslv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmlsluv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmlsl_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  return (uint64x2_t)__builtin_neon_vmlslv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+  return (uint64x2_t)__builtin_neon_vmlsluv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlsl_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vqdmlslv4hi (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vqdmlslv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlsl_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int64x2_t)__builtin_neon_vqdmlslv2si (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vqdmlslv2si (__a, __b, __c);
 }
 
 #ifdef __ARM_FEATURE_FMA
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vfma_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
 {
-  return (float32x2_t)__builtin_neon_vfmav2sf (__a, __b, __c, 3);
+  return (float32x2_t)__builtin_neon_vfmav2sf (__a, __b, __c);
 }
 
 #endif
@@ -1436,7 +1436,7 @@ vfma_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vfmaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
 {
-  return (float32x4_t)__builtin_neon_vfmav4sf (__a, __b, __c, 3);
+  return (float32x4_t)__builtin_neon_vfmav4sf (__a, __b, __c);
 }
 
 #endif
@@ -1444,7 +1444,7 @@ vfmaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
 {
-  return (float32x2_t)__builtin_neon_vfmsv2sf (__a, __b, __c, 3);
+  return (float32x2_t)__builtin_neon_vfmsv2sf (__a, __b, __c);
 }
 
 #endif
@@ -1452,7 +1452,7 @@ vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
 {
-  return (float32x4_t)__builtin_neon_vfmsv4sf (__a, __b, __c, 3);
+  return (float32x4_t)__builtin_neon_vfmsv4sf (__a, __b, __c);
 }
 
 #endif
@@ -1561,7 +1561,7 @@ vsub_f32 (float32x2_t __a, float32x2_t __b)
 #ifdef __FAST_MATH
   return __a - __b;
 #else
-  return (float32x2_t) __builtin_neon_vsubv2sf (__a, __b, 3);
+  return (float32x2_t) __builtin_neon_vsubv2sf (__a, __b);
 #endif
 }
 
@@ -1625,7 +1625,7 @@ vsubq_f32 (float32x4_t __a, float32x4_t __b)
 #ifdef __FAST_MATH
   return __a - __b;
 #else
-  return (float32x4_t) __builtin_neon_vsubv4sf (__a, __b, 3);
+  return (float32x4_t) __builtin_neon_vsubv4sf (__a, __b);
 #endif
 }
 
@@ -1656,2791 +1656,2791 @@ vsubq_u64 (uint64x2_t __a, uint64x2_t __b)
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vsubl_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vsublv8qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vsublsv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vsubl_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vsublv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vsublsv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vsubl_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vsublv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vsublsv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vsubl_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vsublv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vsubluv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsubl_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vsublv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vsubluv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vsubl_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vsublv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vsubluv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vsubw_s8 (int16x8_t __a, int8x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vsubwv8qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vsubwsv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vsubw_s16 (int32x4_t __a, int16x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vsubwv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vsubwsv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vsubw_s32 (int64x2_t __a, int32x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vsubwv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vsubwsv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vsubw_u8 (uint16x8_t __a, uint8x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vsubwv8qi ((int16x8_t) __a, (int8x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vsubwuv8qi ((int16x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsubw_u16 (uint32x4_t __a, uint16x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vsubwv4hi ((int32x4_t) __a, (int16x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vsubwuv4hi ((int32x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vsubw_u32 (uint64x2_t __a, uint32x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vsubwv2si ((int64x2_t) __a, (int32x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vsubwuv2si ((int64x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vhsub_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vhsubv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vhsubsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vhsub_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vhsubv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vhsubsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vhsub_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vhsubv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vhsubsv2si (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vhsub_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vhsubv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vhsubuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vhsub_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vhsubv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vhsubuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vhsub_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vhsubv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vhsubuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vhsubq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vhsubv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vhsubsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vhsubq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vhsubv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vhsubsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vhsubq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vhsubv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vhsubsv4si (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vhsubq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vhsubv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vhsubuv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vhsubq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vhsubv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vhsubuv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vhsubq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vhsubv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vhsubuv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqsub_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vqsubv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vqsubsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqsub_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqsubv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vqsubsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqsub_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqsubv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vqsubsv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vqsub_s64 (int64x1_t __a, int64x1_t __b)
 {
-  return (int64x1_t)__builtin_neon_vqsubdi (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vqsubsdi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqsub_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vqsubv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vqsubuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqsub_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vqsubv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vqsubuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqsub_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vqsubv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vqsubuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vqsub_u64 (uint64x1_t __a, uint64x1_t __b)
 {
-  return (uint64x1_t)__builtin_neon_vqsubdi ((int64x1_t) __a, (int64x1_t) __b, 0);
+  return (uint64x1_t)__builtin_neon_vqsubudi ((int64x1_t) __a, (int64x1_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vqsubq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vqsubv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vqsubsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqsubq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqsubv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vqsubsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqsubq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqsubv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vqsubsv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqsubq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vqsubv2di (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vqsubsv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vqsubq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vqsubv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vqsubuv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vqsubq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vqsubv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vqsubuv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vqsubq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vqsubv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vqsubuv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vqsubq_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vqsubv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vqsubuv2di ((int64x2_t) __a, (int64x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vsubhn_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vsubhnv8hi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vsubhnv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vsubhn_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vsubhnv4si (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vsubhnv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vsubhn_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vsubhnv2di (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vsubhnv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vsubhn_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vsubhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vsubhnv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vsubhn_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vsubhnv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vsubhnv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vsubhn_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vsubhnv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vsubhnv2di ((int64x2_t) __a, (int64x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vrsubhn_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vsubhnv8hi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vrsubhnv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vrsubhn_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vsubhnv4si (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vrsubhnv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vrsubhn_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vsubhnv2di (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vrsubhnv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vrsubhn_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vsubhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 4);
+  return (uint8x8_t)__builtin_neon_vrsubhnv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vrsubhn_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vsubhnv4si ((int32x4_t) __a, (int32x4_t) __b, 4);
+  return (uint16x4_t)__builtin_neon_vrsubhnv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrsubhn_u64 (uint64x2_t __a, uint64x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vsubhnv2di ((int64x2_t) __a, (int64x2_t) __b, 4);
+  return (uint32x2_t)__builtin_neon_vrsubhnv2di ((int64x2_t) __a, (int64x2_t) __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceq_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vceqv8qi (__a, __b, 1);
+  return (uint8x8_t)__builtin_neon_vceqv8qi (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vceq_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vceqv4hi (__a, __b, 1);
+  return (uint16x4_t)__builtin_neon_vceqv4hi (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceq_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vceqv2si (__a, __b, 1);
+  return (uint32x2_t)__builtin_neon_vceqv2si (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceq_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vceqv2sf (__a, __b, 3);
+  return (uint32x2_t)__builtin_neon_vceqv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceq_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vceq_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vceqv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vceqv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vceq_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vceqv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vceqv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vceq_p8 (poly8x8_t __a, poly8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+  return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vceqv16qi (__a, __b, 1);
+  return (uint8x16_t)__builtin_neon_vceqv16qi (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vceqq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vceqv8hi (__a, __b, 1);
+  return (uint16x8_t)__builtin_neon_vceqv8hi (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vceqv4si (__a, __b, 1);
+  return (uint32x4_t)__builtin_neon_vceqv4si (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vceqv4sf (__a, __b, 3);
+  return (uint32x4_t)__builtin_neon_vceqv4sf (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vceqq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vceqv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vceqv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vceqq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vceqv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vceqv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vceqq_p8 (poly8x16_t __a, poly8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b, 2);
+  return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcge_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgev8qi (__a, __b, 1);
+  return (uint8x8_t)__builtin_neon_vcgev8qi (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcge_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgev4hi (__a, __b, 1);
+  return (uint16x4_t)__builtin_neon_vcgev4hi (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcge_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgev2si (__a, __b, 1);
+  return (uint32x2_t)__builtin_neon_vcgev2si (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcge_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgev2sf (__a, __b, 3);
+  return (uint32x2_t)__builtin_neon_vcgev2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcge_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgeuv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vcgeuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcge_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgeuv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vcgeuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcge_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgeuv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vcgeuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgeq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgev16qi (__a, __b, 1);
+  return (uint8x16_t)__builtin_neon_vcgev16qi (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgeq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgev8hi (__a, __b, 1);
+  return (uint16x8_t)__builtin_neon_vcgev8hi (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgeq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgev4si (__a, __b, 1);
+  return (uint32x4_t)__builtin_neon_vcgev4si (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgeq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgev4sf (__a, __b, 3);
+  return (uint32x4_t)__builtin_neon_vcgev4sf (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgeq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgeuv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vcgeuv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgeq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgeuv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vcgeuv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgeq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgeuv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vcgeuv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcle_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgev8qi (__b, __a, 1);
+  return (uint8x8_t)__builtin_neon_vcgev8qi (__b, __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcle_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgev4hi (__b, __a, 1);
+  return (uint16x4_t)__builtin_neon_vcgev4hi (__b, __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcle_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgev2si (__b, __a, 1);
+  return (uint32x2_t)__builtin_neon_vcgev2si (__b, __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcle_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgev2sf (__b, __a, 3);
+  return (uint32x2_t)__builtin_neon_vcgev2sf (__b, __a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcle_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgeuv8qi ((int8x8_t) __b, (int8x8_t) __a, 0);
+  return (uint8x8_t)__builtin_neon_vcgeuv8qi ((int8x8_t) __b, (int8x8_t) __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcle_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgeuv4hi ((int16x4_t) __b, (int16x4_t) __a, 0);
+  return (uint16x4_t)__builtin_neon_vcgeuv4hi ((int16x4_t) __b, (int16x4_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcle_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgeuv2si ((int32x2_t) __b, (int32x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vcgeuv2si ((int32x2_t) __b, (int32x2_t) __a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcleq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgev16qi (__b, __a, 1);
+  return (uint8x16_t)__builtin_neon_vcgev16qi (__b, __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcleq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgev8hi (__b, __a, 1);
+  return (uint16x8_t)__builtin_neon_vcgev8hi (__b, __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcleq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgev4si (__b, __a, 1);
+  return (uint32x4_t)__builtin_neon_vcgev4si (__b, __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcleq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgev4sf (__b, __a, 3);
+  return (uint32x4_t)__builtin_neon_vcgev4sf (__b, __a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcleq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgeuv16qi ((int8x16_t) __b, (int8x16_t) __a, 0);
+  return (uint8x16_t)__builtin_neon_vcgeuv16qi ((int8x16_t) __b, (int8x16_t) __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcleq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgeuv8hi ((int16x8_t) __b, (int16x8_t) __a, 0);
+  return (uint16x8_t)__builtin_neon_vcgeuv8hi ((int16x8_t) __b, (int16x8_t) __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcleq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgeuv4si ((int32x4_t) __b, (int32x4_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vcgeuv4si ((int32x4_t) __b, (int32x4_t) __a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcgt_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgtv8qi (__a, __b, 1);
+  return (uint8x8_t)__builtin_neon_vcgtv8qi (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcgt_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgtv4hi (__a, __b, 1);
+  return (uint16x4_t)__builtin_neon_vcgtv4hi (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgt_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgtv2si (__a, __b, 1);
+  return (uint32x2_t)__builtin_neon_vcgtv2si (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgt_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgtv2sf (__a, __b, 3);
+  return (uint32x2_t)__builtin_neon_vcgtv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcgt_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgtuv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vcgtuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vcgt_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgtuv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vcgtuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcgt_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgtuv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vcgtuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgtq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgtv16qi (__a, __b, 1);
+  return (uint8x16_t)__builtin_neon_vcgtv16qi (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgtq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgtv8hi (__a, __b, 1);
+  return (uint16x8_t)__builtin_neon_vcgtv8hi (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgtv4si (__a, __b, 1);
+  return (uint32x4_t)__builtin_neon_vcgtv4si (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgtv4sf (__a, __b, 3);
+  return (uint32x4_t)__builtin_neon_vcgtv4sf (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgtq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgtuv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vcgtuv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcgtq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgtuv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vcgtuv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcgtq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgtuv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vcgtuv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vclt_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgtv8qi (__b, __a, 1);
+  return (uint8x8_t)__builtin_neon_vcgtv8qi (__b, __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vclt_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgtv4hi (__b, __a, 1);
+  return (uint16x4_t)__builtin_neon_vcgtv4hi (__b, __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclt_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgtv2si (__b, __a, 1);
+  return (uint32x2_t)__builtin_neon_vcgtv2si (__b, __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclt_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgtv2sf (__b, __a, 3);
+  return (uint32x2_t)__builtin_neon_vcgtv2sf (__b, __a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vclt_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vcgtuv8qi ((int8x8_t) __b, (int8x8_t) __a, 0);
+  return (uint8x8_t)__builtin_neon_vcgtuv8qi ((int8x8_t) __b, (int8x8_t) __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vclt_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vcgtuv4hi ((int16x4_t) __b, (int16x4_t) __a, 0);
+  return (uint16x4_t)__builtin_neon_vcgtuv4hi ((int16x4_t) __b, (int16x4_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclt_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcgtuv2si ((int32x2_t) __b, (int32x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vcgtuv2si ((int32x2_t) __b, (int32x2_t) __a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcltq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgtv16qi (__b, __a, 1);
+  return (uint8x16_t)__builtin_neon_vcgtv16qi (__b, __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcltq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgtv8hi (__b, __a, 1);
+  return (uint16x8_t)__builtin_neon_vcgtv8hi (__b, __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgtv4si (__b, __a, 1);
+  return (uint32x4_t)__builtin_neon_vcgtv4si (__b, __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgtv4sf (__b, __a, 3);
+  return (uint32x4_t)__builtin_neon_vcgtv4sf (__b, __a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcltq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vcgtuv16qi ((int8x16_t) __b, (int8x16_t) __a, 0);
+  return (uint8x16_t)__builtin_neon_vcgtuv16qi ((int8x16_t) __b, (int8x16_t) __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vcltq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vcgtuv8hi ((int16x8_t) __b, (int16x8_t) __a, 0);
+  return (uint16x8_t)__builtin_neon_vcgtuv8hi ((int16x8_t) __b, (int16x8_t) __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcltq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcgtuv4si ((int32x4_t) __b, (int32x4_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vcgtuv4si ((int32x4_t) __b, (int32x4_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcage_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcagev2sf (__a, __b, 3);
+  return (uint32x2_t)__builtin_neon_vcagev2sf (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcageq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcagev4sf (__a, __b, 3);
+  return (uint32x4_t)__builtin_neon_vcagev4sf (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcale_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcagev2sf (__b, __a, 3);
+  return (uint32x2_t)__builtin_neon_vcagev2sf (__b, __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcaleq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcagev4sf (__b, __a, 3);
+  return (uint32x4_t)__builtin_neon_vcagev4sf (__b, __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcagt_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcagtv2sf (__a, __b, 3);
+  return (uint32x2_t)__builtin_neon_vcagtv2sf (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcagtq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcagtv4sf (__a, __b, 3);
+  return (uint32x4_t)__builtin_neon_vcagtv4sf (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcalt_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vcagtv2sf (__b, __a, 3);
+  return (uint32x2_t)__builtin_neon_vcagtv2sf (__b, __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcaltq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vcagtv4sf (__b, __a, 3);
+  return (uint32x4_t)__builtin_neon_vcagtv4sf (__b, __a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vtst_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vtstv8qi (__a, __b, 1);
+  return (uint8x8_t)__builtin_neon_vtstv8qi (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vtst_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vtstv4hi (__a, __b, 1);
+  return (uint16x4_t)__builtin_neon_vtstv4hi (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vtst_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vtstv2si (__a, __b, 1);
+  return (uint32x2_t)__builtin_neon_vtstv2si (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vtst_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vtst_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vtstv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vtstv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vtst_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vtstv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vtstv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vtst_p8 (poly8x8_t __a, poly8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+  return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vtstq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vtstv16qi (__a, __b, 1);
+  return (uint8x16_t)__builtin_neon_vtstv16qi (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vtstq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vtstv8hi (__a, __b, 1);
+  return (uint16x8_t)__builtin_neon_vtstv8hi (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vtstq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vtstv4si (__a, __b, 1);
+  return (uint32x4_t)__builtin_neon_vtstv4si (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vtstq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vtstq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vtstv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vtstv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vtstq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vtstv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vtstv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vtstq_p8 (poly8x16_t __a, poly8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b, 2);
+  return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vabd_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vabdv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vabdsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vabd_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vabdv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vabdsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vabd_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vabdv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vabdsv2si (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vabd_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vabdv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vabdfv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vabd_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vabdv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vabduv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vabd_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vabdv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vabduv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vabd_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vabdv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vabduv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vabdq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vabdv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vabdsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vabdq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vabdv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vabdsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vabdq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vabdv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vabdsv4si (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vabdq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (float32x4_t)__builtin_neon_vabdv4sf (__a, __b, 3);
+  return (float32x4_t)__builtin_neon_vabdfv4sf (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vabdq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vabdv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vabduv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vabdq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vabdv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vabduv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vabdq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vabdv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vabduv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vabdl_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vabdlv8qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vabdlsv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vabdl_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vabdlv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vabdlsv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vabdl_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vabdlv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vabdlsv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vabdl_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vabdlv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vabdluv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vabdl_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vabdlv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vabdluv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vabdl_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vabdlv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vabdluv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vaba_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  return (int8x8_t)__builtin_neon_vabav8qi (__a, __b, __c, 1);
+  return (int8x8_t)__builtin_neon_vabasv8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vaba_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int16x4_t)__builtin_neon_vabav4hi (__a, __b, __c, 1);
+  return (int16x4_t)__builtin_neon_vabasv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vaba_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int32x2_t)__builtin_neon_vabav2si (__a, __b, __c, 1);
+  return (int32x2_t)__builtin_neon_vabasv2si (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vaba_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  return (uint8x8_t)__builtin_neon_vabav8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+  return (uint8x8_t)__builtin_neon_vabauv8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vaba_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  return (uint16x4_t)__builtin_neon_vabav4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+  return (uint16x4_t)__builtin_neon_vabauv4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vaba_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  return (uint32x2_t)__builtin_neon_vabav2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+  return (uint32x2_t)__builtin_neon_vabauv2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vabaq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
 {
-  return (int8x16_t)__builtin_neon_vabav16qi (__a, __b, __c, 1);
+  return (int8x16_t)__builtin_neon_vabasv16qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vabaq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
 {
-  return (int16x8_t)__builtin_neon_vabav8hi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vabasv8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vabaq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vabav4si (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vabasv4si (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vabaq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
 {
-  return (uint8x16_t)__builtin_neon_vabav16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c, 0);
+  return (uint8x16_t)__builtin_neon_vabauv16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vabaq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vabav8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c, 0);
+  return (uint16x8_t)__builtin_neon_vabauv8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vabaq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vabav4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c, 0);
+  return (uint32x4_t)__builtin_neon_vabauv4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vabal_s8 (int16x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  return (int16x8_t)__builtin_neon_vabalv8qi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vabalsv8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vabal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  return (int32x4_t)__builtin_neon_vabalv4hi (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vabalsv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vabal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  return (int64x2_t)__builtin_neon_vabalv2si (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vabalsv2si (__a, __b, __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vabal_u8 (uint16x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vabalv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+  return (uint16x8_t)__builtin_neon_vabaluv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vabal_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vabalv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+  return (uint32x4_t)__builtin_neon_vabaluv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vabal_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  return (uint64x2_t)__builtin_neon_vabalv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+  return (uint64x2_t)__builtin_neon_vabaluv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vmax_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vmaxv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vmaxsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmax_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vmaxv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vmaxsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmax_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vmaxv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vmaxsv2si (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmax_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vmaxv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vmaxfv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vmax_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vmaxv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vmaxuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmax_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vmaxv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vmaxuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmax_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vmaxv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vmaxuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vmaxq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vmaxv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vmaxsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmaxq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vmaxv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vmaxsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmaxq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vmaxv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vmaxsv4si (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmaxq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (float32x4_t)__builtin_neon_vmaxv4sf (__a, __b, 3);
+  return (float32x4_t)__builtin_neon_vmaxfv4sf (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vmaxq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vmaxv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vmaxuv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmaxq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vmaxv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vmaxuv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmaxq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vmaxv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vmaxuv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vmin_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vminv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vminsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmin_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vminv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vminsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmin_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vminv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vminsv2si (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmin_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vminv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vminfv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vmin_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vminv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vminuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmin_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vminv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vminuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmin_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vminv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vminuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vminq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vminv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vminsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vminq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vminv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vminsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vminq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vminv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vminsv4si (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vminq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (float32x4_t)__builtin_neon_vminv4sf (__a, __b, 3);
+  return (float32x4_t)__builtin_neon_vminfv4sf (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vminq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vminv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+  return (uint8x16_t)__builtin_neon_vminuv16qi ((int8x16_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vminq_u16 (uint16x8_t __a, uint16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vminv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vminuv8hi ((int16x8_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vminq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vminv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vminuv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vpadd_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vpaddv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vpaddv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpadd_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vpaddv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vpaddv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vpadd_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vpaddv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vpaddv2si (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vpadd_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vpaddv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vpaddv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vpadd_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vpaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vpaddv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vpadd_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vpaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vpaddv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vpadd_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vpaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vpaddv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpaddl_s8 (int8x8_t __a)
 {
-  return (int16x4_t)__builtin_neon_vpaddlv8qi (__a, 1);
+  return (int16x4_t)__builtin_neon_vpaddlsv8qi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vpaddl_s16 (int16x4_t __a)
 {
-  return (int32x2_t)__builtin_neon_vpaddlv4hi (__a, 1);
+  return (int32x2_t)__builtin_neon_vpaddlsv4hi (__a);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vpaddl_s32 (int32x2_t __a)
 {
-  return (int64x1_t)__builtin_neon_vpaddlv2si (__a, 1);
+  return (int64x1_t)__builtin_neon_vpaddlsv2si (__a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vpaddl_u8 (uint8x8_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vpaddlv8qi ((int8x8_t) __a, 0);
+  return (uint16x4_t)__builtin_neon_vpaddluv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vpaddl_u16 (uint16x4_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vpaddlv4hi ((int16x4_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vpaddluv4hi ((int16x4_t) __a);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vpaddl_u32 (uint32x2_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vpaddlv2si ((int32x2_t) __a, 0);
+  return (uint64x1_t)__builtin_neon_vpaddluv2si ((int32x2_t) __a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vpaddlq_s8 (int8x16_t __a)
 {
-  return (int16x8_t)__builtin_neon_vpaddlv16qi (__a, 1);
+  return (int16x8_t)__builtin_neon_vpaddlsv16qi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vpaddlq_s16 (int16x8_t __a)
 {
-  return (int32x4_t)__builtin_neon_vpaddlv8hi (__a, 1);
+  return (int32x4_t)__builtin_neon_vpaddlsv8hi (__a);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vpaddlq_s32 (int32x4_t __a)
 {
-  return (int64x2_t)__builtin_neon_vpaddlv4si (__a, 1);
+  return (int64x2_t)__builtin_neon_vpaddlsv4si (__a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vpaddlq_u8 (uint8x16_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vpaddlv16qi ((int8x16_t) __a, 0);
+  return (uint16x8_t)__builtin_neon_vpaddluv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vpaddlq_u16 (uint16x8_t __a)
 {
-  return (uint32x4_t)__builtin_neon_vpaddlv8hi ((int16x8_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vpaddluv8hi ((int16x8_t) __a);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vpaddlq_u32 (uint32x4_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vpaddlv4si ((int32x4_t) __a, 0);
+  return (uint64x2_t)__builtin_neon_vpaddluv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpadal_s8 (int16x4_t __a, int8x8_t __b)
 {
-  return (int16x4_t)__builtin_neon_vpadalv8qi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vpadalsv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vpadal_s16 (int32x2_t __a, int16x4_t __b)
 {
-  return (int32x2_t)__builtin_neon_vpadalv4hi (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vpadalsv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vpadal_s32 (int64x1_t __a, int32x2_t __b)
 {
-  return (int64x1_t)__builtin_neon_vpadalv2si (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vpadalsv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vpadal_u8 (uint16x4_t __a, uint8x8_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vpadalv8qi ((int16x4_t) __a, (int8x8_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vpadaluv8qi ((int16x4_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vpadal_u16 (uint32x2_t __a, uint16x4_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vpadalv4hi ((int32x2_t) __a, (int16x4_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vpadaluv4hi ((int32x2_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vpadal_u32 (uint64x1_t __a, uint32x2_t __b)
 {
-  return (uint64x1_t)__builtin_neon_vpadalv2si ((int64x1_t) __a, (int32x2_t) __b, 0);
+  return (uint64x1_t)__builtin_neon_vpadaluv2si ((int64x1_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vpadalq_s8 (int16x8_t __a, int8x16_t __b)
 {
-  return (int16x8_t)__builtin_neon_vpadalv16qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vpadalsv16qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vpadalq_s16 (int32x4_t __a, int16x8_t __b)
 {
-  return (int32x4_t)__builtin_neon_vpadalv8hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vpadalsv8hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vpadalq_s32 (int64x2_t __a, int32x4_t __b)
 {
-  return (int64x2_t)__builtin_neon_vpadalv4si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vpadalsv4si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vpadalq_u8 (uint16x8_t __a, uint8x16_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vpadalv16qi ((int16x8_t) __a, (int8x16_t) __b, 0);
+  return (uint16x8_t)__builtin_neon_vpadaluv16qi ((int16x8_t) __a, (int8x16_t) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vpadalq_u16 (uint32x4_t __a, uint16x8_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vpadalv8hi ((int32x4_t) __a, (int16x8_t) __b, 0);
+  return (uint32x4_t)__builtin_neon_vpadaluv8hi ((int32x4_t) __a, (int16x8_t) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vpadalq_u32 (uint64x2_t __a, uint32x4_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vpadalv4si ((int64x2_t) __a, (int32x4_t) __b, 0);
+  return (uint64x2_t)__builtin_neon_vpadaluv4si ((int64x2_t) __a, (int32x4_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vpmax_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vpmaxv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vpmaxsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpmax_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vpmaxv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vpmaxsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vpmax_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vpmaxv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vpmaxsv2si (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vpmax_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vpmaxv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vpmaxfv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vpmax_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vpmaxv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vpmaxuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vpmax_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vpmaxv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vpmaxuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vpmax_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vpmaxv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vpmaxuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vpmin_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vpminv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vpminsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpmin_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vpminv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vpminsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vpmin_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vpminv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vpminsv2si (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vpmin_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vpminv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vpminfv2sf (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vpmin_u8 (uint8x8_t __a, uint8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vpminv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+  return (uint8x8_t)__builtin_neon_vpminuv8qi ((int8x8_t) __a, (int8x8_t) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vpmin_u16 (uint16x4_t __a, uint16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vpminv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+  return (uint16x4_t)__builtin_neon_vpminuv4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vpmin_u32 (uint32x2_t __a, uint32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vpminv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+  return (uint32x2_t)__builtin_neon_vpminuv2si ((int32x2_t) __a, (int32x2_t) __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vrecps_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vrecpsv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vrecpsv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrecpsq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (float32x4_t)__builtin_neon_vrecpsv4sf (__a, __b, 3);
+  return (float32x4_t)__builtin_neon_vrecpsv4sf (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vrsqrts_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return (float32x2_t)__builtin_neon_vrsqrtsv2sf (__a, __b, 3);
+  return (float32x2_t)__builtin_neon_vrsqrtsv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrsqrtsq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return (float32x4_t)__builtin_neon_vrsqrtsv4sf (__a, __b, 3);
+  return (float32x4_t)__builtin_neon_vrsqrtsv4sf (__a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vshl_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vshlv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vshlsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vshl_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vshlv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vshlsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vshl_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vshlv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vshlsv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vshl_s64 (int64x1_t __a, int64x1_t __b)
 {
-  return (int64x1_t)__builtin_neon_vshldi (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vshlsdi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vshl_u8 (uint8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vshlv8qi ((int8x8_t) __a, __b, 0);
+  return (uint8x8_t)__builtin_neon_vshluv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vshl_u16 (uint16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vshlv4hi ((int16x4_t) __a, __b, 0);
+  return (uint16x4_t)__builtin_neon_vshluv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vshl_u32 (uint32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vshlv2si ((int32x2_t) __a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vshluv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vshl_u64 (uint64x1_t __a, int64x1_t __b)
 {
-  return (uint64x1_t)__builtin_neon_vshldi ((int64x1_t) __a, __b, 0);
+  return (uint64x1_t)__builtin_neon_vshludi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vshlq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vshlv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vshlsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vshlq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vshlv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vshlsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vshlq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vshlv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vshlsv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vshlq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vshlv2di (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vshlsv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vshlq_u8 (uint8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vshlv16qi ((int8x16_t) __a, __b, 0);
+  return (uint8x16_t)__builtin_neon_vshluv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vshlq_u16 (uint16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vshlv8hi ((int16x8_t) __a, __b, 0);
+  return (uint16x8_t)__builtin_neon_vshluv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vshlq_u32 (uint32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vshlv4si ((int32x4_t) __a, __b, 0);
+  return (uint32x4_t)__builtin_neon_vshluv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vshlq_u64 (uint64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vshlv2di ((int64x2_t) __a, __b, 0);
+  return (uint64x2_t)__builtin_neon_vshluv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vrshl_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vshlv8qi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vrshlsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vrshl_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vshlv4hi (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vrshlsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vrshl_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vshlv2si (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vrshlsv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vrshl_s64 (int64x1_t __a, int64x1_t __b)
 {
-  return (int64x1_t)__builtin_neon_vshldi (__a, __b, 5);
+  return (int64x1_t)__builtin_neon_vrshlsdi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vrshl_u8 (uint8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vshlv8qi ((int8x8_t) __a, __b, 4);
+  return (uint8x8_t)__builtin_neon_vrshluv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vrshl_u16 (uint16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vshlv4hi ((int16x4_t) __a, __b, 4);
+  return (uint16x4_t)__builtin_neon_vrshluv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrshl_u32 (uint32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vshlv2si ((int32x2_t) __a, __b, 4);
+  return (uint32x2_t)__builtin_neon_vrshluv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vrshl_u64 (uint64x1_t __a, int64x1_t __b)
 {
-  return (uint64x1_t)__builtin_neon_vshldi ((int64x1_t) __a, __b, 4);
+  return (uint64x1_t)__builtin_neon_vrshludi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vrshlq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vshlv16qi (__a, __b, 5);
+  return (int8x16_t)__builtin_neon_vrshlsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vrshlq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vshlv8hi (__a, __b, 5);
+  return (int16x8_t)__builtin_neon_vrshlsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vrshlq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vshlv4si (__a, __b, 5);
+  return (int32x4_t)__builtin_neon_vrshlsv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vrshlq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vshlv2di (__a, __b, 5);
+  return (int64x2_t)__builtin_neon_vrshlsv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vrshlq_u8 (uint8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vshlv16qi ((int8x16_t) __a, __b, 4);
+  return (uint8x16_t)__builtin_neon_vrshluv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vrshlq_u16 (uint16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vshlv8hi ((int16x8_t) __a, __b, 4);
+  return (uint16x8_t)__builtin_neon_vrshluv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrshlq_u32 (uint32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vshlv4si ((int32x4_t) __a, __b, 4);
+  return (uint32x4_t)__builtin_neon_vrshluv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vrshlq_u64 (uint64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vshlv2di ((int64x2_t) __a, __b, 4);
+  return (uint64x2_t)__builtin_neon_vrshluv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqshl_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vqshlv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vqshlsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqshl_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqshlv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vqshlsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqshl_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqshlv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vqshlsv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vqshl_s64 (int64x1_t __a, int64x1_t __b)
 {
-  return (int64x1_t)__builtin_neon_vqshldi (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vqshlsdi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqshl_u8 (uint8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshlv8qi ((int8x8_t) __a, __b, 0);
+  return (uint8x8_t)__builtin_neon_vqshluv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqshl_u16 (uint16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshlv4hi ((int16x4_t) __a, __b, 0);
+  return (uint16x4_t)__builtin_neon_vqshluv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqshl_u32 (uint32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshlv2si ((int32x2_t) __a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vqshluv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vqshl_u64 (uint64x1_t __a, int64x1_t __b)
 {
-  return (uint64x1_t)__builtin_neon_vqshldi ((int64x1_t) __a, __b, 0);
+  return (uint64x1_t)__builtin_neon_vqshludi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vqshlq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vqshlv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vqshlsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqshlq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqshlv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vqshlsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqshlq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqshlv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vqshlsv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqshlq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vqshlv2di (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vqshlsv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vqshlq_u8 (uint8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vqshlv16qi ((int8x16_t) __a, __b, 0);
+  return (uint8x16_t)__builtin_neon_vqshluv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vqshlq_u16 (uint16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vqshlv8hi ((int16x8_t) __a, __b, 0);
+  return (uint16x8_t)__builtin_neon_vqshluv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vqshlq_u32 (uint32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vqshlv4si ((int32x4_t) __a, __b, 0);
+  return (uint32x4_t)__builtin_neon_vqshluv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vqshlq_u64 (uint64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vqshlv2di ((int64x2_t) __a, __b, 0);
+  return (uint64x2_t)__builtin_neon_vqshluv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqrshl_s8 (int8x8_t __a, int8x8_t __b)
 {
-  return (int8x8_t)__builtin_neon_vqshlv8qi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vqrshlsv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqrshl_s16 (int16x4_t __a, int16x4_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqshlv4hi (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vqrshlsv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqrshl_s32 (int32x2_t __a, int32x2_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqshlv2si (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vqrshlsv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vqrshl_s64 (int64x1_t __a, int64x1_t __b)
 {
-  return (int64x1_t)__builtin_neon_vqshldi (__a, __b, 5);
+  return (int64x1_t)__builtin_neon_vqrshlsdi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqrshl_u8 (uint8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshlv8qi ((int8x8_t) __a, __b, 4);
+  return (uint8x8_t)__builtin_neon_vqrshluv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqrshl_u16 (uint16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshlv4hi ((int16x4_t) __a, __b, 4);
+  return (uint16x4_t)__builtin_neon_vqrshluv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqrshl_u32 (uint32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshlv2si ((int32x2_t) __a, __b, 4);
+  return (uint32x2_t)__builtin_neon_vqrshluv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vqrshl_u64 (uint64x1_t __a, int64x1_t __b)
 {
-  return (uint64x1_t)__builtin_neon_vqshldi ((int64x1_t) __a, __b, 4);
+  return (uint64x1_t)__builtin_neon_vqrshludi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vqrshlq_s8 (int8x16_t __a, int8x16_t __b)
 {
-  return (int8x16_t)__builtin_neon_vqshlv16qi (__a, __b, 5);
+  return (int8x16_t)__builtin_neon_vqrshlsv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqrshlq_s16 (int16x8_t __a, int16x8_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqshlv8hi (__a, __b, 5);
+  return (int16x8_t)__builtin_neon_vqrshlsv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqrshlq_s32 (int32x4_t __a, int32x4_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqshlv4si (__a, __b, 5);
+  return (int32x4_t)__builtin_neon_vqrshlsv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqrshlq_s64 (int64x2_t __a, int64x2_t __b)
 {
-  return (int64x2_t)__builtin_neon_vqshlv2di (__a, __b, 5);
+  return (int64x2_t)__builtin_neon_vqrshlsv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vqrshlq_u8 (uint8x16_t __a, int8x16_t __b)
 {
-  return (uint8x16_t)__builtin_neon_vqshlv16qi ((int8x16_t) __a, __b, 4);
+  return (uint8x16_t)__builtin_neon_vqrshluv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vqrshlq_u16 (uint16x8_t __a, int16x8_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vqshlv8hi ((int16x8_t) __a, __b, 4);
+  return (uint16x8_t)__builtin_neon_vqrshluv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vqrshlq_u32 (uint32x4_t __a, int32x4_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vqshlv4si ((int32x4_t) __a, __b, 4);
+  return (uint32x4_t)__builtin_neon_vqrshluv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vqrshlq_u64 (uint64x2_t __a, int64x2_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vqshlv2di ((int64x2_t) __a, __b, 4);
+  return (uint64x2_t)__builtin_neon_vqrshluv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vshr_n_s8 (int8x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vshr_nv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vshrs_nv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vshr_n_s16 (int16x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vshr_nv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vshrs_nv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vshr_n_s32 (int32x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vshr_nv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vshrs_nv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vshr_n_s64 (int64x1_t __a, const int __b)
 {
-  return (int64x1_t)__builtin_neon_vshr_ndi (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vshrs_ndi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vshr_n_u8 (uint8x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vshr_nv8qi ((int8x8_t) __a, __b, 0);
+  return (uint8x8_t)__builtin_neon_vshru_nv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vshr_n_u16 (uint16x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vshr_nv4hi ((int16x4_t) __a, __b, 0);
+  return (uint16x4_t)__builtin_neon_vshru_nv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vshr_n_u32 (uint32x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vshr_nv2si ((int32x2_t) __a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vshru_nv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vshr_n_u64 (uint64x1_t __a, const int __b)
 {
-  return (uint64x1_t)__builtin_neon_vshr_ndi ((int64x1_t) __a, __b, 0);
+  return (uint64x1_t)__builtin_neon_vshru_ndi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vshrq_n_s8 (int8x16_t __a, const int __b)
 {
-  return (int8x16_t)__builtin_neon_vshr_nv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vshrs_nv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vshrq_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int16x8_t)__builtin_neon_vshr_nv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vshrs_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vshrq_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int32x4_t)__builtin_neon_vshr_nv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vshrs_nv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vshrq_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int64x2_t)__builtin_neon_vshr_nv2di (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vshrs_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vshrq_n_u8 (uint8x16_t __a, const int __b)
 {
-  return (uint8x16_t)__builtin_neon_vshr_nv16qi ((int8x16_t) __a, __b, 0);
+  return (uint8x16_t)__builtin_neon_vshru_nv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vshrq_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint16x8_t)__builtin_neon_vshr_nv8hi ((int16x8_t) __a, __b, 0);
+  return (uint16x8_t)__builtin_neon_vshru_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vshrq_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vshr_nv4si ((int32x4_t) __a, __b, 0);
+  return (uint32x4_t)__builtin_neon_vshru_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vshrq_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint64x2_t)__builtin_neon_vshr_nv2di ((int64x2_t) __a, __b, 0);
+  return (uint64x2_t)__builtin_neon_vshru_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vrshr_n_s8 (int8x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vshr_nv8qi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vrshrs_nv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vrshr_n_s16 (int16x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vshr_nv4hi (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vrshrs_nv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vrshr_n_s32 (int32x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vshr_nv2si (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vrshrs_nv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vrshr_n_s64 (int64x1_t __a, const int __b)
 {
-  return (int64x1_t)__builtin_neon_vshr_ndi (__a, __b, 5);
+  return (int64x1_t)__builtin_neon_vrshrs_ndi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vrshr_n_u8 (uint8x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vshr_nv8qi ((int8x8_t) __a, __b, 4);
+  return (uint8x8_t)__builtin_neon_vrshru_nv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vrshr_n_u16 (uint16x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vshr_nv4hi ((int16x4_t) __a, __b, 4);
+  return (uint16x4_t)__builtin_neon_vrshru_nv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrshr_n_u32 (uint32x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vshr_nv2si ((int32x2_t) __a, __b, 4);
+  return (uint32x2_t)__builtin_neon_vrshru_nv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vrshr_n_u64 (uint64x1_t __a, const int __b)
 {
-  return (uint64x1_t)__builtin_neon_vshr_ndi ((int64x1_t) __a, __b, 4);
+  return (uint64x1_t)__builtin_neon_vrshru_ndi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vrshrq_n_s8 (int8x16_t __a, const int __b)
 {
-  return (int8x16_t)__builtin_neon_vshr_nv16qi (__a, __b, 5);
+  return (int8x16_t)__builtin_neon_vrshrs_nv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vrshrq_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int16x8_t)__builtin_neon_vshr_nv8hi (__a, __b, 5);
+  return (int16x8_t)__builtin_neon_vrshrs_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vrshrq_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int32x4_t)__builtin_neon_vshr_nv4si (__a, __b, 5);
+  return (int32x4_t)__builtin_neon_vrshrs_nv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vrshrq_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int64x2_t)__builtin_neon_vshr_nv2di (__a, __b, 5);
+  return (int64x2_t)__builtin_neon_vrshrs_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vrshrq_n_u8 (uint8x16_t __a, const int __b)
 {
-  return (uint8x16_t)__builtin_neon_vshr_nv16qi ((int8x16_t) __a, __b, 4);
+  return (uint8x16_t)__builtin_neon_vrshru_nv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vrshrq_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint16x8_t)__builtin_neon_vshr_nv8hi ((int16x8_t) __a, __b, 4);
+  return (uint16x8_t)__builtin_neon_vrshru_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrshrq_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vshr_nv4si ((int32x4_t) __a, __b, 4);
+  return (uint32x4_t)__builtin_neon_vrshru_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vrshrq_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint64x2_t)__builtin_neon_vshr_nv2di ((int64x2_t) __a, __b, 4);
+  return (uint64x2_t)__builtin_neon_vrshru_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vshrn_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vshrn_nv8hi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vshrn_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vshrn_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vshrn_nv4si (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vshrn_nv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vshrn_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vshrn_nv2di (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vshrn_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vshrn_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vshrn_nv8hi ((int16x8_t) __a, __b, 0);
+  return (uint8x8_t)__builtin_neon_vshrn_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vshrn_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vshrn_nv4si ((int32x4_t) __a, __b, 0);
+  return (uint16x4_t)__builtin_neon_vshrn_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vshrn_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vshrn_nv2di ((int64x2_t) __a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vshrn_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vrshrn_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vshrn_nv8hi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vrshrn_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vrshrn_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vshrn_nv4si (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vrshrn_nv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vrshrn_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vshrn_nv2di (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vrshrn_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vrshrn_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vshrn_nv8hi ((int16x8_t) __a, __b, 4);
+  return (uint8x8_t)__builtin_neon_vrshrn_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vrshrn_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vshrn_nv4si ((int32x4_t) __a, __b, 4);
+  return (uint16x4_t)__builtin_neon_vrshrn_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrshrn_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vshrn_nv2di ((int64x2_t) __a, __b, 4);
+  return (uint32x2_t)__builtin_neon_vrshrn_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqshrn_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vqshrn_nv8hi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vqshrns_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqshrn_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vqshrn_nv4si (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vqshrns_nv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqshrn_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vqshrn_nv2di (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vqshrns_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqshrn_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshrn_nv8hi ((int16x8_t) __a, __b, 0);
+  return (uint8x8_t)__builtin_neon_vqshrnu_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqshrn_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshrn_nv4si ((int32x4_t) __a, __b, 0);
+  return (uint16x4_t)__builtin_neon_vqshrnu_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqshrn_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshrn_nv2di ((int64x2_t) __a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vqshrnu_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqrshrn_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vqshrn_nv8hi (__a, __b, 5);
+  return (int8x8_t)__builtin_neon_vqrshrns_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqrshrn_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vqshrn_nv4si (__a, __b, 5);
+  return (int16x4_t)__builtin_neon_vqrshrns_nv4si (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqrshrn_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vqshrn_nv2di (__a, __b, 5);
+  return (int32x2_t)__builtin_neon_vqrshrns_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqrshrn_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshrn_nv8hi ((int16x8_t) __a, __b, 4);
+  return (uint8x8_t)__builtin_neon_vqrshrnu_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqrshrn_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshrn_nv4si ((int32x4_t) __a, __b, 4);
+  return (uint16x4_t)__builtin_neon_vqrshrnu_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqrshrn_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshrn_nv2di ((int64x2_t) __a, __b, 4);
+  return (uint32x2_t)__builtin_neon_vqrshrnu_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqshrun_n_s16 (int16x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshrun_nv8hi (__a, __b, 1);
+  return (uint8x8_t)__builtin_neon_vqshrun_nv8hi (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqshrun_n_s32 (int32x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshrun_nv4si (__a, __b, 1);
+  return (uint16x4_t)__builtin_neon_vqshrun_nv4si (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqshrun_n_s64 (int64x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshrun_nv2di (__a, __b, 1);
+  return (uint32x2_t)__builtin_neon_vqshrun_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqrshrun_n_s16 (int16x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshrun_nv8hi (__a, __b, 5);
+  return (uint8x8_t)__builtin_neon_vqrshrun_nv8hi (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqrshrun_n_s32 (int32x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshrun_nv4si (__a, __b, 5);
+  return (uint16x4_t)__builtin_neon_vqrshrun_nv4si (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqrshrun_n_s64 (int64x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshrun_nv2di (__a, __b, 5);
+  return (uint32x2_t)__builtin_neon_vqrshrun_nv2di (__a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vshl_n_s8 (int8x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vshl_nv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vshl_nv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vshl_n_s16 (int16x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vshl_nv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vshl_nv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vshl_n_s32 (int32x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vshl_nv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vshl_nv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vshl_n_s64 (int64x1_t __a, const int __b)
 {
-  return (int64x1_t)__builtin_neon_vshl_ndi (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vshl_ndi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vshl_n_u8 (uint8x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vshl_nv8qi ((int8x8_t) __a, __b, 0);
+  return (uint8x8_t)__builtin_neon_vshl_nv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vshl_n_u16 (uint16x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vshl_nv4hi ((int16x4_t) __a, __b, 0);
+  return (uint16x4_t)__builtin_neon_vshl_nv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vshl_n_u32 (uint32x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vshl_nv2si ((int32x2_t) __a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vshl_nv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vshl_n_u64 (uint64x1_t __a, const int __b)
 {
-  return (uint64x1_t)__builtin_neon_vshl_ndi ((int64x1_t) __a, __b, 0);
+  return (uint64x1_t)__builtin_neon_vshl_ndi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vshlq_n_s8 (int8x16_t __a, const int __b)
 {
-  return (int8x16_t)__builtin_neon_vshl_nv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vshl_nv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vshlq_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int16x8_t)__builtin_neon_vshl_nv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vshl_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vshlq_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int32x4_t)__builtin_neon_vshl_nv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vshl_nv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vshlq_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int64x2_t)__builtin_neon_vshl_nv2di (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vshl_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vshlq_n_u8 (uint8x16_t __a, const int __b)
 {
-  return (uint8x16_t)__builtin_neon_vshl_nv16qi ((int8x16_t) __a, __b, 0);
+  return (uint8x16_t)__builtin_neon_vshl_nv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vshlq_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint16x8_t)__builtin_neon_vshl_nv8hi ((int16x8_t) __a, __b, 0);
+  return (uint16x8_t)__builtin_neon_vshl_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vshlq_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vshl_nv4si ((int32x4_t) __a, __b, 0);
+  return (uint32x4_t)__builtin_neon_vshl_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vshlq_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint64x2_t)__builtin_neon_vshl_nv2di ((int64x2_t) __a, __b, 0);
+  return (uint64x2_t)__builtin_neon_vshl_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqshl_n_s8 (int8x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vqshl_nv8qi (__a, __b, 1);
+  return (int8x8_t)__builtin_neon_vqshl_s_nv8qi (__a, __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqshl_n_s16 (int16x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vqshl_nv4hi (__a, __b, 1);
+  return (int16x4_t)__builtin_neon_vqshl_s_nv4hi (__a, __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqshl_n_s32 (int32x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vqshl_nv2si (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vqshl_s_nv2si (__a, __b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vqshl_n_s64 (int64x1_t __a, const int __b)
 {
-  return (int64x1_t)__builtin_neon_vqshl_ndi (__a, __b, 1);
+  return (int64x1_t)__builtin_neon_vqshl_s_ndi (__a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqshl_n_u8 (uint8x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshl_nv8qi ((int8x8_t) __a, __b, 0);
+  return (uint8x8_t)__builtin_neon_vqshl_u_nv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqshl_n_u16 (uint16x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshl_nv4hi ((int16x4_t) __a, __b, 0);
+  return (uint16x4_t)__builtin_neon_vqshl_u_nv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqshl_n_u32 (uint32x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshl_nv2si ((int32x2_t) __a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vqshl_u_nv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vqshl_n_u64 (uint64x1_t __a, const int __b)
 {
-  return (uint64x1_t)__builtin_neon_vqshl_ndi ((int64x1_t) __a, __b, 0);
+  return (uint64x1_t)__builtin_neon_vqshl_u_ndi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vqshlq_n_s8 (int8x16_t __a, const int __b)
 {
-  return (int8x16_t)__builtin_neon_vqshl_nv16qi (__a, __b, 1);
+  return (int8x16_t)__builtin_neon_vqshl_s_nv16qi (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqshlq_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int16x8_t)__builtin_neon_vqshl_nv8hi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vqshl_s_nv8hi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqshlq_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int32x4_t)__builtin_neon_vqshl_nv4si (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vqshl_s_nv4si (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqshlq_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int64x2_t)__builtin_neon_vqshl_nv2di (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vqshl_s_nv2di (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vqshlq_n_u8 (uint8x16_t __a, const int __b)
 {
-  return (uint8x16_t)__builtin_neon_vqshl_nv16qi ((int8x16_t) __a, __b, 0);
+  return (uint8x16_t)__builtin_neon_vqshl_u_nv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vqshlq_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint16x8_t)__builtin_neon_vqshl_nv8hi ((int16x8_t) __a, __b, 0);
+  return (uint16x8_t)__builtin_neon_vqshl_u_nv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vqshlq_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vqshl_nv4si ((int32x4_t) __a, __b, 0);
+  return (uint32x4_t)__builtin_neon_vqshl_u_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vqshlq_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint64x2_t)__builtin_neon_vqshl_nv2di ((int64x2_t) __a, __b, 0);
+  return (uint64x2_t)__builtin_neon_vqshl_u_nv2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqshlu_n_s8 (int8x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vqshlu_nv8qi (__a, __b, 1);
+  return (uint8x8_t)__builtin_neon_vqshlu_nv8qi (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqshlu_n_s16 (int16x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vqshlu_nv4hi (__a, __b, 1);
+  return (uint16x4_t)__builtin_neon_vqshlu_nv4hi (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqshlu_n_s32 (int32x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vqshlu_nv2si (__a, __b, 1);
+  return (uint32x2_t)__builtin_neon_vqshlu_nv2si (__a, __b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vqshlu_n_s64 (int64x1_t __a, const int __b)
 {
-  return (uint64x1_t)__builtin_neon_vqshlu_ndi (__a, __b, 1);
+  return (uint64x1_t)__builtin_neon_vqshlu_ndi (__a, __b);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vqshluq_n_s8 (int8x16_t __a, const int __b)
 {
-  return (uint8x16_t)__builtin_neon_vqshlu_nv16qi (__a, __b, 1);
+  return (uint8x16_t)__builtin_neon_vqshlu_nv16qi (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vqshluq_n_s16 (int16x8_t __a, const int __b)
 {
-  return (uint16x8_t)__builtin_neon_vqshlu_nv8hi (__a, __b, 1);
+  return (uint16x8_t)__builtin_neon_vqshlu_nv8hi (__a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vqshluq_n_s32 (int32x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vqshlu_nv4si (__a, __b, 1);
+  return (uint32x4_t)__builtin_neon_vqshlu_nv4si (__a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vqshluq_n_s64 (int64x2_t __a, const int __b)
 {
-  return (uint64x2_t)__builtin_neon_vqshlu_nv2di (__a, __b, 1);
+  return (uint64x2_t)__builtin_neon_vqshlu_nv2di (__a, __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vshll_n_s8 (int8x8_t __a, const int __b)
 {
-  return (int16x8_t)__builtin_neon_vshll_nv8qi (__a, __b, 1);
+  return (int16x8_t)__builtin_neon_vshlls_nv8qi (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vshll_n_s16 (int16x4_t __a, const int __b)
 {
-  return (int32x4_t)__builtin_neon_vshll_nv4hi (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vshlls_nv4hi (__a, __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vshll_n_s32 (int32x2_t __a, const int __b)
 {
-  return (int64x2_t)__builtin_neon_vshll_nv2si (__a, __b, 1);
+  return (int64x2_t)__builtin_neon_vshlls_nv2si (__a, __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vshll_n_u8 (uint8x8_t __a, const int __b)
 {
-  return (uint16x8_t)__builtin_neon_vshll_nv8qi ((int8x8_t) __a, __b, 0);
+  return (uint16x8_t)__builtin_neon_vshllu_nv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vshll_n_u16 (uint16x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vshll_nv4hi ((int16x4_t) __a, __b, 0);
+  return (uint32x4_t)__builtin_neon_vshllu_nv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vshll_n_u32 (uint32x2_t __a, const int __b)
 {
-  return (uint64x2_t)__builtin_neon_vshll_nv2si ((int32x2_t) __a, __b, 0);
+  return (uint64x2_t)__builtin_neon_vshllu_nv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
 {
-  return (int8x8_t)__builtin_neon_vsra_nv8qi (__a, __b, __c, 1);
+  return (int8x8_t)__builtin_neon_vsras_nv8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x4_t)__builtin_neon_vsra_nv4hi (__a, __b, __c, 1);
+  return (int16x4_t)__builtin_neon_vsras_nv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x2_t)__builtin_neon_vsra_nv2si (__a, __b, __c, 1);
+  return (int32x2_t)__builtin_neon_vsras_nv2si (__a, __b, __c);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
 {
-  return (int64x1_t)__builtin_neon_vsra_ndi (__a, __b, __c, 1);
+  return (int64x1_t)__builtin_neon_vsras_ndi (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
 {
-  return (uint8x8_t)__builtin_neon_vsra_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c, 0);
+  return (uint8x8_t)__builtin_neon_vsrau_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
 {
-  return (uint16x4_t)__builtin_neon_vsra_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 0);
+  return (uint16x4_t)__builtin_neon_vsrau_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
 {
-  return (uint32x2_t)__builtin_neon_vsra_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c, 0);
+  return (uint32x2_t)__builtin_neon_vsrau_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
 {
-  return (uint64x1_t)__builtin_neon_vsra_ndi ((int64x1_t) __a, (int64x1_t) __b, __c, 0);
+  return (uint64x1_t)__builtin_neon_vsrau_ndi ((int64x1_t) __a, (int64x1_t) __b, __c);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
 {
-  return (int8x16_t)__builtin_neon_vsra_nv16qi (__a, __b, __c, 1);
+  return (int8x16_t)__builtin_neon_vsras_nv16qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
 {
-  return (int16x8_t)__builtin_neon_vsra_nv8hi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vsras_nv8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
 {
-  return (int32x4_t)__builtin_neon_vsra_nv4si (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vsras_nv4si (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
 {
-  return (int64x2_t)__builtin_neon_vsra_nv2di (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vsras_nv2di (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
 {
-  return (uint8x16_t)__builtin_neon_vsra_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c, 0);
+  return (uint8x16_t)__builtin_neon_vsrau_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
 {
-  return (uint16x8_t)__builtin_neon_vsra_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c, 0);
+  return (uint16x8_t)__builtin_neon_vsrau_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
 {
-  return (uint32x4_t)__builtin_neon_vsra_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c, 0);
+  return (uint32x4_t)__builtin_neon_vsrau_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
 {
-  return (uint64x2_t)__builtin_neon_vsra_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c, 0);
+  return (uint64x2_t)__builtin_neon_vsrau_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vrsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
 {
-  return (int8x8_t)__builtin_neon_vsra_nv8qi (__a, __b, __c, 5);
+  return (int8x8_t)__builtin_neon_vrsras_nv8qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vrsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x4_t)__builtin_neon_vsra_nv4hi (__a, __b, __c, 5);
+  return (int16x4_t)__builtin_neon_vrsras_nv4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vrsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x2_t)__builtin_neon_vsra_nv2si (__a, __b, __c, 5);
+  return (int32x2_t)__builtin_neon_vrsras_nv2si (__a, __b, __c);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vrsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
 {
-  return (int64x1_t)__builtin_neon_vsra_ndi (__a, __b, __c, 5);
+  return (int64x1_t)__builtin_neon_vrsras_ndi (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vrsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
 {
-  return (uint8x8_t)__builtin_neon_vsra_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c, 4);
+  return (uint8x8_t)__builtin_neon_vrsrau_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vrsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
 {
-  return (uint16x4_t)__builtin_neon_vsra_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 4);
+  return (uint16x4_t)__builtin_neon_vrsrau_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
 {
-  return (uint32x2_t)__builtin_neon_vsra_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c, 4);
+  return (uint32x2_t)__builtin_neon_vrsrau_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vrsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
 {
-  return (uint64x1_t)__builtin_neon_vsra_ndi ((int64x1_t) __a, (int64x1_t) __b, __c, 4);
+  return (uint64x1_t)__builtin_neon_vrsrau_ndi ((int64x1_t) __a, (int64x1_t) __b, __c);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vrsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
 {
-  return (int8x16_t)__builtin_neon_vsra_nv16qi (__a, __b, __c, 5);
+  return (int8x16_t)__builtin_neon_vrsras_nv16qi (__a, __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vrsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
 {
-  return (int16x8_t)__builtin_neon_vsra_nv8hi (__a, __b, __c, 5);
+  return (int16x8_t)__builtin_neon_vrsras_nv8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vrsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
 {
-  return (int32x4_t)__builtin_neon_vsra_nv4si (__a, __b, __c, 5);
+  return (int32x4_t)__builtin_neon_vrsras_nv4si (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vrsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
 {
-  return (int64x2_t)__builtin_neon_vsra_nv2di (__a, __b, __c, 5);
+  return (int64x2_t)__builtin_neon_vrsras_nv2di (__a, __b, __c);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vrsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
 {
-  return (uint8x16_t)__builtin_neon_vsra_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c, 4);
+  return (uint8x16_t)__builtin_neon_vrsrau_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vrsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
 {
-  return (uint16x8_t)__builtin_neon_vsra_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c, 4);
+  return (uint16x8_t)__builtin_neon_vrsrau_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
 {
-  return (uint32x4_t)__builtin_neon_vsra_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c, 4);
+  return (uint32x4_t)__builtin_neon_vrsrau_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vrsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
 {
-  return (uint64x2_t)__builtin_neon_vsra_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c, 4);
+  return (uint64x2_t)__builtin_neon_vrsrau_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
 }
 
 #ifdef __ARM_FEATURE_CRYPTO
@@ -4718,577 +4718,577 @@ vsliq_n_p16 (poly16x8_t __a, poly16x8_t __b, const int __c)
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vabs_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vabsv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vabsv8qi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vabs_s16 (int16x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vabsv4hi (__a, 1);
+  return (int16x4_t)__builtin_neon_vabsv4hi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vabs_s32 (int32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vabsv2si (__a, 1);
+  return (int32x2_t)__builtin_neon_vabsv2si (__a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vabs_f32 (float32x2_t __a)
 {
-  return (float32x2_t)__builtin_neon_vabsv2sf (__a, 3);
+  return (float32x2_t)__builtin_neon_vabsv2sf (__a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vabsq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vabsv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vabsv16qi (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vabsq_s16 (int16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vabsv8hi (__a, 1);
+  return (int16x8_t)__builtin_neon_vabsv8hi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vabsq_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vabsv4si (__a, 1);
+  return (int32x4_t)__builtin_neon_vabsv4si (__a);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vabsq_f32 (float32x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vabsv4sf (__a, 3);
+  return (float32x4_t)__builtin_neon_vabsv4sf (__a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqabs_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vqabsv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vqabsv8qi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqabs_s16 (int16x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vqabsv4hi (__a, 1);
+  return (int16x4_t)__builtin_neon_vqabsv4hi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqabs_s32 (int32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vqabsv2si (__a, 1);
+  return (int32x2_t)__builtin_neon_vqabsv2si (__a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vqabsq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vqabsv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vqabsv16qi (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqabsq_s16 (int16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vqabsv8hi (__a, 1);
+  return (int16x8_t)__builtin_neon_vqabsv8hi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqabsq_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vqabsv4si (__a, 1);
+  return (int32x4_t)__builtin_neon_vqabsv4si (__a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vneg_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vnegv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vnegv8qi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vneg_s16 (int16x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vnegv4hi (__a, 1);
+  return (int16x4_t)__builtin_neon_vnegv4hi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vneg_s32 (int32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vnegv2si (__a, 1);
+  return (int32x2_t)__builtin_neon_vnegv2si (__a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vneg_f32 (float32x2_t __a)
 {
-  return (float32x2_t)__builtin_neon_vnegv2sf (__a, 3);
+  return (float32x2_t)__builtin_neon_vnegv2sf (__a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vnegq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vnegv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vnegv16qi (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vnegq_s16 (int16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vnegv8hi (__a, 1);
+  return (int16x8_t)__builtin_neon_vnegv8hi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vnegq_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vnegv4si (__a, 1);
+  return (int32x4_t)__builtin_neon_vnegv4si (__a);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vnegq_f32 (float32x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vnegv4sf (__a, 3);
+  return (float32x4_t)__builtin_neon_vnegv4sf (__a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqneg_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vqnegv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vqnegv8qi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqneg_s16 (int16x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vqnegv4hi (__a, 1);
+  return (int16x4_t)__builtin_neon_vqnegv4hi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqneg_s32 (int32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vqnegv2si (__a, 1);
+  return (int32x2_t)__builtin_neon_vqnegv2si (__a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vqnegq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vqnegv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vqnegv16qi (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqnegq_s16 (int16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vqnegv8hi (__a, 1);
+  return (int16x8_t)__builtin_neon_vqnegv8hi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqnegq_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vqnegv4si (__a, 1);
+  return (int32x4_t)__builtin_neon_vqnegv4si (__a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vmvn_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vmvnv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vmvnv8qi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmvn_s16 (int16x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vmvnv4hi (__a, 1);
+  return (int16x4_t)__builtin_neon_vmvnv4hi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmvn_s32 (int32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vmvnv2si (__a, 1);
+  return (int32x2_t)__builtin_neon_vmvnv2si (__a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vmvn_u8 (uint8x8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vmvnv8qi ((int8x8_t) __a, 0);
+  return (uint8x8_t)__builtin_neon_vmvnv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmvn_u16 (uint16x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vmvnv4hi ((int16x4_t) __a, 0);
+  return (uint16x4_t)__builtin_neon_vmvnv4hi ((int16x4_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmvn_u32 (uint32x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vmvnv2si ((int32x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vmvnv2si ((int32x2_t) __a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vmvn_p8 (poly8x8_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vmvnv8qi ((int8x8_t) __a, 2);
+  return (poly8x8_t)__builtin_neon_vmvnv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vmvnq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vmvnv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vmvnv16qi (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmvnq_s16 (int16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vmvnv8hi (__a, 1);
+  return (int16x8_t)__builtin_neon_vmvnv8hi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmvnq_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vmvnv4si (__a, 1);
+  return (int32x4_t)__builtin_neon_vmvnv4si (__a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vmvnq_u8 (uint8x16_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vmvnv16qi ((int8x16_t) __a, 0);
+  return (uint8x16_t)__builtin_neon_vmvnv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmvnq_u16 (uint16x8_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vmvnv8hi ((int16x8_t) __a, 0);
+  return (uint16x8_t)__builtin_neon_vmvnv8hi ((int16x8_t) __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmvnq_u32 (uint32x4_t __a)
 {
-  return (uint32x4_t)__builtin_neon_vmvnv4si ((int32x4_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vmvnv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vmvnq_p8 (poly8x16_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vmvnv16qi ((int8x16_t) __a, 2);
+  return (poly8x16_t)__builtin_neon_vmvnv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vcls_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vclsv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vclsv8qi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vcls_s16 (int16x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vclsv4hi (__a, 1);
+  return (int16x4_t)__builtin_neon_vclsv4hi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vcls_s32 (int32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vclsv2si (__a, 1);
+  return (int32x2_t)__builtin_neon_vclsv2si (__a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vclsq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vclsv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vclsv16qi (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vclsq_s16 (int16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vclsv8hi (__a, 1);
+  return (int16x8_t)__builtin_neon_vclsv8hi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vclsq_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vclsv4si (__a, 1);
+  return (int32x4_t)__builtin_neon_vclsv4si (__a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vclz_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vclzv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vclzv8qi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vclz_s16 (int16x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vclzv4hi (__a, 1);
+  return (int16x4_t)__builtin_neon_vclzv4hi (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vclz_s32 (int32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vclzv2si (__a, 1);
+  return (int32x2_t)__builtin_neon_vclzv2si (__a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vclz_u8 (uint8x8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vclzv8qi ((int8x8_t) __a, 0);
+  return (uint8x8_t)__builtin_neon_vclzv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vclz_u16 (uint16x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vclzv4hi ((int16x4_t) __a, 0);
+  return (uint16x4_t)__builtin_neon_vclzv4hi ((int16x4_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vclz_u32 (uint32x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vclzv2si ((int32x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vclzv2si ((int32x2_t) __a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vclzq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vclzv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vclzv16qi (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vclzq_s16 (int16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vclzv8hi (__a, 1);
+  return (int16x8_t)__builtin_neon_vclzv8hi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vclzq_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vclzv4si (__a, 1);
+  return (int32x4_t)__builtin_neon_vclzv4si (__a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vclzq_u8 (uint8x16_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vclzv16qi ((int8x16_t) __a, 0);
+  return (uint8x16_t)__builtin_neon_vclzv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vclzq_u16 (uint16x8_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vclzv8hi ((int16x8_t) __a, 0);
+  return (uint16x8_t)__builtin_neon_vclzv8hi ((int16x8_t) __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vclzq_u32 (uint32x4_t __a)
 {
-  return (uint32x4_t)__builtin_neon_vclzv4si ((int32x4_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vclzv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vcnt_s8 (int8x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vcntv8qi (__a, 1);
+  return (int8x8_t)__builtin_neon_vcntv8qi (__a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcnt_u8 (uint8x8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vcntv8qi ((int8x8_t) __a, 0);
+  return (uint8x8_t)__builtin_neon_vcntv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vcnt_p8 (poly8x8_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vcntv8qi ((int8x8_t) __a, 2);
+  return (poly8x8_t)__builtin_neon_vcntv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vcntq_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vcntv16qi (__a, 1);
+  return (int8x16_t)__builtin_neon_vcntv16qi (__a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcntq_u8 (uint8x16_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vcntv16qi ((int8x16_t) __a, 0);
+  return (uint8x16_t)__builtin_neon_vcntv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vcntq_p8 (poly8x16_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vcntv16qi ((int8x16_t) __a, 2);
+  return (poly8x16_t)__builtin_neon_vcntv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vrecpe_f32 (float32x2_t __a)
 {
-  return (float32x2_t)__builtin_neon_vrecpev2sf (__a, 3);
+  return (float32x2_t)__builtin_neon_vrecpev2sf (__a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrecpe_u32 (uint32x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vrecpev2si ((int32x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vrecpev2si ((int32x2_t) __a);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrecpeq_f32 (float32x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vrecpev4sf (__a, 3);
+  return (float32x4_t)__builtin_neon_vrecpev4sf (__a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrecpeq_u32 (uint32x4_t __a)
 {
-  return (uint32x4_t)__builtin_neon_vrecpev4si ((int32x4_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vrecpev4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vrsqrte_f32 (float32x2_t __a)
 {
-  return (float32x2_t)__builtin_neon_vrsqrtev2sf (__a, 3);
+  return (float32x2_t)__builtin_neon_vrsqrtev2sf (__a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrsqrte_u32 (uint32x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vrsqrtev2si ((int32x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vrsqrtev2si ((int32x2_t) __a);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrsqrteq_f32 (float32x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vrsqrtev4sf (__a, 3);
+  return (float32x4_t)__builtin_neon_vrsqrtev4sf (__a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrsqrteq_u32 (uint32x4_t __a)
 {
-  return (uint32x4_t)__builtin_neon_vrsqrtev4si ((int32x4_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vrsqrtev4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline int8_t __attribute__ ((__always_inline__))
 vget_lane_s8 (int8x8_t __a, const int __b)
 {
-  return (int8_t)__builtin_neon_vget_lanev8qi (__a, __b, 1);
+  return (int8_t)__builtin_neon_vget_lanev8qi (__a, __b);
 }
 
 __extension__ static __inline int16_t __attribute__ ((__always_inline__))
 vget_lane_s16 (int16x4_t __a, const int __b)
 {
-  return (int16_t)__builtin_neon_vget_lanev4hi (__a, __b, 1);
+  return (int16_t)__builtin_neon_vget_lanev4hi (__a, __b);
 }
 
 __extension__ static __inline int32_t __attribute__ ((__always_inline__))
 vget_lane_s32 (int32x2_t __a, const int __b)
 {
-  return (int32_t)__builtin_neon_vget_lanev2si (__a, __b, 1);
+  return (int32_t)__builtin_neon_vget_lanev2si (__a, __b);
 }
 
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
-  return (float32_t)__builtin_neon_vget_lanev2sf (__a, __b, 3);
+  return (float32_t)__builtin_neon_vget_lanev2sf (__a, __b);
 }
 
 __extension__ static __inline uint8_t __attribute__ ((__always_inline__))
 vget_lane_u8 (uint8x8_t __a, const int __b)
 {
-  return (uint8_t)__builtin_neon_vget_lanev8qi ((int8x8_t) __a, __b, 0);
+  return (uint8_t)__builtin_neon_vget_laneuv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline uint16_t __attribute__ ((__always_inline__))
 vget_lane_u16 (uint16x4_t __a, const int __b)
 {
-  return (uint16_t)__builtin_neon_vget_lanev4hi ((int16x4_t) __a, __b, 0);
+  return (uint16_t)__builtin_neon_vget_laneuv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32_t __attribute__ ((__always_inline__))
 vget_lane_u32 (uint32x2_t __a, const int __b)
 {
-  return (uint32_t)__builtin_neon_vget_lanev2si ((int32x2_t) __a, __b, 0);
+  return (uint32_t)__builtin_neon_vget_laneuv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline poly8_t __attribute__ ((__always_inline__))
 vget_lane_p8 (poly8x8_t __a, const int __b)
 {
-  return (poly8_t)__builtin_neon_vget_lanev8qi ((int8x8_t) __a, __b, 2);
+  return (poly8_t)__builtin_neon_vget_laneuv8qi ((int8x8_t) __a, __b);
 }
 
 __extension__ static __inline poly16_t __attribute__ ((__always_inline__))
 vget_lane_p16 (poly16x4_t __a, const int __b)
 {
-  return (poly16_t)__builtin_neon_vget_lanev4hi ((int16x4_t) __a, __b, 2);
+  return (poly16_t)__builtin_neon_vget_laneuv4hi ((int16x4_t) __a, __b);
 }
 
 __extension__ static __inline int64_t __attribute__ ((__always_inline__))
 vget_lane_s64 (int64x1_t __a, const int __b)
 {
-  return (int64_t)__builtin_neon_vget_lanedi (__a, __b, 1);
+  return (int64_t)__builtin_neon_vget_lanedi (__a, __b);
 }
 
 __extension__ static __inline uint64_t __attribute__ ((__always_inline__))
 vget_lane_u64 (uint64x1_t __a, const int __b)
 {
-  return (uint64_t)__builtin_neon_vget_lanedi ((int64x1_t) __a, __b, 0);
+  return (uint64_t)__builtin_neon_vget_lanedi ((int64x1_t) __a, __b);
 }
 
 __extension__ static __inline int8_t __attribute__ ((__always_inline__))
 vgetq_lane_s8 (int8x16_t __a, const int __b)
 {
-  return (int8_t)__builtin_neon_vget_lanev16qi (__a, __b, 1);
+  return (int8_t)__builtin_neon_vget_lanev16qi (__a, __b);
 }
 
 __extension__ static __inline int16_t __attribute__ ((__always_inline__))
 vgetq_lane_s16 (int16x8_t __a, const int __b)
 {
-  return (int16_t)__builtin_neon_vget_lanev8hi (__a, __b, 1);
+  return (int16_t)__builtin_neon_vget_lanev8hi (__a, __b);
 }
 
 __extension__ static __inline int32_t __attribute__ ((__always_inline__))
 vgetq_lane_s32 (int32x4_t __a, const int __b)
 {
-  return (int32_t)__builtin_neon_vget_lanev4si (__a, __b, 1);
+  return (int32_t)__builtin_neon_vget_lanev4si (__a, __b);
 }
 
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vgetq_lane_f32 (float32x4_t __a, const int __b)
 {
-  return (float32_t)__builtin_neon_vget_lanev4sf (__a, __b, 3);
+  return (float32_t)__builtin_neon_vget_lanev4sf (__a, __b);
 }
 
 __extension__ static __inline uint8_t __attribute__ ((__always_inline__))
 vgetq_lane_u8 (uint8x16_t __a, const int __b)
 {
-  return (uint8_t)__builtin_neon_vget_lanev16qi ((int8x16_t) __a, __b, 0);
+  return (uint8_t)__builtin_neon_vget_laneuv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline uint16_t __attribute__ ((__always_inline__))
 vgetq_lane_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint16_t)__builtin_neon_vget_lanev8hi ((int16x8_t) __a, __b, 0);
+  return (uint16_t)__builtin_neon_vget_laneuv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline uint32_t __attribute__ ((__always_inline__))
 vgetq_lane_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint32_t)__builtin_neon_vget_lanev4si ((int32x4_t) __a, __b, 0);
+  return (uint32_t)__builtin_neon_vget_laneuv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline poly8_t __attribute__ ((__always_inline__))
 vgetq_lane_p8 (poly8x16_t __a, const int __b)
 {
-  return (poly8_t)__builtin_neon_vget_lanev16qi ((int8x16_t) __a, __b, 2);
+  return (poly8_t)__builtin_neon_vget_laneuv16qi ((int8x16_t) __a, __b);
 }
 
 __extension__ static __inline poly16_t __attribute__ ((__always_inline__))
 vgetq_lane_p16 (poly16x8_t __a, const int __b)
 {
-  return (poly16_t)__builtin_neon_vget_lanev8hi ((int16x8_t) __a, __b, 2);
+  return (poly16_t)__builtin_neon_vget_laneuv8hi ((int16x8_t) __a, __b);
 }
 
 __extension__ static __inline int64_t __attribute__ ((__always_inline__))
 vgetq_lane_s64 (int64x2_t __a, const int __b)
 {
-  return (int64_t)__builtin_neon_vget_lanev2di (__a, __b, 1);
+  return (int64_t)__builtin_neon_vget_lanev2di (__a, __b);
 }
 
 __extension__ static __inline uint64_t __attribute__ ((__always_inline__))
 vgetq_lane_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint64_t)__builtin_neon_vget_lanev2di ((int64x2_t) __a, __b, 0);
+  return (uint64_t)__builtin_neon_vget_lanev2di ((int64x2_t) __a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
@@ -6150,49 +6150,49 @@ vget_low_u64 (uint64x2_t __a)
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vcvt_s32_f32 (float32x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vcvtv2sf (__a, 1);
+  return (int32x2_t)__builtin_neon_vcvtsv2sf (__a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcvt_f32_s32 (int32x2_t __a)
 {
-  return (float32x2_t)__builtin_neon_vcvtv2si (__a, 1);
+  return (float32x2_t)__builtin_neon_vcvtsv2si (__a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcvt_f32_u32 (uint32x2_t __a)
 {
-  return (float32x2_t)__builtin_neon_vcvtv2si ((int32x2_t) __a, 0);
+  return (float32x2_t)__builtin_neon_vcvtuv2si ((int32x2_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcvt_u32_f32 (float32x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vcvtv2sf (__a, 0);
+  return (uint32x2_t)__builtin_neon_vcvtuv2sf (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vcvtq_s32_f32 (float32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vcvtv4sf (__a, 1);
+  return (int32x4_t)__builtin_neon_vcvtsv4sf (__a);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcvtq_f32_s32 (int32x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vcvtv4si (__a, 1);
+  return (float32x4_t)__builtin_neon_vcvtsv4si (__a);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcvtq_f32_u32 (uint32x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vcvtv4si ((int32x4_t) __a, 0);
+  return (float32x4_t)__builtin_neon_vcvtuv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcvtq_u32_f32 (float32x4_t __a)
 {
-  return (uint32x4_t)__builtin_neon_vcvtv4sf (__a, 0);
+  return (uint32x4_t)__builtin_neon_vcvtuv4sf (__a);
 }
 
 #if ((__ARM_FP & 0x2) != 0)
@@ -6214,175 +6214,175 @@ vcvt_f32_f16 (float16x4_t __a)
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vcvt_n_s32_f32 (float32x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vcvt_nv2sf (__a, __b, 1);
+  return (int32x2_t)__builtin_neon_vcvts_nv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcvt_n_f32_s32 (int32x2_t __a, const int __b)
 {
-  return (float32x2_t)__builtin_neon_vcvt_nv2si (__a, __b, 1);
+  return (float32x2_t)__builtin_neon_vcvts_nv2si (__a, __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcvt_n_f32_u32 (uint32x2_t __a, const int __b)
 {
-  return (float32x2_t)__builtin_neon_vcvt_nv2si ((int32x2_t) __a, __b, 0);
+  return (float32x2_t)__builtin_neon_vcvtu_nv2si ((int32x2_t) __a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vcvt_n_u32_f32 (float32x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vcvt_nv2sf (__a, __b, 0);
+  return (uint32x2_t)__builtin_neon_vcvtu_nv2sf (__a, __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vcvtq_n_s32_f32 (float32x4_t __a, const int __b)
 {
-  return (int32x4_t)__builtin_neon_vcvt_nv4sf (__a, __b, 1);
+  return (int32x4_t)__builtin_neon_vcvts_nv4sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcvtq_n_f32_s32 (int32x4_t __a, const int __b)
 {
-  return (float32x4_t)__builtin_neon_vcvt_nv4si (__a, __b, 1);
+  return (float32x4_t)__builtin_neon_vcvts_nv4si (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcvtq_n_f32_u32 (uint32x4_t __a, const int __b)
 {
-  return (float32x4_t)__builtin_neon_vcvt_nv4si ((int32x4_t) __a, __b, 0);
+  return (float32x4_t)__builtin_neon_vcvtu_nv4si ((int32x4_t) __a, __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcvtq_n_u32_f32 (float32x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vcvt_nv4sf (__a, __b, 0);
+  return (uint32x4_t)__builtin_neon_vcvtu_nv4sf (__a, __b);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vmovn_s16 (int16x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vmovnv8hi (__a, 1);
+  return (int8x8_t)__builtin_neon_vmovnv8hi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmovn_s32 (int32x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vmovnv4si (__a, 1);
+  return (int16x4_t)__builtin_neon_vmovnv4si (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmovn_s64 (int64x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vmovnv2di (__a, 1);
+  return (int32x2_t)__builtin_neon_vmovnv2di (__a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vmovn_u16 (uint16x8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vmovnv8hi ((int16x8_t) __a, 0);
+  return (uint8x8_t)__builtin_neon_vmovnv8hi ((int16x8_t) __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmovn_u32 (uint32x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vmovnv4si ((int32x4_t) __a, 0);
+  return (uint16x4_t)__builtin_neon_vmovnv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmovn_u64 (uint64x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vmovnv2di ((int64x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vmovnv2di ((int64x2_t) __a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqmovn_s16 (int16x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vqmovnv8hi (__a, 1);
+  return (int8x8_t)__builtin_neon_vqmovnsv8hi (__a);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqmovn_s32 (int32x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vqmovnv4si (__a, 1);
+  return (int16x4_t)__builtin_neon_vqmovnsv4si (__a);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqmovn_s64 (int64x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vqmovnv2di (__a, 1);
+  return (int32x2_t)__builtin_neon_vqmovnsv2di (__a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqmovn_u16 (uint16x8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vqmovnv8hi ((int16x8_t) __a, 0);
+  return (uint8x8_t)__builtin_neon_vqmovnuv8hi ((int16x8_t) __a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqmovn_u32 (uint32x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vqmovnv4si ((int32x4_t) __a, 0);
+  return (uint16x4_t)__builtin_neon_vqmovnuv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqmovn_u64 (uint64x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vqmovnv2di ((int64x2_t) __a, 0);
+  return (uint32x2_t)__builtin_neon_vqmovnuv2di ((int64x2_t) __a);
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vqmovun_s16 (int16x8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vqmovunv8hi (__a, 1);
+  return (uint8x8_t)__builtin_neon_vqmovunv8hi (__a);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vqmovun_s32 (int32x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vqmovunv4si (__a, 1);
+  return (uint16x4_t)__builtin_neon_vqmovunv4si (__a);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vqmovun_s64 (int64x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vqmovunv2di (__a, 1);
+  return (uint32x2_t)__builtin_neon_vqmovunv2di (__a);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmovl_s8 (int8x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vmovlv8qi (__a, 1);
+  return (int16x8_t)__builtin_neon_vmovlsv8qi (__a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmovl_s16 (int16x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vmovlv4hi (__a, 1);
+  return (int32x4_t)__builtin_neon_vmovlsv4hi (__a);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmovl_s32 (int32x2_t __a)
 {
-  return (int64x2_t)__builtin_neon_vmovlv2si (__a, 1);
+  return (int64x2_t)__builtin_neon_vmovlsv2si (__a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmovl_u8 (uint8x8_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vmovlv8qi ((int8x8_t) __a, 0);
+  return (uint16x8_t)__builtin_neon_vmovluv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmovl_u16 (uint16x4_t __a)
 {
-  return (uint32x4_t)__builtin_neon_vmovlv4hi ((int16x4_t) __a, 0);
+  return (uint32x4_t)__builtin_neon_vmovluv4hi ((int16x4_t) __a);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmovl_u32 (uint32x2_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vmovlv2si ((int32x2_t) __a, 0);
+  return (uint64x2_t)__builtin_neon_vmovluv2si ((int32x2_t) __a);
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
@@ -6550,673 +6550,673 @@ vtbx4_p8 (poly8x8_t __a, poly8x8x4_t __b, uint8x8_t __c)
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmul_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x4_t)__builtin_neon_vmul_lanev4hi (__a, __b, __c, 1);
+  return (int16x4_t)__builtin_neon_vmul_lanev4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmul_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x2_t)__builtin_neon_vmul_lanev2si (__a, __b, __c, 1);
+  return (int32x2_t)__builtin_neon_vmul_lanev2si (__a, __b, __c);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmul_lane_f32 (float32x2_t __a, float32x2_t __b, const int __c)
 {
-  return (float32x2_t)__builtin_neon_vmul_lanev2sf (__a, __b, __c, 3);
+  return (float32x2_t)__builtin_neon_vmul_lanev2sf (__a, __b, __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmul_lane_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
 {
-  return (uint16x4_t)__builtin_neon_vmul_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 0);
+  return (uint16x4_t)__builtin_neon_vmul_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmul_lane_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
 {
-  return (uint32x2_t)__builtin_neon_vmul_lanev2si ((int32x2_t) __a, (int32x2_t) __b, __c, 0);
+  return (uint32x2_t)__builtin_neon_vmul_lanev2si ((int32x2_t) __a, (int32x2_t) __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmulq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x8_t)__builtin_neon_vmul_lanev8hi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vmul_lanev8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmulq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x4_t)__builtin_neon_vmul_lanev4si (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vmul_lanev4si (__a, __b, __c);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmulq_lane_f32 (float32x4_t __a, float32x2_t __b, const int __c)
 {
-  return (float32x4_t)__builtin_neon_vmul_lanev4sf (__a, __b, __c, 3);
+  return (float32x4_t)__builtin_neon_vmul_lanev4sf (__a, __b, __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmulq_lane_u16 (uint16x8_t __a, uint16x4_t __b, const int __c)
 {
-  return (uint16x8_t)__builtin_neon_vmul_lanev8hi ((int16x8_t) __a, (int16x4_t) __b, __c, 0);
+  return (uint16x8_t)__builtin_neon_vmul_lanev8hi ((int16x8_t) __a, (int16x4_t) __b, __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmulq_lane_u32 (uint32x4_t __a, uint32x2_t __b, const int __c)
 {
-  return (uint32x4_t)__builtin_neon_vmul_lanev4si ((int32x4_t) __a, (int32x2_t) __b, __c, 0);
+  return (uint32x4_t)__builtin_neon_vmul_lanev4si ((int32x4_t) __a, (int32x2_t) __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmla_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
 {
-  return (int16x4_t)__builtin_neon_vmla_lanev4hi (__a, __b, __c, __d, 1);
+  return (int16x4_t)__builtin_neon_vmla_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmla_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
 {
-  return (int32x2_t)__builtin_neon_vmla_lanev2si (__a, __b, __c, __d, 1);
+  return (int32x2_t)__builtin_neon_vmla_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmla_lane_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c, const int __d)
 {
-  return (float32x2_t)__builtin_neon_vmla_lanev2sf (__a, __b, __c, __d, 3);
+  return (float32x2_t)__builtin_neon_vmla_lanev2sf (__a, __b, __c, __d);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmla_lane_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
 {
-  return (uint16x4_t)__builtin_neon_vmla_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+  return (uint16x4_t)__builtin_neon_vmla_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmla_lane_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
 {
-  return (uint32x2_t)__builtin_neon_vmla_lanev2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+  return (uint32x2_t)__builtin_neon_vmla_lanev2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlaq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
 {
-  return (int16x8_t)__builtin_neon_vmla_lanev8hi (__a, __b, __c, __d, 1);
+  return (int16x8_t)__builtin_neon_vmla_lanev8hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlaq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
 {
-  return (int32x4_t)__builtin_neon_vmla_lanev4si (__a, __b, __c, __d, 1);
+  return (int32x4_t)__builtin_neon_vmla_lanev4si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmlaq_lane_f32 (float32x4_t __a, float32x4_t __b, float32x2_t __c, const int __d)
 {
-  return (float32x4_t)__builtin_neon_vmla_lanev4sf (__a, __b, __c, __d, 3);
+  return (float32x4_t)__builtin_neon_vmla_lanev4sf (__a, __b, __c, __d);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlaq_lane_u16 (uint16x8_t __a, uint16x8_t __b, uint16x4_t __c, const int __d)
 {
-  return (uint16x8_t)__builtin_neon_vmla_lanev8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x4_t) __c, __d, 0);
+  return (uint16x8_t)__builtin_neon_vmla_lanev8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x4_t) __c, __d);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlaq_lane_u32 (uint32x4_t __a, uint32x4_t __b, uint32x2_t __c, const int __d)
 {
-  return (uint32x4_t)__builtin_neon_vmla_lanev4si ((int32x4_t) __a, (int32x4_t) __b, (int32x2_t) __c, __d, 0);
+  return (uint32x4_t)__builtin_neon_vmla_lanev4si ((int32x4_t) __a, (int32x4_t) __b, (int32x2_t) __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlal_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
 {
-  return (int32x4_t)__builtin_neon_vmlal_lanev4hi (__a, __b, __c, __d, 1);
+  return (int32x4_t)__builtin_neon_vmlals_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmlal_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
 {
-  return (int64x2_t)__builtin_neon_vmlal_lanev2si (__a, __b, __c, __d, 1);
+  return (int64x2_t)__builtin_neon_vmlals_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlal_lane_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
 {
-  return (uint32x4_t)__builtin_neon_vmlal_lanev4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+  return (uint32x4_t)__builtin_neon_vmlalu_lanev4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmlal_lane_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
 {
-  return (uint64x2_t)__builtin_neon_vmlal_lanev2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+  return (uint64x2_t)__builtin_neon_vmlalu_lanev2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlal_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
 {
-  return (int32x4_t)__builtin_neon_vqdmlal_lanev4hi (__a, __b, __c, __d, 1);
+  return (int32x4_t)__builtin_neon_vqdmlal_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlal_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
 {
-  return (int64x2_t)__builtin_neon_vqdmlal_lanev2si (__a, __b, __c, __d, 1);
+  return (int64x2_t)__builtin_neon_vqdmlal_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmls_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
 {
-  return (int16x4_t)__builtin_neon_vmls_lanev4hi (__a, __b, __c, __d, 1);
+  return (int16x4_t)__builtin_neon_vmls_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmls_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
 {
-  return (int32x2_t)__builtin_neon_vmls_lanev2si (__a, __b, __c, __d, 1);
+  return (int32x2_t)__builtin_neon_vmls_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmls_lane_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c, const int __d)
 {
-  return (float32x2_t)__builtin_neon_vmls_lanev2sf (__a, __b, __c, __d, 3);
+  return (float32x2_t)__builtin_neon_vmls_lanev2sf (__a, __b, __c, __d);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmls_lane_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
 {
-  return (uint16x4_t)__builtin_neon_vmls_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+  return (uint16x4_t)__builtin_neon_vmls_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmls_lane_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
 {
-  return (uint32x2_t)__builtin_neon_vmls_lanev2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+  return (uint32x2_t)__builtin_neon_vmls_lanev2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlsq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
 {
-  return (int16x8_t)__builtin_neon_vmls_lanev8hi (__a, __b, __c, __d, 1);
+  return (int16x8_t)__builtin_neon_vmls_lanev8hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlsq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
 {
-  return (int32x4_t)__builtin_neon_vmls_lanev4si (__a, __b, __c, __d, 1);
+  return (int32x4_t)__builtin_neon_vmls_lanev4si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmlsq_lane_f32 (float32x4_t __a, float32x4_t __b, float32x2_t __c, const int __d)
 {
-  return (float32x4_t)__builtin_neon_vmls_lanev4sf (__a, __b, __c, __d, 3);
+  return (float32x4_t)__builtin_neon_vmls_lanev4sf (__a, __b, __c, __d);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlsq_lane_u16 (uint16x8_t __a, uint16x8_t __b, uint16x4_t __c, const int __d)
 {
-  return (uint16x8_t)__builtin_neon_vmls_lanev8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x4_t) __c, __d, 0);
+  return (uint16x8_t)__builtin_neon_vmls_lanev8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x4_t) __c, __d);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlsq_lane_u32 (uint32x4_t __a, uint32x4_t __b, uint32x2_t __c, const int __d)
 {
-  return (uint32x4_t)__builtin_neon_vmls_lanev4si ((int32x4_t) __a, (int32x4_t) __b, (int32x2_t) __c, __d, 0);
+  return (uint32x4_t)__builtin_neon_vmls_lanev4si ((int32x4_t) __a, (int32x4_t) __b, (int32x2_t) __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlsl_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
 {
-  return (int32x4_t)__builtin_neon_vmlsl_lanev4hi (__a, __b, __c, __d, 1);
+  return (int32x4_t)__builtin_neon_vmlsls_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmlsl_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
 {
-  return (int64x2_t)__builtin_neon_vmlsl_lanev2si (__a, __b, __c, __d, 1);
+  return (int64x2_t)__builtin_neon_vmlsls_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlsl_lane_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
 {
-  return (uint32x4_t)__builtin_neon_vmlsl_lanev4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+  return (uint32x4_t)__builtin_neon_vmlslu_lanev4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmlsl_lane_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
 {
-  return (uint64x2_t)__builtin_neon_vmlsl_lanev2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+  return (uint64x2_t)__builtin_neon_vmlslu_lanev2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlsl_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
 {
-  return (int32x4_t)__builtin_neon_vqdmlsl_lanev4hi (__a, __b, __c, __d, 1);
+  return (int32x4_t)__builtin_neon_vqdmlsl_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlsl_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
 {
-  return (int64x2_t)__builtin_neon_vqdmlsl_lanev2si (__a, __b, __c, __d, 1);
+  return (int64x2_t)__builtin_neon_vqdmlsl_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmull_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
 {
-  return (int32x4_t)__builtin_neon_vmull_lanev4hi (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vmulls_lanev4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmull_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
 {
-  return (int64x2_t)__builtin_neon_vmull_lanev2si (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vmulls_lanev2si (__a, __b, __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmull_lane_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
 {
-  return (uint32x4_t)__builtin_neon_vmull_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 0);
+  return (uint32x4_t)__builtin_neon_vmullu_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmull_lane_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
 {
-  return (uint64x2_t)__builtin_neon_vmull_lanev2si ((int32x2_t) __a, (int32x2_t) __b, __c, 0);
+  return (uint64x2_t)__builtin_neon_vmullu_lanev2si ((int32x2_t) __a, (int32x2_t) __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmull_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
 {
-  return (int32x4_t)__builtin_neon_vqdmull_lanev4hi (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vqdmull_lanev4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmull_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
 {
-  return (int64x2_t)__builtin_neon_vqdmull_lanev2si (__a, __b, __c, 1);
+  return (int64x2_t)__builtin_neon_vqdmull_lanev2si (__a, __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x8_t)__builtin_neon_vqdmulh_lanev8hi (__a, __b, __c, 1);
+  return (int16x8_t)__builtin_neon_vqdmulh_lanev8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x4_t)__builtin_neon_vqdmulh_lanev4si (__a, __b, __c, 1);
+  return (int32x4_t)__builtin_neon_vqdmulh_lanev4si (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x4_t)__builtin_neon_vqdmulh_lanev4hi (__a, __b, __c, 1);
+  return (int16x4_t)__builtin_neon_vqdmulh_lanev4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x2_t)__builtin_neon_vqdmulh_lanev2si (__a, __b, __c, 1);
+  return (int32x2_t)__builtin_neon_vqdmulh_lanev2si (__a, __b, __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqrdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x8_t)__builtin_neon_vqdmulh_lanev8hi (__a, __b, __c, 5);
+  return (int16x8_t)__builtin_neon_vqrdmulh_lanev8hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqrdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x4_t)__builtin_neon_vqdmulh_lanev4si (__a, __b, __c, 5);
+  return (int32x4_t)__builtin_neon_vqrdmulh_lanev4si (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqrdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
 {
-  return (int16x4_t)__builtin_neon_vqdmulh_lanev4hi (__a, __b, __c, 5);
+  return (int16x4_t)__builtin_neon_vqrdmulh_lanev4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
 {
-  return (int32x2_t)__builtin_neon_vqdmulh_lanev2si (__a, __b, __c, 5);
+  return (int32x2_t)__builtin_neon_vqrdmulh_lanev2si (__a, __b, __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
-  return (int16x4_t)__builtin_neon_vmul_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+  return (int16x4_t)__builtin_neon_vmul_nv4hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmul_n_s32 (int32x2_t __a, int32_t __b)
 {
-  return (int32x2_t)__builtin_neon_vmul_nv2si (__a, (__builtin_neon_si) __b, 1);
+  return (int32x2_t)__builtin_neon_vmul_nv2si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmul_n_f32 (float32x2_t __a, float32_t __b)
 {
-  return (float32x2_t)__builtin_neon_vmul_nv2sf (__a, (__builtin_neon_sf) __b, 3);
+  return (float32x2_t)__builtin_neon_vmul_nv2sf (__a, (__builtin_neon_sf) __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmul_n_u16 (uint16x4_t __a, uint16_t __b)
 {
-  return (uint16x4_t)__builtin_neon_vmul_nv4hi ((int16x4_t) __a, (__builtin_neon_hi) __b, 0);
+  return (uint16x4_t)__builtin_neon_vmul_nv4hi ((int16x4_t) __a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmul_n_u32 (uint32x2_t __a, uint32_t __b)
 {
-  return (uint32x2_t)__builtin_neon_vmul_nv2si ((int32x2_t) __a, (__builtin_neon_si) __b, 0);
+  return (uint32x2_t)__builtin_neon_vmul_nv2si ((int32x2_t) __a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmulq_n_s16 (int16x8_t __a, int16_t __b)
 {
-  return (int16x8_t)__builtin_neon_vmul_nv8hi (__a, (__builtin_neon_hi) __b, 1);
+  return (int16x8_t)__builtin_neon_vmul_nv8hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmulq_n_s32 (int32x4_t __a, int32_t __b)
 {
-  return (int32x4_t)__builtin_neon_vmul_nv4si (__a, (__builtin_neon_si) __b, 1);
+  return (int32x4_t)__builtin_neon_vmul_nv4si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmulq_n_f32 (float32x4_t __a, float32_t __b)
 {
-  return (float32x4_t)__builtin_neon_vmul_nv4sf (__a, (__builtin_neon_sf) __b, 3);
+  return (float32x4_t)__builtin_neon_vmul_nv4sf (__a, (__builtin_neon_sf) __b);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmulq_n_u16 (uint16x8_t __a, uint16_t __b)
 {
-  return (uint16x8_t)__builtin_neon_vmul_nv8hi ((int16x8_t) __a, (__builtin_neon_hi) __b, 0);
+  return (uint16x8_t)__builtin_neon_vmul_nv8hi ((int16x8_t) __a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmulq_n_u32 (uint32x4_t __a, uint32_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vmul_nv4si ((int32x4_t) __a, (__builtin_neon_si) __b, 0);
+  return (uint32x4_t)__builtin_neon_vmul_nv4si ((int32x4_t) __a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmull_n_s16 (int16x4_t __a, int16_t __b)
 {
-  return (int32x4_t)__builtin_neon_vmull_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+  return (int32x4_t)__builtin_neon_vmulls_nv4hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmull_n_s32 (int32x2_t __a, int32_t __b)
 {
-  return (int64x2_t)__builtin_neon_vmull_nv2si (__a, (__builtin_neon_si) __b, 1);
+  return (int64x2_t)__builtin_neon_vmulls_nv2si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmull_n_u16 (uint16x4_t __a, uint16_t __b)
 {
-  return (uint32x4_t)__builtin_neon_vmull_nv4hi ((int16x4_t) __a, (__builtin_neon_hi) __b, 0);
+  return (uint32x4_t)__builtin_neon_vmullu_nv4hi ((int16x4_t) __a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmull_n_u32 (uint32x2_t __a, uint32_t __b)
 {
-  return (uint64x2_t)__builtin_neon_vmull_nv2si ((int32x2_t) __a, (__builtin_neon_si) __b, 0);
+  return (uint64x2_t)__builtin_neon_vmullu_nv2si ((int32x2_t) __a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmull_n_s16 (int16x4_t __a, int16_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqdmull_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+  return (int32x4_t)__builtin_neon_vqdmull_nv4hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmull_n_s32 (int32x2_t __a, int32_t __b)
 {
-  return (int64x2_t)__builtin_neon_vqdmull_nv2si (__a, (__builtin_neon_si) __b, 1);
+  return (int64x2_t)__builtin_neon_vqdmull_nv2si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqdmulhq_n_s16 (int16x8_t __a, int16_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqdmulh_nv8hi (__a, (__builtin_neon_hi) __b, 1);
+  return (int16x8_t)__builtin_neon_vqdmulh_nv8hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmulhq_n_s32 (int32x4_t __a, int32_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqdmulh_nv4si (__a, (__builtin_neon_si) __b, 1);
+  return (int32x4_t)__builtin_neon_vqdmulh_nv4si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqdmulh_n_s16 (int16x4_t __a, int16_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqdmulh_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+  return (int16x4_t)__builtin_neon_vqdmulh_nv4hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqdmulh_n_s32 (int32x2_t __a, int32_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqdmulh_nv2si (__a, (__builtin_neon_si) __b, 1);
+  return (int32x2_t)__builtin_neon_vqdmulh_nv2si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vqrdmulhq_n_s16 (int16x8_t __a, int16_t __b)
 {
-  return (int16x8_t)__builtin_neon_vqdmulh_nv8hi (__a, (__builtin_neon_hi) __b, 5);
+  return (int16x8_t)__builtin_neon_vqrdmulh_nv8hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqrdmulhq_n_s32 (int32x4_t __a, int32_t __b)
 {
-  return (int32x4_t)__builtin_neon_vqdmulh_nv4si (__a, (__builtin_neon_si) __b, 5);
+  return (int32x4_t)__builtin_neon_vqrdmulh_nv4si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqrdmulh_n_s16 (int16x4_t __a, int16_t __b)
 {
-  return (int16x4_t)__builtin_neon_vqdmulh_nv4hi (__a, (__builtin_neon_hi) __b, 5);
+  return (int16x4_t)__builtin_neon_vqrdmulh_nv4hi (__a, (__builtin_neon_hi) __b);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vqrdmulh_n_s32 (int32x2_t __a, int32_t __b)
 {
-  return (int32x2_t)__builtin_neon_vqdmulh_nv2si (__a, (__builtin_neon_si) __b, 5);
+  return (int32x2_t)__builtin_neon_vqrdmulh_nv2si (__a, (__builtin_neon_si) __b);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmla_n_s16 (int16x4_t __a, int16x4_t __b, int16_t __c)
 {
-  return (int16x4_t)__builtin_neon_vmla_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int16x4_t)__builtin_neon_vmla_nv4hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmla_n_s32 (int32x2_t __a, int32x2_t __b, int32_t __c)
 {
-  return (int32x2_t)__builtin_neon_vmla_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int32x2_t)__builtin_neon_vmla_nv2si (__a, __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmla_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
 {
-  return (float32x2_t)__builtin_neon_vmla_nv2sf (__a, __b, (__builtin_neon_sf) __c, 3);
+  return (float32x2_t)__builtin_neon_vmla_nv2sf (__a, __b, (__builtin_neon_sf) __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmla_n_u16 (uint16x4_t __a, uint16x4_t __b, uint16_t __c)
 {
-  return (uint16x4_t)__builtin_neon_vmla_nv4hi ((int16x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+  return (uint16x4_t)__builtin_neon_vmla_nv4hi ((int16x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmla_n_u32 (uint32x2_t __a, uint32x2_t __b, uint32_t __c)
 {
-  return (uint32x2_t)__builtin_neon_vmla_nv2si ((int32x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+  return (uint32x2_t)__builtin_neon_vmla_nv2si ((int32x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlaq_n_s16 (int16x8_t __a, int16x8_t __b, int16_t __c)
 {
-  return (int16x8_t)__builtin_neon_vmla_nv8hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int16x8_t)__builtin_neon_vmla_nv8hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlaq_n_s32 (int32x4_t __a, int32x4_t __b, int32_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmla_nv4si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int32x4_t)__builtin_neon_vmla_nv4si (__a, __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmlaq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
 {
-  return (float32x4_t)__builtin_neon_vmla_nv4sf (__a, __b, (__builtin_neon_sf) __c, 3);
+  return (float32x4_t)__builtin_neon_vmla_nv4sf (__a, __b, (__builtin_neon_sf) __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlaq_n_u16 (uint16x8_t __a, uint16x8_t __b, uint16_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vmla_nv8hi ((int16x8_t) __a, (int16x8_t) __b, (__builtin_neon_hi) __c, 0);
+  return (uint16x8_t)__builtin_neon_vmla_nv8hi ((int16x8_t) __a, (int16x8_t) __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlaq_n_u32 (uint32x4_t __a, uint32x4_t __b, uint32_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmla_nv4si ((int32x4_t) __a, (int32x4_t) __b, (__builtin_neon_si) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmla_nv4si ((int32x4_t) __a, (int32x4_t) __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlal_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmlal_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int32x4_t)__builtin_neon_vmlals_nv4hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmlal_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
 {
-  return (int64x2_t)__builtin_neon_vmlal_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int64x2_t)__builtin_neon_vmlals_nv2si (__a, __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlal_n_u16 (uint32x4_t __a, uint16x4_t __b, uint16_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmlal_nv4hi ((int32x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmlalu_nv4hi ((int32x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmlal_n_u32 (uint64x2_t __a, uint32x2_t __b, uint32_t __c)
 {
-  return (uint64x2_t)__builtin_neon_vmlal_nv2si ((int64x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+  return (uint64x2_t)__builtin_neon_vmlalu_nv2si ((int64x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlal_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
 {
-  return (int32x4_t)__builtin_neon_vqdmlal_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int32x4_t)__builtin_neon_vqdmlal_nv4hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlal_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
 {
-  return (int64x2_t)__builtin_neon_vqdmlal_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int64x2_t)__builtin_neon_vqdmlal_nv2si (__a, __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmls_n_s16 (int16x4_t __a, int16x4_t __b, int16_t __c)
 {
-  return (int16x4_t)__builtin_neon_vmls_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int16x4_t)__builtin_neon_vmls_nv4hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vmls_n_s32 (int32x2_t __a, int32x2_t __b, int32_t __c)
 {
-  return (int32x2_t)__builtin_neon_vmls_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int32x2_t)__builtin_neon_vmls_nv2si (__a, __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmls_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
 {
-  return (float32x2_t)__builtin_neon_vmls_nv2sf (__a, __b, (__builtin_neon_sf) __c, 3);
+  return (float32x2_t)__builtin_neon_vmls_nv2sf (__a, __b, (__builtin_neon_sf) __c);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vmls_n_u16 (uint16x4_t __a, uint16x4_t __b, uint16_t __c)
 {
-  return (uint16x4_t)__builtin_neon_vmls_nv4hi ((int16x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+  return (uint16x4_t)__builtin_neon_vmls_nv4hi ((int16x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vmls_n_u32 (uint32x2_t __a, uint32x2_t __b, uint32_t __c)
 {
-  return (uint32x2_t)__builtin_neon_vmls_nv2si ((int32x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+  return (uint32x2_t)__builtin_neon_vmls_nv2si ((int32x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlsq_n_s16 (int16x8_t __a, int16x8_t __b, int16_t __c)
 {
-  return (int16x8_t)__builtin_neon_vmls_nv8hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int16x8_t)__builtin_neon_vmls_nv8hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlsq_n_s32 (int32x4_t __a, int32x4_t __b, int32_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmls_nv4si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int32x4_t)__builtin_neon_vmls_nv4si (__a, __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmlsq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
 {
-  return (float32x4_t)__builtin_neon_vmls_nv4sf (__a, __b, (__builtin_neon_sf) __c, 3);
+  return (float32x4_t)__builtin_neon_vmls_nv4sf (__a, __b, (__builtin_neon_sf) __c);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vmlsq_n_u16 (uint16x8_t __a, uint16x8_t __b, uint16_t __c)
 {
-  return (uint16x8_t)__builtin_neon_vmls_nv8hi ((int16x8_t) __a, (int16x8_t) __b, (__builtin_neon_hi) __c, 0);
+  return (uint16x8_t)__builtin_neon_vmls_nv8hi ((int16x8_t) __a, (int16x8_t) __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlsq_n_u32 (uint32x4_t __a, uint32x4_t __b, uint32_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmls_nv4si ((int32x4_t) __a, (int32x4_t) __b, (__builtin_neon_si) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmls_nv4si ((int32x4_t) __a, (int32x4_t) __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vmlsl_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
 {
-  return (int32x4_t)__builtin_neon_vmlsl_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int32x4_t)__builtin_neon_vmlsls_nv4hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vmlsl_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
 {
-  return (int64x2_t)__builtin_neon_vmlsl_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int64x2_t)__builtin_neon_vmlsls_nv2si (__a, __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vmlsl_n_u16 (uint32x4_t __a, uint16x4_t __b, uint16_t __c)
 {
-  return (uint32x4_t)__builtin_neon_vmlsl_nv4hi ((int32x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+  return (uint32x4_t)__builtin_neon_vmlslu_nv4hi ((int32x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vmlsl_n_u32 (uint64x2_t __a, uint32x2_t __b, uint32_t __c)
 {
-  return (uint64x2_t)__builtin_neon_vmlsl_nv2si ((int64x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+  return (uint64x2_t)__builtin_neon_vmlslu_nv2si ((int64x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlsl_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
 {
-  return (int32x4_t)__builtin_neon_vqdmlsl_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+  return (int32x4_t)__builtin_neon_vqdmlsl_nv4hi (__a, __b, (__builtin_neon_hi) __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlsl_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
 {
-  return (int64x2_t)__builtin_neon_vqdmlsl_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+  return (int64x2_t)__builtin_neon_vqdmlsl_nv2si (__a, __b, (__builtin_neon_si) __c);
 }
 
 #ifdef __ARM_FEATURE_CRYPTO
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 229caca..5451524 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -19,46 +19,87 @@
    <http://www.gnu.org/licenses/>.  */
 
 VAR2 (BINOP, vadd, v2sf, v4sf),
-VAR3 (BINOP, vaddl, v8qi, v4hi, v2si),
-VAR3 (BINOP, vaddw, v8qi, v4hi, v2si),
-VAR6 (BINOP, vhadd, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR8 (BINOP, vqadd, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR3 (BINOP, vaddls, v8qi, v4hi, v2si),
+VAR3 (BINOP, vaddlu, v8qi, v4hi, v2si),
+VAR3 (BINOP, vaddws, v8qi, v4hi, v2si),
+VAR3 (BINOP, vaddwu, v8qi, v4hi, v2si),
+VAR6 (BINOP, vhaddu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vhadds, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vrhaddu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vrhadds, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR8 (BINOP, vqadds, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vqaddu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
 VAR3 (BINOP, vaddhn, v8hi, v4si, v2di),
-VAR8 (BINOP, vmul, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
+VAR3 (BINOP, vraddhn, v8hi, v4si, v2di),
+VAR2 (BINOP, vmulf, v2sf, v4sf),
+VAR2 (BINOP, vmulp, v8qi, v16qi),
 VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR3 (TERNOP, vmlal, v8qi, v4hi, v2si),
+VAR3 (TERNOP, vmlals, v8qi, v4hi, v2si),
+VAR3 (TERNOP, vmlalu, v8qi, v4hi, v2si),
 VAR2 (TERNOP, vfma, v2sf, v4sf),
 VAR2 (TERNOP, vfms, v2sf, v4sf),
 VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR3 (TERNOP, vmlsl, v8qi, v4hi, v2si),
+VAR3 (TERNOP, vmlsls, v8qi, v4hi, v2si),
+VAR3 (TERNOP, vmlslu, v8qi, v4hi, v2si),
 VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si),
+VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si),
 VAR2 (TERNOP, vqdmlal, v4hi, v2si),
 VAR2 (TERNOP, vqdmlsl, v4hi, v2si),
-VAR3 (BINOP, vmull, v8qi, v4hi, v2si),
-VAR2 (SCALARMULL, vmull_n, v4hi, v2si),
-VAR2 (LANEMULL, vmull_lane, v4hi, v2si),
+VAR3 (BINOP, vmullp, v8qi, v4hi, v2si),
+VAR3 (BINOP, vmulls, v8qi, v4hi, v2si),
+VAR3 (BINOP, vmullu, v8qi, v4hi, v2si),
+VAR2 (SCALARMULL, vmulls_n, v4hi, v2si),
+VAR2 (SCALARMULL, vmullu_n, v4hi, v2si),
+VAR2 (LANEMULL, vmulls_lane, v4hi, v2si),
+VAR2 (LANEMULL, vmullu_lane, v4hi, v2si),
 VAR2 (SCALARMULL, vqdmull_n, v4hi, v2si),
 VAR2 (LANEMULL, vqdmull_lane, v4hi, v2si),
 VAR4 (SCALARMULH, vqdmulh_n, v4hi, v2si, v8hi, v4si),
+VAR4 (SCALARMULH, vqrdmulh_n, v4hi, v2si, v8hi, v4si),
 VAR4 (LANEMULH, vqdmulh_lane, v4hi, v2si, v8hi, v4si),
+VAR4 (LANEMULH, vqrdmulh_lane, v4hi, v2si, v8hi, v4si),
 VAR2 (BINOP, vqdmull, v4hi, v2si),
-VAR8 (BINOP, vshl, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vqshl, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vshr_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vrshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vrshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vqshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vqshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vqrshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vqrshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTIMM, vshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTIMM, vshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTIMM, vrshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTIMM, vrshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
 VAR3 (SHIFTIMM, vshrn_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vqshrn_n, v8hi, v4si, v2di),
+VAR3 (SHIFTIMM, vrshrn_n, v8hi, v4si, v2di),
+VAR3 (SHIFTIMM, vqshrns_n, v8hi, v4si, v2di),
+VAR3 (SHIFTIMM, vqshrnu_n, v8hi, v4si, v2di),
+VAR3 (SHIFTIMM, vqrshrns_n, v8hi, v4si, v2di),
+VAR3 (SHIFTIMM, vqrshrnu_n, v8hi, v4si, v2di),
 VAR3 (SHIFTIMM, vqshrun_n, v8hi, v4si, v2di),
+VAR3 (SHIFTIMM, vqrshrun_n, v8hi, v4si, v2di),
 VAR8 (SHIFTIMM, vshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vqshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTIMM, vqshl_s_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTIMM, vqshl_u_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
 VAR8 (SHIFTIMM, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vshll_n, v8qi, v4hi, v2si),
-VAR8 (SHIFTACC, vsra_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR3 (SHIFTIMM, vshlls_n, v8qi, v4hi, v2si),
+VAR3 (SHIFTIMM, vshllu_n, v8qi, v4hi, v2si),
+VAR8 (SHIFTACC, vsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTACC, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTACC, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (SHIFTACC, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
 VAR2 (BINOP, vsub, v2sf, v4sf),
-VAR3 (BINOP, vsubl, v8qi, v4hi, v2si),
-VAR3 (BINOP, vsubw, v8qi, v4hi, v2si),
-VAR8 (BINOP, vqsub, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR6 (BINOP, vhsub, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR3 (BINOP, vsubls, v8qi, v4hi, v2si),
+VAR3 (BINOP, vsublu, v8qi, v4hi, v2si),
+VAR3 (BINOP, vsubws, v8qi, v4hi, v2si),
+VAR3 (BINOP, vsubwu, v8qi, v4hi, v2si),
+VAR8 (BINOP, vqsubs, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR8 (BINOP, vqsubu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
+VAR6 (BINOP, vhsubs, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vhsubu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
 VAR3 (BINOP, vsubhn, v8hi, v4si, v2di),
+VAR3 (BINOP, vrsubhn, v8hi, v4si, v2di),
 VAR8 (BINOP, vceq, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
 VAR8 (BINOP, vcge, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
 VAR6 (BINOP, vcgeu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
@@ -67,17 +108,36 @@ VAR6 (BINOP, vcgtu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
 VAR2 (BINOP, vcage, v2sf, v4sf),
 VAR2 (BINOP, vcagt, v2sf, v4sf),
 VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR8 (BINOP, vabd, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR3 (BINOP, vabdl, v8qi, v4hi, v2si),
-VAR6 (TERNOP, vaba, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR3 (TERNOP, vabal, v8qi, v4hi, v2si),
-VAR8 (BINOP, vmax, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR8 (BINOP, vmin, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
+VAR6 (BINOP, vabds, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vabdu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR2 (BINOP, vabdf, v2sf, v4sf),
+VAR3 (BINOP, vabdls, v8qi, v4hi, v2si),
+VAR3 (BINOP, vabdlu, v8qi, v4hi, v2si),
+
+VAR6 (TERNOP, vabas, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (TERNOP, vabau, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR3 (TERNOP, vabals, v8qi, v4hi, v2si),
+VAR3 (TERNOP, vabalu, v8qi, v4hi, v2si),
+
+VAR6 (BINOP, vmaxs, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vmaxu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR2 (BINOP, vmaxf, v2sf, v4sf),
+VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR2 (BINOP, vminf, v2sf, v4sf),
+
+VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si),
+VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si),
+VAR1 (BINOP, vpmaxf, v2sf),
+VAR3 (BINOP, vpmins, v8qi, v4hi, v2si),
+VAR3 (BINOP, vpminu, v8qi, v4hi, v2si),
+VAR1 (BINOP, vpminf, v2sf),
+
 VAR4 (BINOP, vpadd, v8qi, v4hi, v2si, v2sf),
-VAR6 (UNOP, vpaddl, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vpadal, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR4 (BINOP, vpmax, v8qi, v4hi, v2si, v2sf),
-VAR4 (BINOP, vpmin, v8qi, v4hi, v2si, v2sf),
+VAR6 (UNOP, vpaddls, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (UNOP, vpaddlu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vpadals, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR6 (BINOP, vpadalu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
 VAR2 (BINOP, vrecps, v2sf, v4sf),
 VAR2 (BINOP, vrsqrts, v2sf, v4sf),
 VAR8 (SHIFTINSERT, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
@@ -96,41 +156,50 @@ VAR6 (UNOP, vmvn, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
   /* FIXME: vget_lane supports more variants than this!  */
 VAR10 (GETLANE, vget_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+VAR6 (GETLANE, vget_laneu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
 VAR10 (SETLANE, vset_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
 VAR5 (CREATE, vcreate, v8qi, v4hi, v2si, v2sf, di),
 VAR10 (DUP, vdup_n,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR10 (DUPLANE, vdup_lane,
+VAR10 (BINOP, vdup_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
 VAR5 (COMBINE, vcombine, v8qi, v4hi, v2si, v2sf, di),
 VAR5 (SPLIT, vget_high, v16qi, v8hi, v4si, v4sf, v2di),
 VAR5 (SPLIT, vget_low, v16qi, v8hi, v4si, v4sf, v2di),
 VAR3 (UNOP, vmovn, v8hi, v4si, v2di),
-VAR3 (UNOP, vqmovn, v8hi, v4si, v2di),
+VAR3 (UNOP, vqmovns, v8hi, v4si, v2di),
+VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di),
 VAR3 (UNOP, vqmovun, v8hi, v4si, v2di),
-VAR3 (UNOP, vmovl, v8qi, v4hi, v2si),
+VAR3 (UNOP, vmovls, v8qi, v4hi, v2si),
+VAR3 (UNOP, vmovlu, v8qi, v4hi, v2si),
 VAR6 (LANEMUL, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
 VAR6 (LANEMAC, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (LANEMAC, vmlal_lane, v4hi, v2si),
+VAR2 (LANEMAC, vmlals_lane, v4hi, v2si),
+VAR2 (LANEMAC, vmlalu_lane, v4hi, v2si),
 VAR2 (LANEMAC, vqdmlal_lane, v4hi, v2si),
 VAR6 (LANEMAC, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (LANEMAC, vmlsl_lane, v4hi, v2si),
+VAR2 (LANEMAC, vmlsls_lane, v4hi, v2si),
+VAR2 (LANEMAC, vmlslu_lane, v4hi, v2si),
 VAR2 (LANEMAC, vqdmlsl_lane, v4hi, v2si),
 VAR6 (SCALARMUL, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
 VAR6 (SCALARMAC, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (SCALARMAC, vmlal_n, v4hi, v2si),
+VAR2 (SCALARMAC, vmlals_n, v4hi, v2si),
+VAR2 (SCALARMAC, vmlalu_n, v4hi, v2si),
 VAR2 (SCALARMAC, vqdmlal_n, v4hi, v2si),
 VAR6 (SCALARMAC, vmls_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (SCALARMAC, vmlsl_n, v4hi, v2si),
+VAR2 (SCALARMAC, vmlsls_n, v4hi, v2si),
+VAR2 (SCALARMAC, vmlslu_n, v4hi, v2si),
 VAR2 (SCALARMAC, vqdmlsl_n, v4hi, v2si),
-VAR10 (BINOP, vext,
+VAR10 (SHIFTINSERT, vext,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
 VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
 VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi),
 VAR2 (UNOP, vrev16, v8qi, v16qi),
-VAR4 (CONVERT, vcvt, v2si, v2sf, v4si, v4sf),
-VAR4 (FIXCONV, vcvt_n, v2si, v2sf, v4si, v4sf),
+VAR4 (CONVERT, vcvts, v2si, v2sf, v4si, v4sf),
+VAR4 (CONVERT, vcvtu, v2si, v2sf, v4si, v4sf),
+VAR4 (FIXCONV, vcvts_n, v2si, v2sf, v4si, v4sf),
+VAR4 (FIXCONV, vcvtu_n, v2si, v2sf, v4si, v4sf),
 VAR1 (FLOAT_WIDEN, vcvtv4sf, v4hf),
 VAR1 (FLOAT_NARROW, vcvtv4hf, v4sf),
 VAR10 (SELECT, vbsl,
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 021372a107a6deb9cf56abe092805b38ae22be05..bf0329a3c0f28e4282fbb5922f71816342cd60d9 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -169,6 +169,9 @@ (define_mode_iterator QADDSUB [V4QQ V2HQ
 
 (define_mode_iterator QMUL [HQ HA])
 
+;; Modes for polynomial or float values.
+(define_mode_iterator VPF [V8QI V16QI V2SF V4SF])
+
 ;;----------------------------------------------------------------------------
 ;; Code iterators
 ;;----------------------------------------------------------------------------
@@ -225,6 +228,92 @@ (define_int_iterator NEON_VRINT [UNSPEC_
 
 (define_int_iterator NEON_VCVT [UNSPEC_NVRINTP UNSPEC_NVRINTM UNSPEC_NVRINTA])
 
+(define_int_iterator VADDL [UNSPEC_VADDL_S UNSPEC_VADDL_U])
+
+(define_int_iterator VADDW [UNSPEC_VADDW_S UNSPEC_VADDW_U])
+
+(define_int_iterator VHADD [UNSPEC_VRHADD_S UNSPEC_VRHADD_U
+			    UNSPEC_VHADD_S UNSPEC_VHADD_U])
+
+(define_int_iterator VQADD [UNSPEC_VQADD_S UNSPEC_VQADD_U])
+
+(define_int_iterator VADDHN [UNSPEC_VADDHN UNSPEC_VRADDHN])
+
+(define_int_iterator VMLAL [UNSPEC_VMLAL_S UNSPEC_VMLAL_U])
+
+(define_int_iterator VMLAL_LANE [UNSPEC_VMLAL_S_LANE UNSPEC_VMLAL_U_LANE])
+
+(define_int_iterator VMLSL [UNSPEC_VMLSL_S UNSPEC_VMLSL_U])
+
+(define_int_iterator VMLSL_LANE [UNSPEC_VMLSL_S_LANE UNSPEC_VMLSL_U_LANE])
+
+(define_int_iterator VQDMULH [UNSPEC_VQDMULH UNSPEC_VQRDMULH])
+
+(define_int_iterator VQDMULH_LANE [UNSPEC_VQDMULH_LANE UNSPEC_VQRDMULH_LANE])
+
+(define_int_iterator VMULL [UNSPEC_VMULL_S UNSPEC_VMULL_U UNSPEC_VMULL_P])
+
+(define_int_iterator VMULL_LANE [UNSPEC_VMULL_S_LANE UNSPEC_VMULL_U_LANE])
+
+(define_int_iterator VSUBL [UNSPEC_VSUBL_S UNSPEC_VSUBL_U])
+
+(define_int_iterator VSUBW [UNSPEC_VSUBW_S UNSPEC_VSUBW_U])
+
+(define_int_iterator VHSUB [UNSPEC_VHSUB_S UNSPEC_VHSUB_U])
+
+(define_int_iterator VQSUB [UNSPEC_VQSUB_S UNSPEC_VQSUB_U])
+
+(define_int_iterator VSUBHN [UNSPEC_VSUBHN UNSPEC_VRSUBHN])
+
+(define_int_iterator VABD [UNSPEC_VABD_S UNSPEC_VABD_U])
+
+(define_int_iterator VABDL [UNSPEC_VABDL_S UNSPEC_VABDL_U])
+
+(define_int_iterator VMAXMIN [UNSPEC_VMAX UNSPEC_VMAX_U
+			      UNSPEC_VMIN UNSPEC_VMIN_U])
+
+(define_int_iterator VMAXMINF [UNSPEC_VMAX UNSPEC_VMIN])
+
+(define_int_iterator VPADDL [UNSPEC_VPADDL_S UNSPEC_VPADDL_U])
+
+(define_int_iterator VPADAL [UNSPEC_VPADAL_S UNSPEC_VPADAL_U])
+
+(define_int_iterator VPMAXMIN [UNSPEC_VPMAX UNSPEC_VPMAX_U
+			       UNSPEC_VPMIN UNSPEC_VPMIN_U])
+
+(define_int_iterator VPMAXMINF [UNSPEC_VPMAX UNSPEC_VPMIN])
+
+(define_int_iterator VCVT_US [UNSPEC_VCVT_S UNSPEC_VCVT_U])
+
+(define_int_iterator VCVT_US_N [UNSPEC_VCVT_S_N UNSPEC_VCVT_U_N])
+
+(define_int_iterator VQMOVN [UNSPEC_VQMOVN_S UNSPEC_VQMOVN_U])
+
+(define_int_iterator VMOVL [UNSPEC_VMOVL_S UNSPEC_VMOVL_U])
+
+(define_int_iterator VSHL [UNSPEC_VSHL_S UNSPEC_VSHL_U
+			   UNSPEC_VRSHL_S UNSPEC_VRSHL_U])
+
+(define_int_iterator VQSHL [UNSPEC_VQSHL_S UNSPEC_VQSHL_U
+			    UNSPEC_VQRSHL_S UNSPEC_VQRSHL_U])
+
+(define_int_iterator VSHR_N [UNSPEC_VSHR_S_N UNSPEC_VSHR_U_N
+			     UNSPEC_VRSHR_S_N UNSPEC_VRSHR_U_N])
+
+(define_int_iterator VSHRN_N [UNSPEC_VSHRN_N UNSPEC_VRSHRN_N])
+
+(define_int_iterator VQSHRN_N [UNSPEC_VQSHRN_S_N UNSPEC_VQSHRN_U_N
+			       UNSPEC_VQRSHRN_S_N UNSPEC_VQRSHRN_U_N])
+
+(define_int_iterator VQSHRUN_N [UNSPEC_VQSHRUN_N UNSPEC_VQRSHRUN_N])
+
+(define_int_iterator VQSHL_N [UNSPEC_VQSHL_S_N UNSPEC_VQSHL_U_N])
+
+(define_int_iterator VSHLL_N [UNSPEC_VSHLL_S_N UNSPEC_VSHLL_U_N])
+
+(define_int_iterator VSRA_N [UNSPEC_VSRA_S_N UNSPEC_VSRA_U_N
+			     UNSPEC_VRSRA_S_N UNSPEC_VRSRA_U_N])
+
 (define_int_iterator CRC [UNSPEC_CRC32B UNSPEC_CRC32H UNSPEC_CRC32W
                           UNSPEC_CRC32CB UNSPEC_CRC32CH UNSPEC_CRC32CW])
 
@@ -504,6 +593,8 @@ (define_mode_attr q [(V8QI "") (V16QI "_
                      (DI "")   (V2DI "_q")
                      (DF "")   (V2DF "_q")])
 
+(define_mode_attr pf [(V8QI "p") (V16QI "p") (V2SF "f") (V4SF "f")])
+
 ;;----------------------------------------------------------------------------
 ;; Code attributes
 ;;----------------------------------------------------------------------------
@@ -541,6 +632,82 @@ (define_code_attr shifttype [(ashiftrt "
 ;; Int attributes
 ;;----------------------------------------------------------------------------
 
+;; Mapping between vector UNSPEC operations and the signed ('s'),
+;; unsigned ('u'), poly ('p') or float ('f') nature of their data type.
+(define_int_attr sup [
+  (UNSPEC_VADDL_S "s") (UNSPEC_VADDL_U "u")
+  (UNSPEC_VADDW_S "s") (UNSPEC_VADDW_U "u")
+  (UNSPEC_VRHADD_S "s") (UNSPEC_VRHADD_U "u")
+  (UNSPEC_VHADD_S "s") (UNSPEC_VHADD_U "u")
+  (UNSPEC_VQADD_S "s") (UNSPEC_VQADD_U "u")
+  (UNSPEC_VMLAL_S "s") (UNSPEC_VMLAL_U "u")
+  (UNSPEC_VMLAL_S_LANE "s") (UNSPEC_VMLAL_U_LANE "u")
+  (UNSPEC_VMLSL_S "s") (UNSPEC_VMLSL_U "u")
+  (UNSPEC_VMLSL_S_LANE "s") (UNSPEC_VMLSL_U_LANE "u")
+  (UNSPEC_VMULL_S "s") (UNSPEC_VMULL_U "u") (UNSPEC_VMULL_P "p")
+  (UNSPEC_VMULL_S_LANE "s") (UNSPEC_VMULL_U_LANE "u")
+  (UNSPEC_VSUBL_S "s") (UNSPEC_VSUBL_U "u")
+  (UNSPEC_VSUBW_S "s") (UNSPEC_VSUBW_U "u")
+  (UNSPEC_VHSUB_S "s") (UNSPEC_VHSUB_U "u")
+  (UNSPEC_VQSUB_S "s") (UNSPEC_VQSUB_U "u")
+  (UNSPEC_VABD_S "s") (UNSPEC_VABD_U "u")
+  (UNSPEC_VABDL_S "s") (UNSPEC_VABDL_U "u")
+  (UNSPEC_VMAX "s") (UNSPEC_VMAX_U "u")
+  (UNSPEC_VMIN "s") (UNSPEC_VMIN_U "u")
+  (UNSPEC_VPADDL_S "s") (UNSPEC_VPADDL_U "u")
+  (UNSPEC_VPADAL_S "s") (UNSPEC_VPADAL_U "u")
+  (UNSPEC_VPMAX "s") (UNSPEC_VPMAX_U "u")
+  (UNSPEC_VPMIN "s") (UNSPEC_VPMIN_U "u")
+  (UNSPEC_VCVT_S "s") (UNSPEC_VCVT_U "u")
+  (UNSPEC_VCVT_S_N "s") (UNSPEC_VCVT_U_N "u")
+  (UNSPEC_VQMOVN_S "s") (UNSPEC_VQMOVN_U "u")
+  (UNSPEC_VMOVL_S "s") (UNSPEC_VMOVL_U "u")
+  (UNSPEC_VSHL_S "s") (UNSPEC_VSHL_U "u")
+  (UNSPEC_VRSHL_S "s") (UNSPEC_VRSHL_U "u")
+  (UNSPEC_VQSHL_S "s") (UNSPEC_VQSHL_U "u")
+  (UNSPEC_VQRSHL_S "s") (UNSPEC_VQRSHL_U "u")
+  (UNSPEC_VSHR_S_N "s") (UNSPEC_VSHR_U_N "u")
+  (UNSPEC_VRSHR_S_N "s") (UNSPEC_VRSHR_U_N "u")
+  (UNSPEC_VQSHRN_S_N "s") (UNSPEC_VQSHRN_U_N "u")
+  (UNSPEC_VQRSHRN_S_N "s") (UNSPEC_VQRSHRN_U_N "u")
+  (UNSPEC_VQSHL_S_N "s") (UNSPEC_VQSHL_U_N "u")
+  (UNSPEC_VSHLL_S_N "s") (UNSPEC_VSHLL_U_N "u")
+  (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
+  (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
+
+])
+
+(define_int_attr r [
+  (UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r")
+  (UNSPEC_VHADD_S "") (UNSPEC_VHADD_U "")
+  (UNSPEC_VADDHN "") (UNSPEC_VRADDHN "r")
+  (UNSPEC_VQDMULH "") (UNSPEC_VQRDMULH "r")
+  (UNSPEC_VQDMULH_LANE "") (UNSPEC_VQRDMULH_LANE "r")
+  (UNSPEC_VSUBHN "") (UNSPEC_VRSUBHN "r")
+])
+
+(define_int_attr maxmin [
+  (UNSPEC_VMAX "max") (UNSPEC_VMAX_U "max")
+  (UNSPEC_VMIN "min") (UNSPEC_VMIN_U "min")
+  (UNSPEC_VPMAX "max") (UNSPEC_VPMAX_U "max")
+  (UNSPEC_VPMIN "min") (UNSPEC_VPMIN_U "min")
+])
+
+(define_int_attr shift_op [
+  (UNSPEC_VSHL_S "shl") (UNSPEC_VSHL_U "shl")
+  (UNSPEC_VRSHL_S "rshl") (UNSPEC_VRSHL_U "rshl")
+  (UNSPEC_VQSHL_S "qshl") (UNSPEC_VQSHL_U "qshl")
+  (UNSPEC_VQRSHL_S "qrshl") (UNSPEC_VQRSHL_U "qrshl")
+  (UNSPEC_VSHR_S_N "shr") (UNSPEC_VSHR_U_N "shr")
+  (UNSPEC_VRSHR_S_N "rshr") (UNSPEC_VRSHR_U_N "rshr")
+  (UNSPEC_VSHRN_N "shrn") (UNSPEC_VRSHRN_N "rshrn")
+  (UNSPEC_VQRSHRN_S_N "qrshrn") (UNSPEC_VQRSHRN_U_N "qrshrn")
+  (UNSPEC_VQSHRN_S_N "qshrn") (UNSPEC_VQSHRN_U_N "qshrn")
+  (UNSPEC_VQSHRUN_N "qshrun") (UNSPEC_VQRSHRUN_N "qrshrun")
+  (UNSPEC_VSRA_S_N "sra") (UNSPEC_VSRA_U_N "sra")
+  (UNSPEC_VRSRA_S_N "rsra") (UNSPEC_VRSRA_U_N "rsra")
+])
+
 ;; Standard names for floating point to integral rounding instructions.
 (define_int_attr vrint_pattern [(UNSPEC_VRINTZ "btrunc") (UNSPEC_VRINTP "ceil")
                          (UNSPEC_VRINTA "round") (UNSPEC_VRINTM "floor")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index e7f5abe5aec135e8656b711d98539c685a4f7742..22318de6d7b1a951117909460afdd38c05cf7442 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1627,17 +1627,14 @@ (define_expand "vcond<mode><mode>"
 	  (match_operand:VDQW 2 "s_register_operand" "")))]
   "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
 {
-  HOST_WIDE_INT magic_word = (<MODE>mode == V2SFmode || <MODE>mode == V4SFmode)
-			     ? 3 : 1;
-  rtx magic_rtx = GEN_INT (magic_word);
   int inverse = 0;
   int use_zero_form = 0;
   int swap_bsl_operands = 0;
   rtx mask = gen_reg_rtx (<V_cmp_result>mode);
   rtx tmp = gen_reg_rtx (<V_cmp_result>mode);
 
-  rtx (*base_comparison) (rtx, rtx, rtx, rtx);
-  rtx (*complimentary_comparison) (rtx, rtx, rtx, rtx);
+  rtx (*base_comparison) (rtx, rtx, rtx);
+  rtx (*complimentary_comparison) (rtx, rtx, rtx);
 
   switch (GET_CODE (operands[3]))
     {
@@ -1724,9 +1721,9 @@ (define_expand "vcond<mode><mode>"
 	}
 
       if (!inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5], magic_rtx));
+	emit_insn (base_comparison (mask, operands[4], operands[5]));
       else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4], magic_rtx));
+	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
       break;
     case UNLT:
     case UNLE:
@@ -1746,9 +1743,9 @@ (define_expand "vcond<mode><mode>"
 	 a NE b -> !(a EQ b)  */
 
       if (inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5], magic_rtx));
+	emit_insn (base_comparison (mask, operands[4], operands[5]));
       else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4], magic_rtx));
+	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
 
       swap_bsl_operands = 1;
       break;
@@ -1757,8 +1754,8 @@ (define_expand "vcond<mode><mode>"
 	 true iff !(a != b && a ORDERED b), swapping the operands to BSL
 	 will then give us (a == b ||  a UNORDERED b) as intended.  */
 
-      emit_insn (gen_neon_vcgt<mode> (mask, operands[4], operands[5], magic_rtx));
-      emit_insn (gen_neon_vcgt<mode> (tmp, operands[5], operands[4], magic_rtx));
+      emit_insn (gen_neon_vcgt<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_neon_vcgt<mode> (tmp, operands[5], operands[4]));
       emit_insn (gen_ior<v_cmp_result>3 (mask, mask, tmp));
       swap_bsl_operands = 1;
       break;
@@ -1768,8 +1765,8 @@ (define_expand "vcond<mode><mode>"
      swap_bsl_operands = 1;
      /* Fall through.  */
     case ORDERED:
-      emit_insn (gen_neon_vcgt<mode> (tmp, operands[4], operands[5], magic_rtx));
-      emit_insn (gen_neon_vcge<mode> (mask, operands[5], operands[4], magic_rtx));
+      emit_insn (gen_neon_vcgt<mode> (tmp, operands[4], operands[5]));
+      emit_insn (gen_neon_vcge<mode> (mask, operands[5], operands[4]));
       emit_insn (gen_ior<v_cmp_result>3 (mask, mask, tmp));
       break;
     default:
@@ -1808,41 +1805,33 @@ (define_expand "vcondu<mode><mode>"
   switch (GET_CODE (operands[3]))
     {
     case GEU:
-      emit_insn (gen_neon_vcge<mode> (mask, operands[4], operands[5],
-				      const0_rtx));
+      emit_insn (gen_neon_vcgeu<mode> (mask, operands[4], operands[5]));
       break;
     
     case GTU:
-      emit_insn (gen_neon_vcgt<mode> (mask, operands[4], operands[5],
-				      const0_rtx));
+      emit_insn (gen_neon_vcgtu<mode> (mask, operands[4], operands[5]));
       break;
     
     case EQ:
-      emit_insn (gen_neon_vceq<mode> (mask, operands[4], operands[5],
-				      const0_rtx));
+      emit_insn (gen_neon_vceq<mode> (mask, operands[4], operands[5]));
       break;
     
     case LEU:
       if (immediate_zero)
-	emit_insn (gen_neon_vcle<mode> (mask, operands[4], operands[5],
-					const0_rtx));
+	emit_insn (gen_neon_vcle<mode> (mask, operands[4], operands[5]));
       else
-	emit_insn (gen_neon_vcge<mode> (mask, operands[5], operands[4],
-					const0_rtx));
+	emit_insn (gen_neon_vcgeu<mode> (mask, operands[5], operands[4]));
       break;
     
     case LTU:
       if (immediate_zero)
-        emit_insn (gen_neon_vclt<mode> (mask, operands[4], operands[5],
-					const0_rtx));
+        emit_insn (gen_neon_vclt<mode> (mask, operands[4], operands[5]));
       else
-	emit_insn (gen_neon_vcgt<mode> (mask, operands[5], operands[4],
-					const0_rtx));
+	emit_insn (gen_neon_vcgtu<mode> (mask, operands[5], operands[4]));
       break;
     
     case NE:
-      emit_insn (gen_neon_vceq<mode> (mask, operands[4], operands[5],
-				      const0_rtx));
+      emit_insn (gen_neon_vceq<mode> (mask, operands[4], operands[5]));
       inverse = 1;
       break;
     
@@ -1867,8 +1856,7 @@ (define_expand "vcondu<mode><mode>"
 (define_expand "neon_vadd<mode>"
   [(match_operand:VCVTF 0 "s_register_operand" "=w")
    (match_operand:VCVTF 1 "s_register_operand" "w")
-   (match_operand:VCVTF 2 "s_register_operand" "w")
-   (match_operand:SI 3 "immediate_operand" "i")]
+   (match_operand:VCVTF 2 "s_register_operand" "w")]
   "TARGET_NEON"
 {
   if (!<Is_float_mode> || flag_unsafe_math_optimizations)
@@ -1904,77 +1892,66 @@ (define_insn "neon_vadd<mode>_unspec"
                     (const_string "neon_add<q>")))]
 )
 
-; operand 3 represents in bits:
-;  bit 0: signed (vs unsigned).
-;  bit 1: rounding (vs none).
-
-(define_insn "neon_vaddl<mode>"
+(define_insn "neon_vaddl<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:VDI 1 "s_register_operand" "w")
-		           (match_operand:VDI 2 "s_register_operand" "w")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-                          UNSPEC_VADDL))]
+		           (match_operand:VDI 2 "s_register_operand" "w")]
+                          VADDL))]
   "TARGET_NEON"
-  "vaddl.%T3%#<V_sz_elem>\t%q0, %P1, %P2"
+  "vaddl.<sup>%#<V_sz_elem>\t%q0, %P1, %P2"
   [(set_attr "type" "neon_add_long")]
 )
 
-(define_insn "neon_vaddw<mode>"
+(define_insn "neon_vaddw<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "w")
-		           (match_operand:VDI 2 "s_register_operand" "w")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-                          UNSPEC_VADDW))]
+		           (match_operand:VDI 2 "s_register_operand" "w")]
+                          VADDW))]
   "TARGET_NEON"
-  "vaddw.%T3%#<V_sz_elem>\t%q0, %q1, %P2"
+  "vaddw.<sup>%#<V_sz_elem>\t%q0, %q1, %P2"
   [(set_attr "type" "neon_add_widen")]
 )
 
 ; vhadd and vrhadd.
 
-(define_insn "neon_vhadd<mode>"
+(define_insn "neon_v<r>hadd<sup><mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
-		       (match_operand:VDQIW 2 "s_register_operand" "w")
-		       (match_operand:SI 3 "immediate_operand" "i")]
-		      UNSPEC_VHADD))]
+		       (match_operand:VDQIW 2 "s_register_operand" "w")]
+		      VHADD))]
   "TARGET_NEON"
-  "v%O3hadd.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "v<r>hadd.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_add_halve_q")]
 )
 
-(define_insn "neon_vqadd<mode>"
+(define_insn "neon_vqadd<sup><mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
         (unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:VDQIX 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-                     UNSPEC_VQADD))]
+		       (match_operand:VDQIX 2 "s_register_operand" "w")]
+                     VQADD))]
   "TARGET_NEON"
-  "vqadd.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "vqadd.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_qadd<q>")]
 )
 
-(define_insn "neon_vaddhn<mode>"
+(define_insn "neon_v<r>addhn<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
         (unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-		            (match_operand:VN 2 "s_register_operand" "w")
-                            (match_operand:SI 3 "immediate_operand" "i")]
-                           UNSPEC_VADDHN))]
+		            (match_operand:VN 2 "s_register_operand" "w")]
+                           VADDHN))]
   "TARGET_NEON"
-  "v%O3addhn.<V_if_elem>\t%P0, %q1, %q2"
+  "v<r>addhn.<V_if_elem>\t%P0, %q1, %q2"
   [(set_attr "type" "neon_add_halve_narrow_q")]
 )
 
-;; We cannot replace this unspec with mul<mode>3 because of the odd 
-;; polynomial multiplication case that can specified by operand 3.
-(define_insn "neon_vmul<mode>"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
-        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
-		      (match_operand:VDQW 2 "s_register_operand" "w")
-		      (match_operand:SI 3 "immediate_operand" "i")]
+;; Polynomial and Float multiplication.
+(define_insn "neon_vmul<pf><mode>"
+  [(set (match_operand:VPF 0 "s_register_operand" "=w")
+        (unspec:VPF [(match_operand:VPF 1 "s_register_operand" "w")
+		      (match_operand:VPF 2 "s_register_operand" "w")]
 		     UNSPEC_VMUL))]
   "TARGET_NEON"
-  "vmul.%F3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "vmul.<pf>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
                     (const_string "neon_fp_mul_s<q>")
@@ -1985,8 +1962,7 @@ (define_expand "neon_vmla<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "=w")
    (match_operand:VDQW 1 "s_register_operand" "0")
    (match_operand:VDQW 2 "s_register_operand" "w")
-   (match_operand:VDQW 3 "s_register_operand" "w")
-   (match_operand:SI 4 "immediate_operand" "i")]
+   (match_operand:VDQW 3 "s_register_operand" "w")]
   "TARGET_NEON"
 {
   if (!<Is_float_mode> || flag_unsafe_math_optimizations)
@@ -2002,8 +1978,7 @@ (define_expand "neon_vfma<VCVTF:mode>"
   [(match_operand:VCVTF 0 "s_register_operand")
    (match_operand:VCVTF 1 "s_register_operand")
    (match_operand:VCVTF 2 "s_register_operand")
-   (match_operand:VCVTF 3 "s_register_operand")
-   (match_operand:SI 4 "immediate_operand")]
+   (match_operand:VCVTF 3 "s_register_operand")]
   "TARGET_NEON && TARGET_FMA"
 {
   emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3],
@@ -2015,8 +1990,7 @@ (define_expand "neon_vfms<VCVTF:mode>"
   [(match_operand:VCVTF 0 "s_register_operand")
    (match_operand:VCVTF 1 "s_register_operand")
    (match_operand:VCVTF 2 "s_register_operand")
-   (match_operand:VCVTF 3 "s_register_operand")
-   (match_operand:SI 4 "immediate_operand")]
+   (match_operand:VCVTF 3 "s_register_operand")]
   "TARGET_NEON && TARGET_FMA"
 {
   emit_insn (gen_fmsub<mode>4_intrinsic (operands[0], operands[2], operands[3],
@@ -2040,15 +2014,14 @@ (define_insn "neon_vmla<mode>_unspec"
                     (const_string "neon_mla_<V_elem_ch><q>")))]
 )
 
-(define_insn "neon_vmlal<mode>"
+(define_insn "neon_vmlal<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
 		           (match_operand:VW 2 "s_register_operand" "w")
-		           (match_operand:VW 3 "s_register_operand" "w")
-                           (match_operand:SI 4 "immediate_operand" "i")]
-                          UNSPEC_VMLAL))]
+		           (match_operand:VW 3 "s_register_operand" "w")]
+                          VMLAL))]
   "TARGET_NEON"
-  "vmlal.%T4%#<V_sz_elem>\t%q0, %P2, %P3"
+  "vmlal.<sup>%#<V_sz_elem>\t%q0, %P2, %P3"
   [(set_attr "type" "neon_mla_<V_elem_ch>_long")]
 )
 
@@ -2056,8 +2029,7 @@ (define_expand "neon_vmls<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "=w")
    (match_operand:VDQW 1 "s_register_operand" "0")
    (match_operand:VDQW 2 "s_register_operand" "w")
-   (match_operand:VDQW 3 "s_register_operand" "w")
-   (match_operand:SI 4 "immediate_operand" "i")]
+   (match_operand:VDQW 3 "s_register_operand" "w")]
   "TARGET_NEON"
 {
   if (!<Is_float_mode> || flag_unsafe_math_optimizations)
@@ -2085,26 +2057,25 @@ (define_insn "neon_vmls<mode>_unspec"
                     (const_string "neon_mla_<V_elem_ch><q>")))]
 )
 
-(define_insn "neon_vmlsl<mode>"
+(define_insn "neon_vmlsl<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
 		           (match_operand:VW 2 "s_register_operand" "w")
-		           (match_operand:VW 3 "s_register_operand" "w")
-                           (match_operand:SI 4 "immediate_operand" "i")]
-                          UNSPEC_VMLSL))]
+		           (match_operand:VW 3 "s_register_operand" "w")]
+                          VMLSL))]
   "TARGET_NEON"
-  "vmlsl.%T4%#<V_sz_elem>\t%q0, %P2, %P3"
+  "vmlsl.<sup>%#<V_sz_elem>\t%q0, %P2, %P3"
   [(set_attr "type" "neon_mla_<V_elem_ch>_long")]
 )
 
-(define_insn "neon_vqdmulh<mode>"
+;; vqdmulh, vqrdmulh
+(define_insn "neon_vq<r>dmulh<mode>"
   [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
         (unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "w")
-		       (match_operand:VMDQI 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-                      UNSPEC_VQDMULH))]
+		       (match_operand:VMDQI 2 "s_register_operand" "w")]
+                      VQDMULH))]
   "TARGET_NEON"
-  "vq%O3dmulh.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "vq<r>dmulh.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_sat_mul_<V_elem_ch><q>")]
 )
 
@@ -2112,8 +2083,7 @@ (define_insn "neon_vqdmlal<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
 		           (match_operand:VMDI 2 "s_register_operand" "w")
-		           (match_operand:VMDI 3 "s_register_operand" "w")
-                           (match_operand:SI 4 "immediate_operand" "i")]
+		           (match_operand:VMDI 3 "s_register_operand" "w")]
                           UNSPEC_VQDMLAL))]
   "TARGET_NEON"
   "vqdmlal.<V_s_elem>\t%q0, %P2, %P3"
@@ -2124,30 +2094,27 @@ (define_insn "neon_vqdmlsl<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
 		           (match_operand:VMDI 2 "s_register_operand" "w")
-		           (match_operand:VMDI 3 "s_register_operand" "w")
-                           (match_operand:SI 4 "immediate_operand" "i")]
+		           (match_operand:VMDI 3 "s_register_operand" "w")]
                           UNSPEC_VQDMLSL))]
   "TARGET_NEON"
   "vqdmlsl.<V_s_elem>\t%q0, %P2, %P3"
   [(set_attr "type" "neon_sat_mla_<V_elem_ch>_long")]
 )
 
-(define_insn "neon_vmull<mode>"
+(define_insn "neon_vmull<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
-		           (match_operand:VW 2 "s_register_operand" "w")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-                          UNSPEC_VMULL))]
+		           (match_operand:VW 2 "s_register_operand" "w")]
+                          VMULL))]
   "TARGET_NEON"
-  "vmull.%T3%#<V_sz_elem>\t%q0, %P1, %P2"
+  "vmull.<sup>%#<V_sz_elem>\t%q0, %P1, %P2"
   [(set_attr "type" "neon_mul_<V_elem_ch>_long")]
 )
 
 (define_insn "neon_vqdmull<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
-		           (match_operand:VMDI 2 "s_register_operand" "w")
-                           (match_operand:SI 3 "immediate_operand" "i")]
+		           (match_operand:VMDI 2 "s_register_operand" "w")]
                           UNSPEC_VQDMULL))]
   "TARGET_NEON"
   "vqdmull.<V_s_elem>\t%q0, %P1, %P2"
@@ -2157,8 +2124,7 @@ (define_insn "neon_vqdmull<mode>"
 (define_expand "neon_vsub<mode>"
   [(match_operand:VCVTF 0 "s_register_operand" "=w")
    (match_operand:VCVTF 1 "s_register_operand" "w")
-   (match_operand:VCVTF 2 "s_register_operand" "w")
-   (match_operand:SI 3 "immediate_operand" "i")]
+   (match_operand:VCVTF 2 "s_register_operand" "w")]
   "TARGET_NEON"
 {
   if (!<Is_float_mode> || flag_unsafe_math_optimizations)
@@ -2184,58 +2150,53 @@ (define_insn "neon_vsub<mode>_unspec"
                     (const_string "neon_sub<q>")))]
 )
 
-(define_insn "neon_vsubl<mode>"
+(define_insn "neon_vsubl<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:VDI 1 "s_register_operand" "w")
-		           (match_operand:VDI 2 "s_register_operand" "w")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-                          UNSPEC_VSUBL))]
+		           (match_operand:VDI 2 "s_register_operand" "w")]
+                          VSUBL))]
   "TARGET_NEON"
-  "vsubl.%T3%#<V_sz_elem>\t%q0, %P1, %P2"
+  "vsubl.<sup>%#<V_sz_elem>\t%q0, %P1, %P2"
   [(set_attr "type" "neon_sub_long")]
 )
 
-(define_insn "neon_vsubw<mode>"
+(define_insn "neon_vsubw<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "w")
-		           (match_operand:VDI 2 "s_register_operand" "w")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-			  UNSPEC_VSUBW))]
+		           (match_operand:VDI 2 "s_register_operand" "w")]
+			  VSUBW))]
   "TARGET_NEON"
-  "vsubw.%T3%#<V_sz_elem>\t%q0, %q1, %P2"
+  "vsubw.<sup>%#<V_sz_elem>\t%q0, %q1, %P2"
   [(set_attr "type" "neon_sub_widen")]
 )
 
-(define_insn "neon_vqsub<mode>"
+(define_insn "neon_vqsub<sup><mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
         (unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:VDQIX 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-		      UNSPEC_VQSUB))]
+		       (match_operand:VDQIX 2 "s_register_operand" "w")]
+		      VQSUB))]
   "TARGET_NEON"
-  "vqsub.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "vqsub.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_qsub<q>")]
 )
 
-(define_insn "neon_vhsub<mode>"
+(define_insn "neon_vhsub<sup><mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
-		       (match_operand:VDQIW 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-		      UNSPEC_VHSUB))]
+		       (match_operand:VDQIW 2 "s_register_operand" "w")]
+		      VHSUB))]
   "TARGET_NEON"
-  "vhsub.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "vhsub.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_sub_halve<q>")]
 )
 
-(define_insn "neon_vsubhn<mode>"
+(define_insn "neon_v<r>subhn<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
         (unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-		            (match_operand:VN 2 "s_register_operand" "w")
-                            (match_operand:SI 3 "immediate_operand" "i")]
-                           UNSPEC_VSUBHN))]
+		            (match_operand:VN 2 "s_register_operand" "w")]
+                           VSUBHN))]
   "TARGET_NEON"
-  "v%O3subhn.<V_if_elem>\t%P0, %q1, %q2"
+  "v<r>subhn.<V_if_elem>\t%P0, %q1, %q2"
   [(set_attr "type" "neon_sub_halve_narrow_q")]
 )
 
@@ -2243,8 +2204,7 @@ (define_insn "neon_vceq<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
         (unspec:<V_cmp_result>
 	  [(match_operand:VDQW 1 "s_register_operand" "w,w")
-	   (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")
-	   (match_operand:SI 3 "immediate_operand" "i,i")]
+	   (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")]
           UNSPEC_VCEQ))]
   "TARGET_NEON"
   "@
@@ -2262,13 +2222,12 @@ (define_insn "neon_vcge<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
         (unspec:<V_cmp_result>
 	  [(match_operand:VDQW 1 "s_register_operand" "w,w")
-	   (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")
-	   (match_operand:SI 3 "immediate_operand" "i,i")]
+	   (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")]
           UNSPEC_VCGE))]
   "TARGET_NEON"
   "@
-  vcge.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2
-  vcge.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, #0"
+  vcge.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2
+  vcge.<V_s_elem>\t%<V_reg>0, %<V_reg>1, #0"
   [(set (attr "type")
      (if_then_else (match_test "<Is_float_mode>")
                    (const_string "neon_fp_compare_s<q>")
@@ -2281,11 +2240,10 @@ (define_insn "neon_vcgeu<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (unspec:<V_cmp_result>
 	  [(match_operand:VDQIW 1 "s_register_operand" "w")
-	   (match_operand:VDQIW 2 "s_register_operand" "w")
-           (match_operand:SI 3 "immediate_operand" "i")]
+	   (match_operand:VDQIW 2 "s_register_operand" "w")]
           UNSPEC_VCGEU))]
   "TARGET_NEON"
-  "vcge.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "vcge.u%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_compare<q>")]
 )
 
@@ -2293,13 +2251,12 @@ (define_insn "neon_vcgt<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
         (unspec:<V_cmp_result>
 	  [(match_operand:VDQW 1 "s_register_operand" "w,w")
-	   (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")
-           (match_operand:SI 3 "immediate_operand" "i,i")]
+	   (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")]
           UNSPEC_VCGT))]
   "TARGET_NEON"
   "@
-  vcgt.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2
-  vcgt.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, #0"
+  vcgt.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2
+  vcgt.<V_s_elem>\t%<V_reg>0, %<V_reg>1, #0"
   [(set (attr "type")
      (if_then_else (match_test "<Is_float_mode>")
                    (const_string "neon_fp_compare_s<q>")
@@ -2312,11 +2269,10 @@ (define_insn "neon_vcgtu<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (unspec:<V_cmp_result>
 	  [(match_operand:VDQIW 1 "s_register_operand" "w")
-	   (match_operand:VDQIW 2 "s_register_operand" "w")
-           (match_operand:SI 3 "immediate_operand" "i")]
+	   (match_operand:VDQIW 2 "s_register_operand" "w")]
           UNSPEC_VCGTU))]
   "TARGET_NEON"
-  "vcgt.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "vcgt.u%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_compare<q>")]
 )
 
@@ -2327,11 +2283,10 @@ (define_insn "neon_vcle<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (unspec:<V_cmp_result>
 	  [(match_operand:VDQW 1 "s_register_operand" "w")
-	   (match_operand:VDQW 2 "zero_operand" "Dz")
-	   (match_operand:SI 3 "immediate_operand" "i")]
+	   (match_operand:VDQW 2 "zero_operand" "Dz")]
           UNSPEC_VCLE))]
   "TARGET_NEON"
-  "vcle.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, #0"
+  "vcle.<V_s_elem>\t%<V_reg>0, %<V_reg>1, #0"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
                     (const_string "neon_fp_compare_s<q>")
@@ -2344,11 +2299,10 @@ (define_insn "neon_vclt<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (unspec:<V_cmp_result>
 	  [(match_operand:VDQW 1 "s_register_operand" "w")
-	   (match_operand:VDQW 2 "zero_operand" "Dz")
-	   (match_operand:SI 3 "immediate_operand" "i")]
+	   (match_operand:VDQW 2 "zero_operand" "Dz")]
           UNSPEC_VCLT))]
   "TARGET_NEON"
-  "vclt.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, #0"
+  "vclt.<V_s_elem>\t%<V_reg>0, %<V_reg>1, #0"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
                     (const_string "neon_fp_compare_s<q>")
@@ -2360,8 +2314,7 @@ (define_insn "neon_vclt<mode>"
 (define_insn "neon_vcage<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (unspec:<V_cmp_result> [(match_operand:VCVTF 1 "s_register_operand" "w")
-		                (match_operand:VCVTF 2 "s_register_operand" "w")
-                                (match_operand:SI 3 "immediate_operand" "i")]
+		                (match_operand:VCVTF 2 "s_register_operand" "w")]
                                UNSPEC_VCAGE))]
   "TARGET_NEON"
   "vacge.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
@@ -2371,8 +2324,7 @@ (define_insn "neon_vcage<mode>"
 (define_insn "neon_vcagt<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (unspec:<V_cmp_result> [(match_operand:VCVTF 1 "s_register_operand" "w")
-		                (match_operand:VCVTF 2 "s_register_operand" "w")
-                                (match_operand:SI 3 "immediate_operand" "i")]
+		                (match_operand:VCVTF 2 "s_register_operand" "w")]
                                UNSPEC_VCAGT))]
   "TARGET_NEON"
   "vacgt.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
@@ -2382,96 +2334,89 @@ (define_insn "neon_vcagt<mode>"
 (define_insn "neon_vtst<mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
-		       (match_operand:VDQIW 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
+		       (match_operand:VDQIW 2 "s_register_operand" "w")]
 		      UNSPEC_VTST))]
   "TARGET_NEON"
   "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_tst<q>")]
 )
 
-(define_insn "neon_vabd<mode>"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
-        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
-		      (match_operand:VDQW 2 "s_register_operand" "w")
-		      (match_operand:SI 3 "immediate_operand" "i")]
-		     UNSPEC_VABD))]
+(define_insn "neon_vabd<sup><mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		      (match_operand:VDQIW 2 "s_register_operand" "w")]
+		     VABD))]
   "TARGET_NEON"
-  "vabd.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set (attr "type")
-     (if_then_else (match_test "<Is_float_mode>")
-                   (const_string "neon_fp_abd_s<q>")
-                   (const_string "neon_abd<q>")))]
+  "vabd.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_abd<q>")]
+)
+
+(define_insn "neon_vabdf<mode>"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		      (match_operand:VCVTF 2 "s_register_operand" "w")]
+		     UNSPEC_VABD_F))]
+  "TARGET_NEON"
+  "vabd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_abd_s<q>")]
 )
 
-(define_insn "neon_vabdl<mode>"
+(define_insn "neon_vabdl<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
-		           (match_operand:VW 2 "s_register_operand" "w")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-                          UNSPEC_VABDL))]
+		           (match_operand:VW 2 "s_register_operand" "w")]
+                          VABDL))]
   "TARGET_NEON"
-  "vabdl.%T3%#<V_sz_elem>\t%q0, %P1, %P2"
+  "vabdl.<sup>%#<V_sz_elem>\t%q0, %P1, %P2"
   [(set_attr "type" "neon_abd_long")]
 )
 
-(define_insn "neon_vaba<mode>"
+(define_insn "neon_vaba<sup><mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (plus:VDQIW (unspec:VDQIW [(match_operand:VDQIW 2 "s_register_operand" "w")
-		                   (match_operand:VDQIW 3 "s_register_operand" "w")
-                                   (match_operand:SI 4 "immediate_operand" "i")]
-		                  UNSPEC_VABD)
+		                   (match_operand:VDQIW 3 "s_register_operand" "w")]
+		                  VABD)
 		    (match_operand:VDQIW 1 "s_register_operand" "0")))]
   "TARGET_NEON"
-  "vaba.%T4%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
+  "vaba.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
   [(set_attr "type" "neon_arith_acc<q>")]
 )
 
-(define_insn "neon_vabal<mode>"
+(define_insn "neon_vabal<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (plus:<V_widen> (unspec:<V_widen> [(match_operand:VW 2 "s_register_operand" "w")
-                                           (match_operand:VW 3 "s_register_operand" "w")
-                                           (match_operand:SI 4 "immediate_operand" "i")]
-					   UNSPEC_VABDL)
+                                           (match_operand:VW 3 "s_register_operand" "w")]
+					   VABDL)
 			 (match_operand:<V_widen> 1 "s_register_operand" "0")))]
   "TARGET_NEON"
-  "vabal.%T4%#<V_sz_elem>\t%q0, %P2, %P3"
+  "vabal.<sup>%#<V_sz_elem>\t%q0, %P2, %P3"
   [(set_attr "type" "neon_arith_acc<q>")]
 )
 
-(define_insn "neon_vmax<mode>"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
-        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
-		      (match_operand:VDQW 2 "s_register_operand" "w")
-		      (match_operand:SI 3 "immediate_operand" "i")]
-                     UNSPEC_VMAX))]
+(define_insn "neon_v<maxmin><sup><mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		      (match_operand:VDQIW 2 "s_register_operand" "w")]
+                     VMAXMIN))]
   "TARGET_NEON"
-  "vmax.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set (attr "type")
-    (if_then_else (match_test "<Is_float_mode>")
-                  (const_string "neon_fp_minmax_s<q>")
-                  (const_string "neon_minmax<q>")))]
+  "v<maxmin>.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_minmax<q>")]
 )
 
-(define_insn "neon_vmin<mode>"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
-        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
-		      (match_operand:VDQW 2 "s_register_operand" "w")
-		      (match_operand:SI 3 "immediate_operand" "i")]
-                     UNSPEC_VMIN))]
+(define_insn "neon_v<maxmin>f<mode>"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		      (match_operand:VCVTF 2 "s_register_operand" "w")]
+                     VMAXMINF))]
   "TARGET_NEON"
-  "vmin.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set (attr "type")
-    (if_then_else (match_test "<Is_float_mode>")
-                  (const_string "neon_fp_minmax_s<q>")
-                  (const_string "neon_minmax<q>")))]
+  "v<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
 (define_expand "neon_vpadd<mode>"
   [(match_operand:VD 0 "s_register_operand" "=w")
    (match_operand:VD 1 "s_register_operand" "w")
-   (match_operand:VD 2 "s_register_operand" "w")
-   (match_operand:SI 3 "immediate_operand" "i")]
+   (match_operand:VD 2 "s_register_operand" "w")]
   "TARGET_NEON"
 {
   emit_insn (gen_neon_vpadd_internal<mode> (operands[0], operands[1],
@@ -2479,60 +2424,49 @@ (define_expand "neon_vpadd<mode>"
   DONE;
 })
 
-(define_insn "neon_vpaddl<mode>"
+(define_insn "neon_vpaddl<sup><mode>"
   [(set (match_operand:<V_double_width> 0 "s_register_operand" "=w")
-        (unspec:<V_double_width> [(match_operand:VDQIW 1 "s_register_operand" "w")
-                                  (match_operand:SI 2 "immediate_operand" "i")]
-                                 UNSPEC_VPADDL))]
+        (unspec:<V_double_width> [(match_operand:VDQIW 1 "s_register_operand" "w")]
+                                 VPADDL))]
   "TARGET_NEON"
-  "vpaddl.%T2%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1"
+  "vpaddl.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1"
   [(set_attr "type" "neon_reduc_add_long")]
 )
 
-(define_insn "neon_vpadal<mode>"
+(define_insn "neon_vpadal<sup><mode>"
   [(set (match_operand:<V_double_width> 0 "s_register_operand" "=w")
         (unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
-                                  (match_operand:VDQIW 2 "s_register_operand" "w")
-                                  (match_operand:SI 3 "immediate_operand" "i")]
-                                 UNSPEC_VPADAL))]
+                                  (match_operand:VDQIW 2 "s_register_operand" "w")]
+                                 VPADAL))]
   "TARGET_NEON"
-  "vpadal.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
+  "vpadal.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
   [(set_attr "type" "neon_reduc_add_acc")]
 )
 
-(define_insn "neon_vpmax<mode>"
-  [(set (match_operand:VD 0 "s_register_operand" "=w")
-        (unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
-		    (match_operand:VD 2 "s_register_operand" "w")
-                    (match_operand:SI 3 "immediate_operand" "i")]
-                   UNSPEC_VPMAX))]
+(define_insn "neon_vp<maxmin><sup><mode>"
+  [(set (match_operand:VDI 0 "s_register_operand" "=w")
+        (unspec:VDI [(match_operand:VDI 1 "s_register_operand" "w")
+		    (match_operand:VDI 2 "s_register_operand" "w")]
+                   VPMAXMIN))]
   "TARGET_NEON"
-  "vpmax.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set (attr "type")
-    (if_then_else (match_test "<Is_float_mode>")
-                  (const_string "neon_fp_reduc_minmax_s<q>")
-                  (const_string "neon_reduc_minmax<q>")))]
+  "vp<maxmin>.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_reduc_minmax<q>")]
 )
 
-(define_insn "neon_vpmin<mode>"
-  [(set (match_operand:VD 0 "s_register_operand" "=w")
-        (unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
-		    (match_operand:VD 2 "s_register_operand" "w")
-                    (match_operand:SI 3 "immediate_operand" "i")]
-                   UNSPEC_VPMIN))]
+(define_insn "neon_vp<maxmin>f<mode>"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		    (match_operand:VCVTF 2 "s_register_operand" "w")]
+                   VPMAXMINF))]
   "TARGET_NEON"
-  "vpmin.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set (attr "type")
-    (if_then_else (match_test "<Is_float_mode>")
-                  (const_string "neon_fp_reduc_minmax_s<q>")
-                  (const_string "neon_reduc_minmax<q>")))]
+  "vp<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_reduc_minmax_s<q>")]
 )
 
 (define_insn "neon_vrecps<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
-		       (match_operand:VCVTF 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
+		       (match_operand:VCVTF 2 "s_register_operand" "w")]
                       UNSPEC_VRECPS))]
   "TARGET_NEON"
   "vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
@@ -2542,8 +2476,7 @@ (define_insn "neon_vrecps<mode>"
 (define_insn "neon_vrsqrts<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
-		       (match_operand:VCVTF 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
+		       (match_operand:VCVTF 2 "s_register_operand" "w")]
                       UNSPEC_VRSQRTS))]
   "TARGET_NEON"
   "vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
@@ -2552,8 +2485,7 @@ (define_insn "neon_vrsqrts<mode>"
 
 (define_expand "neon_vabs<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "")
-   (match_operand:VDQW 1 "s_register_operand" "")
-   (match_operand:SI 2 "immediate_operand" "")]
+   (match_operand:VDQW 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
@@ -2562,8 +2494,7 @@ (define_expand "neon_vabs<mode>"
 
 (define_insn "neon_vqabs<mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
-	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
-		       (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")]
 		      UNSPEC_VQABS))]
   "TARGET_NEON"
   "vqabs.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
@@ -2580,8 +2511,7 @@ (define_insn "neon_bswap<mode>"
 
 (define_expand "neon_vneg<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "")
-   (match_operand:VDQW 1 "s_register_operand" "")
-   (match_operand:SI 2 "immediate_operand" "")]
+   (match_operand:VDQW 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   emit_insn (gen_neg<mode>2 (operands[0], operands[1]));
@@ -2617,8 +2547,7 @@ (define_expand "neon_copysignf<mode>"
 
 (define_insn "neon_vqneg<mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
-	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
-		       (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")]
 		      UNSPEC_VQNEG))]
   "TARGET_NEON"
   "vqneg.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
@@ -2627,8 +2556,7 @@ (define_insn "neon_vqneg<mode>"
 
 (define_insn "neon_vcls<mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
-	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
-		       (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")]
 		      UNSPEC_VCLS))]
   "TARGET_NEON"
   "vcls.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
@@ -2645,8 +2573,7 @@ (define_insn "clz<mode>2"
 
 (define_expand "neon_vclz<mode>"
   [(match_operand:VDQIW 0 "s_register_operand" "")
-   (match_operand:VDQIW 1 "s_register_operand" "")
-   (match_operand:SI 2 "immediate_operand" "")]
+   (match_operand:VDQIW 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   emit_insn (gen_clz<mode>2 (operands[0], operands[1]));
@@ -2663,8 +2590,7 @@ (define_insn "popcount<mode>2"
 
 (define_expand "neon_vcnt<mode>"
   [(match_operand:VE 0 "s_register_operand" "=w")
-   (match_operand:VE 1 "s_register_operand" "w")
-   (match_operand:SI 2 "immediate_operand" "i")]
+   (match_operand:VE 1 "s_register_operand" "w")]
   "TARGET_NEON"
 {
   emit_insn (gen_popcount<mode>2 (operands[0], operands[1]));
@@ -2673,8 +2599,7 @@ (define_expand "neon_vcnt<mode>"
 
 (define_insn "neon_vrecpe<mode>"
   [(set (match_operand:V32 0 "s_register_operand" "=w")
-	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")
-                     (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")]
                     UNSPEC_VRECPE))]
   "TARGET_NEON"
   "vrecpe.<V_u_elem>\t%<V_reg>0, %<V_reg>1"
@@ -2683,8 +2608,7 @@ (define_insn "neon_vrecpe<mode>"
 
 (define_insn "neon_vrsqrte<mode>"
   [(set (match_operand:V32 0 "s_register_operand" "=w")
-	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")
-                     (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")]
                     UNSPEC_VRSQRTE))]
   "TARGET_NEON"
   "vrsqrte.<V_u_elem>\t%<V_reg>0, %<V_reg>1"
@@ -2693,8 +2617,7 @@ (define_insn "neon_vrsqrte<mode>"
 
 (define_expand "neon_vmvn<mode>"
   [(match_operand:VDQIW 0 "s_register_operand" "")
-   (match_operand:VDQIW 1 "s_register_operand" "")
-   (match_operand:SI 2 "immediate_operand" "")]
+   (match_operand:VDQIW 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[1]));
@@ -2796,13 +2719,9 @@ (define_insn "neon_vget_lane<mode>_zext_
 (define_expand "neon_vget_lane<mode>"
   [(match_operand:<V_ext> 0 "s_register_operand" "")
    (match_operand:VDQW 1 "s_register_operand" "")
-   (match_operand:SI 2 "immediate_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:SI 2 "immediate_operand" "")]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT magic = INTVAL (operands[3]);
-  rtx insn;
-
   neon_lane_bounds (operands[2], 0, GET_MODE_NUNITS (<MODE>mode));
 
   if (BYTES_BIG_ENDIAN)
@@ -2819,29 +2738,50 @@ (define_expand "neon_vget_lane<mode>"
       operands[2] = GEN_INT (elt);
     }
 
-  if ((magic & 3) == 3 || GET_MODE_BITSIZE (GET_MODE_INNER (<MODE>mode)) == 32)
-    insn = gen_vec_extract<mode> (operands[0], operands[1], operands[2]);
+  if (GET_MODE_BITSIZE (GET_MODE_INNER (<MODE>mode)) == 32)
+    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
   else
+    emit_insn (gen_neon_vget_lane<mode>_sext_internal (operands[0],
+						       operands[1],
+						       operands[2]));
+  DONE;
+})
+
+(define_expand "neon_vget_laneu<mode>"
+  [(match_operand:<V_ext> 0 "s_register_operand" "")
+   (match_operand:VDQIW 1 "s_register_operand" "")
+   (match_operand:SI 2 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  neon_lane_bounds (operands[2], 0, GET_MODE_NUNITS (<MODE>mode));
+
+  if (BYTES_BIG_ENDIAN)
     {
-      if ((magic & 1) != 0)
-	insn = gen_neon_vget_lane<mode>_sext_internal (operands[0], operands[1],
-						       operands[2]);
-      else
-	insn = gen_neon_vget_lane<mode>_zext_internal (operands[0], operands[1],
-						       operands[2]);
+      /* The intrinsics are defined in terms of a model where the
+	 element ordering in memory is vldm order, whereas the generic
+	 RTL is defined in terms of a model where the element ordering
+	 in memory is array order.  Convert the lane number to conform
+	 to this model.  */
+      unsigned int elt = INTVAL (operands[2]);
+      unsigned int reg_nelts
+	= 64 / GET_MODE_BITSIZE (GET_MODE_INNER (<MODE>mode));
+      elt ^= reg_nelts - 1;
+      operands[2] = GEN_INT (elt);
     }
-  emit_insn (insn);
+
+  if (GET_MODE_BITSIZE (GET_MODE_INNER (<MODE>mode)) == 32)
+    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_neon_vget_lane<mode>_zext_internal (operands[0],
+						       operands[1],
+						       operands[2]));
   DONE;
 })
 
-; Operand 3 (info word) is ignored because it does nothing useful with 64-bit
-; elements.
-
 (define_expand "neon_vget_lanedi"
   [(match_operand:DI 0 "s_register_operand" "=r")
    (match_operand:DI 1 "s_register_operand" "w")
-   (match_operand:SI 2 "immediate_operand" "i")
-   (match_operand:SI 3 "immediate_operand" "i")]
+   (match_operand:SI 2 "immediate_operand" "")]
   "TARGET_NEON"
 {
   neon_lane_bounds (operands[2], 0, 1);
@@ -2852,8 +2792,7 @@ (define_expand "neon_vget_lanedi"
 (define_expand "neon_vget_lanev2di"
   [(match_operand:DI 0 "s_register_operand" "")
    (match_operand:V2DI 1 "s_register_operand" "")
-   (match_operand:SI 2 "immediate_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:SI 2 "immediate_operand" "")]
   "TARGET_NEON"
 {
   switch (INTVAL (operands[2]))
@@ -3110,23 +3049,21 @@ (define_insn "fixuns_trunc<mode><V_cvtto
   [(set_attr "type" "neon_fp_to_int_<V_elem_ch><q>")]
 )
 
-(define_insn "neon_vcvt<mode>"
+(define_insn "neon_vcvt<sup><mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
-	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
-			   (match_operand:SI 2 "immediate_operand" "i")]
-			  UNSPEC_VCVT))]
+	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")]
+			  VCVT_US))]
   "TARGET_NEON"
-  "vcvt.%T2%#32.f32\t%<V_reg>0, %<V_reg>1"
+  "vcvt.<sup>%#32.f32\t%<V_reg>0, %<V_reg>1"
   [(set_attr "type" "neon_fp_to_int_<V_elem_ch><q>")]
 )
 
-(define_insn "neon_vcvt<mode>"
+(define_insn "neon_vcvt<sup><mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
-	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
-			   (match_operand:SI 2 "immediate_operand" "i")]
-			  UNSPEC_VCVT))]
+	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")]
+			  VCVT_US))]
   "TARGET_NEON"
-  "vcvt.f32.%T2%#32\t%<V_reg>0, %<V_reg>1"
+  "vcvt.f32.<sup>%#32\t%<V_reg>0, %<V_reg>1"
   [(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")]
 )
 
@@ -3148,71 +3085,65 @@ (define_insn "neon_vcvtv4hfv4sf"
   [(set_attr "type" "neon_fp_cvt_narrow_s_q")]
 )
 
-(define_insn "neon_vcvt_n<mode>"
+(define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
-			   (match_operand:SI 2 "immediate_operand" "i")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-			  UNSPEC_VCVT_N))]
+			   (match_operand:SI 2 "immediate_operand" "i")]
+			  VCVT_US_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 1, 33);
-  return "vcvt.%T3%#32.f32\t%<V_reg>0, %<V_reg>1, %2";
+  return "vcvt.<sup>%#32.f32\t%<V_reg>0, %<V_reg>1, %2";
 }
   [(set_attr "type" "neon_fp_to_int_<V_elem_ch><q>")]
 )
 
-(define_insn "neon_vcvt_n<mode>"
+(define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
-			   (match_operand:SI 2 "immediate_operand" "i")
-                           (match_operand:SI 3 "immediate_operand" "i")]
-			  UNSPEC_VCVT_N))]
+			   (match_operand:SI 2 "immediate_operand" "i")]
+			  VCVT_US_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 1, 33);
-  return "vcvt.f32.%T3%#32\t%<V_reg>0, %<V_reg>1, %2";
+  return "vcvt.f32.<sup>%#32\t%<V_reg>0, %<V_reg>1, %2";
 }
   [(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")]
 )
 
 (define_insn "neon_vmovn<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
-	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-			    (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
                            UNSPEC_VMOVN))]
   "TARGET_NEON"
   "vmovn.<V_if_elem>\t%P0, %q1"
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
 
-(define_insn "neon_vqmovn<mode>"
+(define_insn "neon_vqmovn<sup><mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
-	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-			    (match_operand:SI 2 "immediate_operand" "i")]
-                           UNSPEC_VQMOVN))]
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
+                           VQMOVN))]
   "TARGET_NEON"
-  "vqmovn.%T2%#<V_sz_elem>\t%P0, %q1"
+  "vqmovn.<sup>%#<V_sz_elem>\t%P0, %q1"
   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
 )
 
 (define_insn "neon_vqmovun<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
-	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-			    (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
                            UNSPEC_VQMOVUN))]
   "TARGET_NEON"
   "vqmovun.<V_s_elem>\t%P0, %q1"
   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
 )
 
-(define_insn "neon_vmovl<mode>"
+(define_insn "neon_vmovl<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
-	(unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
-			   (match_operand:SI 2 "immediate_operand" "i")]
-                          UNSPEC_VMOVL))]
+	(unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")]
+                          VMOVL))]
   "TARGET_NEON"
-  "vmovl.%T2%#<V_sz_elem>\t%q0, %P1"
+  "vmovl.<sup>%#<V_sz_elem>\t%q0, %P1"
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
@@ -3221,8 +3152,7 @@ (define_insn "neon_vmul_lane<mode>"
 	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "w")
 		     (match_operand:VMD 2 "s_register_operand"
                                         "<scalar_mul_constraint>")
-                     (match_operand:SI 3 "immediate_operand" "i")
-                     (match_operand:SI 4 "immediate_operand" "i")]
+                     (match_operand:SI 3 "immediate_operand" "i")]
                     UNSPEC_VMUL_LANE))]
   "TARGET_NEON"
 {
@@ -3240,8 +3170,7 @@ (define_insn "neon_vmul_lane<mode>"
 	(unspec:VMQ [(match_operand:VMQ 1 "s_register_operand" "w")
 		     (match_operand:<V_HALF> 2 "s_register_operand"
                                              "<scalar_mul_constraint>")
-                     (match_operand:SI 3 "immediate_operand" "i")
-                     (match_operand:SI 4 "immediate_operand" "i")]
+                     (match_operand:SI 3 "immediate_operand" "i")]
                     UNSPEC_VMUL_LANE))]
   "TARGET_NEON"
 {
@@ -3254,18 +3183,17 @@ (define_insn "neon_vmul_lane<mode>"
                    (const_string "neon_mul_<V_elem_ch>_scalar<q>")))]
 )
 
-(define_insn "neon_vmull_lane<mode>"
+(define_insn "neon_vmull<sup>_lane<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
 		           (match_operand:VMDI 2 "s_register_operand"
 					       "<scalar_mul_constraint>")
-                           (match_operand:SI 3 "immediate_operand" "i")
-                           (match_operand:SI 4 "immediate_operand" "i")]
-                          UNSPEC_VMULL_LANE))]
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          VMULL_LANE))]
   "TARGET_NEON"
 {
   neon_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<MODE>mode));
-  return "vmull.%T4%#<V_sz_elem>\t%q0, %P1, %P2[%c3]";
+  return "vmull.<sup>%#<V_sz_elem>\t%q0, %P1, %P2[%c3]";
 }
   [(set_attr "type" "neon_mul_<V_elem_ch>_scalar_long")]
 )
@@ -3275,8 +3203,7 @@ (define_insn "neon_vqdmull_lane<mode>"
 	(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
 		           (match_operand:VMDI 2 "s_register_operand"
 					       "<scalar_mul_constraint>")
-                           (match_operand:SI 3 "immediate_operand" "i")
-                           (match_operand:SI 4 "immediate_operand" "i")]
+                           (match_operand:SI 3 "immediate_operand" "i")]
                           UNSPEC_VQDMULL_LANE))]
   "TARGET_NEON"
 {
@@ -3286,34 +3213,32 @@ (define_insn "neon_vqdmull_lane<mode>"
   [(set_attr "type" "neon_sat_mul_<V_elem_ch>_scalar_long")]
 )
 
-(define_insn "neon_vqdmulh_lane<mode>"
+(define_insn "neon_vq<r>dmulh_lane<mode>"
   [(set (match_operand:VMQI 0 "s_register_operand" "=w")
 	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "w")
 		      (match_operand:<V_HALF> 2 "s_register_operand"
 					      "<scalar_mul_constraint>")
-                      (match_operand:SI 3 "immediate_operand" "i")
-                      (match_operand:SI 4 "immediate_operand" "i")]
-                      UNSPEC_VQDMULH_LANE))]
+                      (match_operand:SI 3 "immediate_operand" "i")]
+                      VQDMULH_LANE))]
   "TARGET_NEON"
 {
   neon_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<MODE>mode));
-  return "vq%O4dmulh.%T4%#<V_sz_elem>\t%q0, %q1, %P2[%c3]";
+  return "vq<r>dmulh.<V_s_elem>\t%q0, %q1, %P2[%c3]";
 }
   [(set_attr "type" "neon_sat_mul_<V_elem_ch>_scalar_q")]
 )
 
-(define_insn "neon_vqdmulh_lane<mode>"
+(define_insn "neon_vq<r>dmulh_lane<mode>"
   [(set (match_operand:VMDI 0 "s_register_operand" "=w")
 	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "w")
 		      (match_operand:VMDI 2 "s_register_operand"
 					  "<scalar_mul_constraint>")
-                      (match_operand:SI 3 "immediate_operand" "i")
-                      (match_operand:SI 4 "immediate_operand" "i")]
-                      UNSPEC_VQDMULH_LANE))]
+                      (match_operand:SI 3 "immediate_operand" "i")]
+                      VQDMULH_LANE))]
   "TARGET_NEON"
 {
   neon_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<MODE>mode));
-  return "vq%O4dmulh.%T4%#<V_sz_elem>\t%P0, %P1, %P2[%c3]";
+  return "vq<r>dmulh.<V_s_elem>\t%P0, %P1, %P2[%c3]";
 }
   [(set_attr "type" "neon_sat_mul_<V_elem_ch>_scalar_q")]
 )
@@ -3324,8 +3249,7 @@ (define_insn "neon_vmla_lane<mode>"
 		     (match_operand:VMD 2 "s_register_operand" "w")
                      (match_operand:VMD 3 "s_register_operand"
 					"<scalar_mul_constraint>")
-                     (match_operand:SI 4 "immediate_operand" "i")
-                     (match_operand:SI 5 "immediate_operand" "i")]
+                     (match_operand:SI 4 "immediate_operand" "i")]
                      UNSPEC_VMLA_LANE))]
   "TARGET_NEON"
 {
@@ -3344,8 +3268,7 @@ (define_insn "neon_vmla_lane<mode>"
 		     (match_operand:VMQ 2 "s_register_operand" "w")
                      (match_operand:<V_HALF> 3 "s_register_operand"
 					     "<scalar_mul_constraint>")
-                     (match_operand:SI 4 "immediate_operand" "i")
-                     (match_operand:SI 5 "immediate_operand" "i")]
+                     (match_operand:SI 4 "immediate_operand" "i")]
                      UNSPEC_VMLA_LANE))]
   "TARGET_NEON"
 {
@@ -3358,19 +3281,18 @@ (define_insn "neon_vmla_lane<mode>"
                    (const_string "neon_mla_<V_elem_ch>_scalar<q>")))]
 )
 
-(define_insn "neon_vmlal_lane<mode>"
+(define_insn "neon_vmlal<sup>_lane<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
 			   (match_operand:VMDI 2 "s_register_operand" "w")
                            (match_operand:VMDI 3 "s_register_operand"
 					       "<scalar_mul_constraint>")
-                           (match_operand:SI 4 "immediate_operand" "i")
-                           (match_operand:SI 5 "immediate_operand" "i")]
-                          UNSPEC_VMLAL_LANE))]
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          VMLAL_LANE))]
   "TARGET_NEON"
 {
   neon_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<MODE>mode));
-  return "vmlal.%T5%#<V_sz_elem>\t%q0, %P2, %P3[%c4]";
+  return "vmlal.<sup>%#<V_sz_elem>\t%q0, %P2, %P3[%c4]";
 }
   [(set_attr "type" "neon_mla_<V_elem_ch>_scalar_long")]
 )
@@ -3381,8 +3303,7 @@ (define_insn "neon_vqdmlal_lane<mode>"
 			   (match_operand:VMDI 2 "s_register_operand" "w")
                            (match_operand:VMDI 3 "s_register_operand"
 					       "<scalar_mul_constraint>")
-                           (match_operand:SI 4 "immediate_operand" "i")
-                           (match_operand:SI 5 "immediate_operand" "i")]
+                           (match_operand:SI 4 "immediate_operand" "i")]
                           UNSPEC_VQDMLAL_LANE))]
   "TARGET_NEON"
 {
@@ -3398,8 +3319,7 @@ (define_insn "neon_vmls_lane<mode>"
 		     (match_operand:VMD 2 "s_register_operand" "w")
                      (match_operand:VMD 3 "s_register_operand"
 					"<scalar_mul_constraint>")
-                     (match_operand:SI 4 "immediate_operand" "i")
-                     (match_operand:SI 5 "immediate_operand" "i")]
+                     (match_operand:SI 4 "immediate_operand" "i")]
                     UNSPEC_VMLS_LANE))]
   "TARGET_NEON"
 {
@@ -3418,8 +3338,7 @@ (define_insn "neon_vmls_lane<mode>"
 		     (match_operand:VMQ 2 "s_register_operand" "w")
                      (match_operand:<V_HALF> 3 "s_register_operand"
 					     "<scalar_mul_constraint>")
-                     (match_operand:SI 4 "immediate_operand" "i")
-                     (match_operand:SI 5 "immediate_operand" "i")]
+                     (match_operand:SI 4 "immediate_operand" "i")]
                     UNSPEC_VMLS_LANE))]
   "TARGET_NEON"
 {
@@ -3432,19 +3351,18 @@ (define_insn "neon_vmls_lane<mode>"
                    (const_string "neon_mla_<V_elem_ch>_scalar<q>")))]
 )
 
-(define_insn "neon_vmlsl_lane<mode>"
+(define_insn "neon_vmlsl<sup>_lane<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
 			   (match_operand:VMDI 2 "s_register_operand" "w")
                            (match_operand:VMDI 3 "s_register_operand"
 					       "<scalar_mul_constraint>")
-                           (match_operand:SI 4 "immediate_operand" "i")
-                           (match_operand:SI 5 "immediate_operand" "i")]
-                          UNSPEC_VMLSL_LANE))]
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          VMLSL_LANE))]
   "TARGET_NEON"
 {
   neon_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<MODE>mode));
-  return "vmlsl.%T5%#<V_sz_elem>\t%q0, %P2, %P3[%c4]";
+  return "vmlsl.<sup>%#<V_sz_elem>\t%q0, %P2, %P3[%c4]";
 }
   [(set_attr "type" "neon_mla_<V_elem_ch>_scalar_long")]
 )
@@ -3455,8 +3373,7 @@ (define_insn "neon_vqdmlsl_lane<mode>"
 			   (match_operand:VMDI 2 "s_register_operand" "w")
                            (match_operand:VMDI 3 "s_register_operand"
 					       "<scalar_mul_constraint>")
-                           (match_operand:SI 4 "immediate_operand" "i")
-                           (match_operand:SI 5 "immediate_operand" "i")]
+                           (match_operand:SI 4 "immediate_operand" "i")]
                           UNSPEC_VQDMLSL_LANE))]
   "TARGET_NEON"
 {
@@ -3476,84 +3393,117 @@ (define_insn "neon_vqdmlsl_lane<mode>"
 (define_expand "neon_vmul_n<mode>"
   [(match_operand:VMD 0 "s_register_operand" "")
    (match_operand:VMD 1 "s_register_operand" "")
-   (match_operand:<V_elem> 2 "s_register_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
   emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
-				       const0_rtx, const0_rtx));
+				       const0_rtx));
   DONE;
 })
 
 (define_expand "neon_vmul_n<mode>"
   [(match_operand:VMQ 0 "s_register_operand" "")
    (match_operand:VMQ 1 "s_register_operand" "")
-   (match_operand:<V_elem> 2 "s_register_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<V_HALF>mode);
   emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[2], tmp, const0_rtx));
   emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
-				       const0_rtx, const0_rtx));
+				       const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vmulls_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:VMDI 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vmulls_lane<mode> (operands[0], operands[1], tmp,
+					 const0_rtx));
   DONE;
 })
 
-(define_expand "neon_vmull_n<mode>"
+(define_expand "neon_vmullu_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:VMDI 1 "s_register_operand" "")
-   (match_operand:<V_elem> 2 "s_register_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
-  emit_insn (gen_neon_vmull_lane<mode> (operands[0], operands[1], tmp,
-				        const0_rtx, operands[3]));
+  emit_insn (gen_neon_vmullu_lane<mode> (operands[0], operands[1], tmp,
+					 const0_rtx));
   DONE;
 })
 
 (define_expand "neon_vqdmull_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:VMDI 1 "s_register_operand" "")
-   (match_operand:<V_elem> 2 "s_register_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
   emit_insn (gen_neon_vqdmull_lane<mode> (operands[0], operands[1], tmp,
-				          const0_rtx, const0_rtx));
+				          const0_rtx));
   DONE;
 })
 
 (define_expand "neon_vqdmulh_n<mode>"
   [(match_operand:VMDI 0 "s_register_operand" "")
    (match_operand:VMDI 1 "s_register_operand" "")
-   (match_operand:<V_elem> 2 "s_register_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
   emit_insn (gen_neon_vqdmulh_lane<mode> (operands[0], operands[1], tmp,
-				          const0_rtx, operands[3]));
+				          const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vqrdmulh_n<mode>"
+  [(match_operand:VMDI 0 "s_register_operand" "")
+   (match_operand:VMDI 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vqrdmulh_lane<mode> (operands[0], operands[1], tmp,
+				          const0_rtx));
   DONE;
 })
 
 (define_expand "neon_vqdmulh_n<mode>"
   [(match_operand:VMQI 0 "s_register_operand" "")
    (match_operand:VMQI 1 "s_register_operand" "")
-   (match_operand:<V_elem> 2 "s_register_operand" "")
-   (match_operand:SI 3 "immediate_operand" "")]
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<V_HALF>mode);
   emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[2], tmp, const0_rtx));
   emit_insn (gen_neon_vqdmulh_lane<mode> (operands[0], operands[1], tmp,
-				          const0_rtx, operands[3]));
+					  const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vqrdmulh_n<mode>"
+  [(match_operand:VMQI 0 "s_register_operand" "")
+   (match_operand:VMQI 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<V_HALF>mode);
+  emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vqrdmulh_lane<mode> (operands[0], operands[1], tmp,
+					   const0_rtx));
   DONE;
 })
 
@@ -3561,14 +3511,13 @@ (define_expand "neon_vmla_n<mode>"
   [(match_operand:VMD 0 "s_register_operand" "")
    (match_operand:VMD 1 "s_register_operand" "")
    (match_operand:VMD 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
   emit_insn (gen_neon_vmla_lane<mode> (operands[0], operands[1], operands[2],
-				       tmp, const0_rtx, operands[4]));
+				       tmp, const0_rtx));
   DONE;
 })
 
@@ -3576,29 +3525,41 @@ (define_expand "neon_vmla_n<mode>"
   [(match_operand:VMQ 0 "s_register_operand" "")
    (match_operand:VMQ 1 "s_register_operand" "")
    (match_operand:VMQ 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<V_HALF>mode);
   emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[3], tmp, const0_rtx));
   emit_insn (gen_neon_vmla_lane<mode> (operands[0], operands[1], operands[2],
-				       tmp, const0_rtx, operands[4]));
+				       tmp, const0_rtx));
   DONE;
 })
 
-(define_expand "neon_vmlal_n<mode>"
+(define_expand "neon_vmlals_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:<V_widen> 1 "s_register_operand" "")
    (match_operand:VMDI 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
-  emit_insn (gen_neon_vmlal_lane<mode> (operands[0], operands[1], operands[2],
-					tmp, const0_rtx, operands[4]));
+  emit_insn (gen_neon_vmlals_lane<mode> (operands[0], operands[1], operands[2],
+					 tmp, const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vmlalu_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:<V_widen> 1 "s_register_operand" "")
+   (match_operand:VMDI 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmlalu_lane<mode> (operands[0], operands[1], operands[2],
+					 tmp, const0_rtx));
   DONE;
 })
 
@@ -3606,14 +3567,13 @@ (define_expand "neon_vqdmlal_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:<V_widen> 1 "s_register_operand" "")
    (match_operand:VMDI 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
   emit_insn (gen_neon_vqdmlal_lane<mode> (operands[0], operands[1], operands[2],
-					  tmp, const0_rtx, operands[4]));
+					  tmp, const0_rtx));
   DONE;
 })
 
@@ -3621,14 +3581,13 @@ (define_expand "neon_vmls_n<mode>"
   [(match_operand:VMD 0 "s_register_operand" "")
    (match_operand:VMD 1 "s_register_operand" "")
    (match_operand:VMD 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
   emit_insn (gen_neon_vmls_lane<mode> (operands[0], operands[1], operands[2],
-				       tmp, const0_rtx, operands[4]));
+				       tmp, const0_rtx));
   DONE;
 })
 
@@ -3636,29 +3595,41 @@ (define_expand "neon_vmls_n<mode>"
   [(match_operand:VMQ 0 "s_register_operand" "")
    (match_operand:VMQ 1 "s_register_operand" "")
    (match_operand:VMQ 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<V_HALF>mode);
   emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[3], tmp, const0_rtx));
   emit_insn (gen_neon_vmls_lane<mode> (operands[0], operands[1], operands[2],
-				       tmp, const0_rtx, operands[4]));
+				       tmp, const0_rtx));
   DONE;
 })
 
-(define_expand "neon_vmlsl_n<mode>"
+(define_expand "neon_vmlsls_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:<V_widen> 1 "s_register_operand" "")
    (match_operand:VMDI 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
-  emit_insn (gen_neon_vmlsl_lane<mode> (operands[0], operands[1], operands[2],
-					tmp, const0_rtx, operands[4]));
+  emit_insn (gen_neon_vmlsls_lane<mode> (operands[0], operands[1], operands[2],
+					tmp, const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vmlslu_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:<V_widen> 1 "s_register_operand" "")
+   (match_operand:VMDI 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmlslu_lane<mode> (operands[0], operands[1], operands[2],
+					tmp, const0_rtx));
   DONE;
 })
 
@@ -3666,14 +3637,13 @@ (define_expand "neon_vqdmlsl_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:<V_widen> 1 "s_register_operand" "")
    (match_operand:VMDI 2 "s_register_operand" "")
-   (match_operand:<V_elem> 3 "s_register_operand" "")
-   (match_operand:SI 4 "immediate_operand" "")]
+   (match_operand:<V_elem> 3 "s_register_operand" "")]
   "TARGET_NEON"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
   emit_insn (gen_neon_vqdmlsl_lane<mode> (operands[0], operands[1], operands[2],
-					  tmp, const0_rtx, operands[4]));
+					  tmp, const0_rtx));
   DONE;
 })
 
@@ -3693,8 +3663,7 @@ (define_insn "neon_vext<mode>"
 
 (define_insn "neon_vrev64<mode>"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
-	(unspec:VDQ [(match_operand:VDQ 1 "s_register_operand" "w")
-		     (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:VDQ [(match_operand:VDQ 1 "s_register_operand" "w")]
                     UNSPEC_VREV64))]
   "TARGET_NEON"
   "vrev64.<V_sz_elem>\t%<V_reg>0, %<V_reg>1"
@@ -3703,8 +3672,7 @@ (define_insn "neon_vrev64<mode>"
 
 (define_insn "neon_vrev32<mode>"
   [(set (match_operand:VX 0 "s_register_operand" "=w")
-	(unspec:VX [(match_operand:VX 1 "s_register_operand" "w")
-		    (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:VX [(match_operand:VX 1 "s_register_operand" "w")]
                    UNSPEC_VREV32))]
   "TARGET_NEON"
   "vrev32.<V_sz_elem>\t%<V_reg>0, %<V_reg>1"
@@ -3713,8 +3681,7 @@ (define_insn "neon_vrev32<mode>"
 
 (define_insn "neon_vrev16<mode>"
   [(set (match_operand:VE 0 "s_register_operand" "=w")
-	(unspec:VE [(match_operand:VE 1 "s_register_operand" "w")
-		    (match_operand:SI 2 "immediate_operand" "i")]
+	(unspec:VE [(match_operand:VE 1 "s_register_operand" "w")]
                    UNSPEC_VREV16))]
   "TARGET_NEON"
   "vrev16.<V_sz_elem>\t%<V_reg>0, %<V_reg>1"
@@ -3755,80 +3722,80 @@ (define_expand "neon_vbsl<mode>"
   operands[1] = gen_lowpart (<MODE>mode, operands[1]);
 })
 
-(define_insn "neon_vshl<mode>"
+;; vshl, vrshl
+(define_insn "neon_v<shift_op><sup><mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
 	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:VDQIX 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-                      UNSPEC_VSHL))]
+		       (match_operand:VDQIX 2 "s_register_operand" "w")]
+                      VSHL))]
   "TARGET_NEON"
-  "v%O3shl.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "v<shift_op>.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_shift_imm<q>")]
 )
 
-(define_insn "neon_vqshl<mode>"
+;; vqshl, vqrshl
+(define_insn "neon_v<shift_op><sup><mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
 	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:VDQIX 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-                      UNSPEC_VQSHL))]
+		       (match_operand:VDQIX 2 "s_register_operand" "w")]
+                      VQSHL))]
   "TARGET_NEON"
-  "vq%O3shl.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  "v<shift_op>.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set_attr "type" "neon_sat_shift_imm<q>")]
 )
 
-(define_insn "neon_vshr_n<mode>"
+;; vshr_n, vrshr_n
+(define_insn "neon_v<shift_op><sup>_n<mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
 	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:SI 2 "immediate_operand" "i")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-                      UNSPEC_VSHR_N))]
+		       (match_operand:SI 2 "immediate_operand" "i")]
+                      VSHR_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 1, neon_element_bits (<MODE>mode) + 1);
-  return "v%O3shr.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2";
+  return "v<shift_op>.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2";
 }
   [(set_attr "type" "neon_shift_imm<q>")]
 )
 
-(define_insn "neon_vshrn_n<mode>"
+;; vshrn_n, vrshrn_n
+(define_insn "neon_v<shift_op>_n<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-			    (match_operand:SI 2 "immediate_operand" "i")
-			    (match_operand:SI 3 "immediate_operand" "i")]
-                           UNSPEC_VSHRN_N))]
+			    (match_operand:SI 2 "immediate_operand" "i")]
+                           VSHRN_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 1, neon_element_bits (<MODE>mode) / 2 + 1);
-  return "v%O3shrn.<V_if_elem>\t%P0, %q1, %2";
+  return "v<shift_op>.<V_if_elem>\t%P0, %q1, %2";
 }
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
 
-(define_insn "neon_vqshrn_n<mode>"
+;; vqshrn_n, vqrshrn_n
+(define_insn "neon_v<shift_op><sup>_n<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-			    (match_operand:SI 2 "immediate_operand" "i")
-			    (match_operand:SI 3 "immediate_operand" "i")]
-                           UNSPEC_VQSHRN_N))]
+			    (match_operand:SI 2 "immediate_operand" "i")]
+                           VQSHRN_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 1, neon_element_bits (<MODE>mode) / 2 + 1);
-  return "vq%O3shrn.%T3%#<V_sz_elem>\t%P0, %q1, %2";
+  return "v<shift_op>.<sup>%#<V_sz_elem>\t%P0, %q1, %2";
 }
   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
 )
 
-(define_insn "neon_vqshrun_n<mode>"
+;; vqshrun_n, vqrshrun_n
+(define_insn "neon_v<shift_op>_n<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
-			    (match_operand:SI 2 "immediate_operand" "i")
-			    (match_operand:SI 3 "immediate_operand" "i")]
-                           UNSPEC_VQSHRUN_N))]
+			    (match_operand:SI 2 "immediate_operand" "i")]
+                           VQSHRUN_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 1, neon_element_bits (<MODE>mode) / 2 + 1);
-  return "vq%O3shrun.%T3%#<V_sz_elem>\t%P0, %q1, %2";
+  return "v<shift_op>.<V_s_elem>\t%P0, %q1, %2";
 }
   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
 )
@@ -3836,8 +3803,7 @@ (define_insn "neon_vqshrun_n<mode>"
 (define_insn "neon_vshl_n<mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
 	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:SI 2 "immediate_operand" "i")
-                       (match_operand:SI 3 "immediate_operand" "i")]
+		       (match_operand:SI 2 "immediate_operand" "i")]
                       UNSPEC_VSHL_N))]
   "TARGET_NEON"
 {
@@ -3847,16 +3813,15 @@ (define_insn "neon_vshl_n<mode>"
   [(set_attr "type" "neon_shift_imm<q>")]
 )
 
-(define_insn "neon_vqshl_n<mode>"
+(define_insn "neon_vqshl_<sup>_n<mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
 	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:SI 2 "immediate_operand" "i")
-                       (match_operand:SI 3 "immediate_operand" "i")]
-                      UNSPEC_VQSHL_N))]
+		       (match_operand:SI 2 "immediate_operand" "i")]
+                      VQSHL_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 0, neon_element_bits (<MODE>mode));
-  return "vqshl.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2";
+  return "vqshl.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2";
 }
   [(set_attr "type" "neon_sat_shift_imm<q>")]
 )
@@ -3864,43 +3829,41 @@ (define_insn "neon_vqshl_n<mode>"
 (define_insn "neon_vqshlu_n<mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
 	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
-		       (match_operand:SI 2 "immediate_operand" "i")
-                       (match_operand:SI 3 "immediate_operand" "i")]
+		       (match_operand:SI 2 "immediate_operand" "i")]
                       UNSPEC_VQSHLU_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[2], 0, neon_element_bits (<MODE>mode));
-  return "vqshlu.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2";
+  return "vqshlu.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %2";
 }
   [(set_attr "type" "neon_sat_shift_imm<q>")]
 )
 
-(define_insn "neon_vshll_n<mode>"
+(define_insn "neon_vshll<sup>_n<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
-			   (match_operand:SI 2 "immediate_operand" "i")
-			   (match_operand:SI 3 "immediate_operand" "i")]
-			  UNSPEC_VSHLL_N))]
+			   (match_operand:SI 2 "immediate_operand" "i")]
+			  VSHLL_N))]
   "TARGET_NEON"
 {
   /* The boundaries are: 0 < imm <= size.  */
   neon_const_bounds (operands[2], 0, neon_element_bits (<MODE>mode) + 1);
-  return "vshll.%T3%#<V_sz_elem>\t%q0, %P1, %2";
+  return "vshll.<sup>%#<V_sz_elem>\t%q0, %P1, %2";
 }
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_insn "neon_vsra_n<mode>"
+;; vsra_n, vrsra_n
+(define_insn "neon_v<shift_op><sup>_n<mode>"
   [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
 	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "0")
 		       (match_operand:VDQIX 2 "s_register_operand" "w")
-                       (match_operand:SI 3 "immediate_operand" "i")
-                       (match_operand:SI 4 "immediate_operand" "i")]
-                      UNSPEC_VSRA_N))]
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      VSRA_N))]
   "TARGET_NEON"
 {
   neon_const_bounds (operands[3], 1, neon_element_bits (<MODE>mode) + 1);
-  return "v%O4sra.%T4%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %3";
+  return "v<shift_op>.<sup>%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %3";
 }
   [(set_attr "type" "neon_shift_acc<q>")]
 )
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 147cb802d41cfa9451ca03eeac0ed1d9b6da2053..fcee30b62c35b4392748f15038b8c555b444bb55 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -173,12 +173,18 @@ (define_c_enum "unspec" [
   UNSPEC_SHA256SU1
   UNSPEC_VMULLP64
   UNSPEC_LOAD_COUNT
-  UNSPEC_VABD
-  UNSPEC_VABDL
+  UNSPEC_VABD_F
+  UNSPEC_VABD_S
+  UNSPEC_VABD_U
+  UNSPEC_VABDL_S
+  UNSPEC_VABDL_U
   UNSPEC_VADD
   UNSPEC_VADDHN
-  UNSPEC_VADDL
-  UNSPEC_VADDW
+  UNSPEC_VRADDHN
+  UNSPEC_VADDL_S
+  UNSPEC_VADDL_U
+  UNSPEC_VADDW_S
+  UNSPEC_VADDW_U
   UNSPEC_VBSL
   UNSPEC_VCAGE
   UNSPEC_VCAGT
@@ -190,10 +196,17 @@ (define_c_enum "unspec" [
   UNSPEC_VCLS
   UNSPEC_VCONCAT
   UNSPEC_VCVT
-  UNSPEC_VCVT_N
+  UNSPEC_VCVT_S
+  UNSPEC_VCVT_U
+  UNSPEC_VCVT_S_N
+  UNSPEC_VCVT_U_N
   UNSPEC_VEXT
-  UNSPEC_VHADD
-  UNSPEC_VHSUB
+  UNSPEC_VHADD_S
+  UNSPEC_VHADD_U
+  UNSPEC_VRHADD_S
+  UNSPEC_VRHADD_U
+  UNSPEC_VHSUB_S
+  UNSPEC_VHSUB_U
   UNSPEC_VLD1
   UNSPEC_VLD1_LANE
   UNSPEC_VLD2
@@ -210,49 +223,77 @@ (define_c_enum "unspec" [
   UNSPEC_VLD4_DUP
   UNSPEC_VLD4_LANE
   UNSPEC_VMAX
+  UNSPEC_VMAX_U
   UNSPEC_VMIN
+  UNSPEC_VMIN_U
   UNSPEC_VMLA
-  UNSPEC_VMLAL
   UNSPEC_VMLA_LANE
-  UNSPEC_VMLAL_LANE
+  UNSPEC_VMLAL_S
+  UNSPEC_VMLAL_U
+  UNSPEC_VMLAL_S_LANE
+  UNSPEC_VMLAL_U_LANE
   UNSPEC_VMLS
-  UNSPEC_VMLSL
   UNSPEC_VMLS_LANE
+  UNSPEC_VMLSL_S
+  UNSPEC_VMLSL_U
+  UNSPEC_VMLSL_S_LANE
+  UNSPEC_VMLSL_U_LANE
   UNSPEC_VMLSL_LANE
-  UNSPEC_VMOVL
+  UNSPEC_VMOVL_S
+  UNSPEC_VMOVL_U
   UNSPEC_VMOVN
   UNSPEC_VMUL
-  UNSPEC_VMULL
+  UNSPEC_VMULL_P
+  UNSPEC_VMULL_S
+  UNSPEC_VMULL_U
   UNSPEC_VMUL_LANE
-  UNSPEC_VMULL_LANE
-  UNSPEC_VPADAL
+  UNSPEC_VMULL_S_LANE
+  UNSPEC_VMULL_U_LANE
+  UNSPEC_VPADAL_S
+  UNSPEC_VPADAL_U
   UNSPEC_VPADD
-  UNSPEC_VPADDL
+  UNSPEC_VPADDL_S
+  UNSPEC_VPADDL_U
   UNSPEC_VPMAX
+  UNSPEC_VPMAX_U
   UNSPEC_VPMIN
+  UNSPEC_VPMIN_U
   UNSPEC_VPSMAX
   UNSPEC_VPSMIN
   UNSPEC_VPUMAX
   UNSPEC_VPUMIN
   UNSPEC_VQABS
-  UNSPEC_VQADD
+  UNSPEC_VQADD_S
+  UNSPEC_VQADD_U
   UNSPEC_VQDMLAL
   UNSPEC_VQDMLAL_LANE
   UNSPEC_VQDMLSL
   UNSPEC_VQDMLSL_LANE
   UNSPEC_VQDMULH
   UNSPEC_VQDMULH_LANE
+  UNSPEC_VQRDMULH
+  UNSPEC_VQRDMULH_LANE
   UNSPEC_VQDMULL
   UNSPEC_VQDMULL_LANE
-  UNSPEC_VQMOVN
+  UNSPEC_VQMOVN_S
+  UNSPEC_VQMOVN_U
   UNSPEC_VQMOVUN
   UNSPEC_VQNEG
-  UNSPEC_VQSHL
-  UNSPEC_VQSHL_N
+  UNSPEC_VQSHL_S
+  UNSPEC_VQSHL_U
+  UNSPEC_VQRSHL_S
+  UNSPEC_VQRSHL_U
+  UNSPEC_VQSHL_S_N
+  UNSPEC_VQSHL_U_N
   UNSPEC_VQSHLU_N
-  UNSPEC_VQSHRN_N
+  UNSPEC_VQSHRN_S_N
+  UNSPEC_VQSHRN_U_N
+  UNSPEC_VQRSHRN_S_N
+  UNSPEC_VQRSHRN_U_N
   UNSPEC_VQSHRUN_N
-  UNSPEC_VQSUB
+  UNSPEC_VQRSHRUN_N
+  UNSPEC_VQSUB_S
+  UNSPEC_VQSUB_U
   UNSPEC_VRECPE
   UNSPEC_VRECPS
   UNSPEC_VREV16
@@ -260,13 +301,24 @@ (define_c_enum "unspec" [
   UNSPEC_VREV64
   UNSPEC_VRSQRTE
   UNSPEC_VRSQRTS
-  UNSPEC_VSHL
-  UNSPEC_VSHLL_N
+  UNSPEC_VSHL_S
+  UNSPEC_VSHL_U
+  UNSPEC_VRSHL_S
+  UNSPEC_VRSHL_U
+  UNSPEC_VSHLL_S_N
+  UNSPEC_VSHLL_U_N
   UNSPEC_VSHL_N
-  UNSPEC_VSHR_N
+  UNSPEC_VSHR_S_N
+  UNSPEC_VSHR_U_N
+  UNSPEC_VRSHR_S_N
+  UNSPEC_VRSHR_U_N
   UNSPEC_VSHRN_N
+  UNSPEC_VRSHRN_N
   UNSPEC_VSLI
-  UNSPEC_VSRA_N
+  UNSPEC_VSRA_S_N
+  UNSPEC_VSRA_U_N
+  UNSPEC_VRSRA_S_N
+  UNSPEC_VRSRA_U_N
   UNSPEC_VSRI
   UNSPEC_VST1
   UNSPEC_VST1_LANE
@@ -283,8 +335,11 @@ (define_c_enum "unspec" [
   UNSPEC_VSTRUCTDUMMY
   UNSPEC_VSUB
   UNSPEC_VSUBHN
-  UNSPEC_VSUBL
-  UNSPEC_VSUBW
+  UNSPEC_VRSUBHN
+  UNSPEC_VSUBL_S
+  UNSPEC_VSUBL_U
+  UNSPEC_VSUBW_S
+  UNSPEC_VSUBW_U
   UNSPEC_VTBL
   UNSPEC_VTBX
   UNSPEC_VTRN1
diff --git a/gcc/testsuite/gcc.target/arm/pr51968.c b/gcc/testsuite/gcc.target/arm/pr51968.c
index 6cf802b..99bdb96 100644
--- a/gcc/testsuite/gcc.target/arm/pr51968.c
+++ b/gcc/testsuite/gcc.target/arm/pr51968.c
@@ -24,7 +24,7 @@ foo (int8x8_t z, int8x8_t x, int16x8_t b, int8x8_t n)
       int8x16_t g;
       int8x8_t h, j, k;
       struct T m;
-      j = __builtin_neon_vqmovunv8hi (b, 1);
+      j = __builtin_neon_vqmovunv8hi (b);
       g = __builtin_neon_vcombinev8qi (j, h);
       k = __builtin_neon_vget_lowv16qi (g);
       __builtin_neon_vuzpv8qi (&m.val[0], k, n);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM James Greenhalgh
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 4/8] Refactor "VAR<n>" Macros James Greenhalgh
@ 2014-11-12 17:11   ` James Greenhalgh
  2014-11-18  9:16     ` Ramana Radhakrishnan
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling James Greenhalgh
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 1990 bytes --]


Hi,

If we want to move all the code relating to "builtin" initialisation and
expansion to a common file, we must share the processor flags with that
common file.

This patch pulls those definitions out to config/arm/arm-protos.h

Bootstrapped and regression tested in series, and in isolation with no
issues.

OK?

Thanks,
James

---
2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/arm/t-arm (arm.o): Include arm-protos.h in the recipe.
	* config/arm/arm.c (FL_CO_PROC): Move to arm-protos.h.
	(FL_ARCH3M): Likewise.
	(FL_MODE26): Likewise.
	(FL_MODE32): Likewise.
	(FL_ARCH4): Likewise.
	(FL_ARCH5): Likewise.
	(FL_THUMB): Likewise.
	(FL_LDSCHED): Likewise.
	(FL_STRONG): Likewise.
	(FL_ARCH5E): Likewise.
	(FL_XSCALE): Likewise.
	(FL_ARCH6): Likewise.
	(FL_VFPV2): Likewise.
	(FL_WBUF): Likewise.
	(FL_ARCH6K): Likewise.
	(FL_THUMB2): Likewise.
	(FL_NOTM): Likewise.
	(FL_THUMB_DIV): Likewise.
	(FL_VFPV3): Likewise.
	(FL_NEON): Likewise.
	(FL_ARCH7EM): Likewise.
	(FL_ARCH7): Likewise.
	(FL_ARM_DIV): Likewise.
	(FL_ARCH8): Likewise.
	(FL_CRC32): Likewise.
	(FL_SMALLMUL): Likewise.
	(FL_IWMMXT): Likewise.
	(FL_IWMMXT2): Likewise.
	(FL_TUNE): Likewise.
	(FL_FOR_ARCH2): Likewise.
	(FL_FOR_ARCH3): Likewise.
	(FL_FOR_ARCH3M): Likewise.
	(FL_FOR_ARCH4): Likewise.
	(FL_FOR_ARCH4T): Likewise.
	(FL_FOR_ARCH5): Likewise.
	(FL_FOR_ARCH5T): Likewise.
	(FL_FOR_ARCH5E): Likewise.
	(FL_FOR_ARCH5TE): Likewise.
	(FL_FOR_ARCH5TEJ): Likewise.
	(FL_FOR_ARCH6): Likewise.
	(FL_FOR_ARCH6J): Likewise.
	(FL_FOR_ARCH6K): Likewise.
	(FL_FOR_ARCH6Z): Likewise.
	(FL_FOR_ARCH6ZK): Likewise.
	(FL_FOR_ARCH6T2): Likewise.
	(FL_FOR_ARCH6M): Likewise.
	(FL_FOR_ARCH7): Likewise.
	(FL_FOR_ARCH7A): Likewise.
	(FL_FOR_ARCH7VE): Likewise.
	(FL_FOR_ARCH7R): Likewise.
	(FL_FOR_ARCH7M): Likewise.
	(FL_FOR_ARCH7EM): Likewise.
	(FL_FOR_ARCH8A): Likewise.
	* config/arm/arm-protos.h: Take definitions moved from arm.c.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-Patch-ARM-Refactor-Builtins-2-8-Move-Processor-flags.patch --]
[-- Type: text/x-patch;  name=0002-Patch-ARM-Refactor-Builtins-2-8-Move-Processor-flags.patch, Size: 11823 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index a37aa80..aa9b1cb 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -306,4 +306,167 @@ extern const char *arm_rewrite_selected_cpu (const char *name);
 
 extern bool arm_is_constant_pool_ref (rtx);
 
+/* Flags used to identify the presence of processor capabilities.  */
+
+/* Bit values used to identify processor capabilities.  */
+#define FL_CO_PROC    (1 << 0)        /* Has external co-processor bus */
+#define FL_ARCH3M     (1 << 1)        /* Extended multiply */
+#define FL_MODE26     (1 << 2)        /* 26-bit mode support */
+#define FL_MODE32     (1 << 3)        /* 32-bit mode support */
+#define FL_ARCH4      (1 << 4)        /* Architecture rel 4 */
+#define FL_ARCH5      (1 << 5)        /* Architecture rel 5 */
+#define FL_THUMB      (1 << 6)        /* Thumb aware */
+#define FL_LDSCHED    (1 << 7)	      /* Load scheduling necessary */
+#define FL_STRONG     (1 << 8)	      /* StrongARM */
+#define FL_ARCH5E     (1 << 9)        /* DSP extensions to v5 */
+#define FL_XSCALE     (1 << 10)	      /* XScale */
+/* spare	      (1 << 11)	*/
+#define FL_ARCH6      (1 << 12)       /* Architecture rel 6.  Adds
+					 media instructions.  */
+#define FL_VFPV2      (1 << 13)       /* Vector Floating Point V2.  */
+#define FL_WBUF	      (1 << 14)	      /* Schedule for write buffer ops.
+					 Note: ARM6 & 7 derivatives only.  */
+#define FL_ARCH6K     (1 << 15)       /* Architecture rel 6 K extensions.  */
+#define FL_THUMB2     (1 << 16)	      /* Thumb-2.  */
+#define FL_NOTM	      (1 << 17)	      /* Instructions not present in the 'M'
+					 profile.  */
+#define FL_THUMB_DIV  (1 << 18)	      /* Hardware divide (Thumb mode).  */
+#define FL_VFPV3      (1 << 19)       /* Vector Floating Point V3.  */
+#define FL_NEON       (1 << 20)       /* Neon instructions.  */
+#define FL_ARCH7EM    (1 << 21)	      /* Instructions present in the ARMv7E-M
+					 architecture.  */
+#define FL_ARCH7      (1 << 22)       /* Architecture 7.  */
+#define FL_ARM_DIV    (1 << 23)	      /* Hardware divide (ARM mode).  */
+#define FL_ARCH8      (1 << 24)       /* Architecture 8.  */
+#define FL_CRC32      (1 << 25)	      /* ARMv8 CRC32 instructions.  */
+
+#define FL_SMALLMUL   (1 << 26)       /* Small multiply supported.  */
+
+#define FL_IWMMXT     (1 << 29)	      /* XScale v2 or "Intel Wireless MMX technology".  */
+#define FL_IWMMXT2    (1 << 30)       /* "Intel Wireless MMX2 technology".  */
+
+/* Flags that only effect tuning, not available instructions.  */
+#define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
+			 | FL_CO_PROC)
+
+#define FL_FOR_ARCH2	FL_NOTM
+#define FL_FOR_ARCH3	(FL_FOR_ARCH2 | FL_MODE32)
+#define FL_FOR_ARCH3M	(FL_FOR_ARCH3 | FL_ARCH3M)
+#define FL_FOR_ARCH4	(FL_FOR_ARCH3M | FL_ARCH4)
+#define FL_FOR_ARCH4T	(FL_FOR_ARCH4 | FL_THUMB)
+#define FL_FOR_ARCH5	(FL_FOR_ARCH4 | FL_ARCH5)
+#define FL_FOR_ARCH5T	(FL_FOR_ARCH5 | FL_THUMB)
+#define FL_FOR_ARCH5E	(FL_FOR_ARCH5 | FL_ARCH5E)
+#define FL_FOR_ARCH5TE	(FL_FOR_ARCH5E | FL_THUMB)
+#define FL_FOR_ARCH5TEJ	FL_FOR_ARCH5TE
+#define FL_FOR_ARCH6	(FL_FOR_ARCH5TE | FL_ARCH6)
+#define FL_FOR_ARCH6J	FL_FOR_ARCH6
+#define FL_FOR_ARCH6K	(FL_FOR_ARCH6 | FL_ARCH6K)
+#define FL_FOR_ARCH6Z	FL_FOR_ARCH6
+#define FL_FOR_ARCH6ZK	FL_FOR_ARCH6K
+#define FL_FOR_ARCH6T2	(FL_FOR_ARCH6 | FL_THUMB2)
+#define FL_FOR_ARCH6M	(FL_FOR_ARCH6 & ~FL_NOTM)
+#define FL_FOR_ARCH7	((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7)
+#define FL_FOR_ARCH7A	(FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K)
+#define FL_FOR_ARCH7VE	(FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV)
+#define FL_FOR_ARCH7R	(FL_FOR_ARCH7A | FL_THUMB_DIV)
+#define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
+#define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
+#define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+
+/* The bits in this mask specify which
+   instructions we are allowed to generate.  */
+extern unsigned long insn_flags;
+
+/* The bits in this mask specify which instruction scheduling options should
+   be used.  */
+extern unsigned long tune_flags;
+
+/* Nonzero if this chip supports the ARM Architecture 3M extensions.  */
+extern int arm_arch3m;
+
+/* Nonzero if this chip supports the ARM Architecture 4 extensions.  */
+extern int arm_arch4;
+
+/* Nonzero if this chip supports the ARM Architecture 4t extensions.  */
+extern int arm_arch4t;
+
+/* Nonzero if this chip supports the ARM Architecture 5 extensions.  */
+extern int arm_arch5;
+
+/* Nonzero if this chip supports the ARM Architecture 5E extensions.  */
+extern int arm_arch5e;
+
+/* Nonzero if this chip supports the ARM Architecture 6 extensions.  */
+extern int arm_arch6;
+
+/* Nonzero if this chip supports the ARM 6K extensions.  */
+extern int arm_arch6k;
+
+/* Nonzero if instructions present in ARMv6-M can be used.  */
+extern int arm_arch6m;
+
+/* Nonzero if this chip supports the ARM 7 extensions.  */
+extern int arm_arch7;
+
+/* Nonzero if instructions not present in the 'M' profile can be used.  */
+extern int arm_arch_notm;
+
+/* Nonzero if instructions present in ARMv7E-M can be used.  */
+extern int arm_arch7em;
+
+/* Nonzero if instructions present in ARMv8 can be used.  */
+extern int arm_arch8;
+
+/* Nonzero if this chip can benefit from load scheduling.  */
+extern int arm_ld_sched;
+
+/* Nonzero if this chip is a StrongARM.  */
+extern int arm_tune_strongarm;
+
+/* Nonzero if this chip supports Intel Wireless MMX technology.  */
+extern int arm_arch_iwmmxt;
+
+/* Nonzero if this chip supports Intel Wireless MMX2 technology.  */
+extern int arm_arch_iwmmxt2;
+
+/* Nonzero if this chip is an XScale.  */
+extern int arm_arch_xscale;
+
+/* Nonzero if tuning for XScale  */
+extern int arm_tune_xscale;
+
+/* Nonzero if we want to tune for stores that access the write-buffer.
+   This typically means an ARM6 or ARM7 with MMU or MPU.  */
+extern int arm_tune_wbuf;
+
+/* Nonzero if tuning for Cortex-A9.  */
+extern int arm_tune_cortex_a9;
+
+/* Nonzero if generating Thumb instructions.  */
+extern int thumb_code;
+
+/* Nonzero if generating Thumb-1 instructions.  */
+extern int thumb1_code;
+
+/* Nonzero if we should define __THUMB_INTERWORK__ in the
+   preprocessor.
+   XXX This is a bit of a hack, it's intended to help work around
+   problems in GLD which doesn't understand that armv5t code is
+   interworking clean.  */
+extern int arm_cpp_interwork;
+
+/* Nonzero if chip supports Thumb 2.  */
+extern int arm_arch_thumb2;
+
+/* Nonzero if chip supports integer division instruction.  */
+extern int arm_arch_arm_hwdiv;
+extern int arm_arch_thumb_hwdiv;
+
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+extern int prefer_neon_for_64bits;
+
+
+
 #endif /* ! GCC_ARM_PROTOS_H */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 35a3932..e338e05 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -739,79 +739,13 @@ const struct arm_fpu_desc *arm_fpu_desc;
 rtx thumb_call_via_label[14];
 static int thumb_call_reg_needed;
 
-/* Bit values used to identify processor capabilities.  */
-#define FL_CO_PROC    (1 << 0)        /* Has external co-processor bus */
-#define FL_ARCH3M     (1 << 1)        /* Extended multiply */
-#define FL_MODE26     (1 << 2)        /* 26-bit mode support */
-#define FL_MODE32     (1 << 3)        /* 32-bit mode support */
-#define FL_ARCH4      (1 << 4)        /* Architecture rel 4 */
-#define FL_ARCH5      (1 << 5)        /* Architecture rel 5 */
-#define FL_THUMB      (1 << 6)        /* Thumb aware */
-#define FL_LDSCHED    (1 << 7)	      /* Load scheduling necessary */
-#define FL_STRONG     (1 << 8)	      /* StrongARM */
-#define FL_ARCH5E     (1 << 9)        /* DSP extensions to v5 */
-#define FL_XSCALE     (1 << 10)	      /* XScale */
-/* spare	      (1 << 11)	*/
-#define FL_ARCH6      (1 << 12)       /* Architecture rel 6.  Adds
-					 media instructions.  */
-#define FL_VFPV2      (1 << 13)       /* Vector Floating Point V2.  */
-#define FL_WBUF	      (1 << 14)	      /* Schedule for write buffer ops.
-					 Note: ARM6 & 7 derivatives only.  */
-#define FL_ARCH6K     (1 << 15)       /* Architecture rel 6 K extensions.  */
-#define FL_THUMB2     (1 << 16)	      /* Thumb-2.  */
-#define FL_NOTM	      (1 << 17)	      /* Instructions not present in the 'M'
-					 profile.  */
-#define FL_THUMB_DIV  (1 << 18)	      /* Hardware divide (Thumb mode).  */
-#define FL_VFPV3      (1 << 19)       /* Vector Floating Point V3.  */
-#define FL_NEON       (1 << 20)       /* Neon instructions.  */
-#define FL_ARCH7EM    (1 << 21)	      /* Instructions present in the ARMv7E-M
-					 architecture.  */
-#define FL_ARCH7      (1 << 22)       /* Architecture 7.  */
-#define FL_ARM_DIV    (1 << 23)	      /* Hardware divide (ARM mode).  */
-#define FL_ARCH8      (1 << 24)       /* Architecture 8.  */
-#define FL_CRC32      (1 << 25)	      /* ARMv8 CRC32 instructions.  */
-
-#define FL_SMALLMUL   (1 << 26)       /* Small multiply supported.  */
-
-#define FL_IWMMXT     (1 << 29)	      /* XScale v2 or "Intel Wireless MMX technology".  */
-#define FL_IWMMXT2    (1 << 30)       /* "Intel Wireless MMX2 technology".  */
-
-/* Flags that only effect tuning, not available instructions.  */
-#define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
-			 | FL_CO_PROC)
-
-#define FL_FOR_ARCH2	FL_NOTM
-#define FL_FOR_ARCH3	(FL_FOR_ARCH2 | FL_MODE32)
-#define FL_FOR_ARCH3M	(FL_FOR_ARCH3 | FL_ARCH3M)
-#define FL_FOR_ARCH4	(FL_FOR_ARCH3M | FL_ARCH4)
-#define FL_FOR_ARCH4T	(FL_FOR_ARCH4 | FL_THUMB)
-#define FL_FOR_ARCH5	(FL_FOR_ARCH4 | FL_ARCH5)
-#define FL_FOR_ARCH5T	(FL_FOR_ARCH5 | FL_THUMB)
-#define FL_FOR_ARCH5E	(FL_FOR_ARCH5 | FL_ARCH5E)
-#define FL_FOR_ARCH5TE	(FL_FOR_ARCH5E | FL_THUMB)
-#define FL_FOR_ARCH5TEJ	FL_FOR_ARCH5TE
-#define FL_FOR_ARCH6	(FL_FOR_ARCH5TE | FL_ARCH6)
-#define FL_FOR_ARCH6J	FL_FOR_ARCH6
-#define FL_FOR_ARCH6K	(FL_FOR_ARCH6 | FL_ARCH6K)
-#define FL_FOR_ARCH6Z	FL_FOR_ARCH6
-#define FL_FOR_ARCH6ZK	FL_FOR_ARCH6K
-#define FL_FOR_ARCH6T2	(FL_FOR_ARCH6 | FL_THUMB2)
-#define FL_FOR_ARCH6M	(FL_FOR_ARCH6 & ~FL_NOTM)
-#define FL_FOR_ARCH7	((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7)
-#define FL_FOR_ARCH7A	(FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K)
-#define FL_FOR_ARCH7VE	(FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV)
-#define FL_FOR_ARCH7R	(FL_FOR_ARCH7A | FL_THUMB_DIV)
-#define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
-#define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
-#define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
-
 /* The bits in this mask specify which
    instructions we are allowed to generate.  */
-static unsigned long insn_flags = 0;
+unsigned long insn_flags = 0;
 
 /* The bits in this mask specify which instruction scheduling options should
    be used.  */
-static unsigned long tune_flags = 0;
+unsigned long tune_flags = 0;
 
 /* The highest ARM architecture version supported by the
    target.  */
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 99bd696..25236a4 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -92,6 +92,7 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   $(TARGET_H) $(TARGET_DEF_H) debug.h langhooks.h $(DF_H) \
   intl.h libfuncs.h $(PARAMS_H) $(OPTS_H) $(srcdir)/config/arm/arm-cores.def \
   $(srcdir)/config/arm/arm-arches.def $(srcdir)/config/arm/arm-fpus.def \
+  $(srcdir)/config/arm/arm-protos.h \
   $(srcdir)/config/arm/arm_neon_builtins.def
 
 arm-c.o: $(srcdir)/config/arm/arm-c.c $(CONFIG_H) $(SYSTEM_H) \

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch ARM Refactor Builtins 4/8]  Refactor "VAR<n>" Macros
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM James Greenhalgh
@ 2014-11-12 17:11   ` James Greenhalgh
  2014-11-18  9:17     ` Ramana Radhakrishnan
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h James Greenhalgh
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]


Hi,

These macros can always be defined as a base case of VAR1 and a "recursive"
case of VAR<n-1>. At the moment, the body of VAR1 is duplicated to each
macro.

This patch makes that change.

Regression tested on arm-none-linux-gnueabihf with no issues.

OK?

Thanks,
James

---
gcc/

2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/arm/arm-builtins.c (VAR1): Add a comma.
	(VAR2): Rewrite in terms of VAR1.
	(VAR3-10): Likewise.
	(arm_builtins): Remove leading comma before ARM_BUILTIN_MAX.
	* config/arm/arm_neon_builtins.def: Remove trailing commas.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-Patch-ARM-Refactor-Builtins-4-8-Refactor-VAR-n-Macro.patch --]
[-- Type: text/x-patch;  name=0004-Patch-ARM-Refactor-Builtins-4-8-Refactor-VAR-n-Macro.patch, Size: 28401 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index e387b60..ef86a31 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -134,34 +134,34 @@ typedef struct {
 #define CF(N,X) CODE_FOR_neon_##N##X
 
 #define VAR1(T, N, A) \
-  {#N, NEON_##T, UP (A), CF (N, A), 0}
+  {#N, NEON_##T, UP (A), CF (N, A), 0},
 #define VAR2(T, N, A, B) \
-  VAR1 (T, N, A), \
-  {#N, NEON_##T, UP (B), CF (N, B), 0}
+  VAR1 (T, N, A) \
+  VAR1 (T, N, B)
 #define VAR3(T, N, A, B, C) \
-  VAR2 (T, N, A, B), \
-  {#N, NEON_##T, UP (C), CF (N, C), 0}
+  VAR2 (T, N, A, B) \
+  VAR1 (T, N, C)
 #define VAR4(T, N, A, B, C, D) \
-  VAR3 (T, N, A, B, C), \
-  {#N, NEON_##T, UP (D), CF (N, D), 0}
+  VAR3 (T, N, A, B, C) \
+  VAR1 (T, N, D)
 #define VAR5(T, N, A, B, C, D, E) \
-  VAR4 (T, N, A, B, C, D), \
-  {#N, NEON_##T, UP (E), CF (N, E), 0}
+  VAR4 (T, N, A, B, C, D) \
+  VAR1 (T, N, E)
 #define VAR6(T, N, A, B, C, D, E, F) \
-  VAR5 (T, N, A, B, C, D, E), \
-  {#N, NEON_##T, UP (F), CF (N, F), 0}
+  VAR5 (T, N, A, B, C, D, E) \
+  VAR1 (T, N, F)
 #define VAR7(T, N, A, B, C, D, E, F, G) \
-  VAR6 (T, N, A, B, C, D, E, F), \
-  {#N, NEON_##T, UP (G), CF (N, G), 0}
+  VAR6 (T, N, A, B, C, D, E, F) \
+  VAR1 (T, N, G)
 #define VAR8(T, N, A, B, C, D, E, F, G, H) \
-  VAR7 (T, N, A, B, C, D, E, F, G), \
-  {#N, NEON_##T, UP (H), CF (N, H), 0}
+  VAR7 (T, N, A, B, C, D, E, F, G) \
+  VAR1 (T, N, H)
 #define VAR9(T, N, A, B, C, D, E, F, G, H, I) \
-  VAR8 (T, N, A, B, C, D, E, F, G, H), \
-  {#N, NEON_##T, UP (I), CF (N, I), 0}
+  VAR8 (T, N, A, B, C, D, E, F, G, H) \
+  VAR1 (T, N, I)
 #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
-  VAR9 (T, N, A, B, C, D, E, F, G, H, I), \
-  {#N, NEON_##T, UP (J), CF (N, J), 0}
+  VAR9 (T, N, A, B, C, D, E, F, G, H, I) \
+  VAR1 (T, N, J)
 
 /* The NEON builtin data can be found in arm_neon_builtins.def.
    The mode entries in the following table correspond to the "key" type of the
@@ -179,46 +179,10 @@ static neon_builtin_datum neon_builtin_data[] =
 
 #undef CF
 #undef VAR1
-#undef VAR2
-#undef VAR3
-#undef VAR4
-#undef VAR5
-#undef VAR6
-#undef VAR7
-#undef VAR8
-#undef VAR9
-#undef VAR10
 
-#define CF(N,X) ARM_BUILTIN_NEON_##N##X
-#define VAR1(T, N, A) \
-  CF (N, A)
-#define VAR2(T, N, A, B) \
-  VAR1 (T, N, A), \
-  CF (N, B)
-#define VAR3(T, N, A, B, C) \
-  VAR2 (T, N, A, B), \
-  CF (N, C)
-#define VAR4(T, N, A, B, C, D) \
-  VAR3 (T, N, A, B, C), \
-  CF (N, D)
-#define VAR5(T, N, A, B, C, D, E) \
-  VAR4 (T, N, A, B, C, D), \
-  CF (N, E)
-#define VAR6(T, N, A, B, C, D, E, F) \
-  VAR5 (T, N, A, B, C, D, E), \
-  CF (N, F)
-#define VAR7(T, N, A, B, C, D, E, F, G) \
-  VAR6 (T, N, A, B, C, D, E, F), \
-  CF (N, G)
-#define VAR8(T, N, A, B, C, D, E, F, G, H) \
-  VAR7 (T, N, A, B, C, D, E, F, G), \
-  CF (N, H)
-#define VAR9(T, N, A, B, C, D, E, F, G, H, I) \
-  VAR8 (T, N, A, B, C, D, E, F, G, H), \
-  CF (N, I)
-#define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
-  VAR9 (T, N, A, B, C, D, E, F, G, H, I), \
-  CF (N, J)
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_NEON_##N##X,
+
 enum arm_builtins
 {
   ARM_BUILTIN_GETWCGR0,
@@ -496,7 +460,7 @@ enum arm_builtins
 
 #include "arm_neon_builtins.def"
 
-  ,ARM_BUILTIN_MAX
+  ARM_BUILTIN_MAX
 };
 
 #define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 5451524..88f0788 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -18,264 +18,264 @@
    along with GCC; see the file COPYING3.  If not see
    <http://www.gnu.org/licenses/>.  */
 
-VAR2 (BINOP, vadd, v2sf, v4sf),
-VAR3 (BINOP, vaddls, v8qi, v4hi, v2si),
-VAR3 (BINOP, vaddlu, v8qi, v4hi, v2si),
-VAR3 (BINOP, vaddws, v8qi, v4hi, v2si),
-VAR3 (BINOP, vaddwu, v8qi, v4hi, v2si),
-VAR6 (BINOP, vhaddu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vhadds, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vrhaddu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vrhadds, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR8 (BINOP, vqadds, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vqaddu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR3 (BINOP, vaddhn, v8hi, v4si, v2di),
-VAR3 (BINOP, vraddhn, v8hi, v4si, v2di),
-VAR2 (BINOP, vmulf, v2sf, v4sf),
-VAR2 (BINOP, vmulp, v8qi, v16qi),
-VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR3 (TERNOP, vmlals, v8qi, v4hi, v2si),
-VAR3 (TERNOP, vmlalu, v8qi, v4hi, v2si),
-VAR2 (TERNOP, vfma, v2sf, v4sf),
-VAR2 (TERNOP, vfms, v2sf, v4sf),
-VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR3 (TERNOP, vmlsls, v8qi, v4hi, v2si),
-VAR3 (TERNOP, vmlslu, v8qi, v4hi, v2si),
-VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si),
-VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si),
-VAR2 (TERNOP, vqdmlal, v4hi, v2si),
-VAR2 (TERNOP, vqdmlsl, v4hi, v2si),
-VAR3 (BINOP, vmullp, v8qi, v4hi, v2si),
-VAR3 (BINOP, vmulls, v8qi, v4hi, v2si),
-VAR3 (BINOP, vmullu, v8qi, v4hi, v2si),
-VAR2 (SCALARMULL, vmulls_n, v4hi, v2si),
-VAR2 (SCALARMULL, vmullu_n, v4hi, v2si),
-VAR2 (LANEMULL, vmulls_lane, v4hi, v2si),
-VAR2 (LANEMULL, vmullu_lane, v4hi, v2si),
-VAR2 (SCALARMULL, vqdmull_n, v4hi, v2si),
-VAR2 (LANEMULL, vqdmull_lane, v4hi, v2si),
-VAR4 (SCALARMULH, vqdmulh_n, v4hi, v2si, v8hi, v4si),
-VAR4 (SCALARMULH, vqrdmulh_n, v4hi, v2si, v8hi, v4si),
-VAR4 (LANEMULH, vqdmulh_lane, v4hi, v2si, v8hi, v4si),
-VAR4 (LANEMULH, vqrdmulh_lane, v4hi, v2si, v8hi, v4si),
-VAR2 (BINOP, vqdmull, v4hi, v2si),
-VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vrshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vrshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vqshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vqshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vqrshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vqrshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vrshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vrshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vshrn_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vrshrn_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vqshrns_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vqshrnu_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vqrshrns_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vqrshrnu_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vqshrun_n, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vqrshrun_n, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vqshl_s_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vqshl_u_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTIMM, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR3 (SHIFTIMM, vshlls_n, v8qi, v4hi, v2si),
-VAR3 (SHIFTIMM, vshllu_n, v8qi, v4hi, v2si),
-VAR8 (SHIFTACC, vsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTACC, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTACC, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTACC, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR2 (BINOP, vsub, v2sf, v4sf),
-VAR3 (BINOP, vsubls, v8qi, v4hi, v2si),
-VAR3 (BINOP, vsublu, v8qi, v4hi, v2si),
-VAR3 (BINOP, vsubws, v8qi, v4hi, v2si),
-VAR3 (BINOP, vsubwu, v8qi, v4hi, v2si),
-VAR8 (BINOP, vqsubs, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (BINOP, vqsubu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR6 (BINOP, vhsubs, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vhsubu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR3 (BINOP, vsubhn, v8hi, v4si, v2di),
-VAR3 (BINOP, vrsubhn, v8hi, v4si, v2di),
-VAR8 (BINOP, vceq, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR8 (BINOP, vcge, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR6 (BINOP, vcgeu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR8 (BINOP, vcgt, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR6 (BINOP, vcgtu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR2 (BINOP, vcage, v2sf, v4sf),
-VAR2 (BINOP, vcagt, v2sf, v4sf),
-VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vabds, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vabdu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR2 (BINOP, vabdf, v2sf, v4sf),
-VAR3 (BINOP, vabdls, v8qi, v4hi, v2si),
-VAR3 (BINOP, vabdlu, v8qi, v4hi, v2si),
+VAR2 (BINOP, vadd, v2sf, v4sf)
+VAR3 (BINOP, vaddls, v8qi, v4hi, v2si)
+VAR3 (BINOP, vaddlu, v8qi, v4hi, v2si)
+VAR3 (BINOP, vaddws, v8qi, v4hi, v2si)
+VAR3 (BINOP, vaddwu, v8qi, v4hi, v2si)
+VAR6 (BINOP, vhaddu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vhadds, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vrhaddu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vrhadds, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR8 (BINOP, vqadds, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vqaddu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR3 (BINOP, vaddhn, v8hi, v4si, v2di)
+VAR3 (BINOP, vraddhn, v8hi, v4si, v2di)
+VAR2 (BINOP, vmulf, v2sf, v4sf)
+VAR2 (BINOP, vmulp, v8qi, v16qi)
+VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR3 (TERNOP, vmlals, v8qi, v4hi, v2si)
+VAR3 (TERNOP, vmlalu, v8qi, v4hi, v2si)
+VAR2 (TERNOP, vfma, v2sf, v4sf)
+VAR2 (TERNOP, vfms, v2sf, v4sf)
+VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR3 (TERNOP, vmlsls, v8qi, v4hi, v2si)
+VAR3 (TERNOP, vmlslu, v8qi, v4hi, v2si)
+VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si)
+VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si)
+VAR2 (TERNOP, vqdmlal, v4hi, v2si)
+VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
+VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
+VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
+VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
+VAR2 (SCALARMULL, vmulls_n, v4hi, v2si)
+VAR2 (SCALARMULL, vmullu_n, v4hi, v2si)
+VAR2 (LANEMULL, vmulls_lane, v4hi, v2si)
+VAR2 (LANEMULL, vmullu_lane, v4hi, v2si)
+VAR2 (SCALARMULL, vqdmull_n, v4hi, v2si)
+VAR2 (LANEMULL, vqdmull_lane, v4hi, v2si)
+VAR4 (SCALARMULH, vqdmulh_n, v4hi, v2si, v8hi, v4si)
+VAR4 (SCALARMULH, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
+VAR4 (LANEMULH, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (LANEMULH, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR2 (BINOP, vqdmull, v4hi, v2si)
+VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vrshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vrshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vqshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vqshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vqrshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vqrshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vrshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vrshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vshrn_n, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vrshrn_n, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vqshrns_n, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vqshrnu_n, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vqrshrns_n, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vqrshrnu_n, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vqshrun_n, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vqrshrun_n, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vqshl_s_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vqshl_u_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTIMM, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR3 (SHIFTIMM, vshlls_n, v8qi, v4hi, v2si)
+VAR3 (SHIFTIMM, vshllu_n, v8qi, v4hi, v2si)
+VAR8 (SHIFTACC, vsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTACC, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTACC, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTACC, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR2 (BINOP, vsub, v2sf, v4sf)
+VAR3 (BINOP, vsubls, v8qi, v4hi, v2si)
+VAR3 (BINOP, vsublu, v8qi, v4hi, v2si)
+VAR3 (BINOP, vsubws, v8qi, v4hi, v2si)
+VAR3 (BINOP, vsubwu, v8qi, v4hi, v2si)
+VAR8 (BINOP, vqsubs, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (BINOP, vqsubu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR6 (BINOP, vhsubs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vhsubu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR3 (BINOP, vsubhn, v8hi, v4si, v2di)
+VAR3 (BINOP, vrsubhn, v8hi, v4si, v2di)
+VAR8 (BINOP, vceq, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR8 (BINOP, vcge, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR6 (BINOP, vcgeu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR8 (BINOP, vcgt, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR6 (BINOP, vcgtu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR2 (BINOP, vcage, v2sf, v4sf)
+VAR2 (BINOP, vcagt, v2sf, v4sf)
+VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vabds, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vabdu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR2 (BINOP, vabdf, v2sf, v4sf)
+VAR3 (BINOP, vabdls, v8qi, v4hi, v2si)
+VAR3 (BINOP, vabdlu, v8qi, v4hi, v2si)
 
-VAR6 (TERNOP, vabas, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (TERNOP, vabau, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR3 (TERNOP, vabals, v8qi, v4hi, v2si),
-VAR3 (TERNOP, vabalu, v8qi, v4hi, v2si),
+VAR6 (TERNOP, vabas, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (TERNOP, vabau, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR3 (TERNOP, vabals, v8qi, v4hi, v2si)
+VAR3 (TERNOP, vabalu, v8qi, v4hi, v2si)
 
-VAR6 (BINOP, vmaxs, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vmaxu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR2 (BINOP, vmaxf, v2sf, v4sf),
-VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR2 (BINOP, vminf, v2sf, v4sf),
+VAR6 (BINOP, vmaxs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vmaxu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR2 (BINOP, vmaxf, v2sf, v4sf)
+VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR2 (BINOP, vminf, v2sf, v4sf)
 
-VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si),
-VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si),
-VAR1 (BINOP, vpmaxf, v2sf),
-VAR3 (BINOP, vpmins, v8qi, v4hi, v2si),
-VAR3 (BINOP, vpminu, v8qi, v4hi, v2si),
-VAR1 (BINOP, vpminf, v2sf),
+VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si)
+VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si)
+VAR1 (BINOP, vpmaxf, v2sf)
+VAR3 (BINOP, vpmins, v8qi, v4hi, v2si)
+VAR3 (BINOP, vpminu, v8qi, v4hi, v2si)
+VAR1 (BINOP, vpminf, v2sf)
 
-VAR4 (BINOP, vpadd, v8qi, v4hi, v2si, v2sf),
-VAR6 (UNOP, vpaddls, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (UNOP, vpaddlu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vpadals, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (BINOP, vpadalu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR2 (BINOP, vrecps, v2sf, v4sf),
-VAR2 (BINOP, vrsqrts, v2sf, v4sf),
-VAR8 (SHIFTINSERT, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (SHIFTINSERT, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR8 (UNOP, vabs, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR6 (UNOP, vqabs, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR8 (UNOP, vneg, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR6 (UNOP, vqneg, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (UNOP, vcls, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR6 (UNOP, vclz, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
-VAR5 (BSWAP, bswap, v4hi, v8hi, v2si, v4si, v2di),
-VAR2 (UNOP, vcnt, v8qi, v16qi),
-VAR4 (UNOP, vrecpe, v2si, v2sf, v4si, v4sf),
-VAR4 (UNOP, vrsqrte, v2si, v2sf, v4si, v4sf),
-VAR6 (UNOP, vmvn, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+VAR4 (BINOP, vpadd, v8qi, v4hi, v2si, v2sf)
+VAR6 (UNOP, vpaddls, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (UNOP, vpaddlu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vpadals, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (BINOP, vpadalu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR2 (BINOP, vrecps, v2sf, v4sf)
+VAR2 (BINOP, vrsqrts, v2sf, v4sf)
+VAR8 (SHIFTINSERT, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SHIFTINSERT, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (UNOP, vabs, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR6 (UNOP, vqabs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR8 (UNOP, vneg, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR6 (UNOP, vqneg, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (UNOP, vcls, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR6 (UNOP, vclz, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR5 (BSWAP, bswap, v4hi, v8hi, v2si, v4si, v2di)
+VAR2 (UNOP, vcnt, v8qi, v16qi)
+VAR4 (UNOP, vrecpe, v2si, v2sf, v4si, v4sf)
+VAR4 (UNOP, vrsqrte, v2si, v2sf, v4si, v4sf)
+VAR6 (UNOP, vmvn, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
   /* FIXME: vget_lane supports more variants than this!  */
 VAR10 (GETLANE, vget_lane,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR6 (GETLANE, vget_laneu, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
+	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR6 (GETLANE, vget_laneu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR10 (SETLANE, vset_lane,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR5 (CREATE, vcreate, v8qi, v4hi, v2si, v2sf, di),
+	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR5 (CREATE, vcreate, v8qi, v4hi, v2si, v2sf, di)
 VAR10 (DUP, vdup_n,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (BINOP, vdup_lane,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR5 (COMBINE, vcombine, v8qi, v4hi, v2si, v2sf, di),
-VAR5 (SPLIT, vget_high, v16qi, v8hi, v4si, v4sf, v2di),
-VAR5 (SPLIT, vget_low, v16qi, v8hi, v4si, v4sf, v2di),
-VAR3 (UNOP, vmovn, v8hi, v4si, v2di),
-VAR3 (UNOP, vqmovns, v8hi, v4si, v2di),
-VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di),
-VAR3 (UNOP, vqmovun, v8hi, v4si, v2di),
-VAR3 (UNOP, vmovls, v8qi, v4hi, v2si),
-VAR3 (UNOP, vmovlu, v8qi, v4hi, v2si),
-VAR6 (LANEMUL, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR6 (LANEMAC, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (LANEMAC, vmlals_lane, v4hi, v2si),
-VAR2 (LANEMAC, vmlalu_lane, v4hi, v2si),
-VAR2 (LANEMAC, vqdmlal_lane, v4hi, v2si),
-VAR6 (LANEMAC, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (LANEMAC, vmlsls_lane, v4hi, v2si),
-VAR2 (LANEMAC, vmlslu_lane, v4hi, v2si),
-VAR2 (LANEMAC, vqdmlsl_lane, v4hi, v2si),
-VAR6 (SCALARMUL, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR6 (SCALARMAC, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (SCALARMAC, vmlals_n, v4hi, v2si),
-VAR2 (SCALARMAC, vmlalu_n, v4hi, v2si),
-VAR2 (SCALARMAC, vqdmlal_n, v4hi, v2si),
-VAR6 (SCALARMAC, vmls_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR2 (SCALARMAC, vmlsls_n, v4hi, v2si),
-VAR2 (SCALARMAC, vmlslu_n, v4hi, v2si),
-VAR2 (SCALARMAC, vqdmlsl_n, v4hi, v2si),
+	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR5 (COMBINE, vcombine, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (SPLIT, vget_high, v16qi, v8hi, v4si, v4sf, v2di)
+VAR5 (SPLIT, vget_low, v16qi, v8hi, v4si, v4sf, v2di)
+VAR3 (UNOP, vmovn, v8hi, v4si, v2di)
+VAR3 (UNOP, vqmovns, v8hi, v4si, v2di)
+VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di)
+VAR3 (UNOP, vqmovun, v8hi, v4si, v2di)
+VAR3 (UNOP, vmovls, v8qi, v4hi, v2si)
+VAR3 (UNOP, vmovlu, v8qi, v4hi, v2si)
+VAR6 (LANEMUL, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR6 (LANEMAC, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR2 (LANEMAC, vmlals_lane, v4hi, v2si)
+VAR2 (LANEMAC, vmlalu_lane, v4hi, v2si)
+VAR2 (LANEMAC, vqdmlal_lane, v4hi, v2si)
+VAR6 (LANEMAC, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR2 (LANEMAC, vmlsls_lane, v4hi, v2si)
+VAR2 (LANEMAC, vmlslu_lane, v4hi, v2si)
+VAR2 (LANEMAC, vqdmlsl_lane, v4hi, v2si)
+VAR6 (SCALARMUL, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR6 (SCALARMAC, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR2 (SCALARMAC, vmlals_n, v4hi, v2si)
+VAR2 (SCALARMAC, vmlalu_n, v4hi, v2si)
+VAR2 (SCALARMAC, vqdmlal_n, v4hi, v2si)
+VAR6 (SCALARMAC, vmls_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR2 (SCALARMAC, vmlsls_n, v4hi, v2si)
+VAR2 (SCALARMAC, vmlslu_n, v4hi, v2si)
+VAR2 (SCALARMAC, vqdmlsl_n, v4hi, v2si)
 VAR10 (SHIFTINSERT, vext,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf),
-VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi),
-VAR2 (UNOP, vrev16, v8qi, v16qi),
-VAR4 (CONVERT, vcvts, v2si, v2sf, v4si, v4sf),
-VAR4 (CONVERT, vcvtu, v2si, v2sf, v4si, v4sf),
-VAR4 (FIXCONV, vcvts_n, v2si, v2sf, v4si, v4sf),
-VAR4 (FIXCONV, vcvtu_n, v2si, v2sf, v4si, v4sf),
-VAR1 (FLOAT_WIDEN, vcvtv4sf, v4hf),
-VAR1 (FLOAT_NARROW, vcvtv4hf, v4sf),
+	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi)
+VAR2 (UNOP, vrev16, v8qi, v16qi)
+VAR4 (CONVERT, vcvts, v2si, v2sf, v4si, v4sf)
+VAR4 (CONVERT, vcvtu, v2si, v2sf, v4si, v4sf)
+VAR4 (FIXCONV, vcvts_n, v2si, v2sf, v4si, v4sf)
+VAR4 (FIXCONV, vcvtu_n, v2si, v2sf, v4si, v4sf)
+VAR1 (FLOAT_WIDEN, vcvtv4sf, v4hf)
+VAR1 (FLOAT_NARROW, vcvtv4hf, v4sf)
 VAR10 (SELECT, vbsl,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR2 (COPYSIGNF, copysignf, v2sf, v4sf),
-VAR2 (RINT, vrintn, v2sf, v4sf),
-VAR2 (RINT, vrinta, v2sf, v4sf),
-VAR2 (RINT, vrintp, v2sf, v4sf),
-VAR2 (RINT, vrintm, v2sf, v4sf),
-VAR2 (RINT, vrintz, v2sf, v4sf),
-VAR2 (RINT, vrintx, v2sf, v4sf),
-VAR1 (RINT, vcvtav2sf, v2si),
-VAR1 (RINT, vcvtav4sf, v4si),
-VAR1 (RINT, vcvtauv2sf, v2si),
-VAR1 (RINT, vcvtauv4sf, v4si),
-VAR1 (RINT, vcvtpv2sf, v2si),
-VAR1 (RINT, vcvtpv4sf, v4si),
-VAR1 (RINT, vcvtpuv2sf, v2si),
-VAR1 (RINT, vcvtpuv4sf, v4si),
-VAR1 (RINT, vcvtmv2sf, v2si),
-VAR1 (RINT, vcvtmv4sf, v4si),
-VAR1 (RINT, vcvtmuv2sf, v2si),
-VAR1 (RINT, vcvtmuv4sf, v4si),
-VAR1 (VTBL, vtbl1, v8qi),
-VAR1 (VTBL, vtbl2, v8qi),
-VAR1 (VTBL, vtbl3, v8qi),
-VAR1 (VTBL, vtbl4, v8qi),
-VAR1 (VTBX, vtbx1, v8qi),
-VAR1 (VTBX, vtbx2, v8qi),
-VAR1 (VTBX, vtbx3, v8qi),
-VAR1 (VTBX, vtbx4, v8qi),
-VAR5 (REINTERP, vreinterpretv8qi, v8qi, v4hi, v2si, v2sf, di),
-VAR5 (REINTERP, vreinterpretv4hi, v8qi, v4hi, v2si, v2sf, di),
-VAR5 (REINTERP, vreinterpretv2si, v8qi, v4hi, v2si, v2sf, di),
-VAR5 (REINTERP, vreinterpretv2sf, v8qi, v4hi, v2si, v2sf, di),
-VAR5 (REINTERP, vreinterpretdi, v8qi, v4hi, v2si, v2sf, di),
-VAR6 (REINTERP, vreinterpretv16qi, v16qi, v8hi, v4si, v4sf, v2di, ti),
-VAR6 (REINTERP, vreinterpretv8hi, v16qi, v8hi, v4si, v4sf, v2di, ti),
-VAR6 (REINTERP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di, ti),
-VAR6 (REINTERP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di, ti),
-VAR6 (REINTERP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di, ti),
-VAR6 (REINTERP, vreinterpretti, v16qi, v8hi, v4si, v4sf, v2di, ti),
+	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR2 (COPYSIGNF, copysignf, v2sf, v4sf)
+VAR2 (RINT, vrintn, v2sf, v4sf)
+VAR2 (RINT, vrinta, v2sf, v4sf)
+VAR2 (RINT, vrintp, v2sf, v4sf)
+VAR2 (RINT, vrintm, v2sf, v4sf)
+VAR2 (RINT, vrintz, v2sf, v4sf)
+VAR2 (RINT, vrintx, v2sf, v4sf)
+VAR1 (RINT, vcvtav2sf, v2si)
+VAR1 (RINT, vcvtav4sf, v4si)
+VAR1 (RINT, vcvtauv2sf, v2si)
+VAR1 (RINT, vcvtauv4sf, v4si)
+VAR1 (RINT, vcvtpv2sf, v2si)
+VAR1 (RINT, vcvtpv4sf, v4si)
+VAR1 (RINT, vcvtpuv2sf, v2si)
+VAR1 (RINT, vcvtpuv4sf, v4si)
+VAR1 (RINT, vcvtmv2sf, v2si)
+VAR1 (RINT, vcvtmv4sf, v4si)
+VAR1 (RINT, vcvtmuv2sf, v2si)
+VAR1 (RINT, vcvtmuv4sf, v4si)
+VAR1 (VTBL, vtbl1, v8qi)
+VAR1 (VTBL, vtbl2, v8qi)
+VAR1 (VTBL, vtbl3, v8qi)
+VAR1 (VTBL, vtbl4, v8qi)
+VAR1 (VTBX, vtbx1, v8qi)
+VAR1 (VTBX, vtbx2, v8qi)
+VAR1 (VTBX, vtbx3, v8qi)
+VAR1 (VTBX, vtbx4, v8qi)
+VAR5 (REINTERP, vreinterpretv8qi, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (REINTERP, vreinterpretv4hi, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (REINTERP, vreinterpretv2si, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (REINTERP, vreinterpretv2sf, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (REINTERP, vreinterpretdi, v8qi, v4hi, v2si, v2sf, di)
+VAR6 (REINTERP, vreinterpretv16qi, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (REINTERP, vreinterpretv8hi, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (REINTERP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (REINTERP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (REINTERP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (REINTERP, vreinterpretti, v16qi, v8hi, v4si, v4sf, v2di, ti)
 VAR10 (LOAD1, vld1,
-         v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+        v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (LOAD1LANE, vld1_lane,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (LOAD1, vld1_dup,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (STORE1, vst1,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (STORE1LANE, vst1_lane,
-	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR9 (LOADSTRUCT,
-	vld2, v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR9 (LOADSTRUCT, vld2,
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
 VAR7 (LOADSTRUCTLANE, vld2_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR5 (LOADSTRUCT, vld2_dup, v8qi, v4hi, v2si, v2sf, di),
+	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR5 (LOADSTRUCT, vld2_dup, v8qi, v4hi, v2si, v2sf, di)
 VAR9 (STORESTRUCT, vst2,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
 VAR7 (STORESTRUCTLANE, vst2_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR9 (LOADSTRUCT,
-	vld3, v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf),
+	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR9 (LOADSTRUCT, vld3,
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
 VAR7 (LOADSTRUCTLANE, vld3_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR5 (LOADSTRUCT, vld3_dup, v8qi, v4hi, v2si, v2sf, di),
+	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR5 (LOADSTRUCT, vld3_dup, v8qi, v4hi, v2si, v2sf, di)
 VAR9 (STORESTRUCT, vst3,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
 VAR7 (STORESTRUCTLANE, vst3_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
+	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR9 (LOADSTRUCT, vld4,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
 VAR7 (LOADSTRUCTLANE, vld4_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR5 (LOADSTRUCT, vld4_dup, v8qi, v4hi, v2si, v2sf, di),
+	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR5 (LOADSTRUCT, vld4_dup, v8qi, v4hi, v2si, v2sf, di)
 VAR9 (STORESTRUCT, vst4,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf),
+	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
 VAR7 (STORESTRUCTLANE, vst4_lane,
 	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM.
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
@ 2014-11-12 17:11   ` James Greenhalgh
  2014-11-18  9:18     ` Ramana Radhakrishnan
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 4/8] Refactor "VAR<n>" Macros James Greenhalgh
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 2486 bytes --]


Hi,

Now we have everything we need to start keeping track of the correct
"qualifiers" for each Neon builtin class in the arm back-end.

Some of the ARM Neon itypes are redundant when mapped to the qualifiers
framework. For now, don't change these, we will clean them up in patch
 8 of the series.

Bootstrapped on arm-none-gnueabihf with no issues.

OK?

Thanks,
James

---
gcc/

2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc/config/arm/arm-builtins.c	(arm_type_qualifiers): New.
	(neon_itype): Add new types corresponding to the types used in
	qualifiers names.
	(arm_unop_qualifiers): New.
	(arm_bswap_qualifiers): Likewise.
	(arm_binop_qualifiers): Likewise.
	(arm_ternop_qualifiers): Likewise.
	(arm_getlane_qualifiers): Likewise.
	(arm_lanemac_qualifiers): Likewise.
	(arm_setlane_qualifiers): Likewise.
	(arm_combine_qualifiers): Likewise.
	(arm_load1_qualifiers): Likewise.
	(arm_load1_lane_qualifiers): Likewise.
	(arm_store1_qualifiers): Likewise.
	(arm_storestruct_lane_qualifiers): Likewise.
	(UNOP_QUALIFIERS): Likewise.
	(DUP_QUALIFIERS): Likewise.
	(SPLIT_QUALIFIERS): Likewise.
	(CONVERT_QUALIFIERS): Likewise.
	(FLOAT_WIDEN_QUALIFIERS): Likewise.
	(FLOAT_NARROW_QUALIFIERS): Likewise.
	(RINT_QUALIFIERS): Likewise.
	(COPYSIGNF_QUALIFIERS): Likewise.
	(CREATE_QUALIFIERS): Likewise.
	(REINTERP_QUALIFIERS): Likewise.
	(BSWAP_QUALIFIERS): Likewise.
	(BINOP_QUALIFIERS): Likewise.
	(FIXCONV_QUALIFIERS): Likewise.
	(SCALARMUL_QUALIFIERS): Likewise.
	(SCALARMULL_QUALIFIERS): Likewise.
	(SCALARMULH_QUALIFIERS): Likewise.
	(TERNOP_QUALIFIERS): Likewise.
	(SELECT_QUALIFIERS): Likewise.
	(VTBX_QUALIFIERS): Likewise.
	(GETLANE_QUALIFIERS): Likewise.
	(SHIFTIMM_QUALIFIERS): Likewise.
	(LANEMAC_QUALIFIERS): Likewise.
	(SCALARMAC_QUALIFIERS): Likewise.
	(SETLANE_QUALIFIERS): Likewise.
	(SHIFTINSERT_QUALIFIERS): Likewise.
	(SHIFTACC_QUALIFIERS): Likewise.
	(LANEMUL_QUALIFIERS): Likewise.
	(LANEMULL_QUALIFIERS): Likewise.
	(LANEMULH_QUALIFIERS): Likewise.
	(COMBINE_QUALIFIERS): Likewise.
	(VTBL_QUALIFIERS): Likewise.
	(LOAD1_QUALIFIERS): Likewise.
	(LOADSTRUCT_QUALIFIERS): Likewise.
	(LOAD1LANE_QUALIFIERS): Likewise.
	(LOADSTRUCTLANE_QUALIFIERS): Likewise.
	(STORE1_QUALIFIERS): Likewise.
	(STORESTRUCT_QUALIFIERS): Likewise.
	(STORE1LANE_QUALIFIERS): Likewise.
	(STORESTRUCTLANE_QUALIFIERS): Likewise.
	(neon_builtin_datum): Keep track of qualifiers.
	(VAR1): Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-Patch-ARM-Refactor-Builtins-5-8-Start-keeping-track-.patch --]
[-- Type: text/x-patch;  name=0005-Patch-ARM-Refactor-Builtins-5-8-Start-keeping-track-.patch, Size: 6774 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index ef86a31..4ea6581 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -35,6 +35,148 @@
 #include "ggc.h"
 #include "arm-protos.h"
 
+#define SIMD_MAX_BUILTIN_ARGS 5
+
+enum arm_type_qualifiers
+{
+  /* T foo.  */
+  qualifier_none = 0x0,
+  /* unsigned T foo.  */
+  qualifier_unsigned = 0x1, /* 1 << 0  */
+  /* const T foo.  */
+  qualifier_const = 0x2, /* 1 << 1  */
+  /* T *foo.  */
+  qualifier_pointer = 0x4, /* 1 << 2  */
+  /* Used when expanding arguments if an operand could
+     be an immediate.  */
+  qualifier_immediate = 0x8, /* 1 << 3  */
+  qualifier_maybe_immediate = 0x10, /* 1 << 4  */
+  /* void foo (...).  */
+  qualifier_void = 0x20, /* 1 << 5  */
+  /* Some patterns may have internal operands, this qualifier is an
+     instruction to the initialisation code to skip this operand.  */
+  qualifier_internal = 0x40, /* 1 << 6  */
+  /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum
+     rather than using the type of the operand.  */
+  qualifier_map_mode = 0x80, /* 1 << 7  */
+  /* qualifier_pointer | qualifier_map_mode  */
+  qualifier_pointer_map_mode = 0x84,
+  /* qualifier_const_pointer | qualifier_map_mode  */
+  qualifier_const_pointer_map_mode = 0x86,
+  /* Polynomial types.  */
+  qualifier_poly = 0x100
+};
+
+/*  The qualifier_internal allows generation of a unary builtin from
+    a pattern with a third pseudo-operand such as a match_scratch.
+    T (T).  */
+static enum arm_type_qualifiers
+arm_unop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_internal };
+#define CONVERT_QUALIFIERS (arm_unop_qualifiers)
+#define COPYSIGNF_QUALIFIERS (arm_unop_qualifiers)
+#define CREATE_QUALIFIERS (arm_unop_qualifiers)
+#define DUP_QUALIFIERS (arm_unop_qualifiers)
+#define FLOAT_WIDEN_QUALIFIERS (arm_unop_qualifiers)
+#define FLOAT_NARROW_QUALIFIERS (arm_unop_qualifiers)
+#define REINTERP_QUALIFIERS (arm_unop_qualifiers)
+#define RINT_QUALIFIERS (arm_unop_qualifiers)
+#define SPLIT_QUALIFIERS (arm_unop_qualifiers)
+#define UNOP_QUALIFIERS (arm_unop_qualifiers)
+
+/* unsigned T (unsigned T).  */
+static enum arm_type_qualifiers
+arm_bswap_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned };
+#define BSWAP_QUALIFIERS (arm_bswap_qualifiers)
+
+/* T (T, T [maybe_immediate]).  */
+static enum arm_type_qualifiers
+arm_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_maybe_immediate };
+#define BINOP_QUALIFIERS (arm_binop_qualifiers)
+#define FIXCONV_QUALIFIERS (arm_binop_qualifiers)
+#define SCALARMUL_QUALIFIERS (arm_binop_qualifiers)
+#define SCALARMULL_QUALIFIERS (arm_binop_qualifiers)
+#define SCALARMULH_QUALIFIERS (arm_binop_qualifiers)
+
+/* T (T, T, T).  */
+static enum arm_type_qualifiers
+arm_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
+#define TERNOP_QUALIFIERS (arm_ternop_qualifiers)
+#define SELECT_QUALIFIERS (arm_ternop_qualifiers)
+#define VTBX_QUALIFIERS (arm_ternop_qualifiers)
+
+/* T (T, immediate).  */
+static enum arm_type_qualifiers
+arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_immediate };
+#define GETLANE_QUALIFIERS (arm_getlane_qualifiers)
+#define SHIFTIMM_QUALIFIERS (arm_getlane_qualifiers)
+
+/* T (T, T, T, immediate).  */
+static enum arm_type_qualifiers
+arm_lanemac_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+      qualifier_none, qualifier_immediate };
+#define LANEMAC_QUALIFIERS (arm_lanemac_qualifiers)
+#define SCALARMAC_QUALIFIERS (arm_lanemac_qualifiers)
+
+/* T (T, T, immediate).  */
+static enum arm_type_qualifiers
+arm_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate };
+#define LANEMUL_QUALIFIERS (arm_setlane_qualifiers)
+#define LANEMULH_QUALIFIERS (arm_setlane_qualifiers)
+#define LANEMULL_QUALIFIERS (arm_setlane_qualifiers)
+#define SETLANE_QUALIFIERS (arm_setlane_qualifiers)
+#define SHIFTACC_QUALIFIERS (arm_setlane_qualifiers)
+#define SHIFTINSERT_QUALIFIERS (arm_setlane_qualifiers)
+
+/* T (T, T).  */
+static enum arm_type_qualifiers
+arm_combine_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none };
+#define COMBINE_QUALIFIERS (arm_combine_qualifiers)
+#define VTBL_QUALIFIERS (arm_combine_qualifiers)
+
+/* T ([T element type] *).  */
+static enum arm_type_qualifiers
+arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_const_pointer_map_mode };
+#define LOAD1_QUALIFIERS (arm_load1_qualifiers)
+#define LOADSTRUCT_QUALIFIERS (arm_load1_qualifiers)
+
+/* T ([T element type] *, T, immediate).  */
+static enum arm_type_qualifiers
+arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_const_pointer_map_mode,
+      qualifier_none, qualifier_immediate };
+#define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
+#define LOADSTRUCTLANE_QUALIFIERS (arm_load1_lane_qualifiers)
+
+/* The first argument (return type) of a store should be void type,
+   which we represent with qualifier_void.  Their first operand will be
+   a DImode pointer to the location to store to, so we must use
+   qualifier_map_mode | qualifier_pointer to build a pointer to the
+   element type of the vector.
+
+   void ([T element type] *, T).  */
+static enum arm_type_qualifiers
+arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer_map_mode, qualifier_none };
+#define STORE1_QUALIFIERS (arm_store1_qualifiers)
+#define STORESTRUCT_QUALIFIERS (arm_store1_qualifiers)
+
+   /* void ([T element type] *, T, immediate).  */
+static enum arm_type_qualifiers
+arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer_map_mode,
+      qualifier_none, qualifier_immediate };
+#define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers)
+#define STORESTRUCTLANE_QUALIFIERS (arm_storestruct_lane_qualifiers)
+
 typedef enum {
   T_V8QI,
   T_V4HI,
@@ -129,12 +271,13 @@ typedef struct {
   const neon_builtin_type_mode mode;
   const enum insn_code code;
   unsigned int fcode;
+  enum arm_type_qualifiers *qualifiers;
 } neon_builtin_datum;
 
 #define CF(N,X) CODE_FOR_neon_##N##X
 
 #define VAR1(T, N, A) \
-  {#N, NEON_##T, UP (A), CF (N, A), 0},
+  {#N, NEON_##T, UP (A), CF (N, A), 0, T##_QUALIFIERS},
 #define VAR2(T, N, A, B) \
   VAR1 (T, N, A) \
   VAR1 (T, N, B)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
                     ` (2 preceding siblings ...)
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h James Greenhalgh
@ 2014-11-12 17:12   ` James Greenhalgh
  2014-11-18  9:30     ` Ramana Radhakrishnan
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file James Greenhalgh
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 2111 bytes --]


Hi,

This patch wires up builtin initialisation similar to the AArch64 backend,
making use of the "qualifiers" arrays to decide on types for each builtin
we hope to initialise.

We could take an old snapshot of the qualifiers code from AArch64, but as
our end-goal is to pull in the type mangling changes, we are as well to do
that now. In order to preserve the old mangling rules after this patch, we
must wire all of these types up.

Together, this becomes a fairly simple side-port of the logic for
Advanced SIMD builtins from the AArch64 target.

Bootstrapped on arm-none-linux-gnueabihf with no issues.

OK?

Thanks,
James

---
gcc/

2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/arm/arm-builtins.c (arm_scalar_builtin_types): New.
	(enum arm_simd_type): Likewise.
	(struct arm_simd_type_info): Likewise
	(arm_mangle_builtin_scalar_type): Likewise.
	(arm_mangle_builtin_vector_type): Likewise.
	(arm_mangle_builtin_type): Likewise.
	(arm_simd_builtin_std_type): Likewise.
	(arm_lookup_simd_builtin_type): Likewise.
	(arm_simd_builtin_type): Likewise.
	(arm_init_simd_builtin_types): Likewise.
	(arm_init_simd_builtin_scalar_types): Likewise.
	(arm_init_neon_builtins): Rewrite using qualifiers.
	* config/arm/arm-protos.h (arm_mangle_builtin_type): New.
	* config/arm/arm-simd-builtin-types.def: New file.
	* config/arm/t-arm (arm-builtins.o): Depend on it.
	* config/arm/arm.c (arm_mangle_type): Call arm_mangle_builtin_type.
	* config/arm/arm_neon.h (int8x8_t): Use new internal type.
	(int16x4_t): Likewise.
	(int32x2_t): Likewise.
	(float16x4_t): Likewise.
	(float32x2_t): Likewise.
	(poly8x8_t): Likewise.
	(poly16x4_t): Likewise.
	(uint8x8_t): Likewise.
	(uint16x4_t): Likewise.
	(uint32x2_t): Likewise.
	(int8x16_t): Likewise.
	(int16x8_t): Likewise.
	(int32x4_t): Likewise.
	(int64x2_t): Likewise.
	(float32x4_t): Likewise.
	(poly8x16_t): Likewise.
	(poly16x8_t): Likewise.
	(uint8x16_t): Likewise.
	(uint16x8_t): Likewise.
	(uint32x4_t): Likewise.
	(uint64x2_t): Likewise.

Conflicts:
	gcc/config/arm/arm.c

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0007-Patch-ARM-Refactor-Builtins-7-8-Use-qualifiers-array.patch --]
[-- Type: text/x-patch;  name=0007-Patch-ARM-Refactor-Builtins-7-8-Use-qualifiers-array.patch, Size: 48696 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 4ea6581..6f3183e 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -625,598 +625,552 @@ static GTY(()) tree arm_builtin_decls[ARM_BUILTIN_MAX];
 #define NUM_DREG_TYPES 5
 #define NUM_QREG_TYPES 6
 
+/* Internal scalar builtin types.  These types are used to support
+   neon intrinsic builtins.  They are _not_ user-visible types.  Therefore
+   the mangling for these types are implementation defined.  */
+const char *arm_scalar_builtin_types[] = {
+  "__builtin_neon_qi",
+  "__builtin_neon_hi",
+  "__builtin_neon_si",
+  "__builtin_neon_sf",
+  "__builtin_neon_di",
+  "__builtin_neon_df",
+  "__builtin_neon_ti",
+  "__builtin_neon_uqi",
+  "__builtin_neon_uhi",
+  "__builtin_neon_usi",
+  "__builtin_neon_udi",
+  "__builtin_neon_ei",
+  "__builtin_neon_oi",
+  "__builtin_neon_ci",
+  "__builtin_neon_xi",
+  NULL
+};
+
+#define ENTRY(E, M, Q, S, T, G) E,
+enum arm_simd_type
+{
+#include "arm-simd-builtin-types.def"
+  __TYPE_FINAL
+};
+#undef ENTRY
+
+struct arm_simd_type_info
+{
+  enum arm_simd_type type;
+
+  /* Internal type name.  */
+  const char *name;
+
+  /* Internal type name(mangled).  The mangled names conform to the
+     AAPCS (see "Procedure Call Standard for the ARM Architecture",
+     Appendix A).  To qualify for emission with the mangled names defined in
+     that document, a vector type must not only be of the correct mode but also
+     be of the correct internal Neon vector type (e.g. __simd64_int8_t);
+     these types are registered by arm_init_simd_builtin_types ().  In other
+     words, vector types defined in other ways e.g. via vector_size attribute
+     will get default mangled names.  */
+  const char *mangle;
+
+  /* Internal type.  */
+  tree itype;
+
+  /* Element type.  */
+  tree eltype;
+
+  /* Machine mode the internal type maps to.  */
+  machine_mode mode;
+
+  /* Qualifiers.  */
+  enum arm_type_qualifiers q;
+};
+
+#define ENTRY(E, M, Q, S, T, G)		\
+  {E,					\
+   "__simd" #S "_" #T "_t",		\
+   #G "__simd" #S "_" #T "_t",		\
+   NULL_TREE, NULL_TREE, M##mode, qualifier_##Q},
+static struct arm_simd_type_info arm_simd_types [] = {
+#include "arm-simd-builtin-types.def"
+};
+#undef ENTRY
+
+static tree arm_simd_floatHF_type_node = NULL_TREE;
+static tree arm_simd_intOI_type_node = NULL_TREE;
+static tree arm_simd_intEI_type_node = NULL_TREE;
+static tree arm_simd_intCI_type_node = NULL_TREE;
+static tree arm_simd_intXI_type_node = NULL_TREE;
+static tree arm_simd_polyQI_type_node = NULL_TREE;
+static tree arm_simd_polyHI_type_node = NULL_TREE;
+static tree arm_simd_polyDI_type_node = NULL_TREE;
+static tree arm_simd_polyTI_type_node = NULL_TREE;
+
+static const char *
+arm_mangle_builtin_scalar_type (const_tree type)
+{
+  int i = 0;
+
+  while (arm_scalar_builtin_types[i] != NULL)
+    {
+      const char *name = arm_scalar_builtin_types[i];
+
+      if (TREE_CODE (TYPE_NAME (type)) == TYPE_DECL
+	  && DECL_NAME (TYPE_NAME (type))
+	  && !strcmp (IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))), name))
+	return arm_scalar_builtin_types[i];
+      i++;
+    }
+  return NULL;
+}
+
+static const char *
+arm_mangle_builtin_vector_type (const_tree type)
+{
+  int i;
+  int nelts = sizeof (arm_simd_types) / sizeof (arm_simd_types[0]);
+
+  for (i = 0; i < nelts; i++)
+    if (arm_simd_types[i].mode ==  TYPE_MODE (type)
+	&& TYPE_NAME (type)
+	&& TREE_CODE (TYPE_NAME (type)) == TYPE_DECL
+	&& DECL_NAME (TYPE_NAME (type))
+	&& !strcmp
+	     (IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))),
+	      arm_simd_types[i].name))
+      return arm_simd_types[i].mangle;
+
+  return NULL;
+}
+
+const char *
+arm_mangle_builtin_type (const_tree type)
+{
+  const char *mangle;
+  /* Walk through all the AArch64 builtins types tables to filter out the
+     incoming type.  */
+  if ((mangle = arm_mangle_builtin_vector_type (type))
+      || (mangle = arm_mangle_builtin_scalar_type (type)))
+    return mangle;
+
+  return NULL;
+}
+
+static tree
+arm_simd_builtin_std_type (enum machine_mode mode,
+			   enum arm_type_qualifiers q)
+{
+#define QUAL_TYPE(M)  \
+  ((q == qualifier_none) ? int##M##_type_node : unsigned_int##M##_type_node);
+  switch (mode)
+    {
+    case QImode:
+      return QUAL_TYPE (QI);
+    case HImode:
+      return QUAL_TYPE (HI);
+    case SImode:
+      return QUAL_TYPE (SI);
+    case DImode:
+      return QUAL_TYPE (DI);
+    case TImode:
+      return QUAL_TYPE (TI);
+    case OImode:
+      return arm_simd_intOI_type_node;
+    case EImode:
+      return arm_simd_intEI_type_node;
+    case CImode:
+      return arm_simd_intCI_type_node;
+    case XImode:
+      return arm_simd_intXI_type_node;
+    case HFmode:
+      return arm_simd_floatHF_type_node;
+    case SFmode:
+      return float_type_node;
+    case DFmode:
+      return double_type_node;
+    default:
+      gcc_unreachable ();
+    }
+#undef QUAL_TYPE
+}
+
+static tree
+arm_lookup_simd_builtin_type (enum machine_mode mode,
+			      enum arm_type_qualifiers q)
+{
+  int i;
+  int nelts = sizeof (arm_simd_types) / sizeof (arm_simd_types[0]);
+
+  /* Non-poly scalar modes map to standard types not in the table.  */
+  if (q != qualifier_poly && !VECTOR_MODE_P (mode))
+    return arm_simd_builtin_std_type (mode, q);
+
+  for (i = 0; i < nelts; i++)
+    if (arm_simd_types[i].mode == mode
+	&& arm_simd_types[i].q == q)
+      return arm_simd_types[i].itype;
+
+  /* Note that we won't have caught the underlying type for poly64x2_t
+     in the above table.  This gets default mangling.  */
+
+  return NULL_TREE;
+}
+
+static tree
+arm_simd_builtin_type (enum machine_mode mode,
+			   bool unsigned_p, bool poly_p)
+{
+  if (poly_p)
+    return arm_lookup_simd_builtin_type (mode, qualifier_poly);
+  else if (unsigned_p)
+    return arm_lookup_simd_builtin_type (mode, qualifier_unsigned);
+  else
+    return arm_lookup_simd_builtin_type (mode, qualifier_none);
+}
+
 static void
-arm_init_neon_builtins (void)
+arm_init_simd_builtin_types (void)
 {
-  unsigned int i, fcode;
-  tree decl;
-
-  tree neon_intQI_type_node;
-  tree neon_intHI_type_node;
-  tree neon_floatHF_type_node;
-  tree neon_polyQI_type_node;
-  tree neon_polyHI_type_node;
-  tree neon_intSI_type_node;
-  tree neon_intDI_type_node;
-  tree neon_intUTI_type_node;
-  tree neon_float_type_node;
-
-  tree intQI_pointer_node;
-  tree intHI_pointer_node;
-  tree intSI_pointer_node;
-  tree intDI_pointer_node;
-  tree float_pointer_node;
-
-  tree const_intQI_node;
-  tree const_intHI_node;
-  tree const_intSI_node;
-  tree const_intDI_node;
-  tree const_float_node;
-
-  tree const_intQI_pointer_node;
-  tree const_intHI_pointer_node;
-  tree const_intSI_pointer_node;
-  tree const_intDI_pointer_node;
-  tree const_float_pointer_node;
-
-  tree V8QI_type_node;
-  tree V4HI_type_node;
-  tree V4UHI_type_node;
-  tree V4HF_type_node;
-  tree V2SI_type_node;
-  tree V2USI_type_node;
-  tree V2SF_type_node;
-  tree V16QI_type_node;
-  tree V8HI_type_node;
-  tree V8UHI_type_node;
-  tree V4SI_type_node;
-  tree V4USI_type_node;
-  tree V4SF_type_node;
-  tree V2DI_type_node;
-  tree V2UDI_type_node;
-
-  tree intUQI_type_node;
-  tree intUHI_type_node;
-  tree intUSI_type_node;
-  tree intUDI_type_node;
-
-  tree intEI_type_node;
-  tree intOI_type_node;
-  tree intCI_type_node;
-  tree intXI_type_node;
-
-  tree reinterp_ftype_dreg[NUM_DREG_TYPES][NUM_DREG_TYPES];
-  tree reinterp_ftype_qreg[NUM_QREG_TYPES][NUM_QREG_TYPES];
-  tree dreg_types[NUM_DREG_TYPES], qreg_types[NUM_QREG_TYPES];
-
-  /* Create distinguished type nodes for NEON vector element types,
-     and pointers to values of such types, so we can detect them later.  */
-  neon_intQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
-  neon_intHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
-  neon_polyQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
-  neon_polyHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
-  neon_intSI_type_node = make_signed_type (GET_MODE_PRECISION (SImode));
-  neon_intDI_type_node = make_signed_type (GET_MODE_PRECISION (DImode));
-  neon_float_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (neon_float_type_node) = FLOAT_TYPE_SIZE;
-  layout_type (neon_float_type_node);
-  neon_floatHF_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (neon_floatHF_type_node) = GET_MODE_PRECISION (HFmode);
-  layout_type (neon_floatHF_type_node);
-
-  /* Define typedefs which exactly correspond to the modes we are basing vector
-     types on.  If you change these names you'll need to change
-     the table used by arm_mangle_type too.  */
-  (*lang_hooks.types.register_builtin_type) (neon_intQI_type_node,
+  int i;
+  int nelts = sizeof (arm_simd_types) / sizeof (arm_simd_types[0]);
+  tree tdecl;
+
+  /* Initialize the HFmode scalar type.  */
+  arm_simd_floatHF_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (arm_simd_floatHF_type_node) = GET_MODE_PRECISION (HFmode);
+  layout_type (arm_simd_floatHF_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_floatHF_type_node,
+					     "__builtin_neon_hf");
+
+  /* Poly types are a world of their own.  In order to maintain legacy
+     ABI, they get initialized using the old interface, and don't get
+     an entry in our mangling table, consequently, they get default
+     mangling.  As a further gotcha, poly8_t and poly16_t are signed
+     types, poly64_t and poly128_t are unsigned types.  */
+  arm_simd_polyQI_type_node
+    = build_distinct_type_copy (intQI_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
+					     "__builtin_neon_poly8");
+  arm_simd_polyHI_type_node
+    = build_distinct_type_copy (intHI_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
+					     "__builtin_neon_poly16");
+  arm_simd_polyDI_type_node
+    = build_distinct_type_copy (unsigned_intDI_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
+					     "__builtin_neon_poly64");
+  arm_simd_polyTI_type_node
+    = build_distinct_type_copy (unsigned_intTI_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
+					     "__builtin_neon_poly128");
+
+  /* Init all the element types built by the front-end.  */
+  arm_simd_types[Int8x8_t].eltype = intQI_type_node;
+  arm_simd_types[Int8x16_t].eltype = intQI_type_node;
+  arm_simd_types[Int16x4_t].eltype = intHI_type_node;
+  arm_simd_types[Int16x8_t].eltype = intHI_type_node;
+  arm_simd_types[Int32x2_t].eltype = intSI_type_node;
+  arm_simd_types[Int32x4_t].eltype = intSI_type_node;
+  arm_simd_types[Int64x2_t].eltype = intDI_type_node;
+  arm_simd_types[Uint8x8_t].eltype = unsigned_intQI_type_node;
+  arm_simd_types[Uint8x16_t].eltype = unsigned_intQI_type_node;
+  arm_simd_types[Uint16x4_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Uint16x8_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Uint32x2_t].eltype = unsigned_intSI_type_node;
+  arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
+  arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
+
+  /* Init poly vector element types with scalar poly types.  */
+  arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
+  arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
+  arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
+  arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
+  /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
+     mangling.  */
+
+  /* Continue with standard types.  */
+  arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
+  arm_simd_types[Float32x2_t].eltype = float_type_node;
+  arm_simd_types[Float32x4_t].eltype = float_type_node;
+
+  for (i = 0; i < nelts; i++)
+    {
+      tree eltype = arm_simd_types[i].eltype;
+      enum machine_mode mode = arm_simd_types[i].mode;
+
+      if (arm_simd_types[i].itype == NULL)
+	arm_simd_types[i].itype =
+	  build_distinct_type_copy
+	    (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
+
+      tdecl = add_builtin_type (arm_simd_types[i].name,
+				arm_simd_types[i].itype);
+      TYPE_NAME (arm_simd_types[i].itype) = tdecl;
+      SET_TYPE_STRUCTURAL_EQUALITY (arm_simd_types[i].itype);
+    }
+
+#define AARCH_BUILD_SIGNED_TYPE(mode)  \
+  make_signed_type (GET_MODE_PRECISION (mode));
+  arm_simd_intOI_type_node = AARCH_BUILD_SIGNED_TYPE (OImode);
+  arm_simd_intEI_type_node = AARCH_BUILD_SIGNED_TYPE (EImode);
+  arm_simd_intCI_type_node = AARCH_BUILD_SIGNED_TYPE (CImode);
+  arm_simd_intXI_type_node = AARCH_BUILD_SIGNED_TYPE (XImode);
+#undef AARCH_BUILD_SIGNED_TYPE
+
+  tdecl = add_builtin_type
+	    ("__builtin_neon_ei" , arm_simd_intEI_type_node);
+  TYPE_NAME (arm_simd_intEI_type_node) = tdecl;
+  tdecl = add_builtin_type
+	    ("__builtin_neon_oi" , arm_simd_intOI_type_node);
+  TYPE_NAME (arm_simd_intOI_type_node) = tdecl;
+  tdecl = add_builtin_type
+	    ("__builtin_neon_ci" , arm_simd_intCI_type_node);
+  TYPE_NAME (arm_simd_intCI_type_node) = tdecl;
+  tdecl = add_builtin_type
+	    ("__builtin_neon_xi" , arm_simd_intXI_type_node);
+  TYPE_NAME (arm_simd_intXI_type_node) = tdecl;
+}
+
+static void
+arm_init_simd_builtin_scalar_types (void)
+{
+  /* Define typedefs for all the standard scalar types.  */
+  (*lang_hooks.types.register_builtin_type) (intQI_type_node,
 					     "__builtin_neon_qi");
-  (*lang_hooks.types.register_builtin_type) (neon_intHI_type_node,
+  (*lang_hooks.types.register_builtin_type) (intHI_type_node,
 					     "__builtin_neon_hi");
-  (*lang_hooks.types.register_builtin_type) (neon_floatHF_type_node,
-					     "__builtin_neon_hf");
-  (*lang_hooks.types.register_builtin_type) (neon_intSI_type_node,
+  (*lang_hooks.types.register_builtin_type) (intSI_type_node,
 					     "__builtin_neon_si");
-  (*lang_hooks.types.register_builtin_type) (neon_float_type_node,
+  (*lang_hooks.types.register_builtin_type) (float_type_node,
 					     "__builtin_neon_sf");
-  (*lang_hooks.types.register_builtin_type) (neon_intDI_type_node,
+  (*lang_hooks.types.register_builtin_type) (intDI_type_node,
 					     "__builtin_neon_di");
-  (*lang_hooks.types.register_builtin_type) (neon_polyQI_type_node,
-					     "__builtin_neon_poly8");
-  (*lang_hooks.types.register_builtin_type) (neon_polyHI_type_node,
-					     "__builtin_neon_poly16");
-
-  intQI_pointer_node = build_pointer_type (neon_intQI_type_node);
-  intHI_pointer_node = build_pointer_type (neon_intHI_type_node);
-  intSI_pointer_node = build_pointer_type (neon_intSI_type_node);
-  intDI_pointer_node = build_pointer_type (neon_intDI_type_node);
-  float_pointer_node = build_pointer_type (neon_float_type_node);
-
-  /* Next create constant-qualified versions of the above types.  */
-  const_intQI_node = build_qualified_type (neon_intQI_type_node,
-					   TYPE_QUAL_CONST);
-  const_intHI_node = build_qualified_type (neon_intHI_type_node,
-					   TYPE_QUAL_CONST);
-  const_intSI_node = build_qualified_type (neon_intSI_type_node,
-					   TYPE_QUAL_CONST);
-  const_intDI_node = build_qualified_type (neon_intDI_type_node,
-					   TYPE_QUAL_CONST);
-  const_float_node = build_qualified_type (neon_float_type_node,
-					   TYPE_QUAL_CONST);
-
-  const_intQI_pointer_node = build_pointer_type (const_intQI_node);
-  const_intHI_pointer_node = build_pointer_type (const_intHI_node);
-  const_intSI_pointer_node = build_pointer_type (const_intSI_node);
-  const_intDI_pointer_node = build_pointer_type (const_intDI_node);
-  const_float_pointer_node = build_pointer_type (const_float_node);
+  (*lang_hooks.types.register_builtin_type) (double_type_node,
+					     "__builtin_neon_df");
+  (*lang_hooks.types.register_builtin_type) (intTI_type_node,
+					     "__builtin_neon_ti");
 
   /* Unsigned integer types for various mode sizes.  */
-  intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
-  intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
-  intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
-  intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
-  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
-  /* Now create vector types based on our NEON element types.  */
-  /* 64-bit vectors.  */
-  V8QI_type_node =
-    build_vector_type_for_mode (neon_intQI_type_node, V8QImode);
-  V4HI_type_node =
-    build_vector_type_for_mode (neon_intHI_type_node, V4HImode);
-  V4UHI_type_node =
-    build_vector_type_for_mode (intUHI_type_node, V4HImode);
-  V4HF_type_node =
-    build_vector_type_for_mode (neon_floatHF_type_node, V4HFmode);
-  V2SI_type_node =
-    build_vector_type_for_mode (neon_intSI_type_node, V2SImode);
-  V2USI_type_node =
-    build_vector_type_for_mode (intUSI_type_node, V2SImode);
-  V2SF_type_node =
-    build_vector_type_for_mode (neon_float_type_node, V2SFmode);
-  /* 128-bit vectors.  */
-  V16QI_type_node =
-    build_vector_type_for_mode (neon_intQI_type_node, V16QImode);
-  V8HI_type_node =
-    build_vector_type_for_mode (neon_intHI_type_node, V8HImode);
-  V8UHI_type_node =
-    build_vector_type_for_mode (intUHI_type_node, V8HImode);
-  V4SI_type_node =
-    build_vector_type_for_mode (neon_intSI_type_node, V4SImode);
-  V4USI_type_node =
-    build_vector_type_for_mode (intUSI_type_node, V4SImode);
-  V4SF_type_node =
-    build_vector_type_for_mode (neon_float_type_node, V4SFmode);
-  V2DI_type_node =
-    build_vector_type_for_mode (neon_intDI_type_node, V2DImode);
-  V2UDI_type_node =
-    build_vector_type_for_mode (intUDI_type_node, V2DImode);
-
-
-  (*lang_hooks.types.register_builtin_type) (intUQI_type_node,
+  (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
 					     "__builtin_neon_uqi");
-  (*lang_hooks.types.register_builtin_type) (intUHI_type_node,
+  (*lang_hooks.types.register_builtin_type) (unsigned_intHI_type_node,
 					     "__builtin_neon_uhi");
-  (*lang_hooks.types.register_builtin_type) (intUSI_type_node,
+  (*lang_hooks.types.register_builtin_type) (unsigned_intSI_type_node,
 					     "__builtin_neon_usi");
-  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
+  (*lang_hooks.types.register_builtin_type) (unsigned_intDI_type_node,
 					     "__builtin_neon_udi");
-  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
-					     "__builtin_neon_poly64");
-  (*lang_hooks.types.register_builtin_type) (neon_intUTI_type_node,
-					     "__builtin_neon_poly128");
-
-  /* Opaque integer types for structures of vectors.  */
-  intEI_type_node = make_signed_type (GET_MODE_PRECISION (EImode));
-  intOI_type_node = make_signed_type (GET_MODE_PRECISION (OImode));
-  intCI_type_node = make_signed_type (GET_MODE_PRECISION (CImode));
-  intXI_type_node = make_signed_type (GET_MODE_PRECISION (XImode));
+  (*lang_hooks.types.register_builtin_type) (unsigned_intTI_type_node,
+					     "__builtin_neon_uti");
+}
 
-  (*lang_hooks.types.register_builtin_type) (intTI_type_node,
-					     "__builtin_neon_ti");
-  (*lang_hooks.types.register_builtin_type) (intEI_type_node,
-					     "__builtin_neon_ei");
-  (*lang_hooks.types.register_builtin_type) (intOI_type_node,
-					     "__builtin_neon_oi");
-  (*lang_hooks.types.register_builtin_type) (intCI_type_node,
-					     "__builtin_neon_ci");
-  (*lang_hooks.types.register_builtin_type) (intXI_type_node,
-					     "__builtin_neon_xi");
+static void
+arm_init_neon_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_NEON_BASE;
 
-  if (TARGET_CRYPTO && TARGET_HARD_FLOAT)
-  {
-
-    tree V16UQI_type_node =
-      build_vector_type_for_mode (intUQI_type_node, V16QImode);
-
-    tree v16uqi_ftype_v16uqi
-      = build_function_type_list (V16UQI_type_node, V16UQI_type_node, NULL_TREE);
-
-    tree v16uqi_ftype_v16uqi_v16uqi
-      = build_function_type_list (V16UQI_type_node, V16UQI_type_node,
-                                  V16UQI_type_node, NULL_TREE);
-
-    tree v4usi_ftype_v4usi
-      = build_function_type_list (V4USI_type_node, V4USI_type_node, NULL_TREE);
-
-    tree v4usi_ftype_v4usi_v4usi
-      = build_function_type_list (V4USI_type_node, V4USI_type_node,
-                                  V4USI_type_node, NULL_TREE);
-
-    tree v4usi_ftype_v4usi_v4usi_v4usi
-      = build_function_type_list (V4USI_type_node, V4USI_type_node,
-                                  V4USI_type_node, V4USI_type_node, NULL_TREE);
-
-    tree uti_ftype_udi_udi
-      = build_function_type_list (neon_intUTI_type_node, intUDI_type_node,
-                                  intUDI_type_node, NULL_TREE);
-
-    #undef CRYPTO1
-    #undef CRYPTO2
-    #undef CRYPTO3
-    #undef C
-    #undef N
-    #undef CF
-    #undef FT1
-    #undef FT2
-    #undef FT3
-
-    #define C(U) \
-      ARM_BUILTIN_CRYPTO_##U
-    #define N(L) \
-      "__builtin_arm_crypto_"#L
-    #define FT1(R, A) \
-      R##_ftype_##A
-    #define FT2(R, A1, A2) \
-      R##_ftype_##A1##_##A2
-    #define FT3(R, A1, A2, A3) \
-      R##_ftype_##A1##_##A2##_##A3
-    #define CRYPTO1(L, U, R, A) \
-      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT1 (R, A), \
-                                                       C (U), BUILT_IN_MD, \
-                                                       NULL, NULL_TREE);
-    #define CRYPTO2(L, U, R, A1, A2) \
-      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT2 (R, A1, A2), \
-                                                       C (U), BUILT_IN_MD, \
-                                                       NULL, NULL_TREE);
-
-    #define CRYPTO3(L, U, R, A1, A2, A3) \
-      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT3 (R, A1, A2, A3), \
-                                                       C (U), BUILT_IN_MD, \
-                                                       NULL, NULL_TREE);
-    #include "crypto.def"
-
-    #undef CRYPTO1
-    #undef CRYPTO2
-    #undef CRYPTO3
-    #undef C
-    #undef N
-    #undef FT1
-    #undef FT2
-    #undef FT3
-  }
-  dreg_types[0] = V8QI_type_node;
-  dreg_types[1] = V4HI_type_node;
-  dreg_types[2] = V2SI_type_node;
-  dreg_types[3] = V2SF_type_node;
-  dreg_types[4] = neon_intDI_type_node;
-
-  qreg_types[0] = V16QI_type_node;
-  qreg_types[1] = V8HI_type_node;
-  qreg_types[2] = V4SI_type_node;
-  qreg_types[3] = V4SF_type_node;
-  qreg_types[4] = V2DI_type_node;
-  qreg_types[5] = neon_intUTI_type_node;
-
-  for (i = 0; i < NUM_QREG_TYPES; i++)
-    {
-      int j;
-      for (j = 0; j < NUM_QREG_TYPES; j++)
-        {
-          if (i < NUM_DREG_TYPES && j < NUM_DREG_TYPES)
-            reinterp_ftype_dreg[i][j]
-              = build_function_type_list (dreg_types[i], dreg_types[j], NULL);
+  arm_init_simd_builtin_types ();
 
-          reinterp_ftype_qreg[i][j]
-            = build_function_type_list (qreg_types[i], qreg_types[j], NULL);
-        }
-    }
+  /* Strong-typing hasn't been implemented for all AdvSIMD builtin intrinsics.
+     Therefore we need to preserve the old __builtin scalar types.  It can be
+     removed once all the intrinsics become strongly typed using the qualifier
+     system.  */
+  arm_init_simd_builtin_scalar_types ();
 
-  for (i = 0, fcode = ARM_BUILTIN_NEON_BASE;
-       i < ARRAY_SIZE (neon_builtin_data);
-       i++, fcode++)
+  for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
     {
+      bool print_type_signature_p = false;
+      char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
       neon_builtin_datum *d = &neon_builtin_data[i];
+      const char *const modenames[] =
+	{
+	  "v8qi", "v4hi", "v4hf", "v2si", "v2sf", "di",
+	  "v16qi", "v8hi", "v4si", "v4sf", "v2di",
+	  "ti", "ei", "oi"
+	};
+      const enum machine_mode modes[] =
+	{
+	  V8QImode, V4HImode, V4HFmode, V2SImode, V2SFmode, DImode,
+	  V16QImode, V8HImode, V4SImode, V4SFmode, V2DImode,
+	  TImode, EImode, OImode
+	};
 
-      const char* const modenames[] = {
-	"v8qi", "v4hi", "v4hf", "v2si", "v2sf", "di",
-	"v16qi", "v8hi", "v4si", "v4sf", "v2di",
-	"ti", "ei", "oi"
-      };
       char namebuf[60];
       tree ftype = NULL;
-      int is_load = 0, is_store = 0;
+      tree fndecl = NULL;
 
       gcc_assert (ARRAY_SIZE (modenames) == T_MAX);
 
       d->fcode = fcode;
 
-      switch (d->itype)
+      /* We must track two variables here.  op_num is
+	 the operand number as in the RTL pattern.  This is
+	 required to access the mode (e.g. V4SF mode) of the
+	 argument, from which the base type can be derived.
+	 arg_num is an index in to the qualifiers data, which
+	 gives qualifiers to the type (e.g. const unsigned).
+	 The reason these two variables may differ by one is the
+	 void return type.  While all return types take the 0th entry
+	 in the qualifiers array, there is no operand for them in the
+	 RTL pattern.  */
+      int op_num = insn_data[d->code].n_operands - 1;
+      int arg_num = d->qualifiers[0] & qualifier_void
+		      ? op_num + 1
+		      : op_num;
+      tree return_type = void_type_node, args = void_list_node;
+      tree eltype;
+
+      /* Build a function type directly from the insn_data for this
+	 builtin.  The build_function_type () function takes care of
+	 removing duplicates for us.  */
+      for (; op_num >= 0; arg_num--, op_num--)
 	{
-	case NEON_LOAD1:
-	case NEON_LOAD1LANE:
-	case NEON_LOADSTRUCT:
-	case NEON_LOADSTRUCTLANE:
-	  is_load = 1;
-	  /* Fall through.  */
-	case NEON_STORE1:
-	case NEON_STORE1LANE:
-	case NEON_STORESTRUCT:
-	case NEON_STORESTRUCTLANE:
-	  if (!is_load)
-	    is_store = 1;
-	  /* Fall through.  */
-	case NEON_UNOP:
-	case NEON_RINT:
-	case NEON_BINOP:
-	case NEON_LOGICBINOP:
-	case NEON_SHIFTINSERT:
-	case NEON_TERNOP:
-	case NEON_GETLANE:
-	case NEON_SETLANE:
-	case NEON_CREATE:
-	case NEON_DUP:
-	case NEON_DUPLANE:
-	case NEON_SHIFTIMM:
-	case NEON_SHIFTACC:
-	case NEON_COMBINE:
-	case NEON_SPLIT:
-	case NEON_CONVERT:
-	case NEON_FIXCONV:
-	case NEON_LANEMUL:
-	case NEON_LANEMULL:
-	case NEON_LANEMULH:
-	case NEON_LANEMAC:
-	case NEON_SCALARMUL:
-	case NEON_SCALARMULL:
-	case NEON_SCALARMULH:
-	case NEON_SCALARMAC:
-	case NEON_SELECT:
-	case NEON_VTBL:
-	case NEON_VTBX:
-	  {
-	    int k;
-	    tree return_type = void_type_node, args = void_list_node;
-
-	    /* Build a function type directly from the insn_data for
-	       this builtin.  The build_function_type() function takes
-	       care of removing duplicates for us.  */
-	    for (k = insn_data[d->code].n_generator_args - 1; k >= 0; k--)
-	      {
-		tree eltype;
-
-		if (is_load && k == 1)
-		  {
-		    /* Neon load patterns always have the memory
-		       operand in the operand 1 position.  */
-		    gcc_assert (insn_data[d->code].operand[k].predicate
-				== neon_struct_operand);
-
-		    switch (d->mode)
-		      {
-		      case T_V8QI:
-		      case T_V16QI:
-			eltype = const_intQI_pointer_node;
-			break;
-
-		      case T_V4HI:
-		      case T_V8HI:
-			eltype = const_intHI_pointer_node;
-			break;
-
-		      case T_V2SI:
-		      case T_V4SI:
-			eltype = const_intSI_pointer_node;
-			break;
-
-		      case T_V2SF:
-		      case T_V4SF:
-			eltype = const_float_pointer_node;
-			break;
-
-		      case T_DI:
-		      case T_V2DI:
-			eltype = const_intDI_pointer_node;
-			break;
-
-		      default: gcc_unreachable ();
-		      }
-		  }
-		else if (is_store && k == 0)
-		  {
-		    /* Similarly, Neon store patterns use operand 0 as
-		       the memory location to store to.  */
-		    gcc_assert (insn_data[d->code].operand[k].predicate
-				== neon_struct_operand);
-
-		    switch (d->mode)
-		      {
-		      case T_V8QI:
-		      case T_V16QI:
-			eltype = intQI_pointer_node;
-			break;
-
-		      case T_V4HI:
-		      case T_V8HI:
-			eltype = intHI_pointer_node;
-			break;
-
-		      case T_V2SI:
-		      case T_V4SI:
-			eltype = intSI_pointer_node;
-			break;
-
-		      case T_V2SF:
-		      case T_V4SF:
-			eltype = float_pointer_node;
-			break;
-
-		      case T_DI:
-		      case T_V2DI:
-			eltype = intDI_pointer_node;
-			break;
-
-		      default: gcc_unreachable ();
-		      }
-		  }
-		else
-		  {
-		    switch (insn_data[d->code].operand[k].mode)
-		      {
-		      case VOIDmode: eltype = void_type_node; break;
-			/* Scalars.  */
-		      case QImode: eltype = neon_intQI_type_node; break;
-		      case HImode: eltype = neon_intHI_type_node; break;
-		      case SImode: eltype = neon_intSI_type_node; break;
-		      case SFmode: eltype = neon_float_type_node; break;
-		      case DImode: eltype = neon_intDI_type_node; break;
-		      case TImode: eltype = intTI_type_node; break;
-		      case EImode: eltype = intEI_type_node; break;
-		      case OImode: eltype = intOI_type_node; break;
-		      case CImode: eltype = intCI_type_node; break;
-		      case XImode: eltype = intXI_type_node; break;
-			/* 64-bit vectors.  */
-		      case V8QImode: eltype = V8QI_type_node; break;
-		      case V4HImode: eltype = V4HI_type_node; break;
-		      case V2SImode: eltype = V2SI_type_node; break;
-		      case V2SFmode: eltype = V2SF_type_node; break;
-			/* 128-bit vectors.  */
-		      case V16QImode: eltype = V16QI_type_node; break;
-		      case V8HImode: eltype = V8HI_type_node; break;
-		      case V4SImode: eltype = V4SI_type_node; break;
-		      case V4SFmode: eltype = V4SF_type_node; break;
-		      case V2DImode: eltype = V2DI_type_node; break;
-		      default: gcc_unreachable ();
-		      }
-		  }
-
-		if (k == 0 && !is_store)
-		  return_type = eltype;
-		else
-		  args = tree_cons (NULL_TREE, eltype, args);
-	      }
-
-	    ftype = build_function_type (return_type, args);
-	  }
-	  break;
-
-	case NEON_REINTERP:
-	  {
-	    /* We iterate over NUM_DREG_TYPES doubleword types,
-	       then NUM_QREG_TYPES quadword  types.
-	       V4HF is not a type used in reinterpret, so we translate
-	       d->mode to the correct index in reinterp_ftype_dreg.  */
-	    bool qreg_p
-	      = GET_MODE_SIZE (insn_data[d->code].operand[0].mode) > 8;
-	    int rhs = (d->mode - ((!qreg_p && (d->mode > T_V4HF)) ? 1 : 0))
-	              % NUM_QREG_TYPES;
-	    switch (insn_data[d->code].operand[0].mode)
-	      {
-	      case V8QImode: ftype = reinterp_ftype_dreg[0][rhs]; break;
-	      case V4HImode: ftype = reinterp_ftype_dreg[1][rhs]; break;
-	      case V2SImode: ftype = reinterp_ftype_dreg[2][rhs]; break;
-	      case V2SFmode: ftype = reinterp_ftype_dreg[3][rhs]; break;
-	      case DImode: ftype = reinterp_ftype_dreg[4][rhs]; break;
-	      case V16QImode: ftype = reinterp_ftype_qreg[0][rhs]; break;
-	      case V8HImode: ftype = reinterp_ftype_qreg[1][rhs]; break;
-	      case V4SImode: ftype = reinterp_ftype_qreg[2][rhs]; break;
-	      case V4SFmode: ftype = reinterp_ftype_qreg[3][rhs]; break;
-	      case V2DImode: ftype = reinterp_ftype_qreg[4][rhs]; break;
-	      case TImode: ftype = reinterp_ftype_qreg[5][rhs]; break;
-	      default: gcc_unreachable ();
-	      }
-	  }
-	  break;
-	case NEON_FLOAT_WIDEN:
-	  {
-	    tree eltype = NULL_TREE;
-	    tree return_type = NULL_TREE;
+	  machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
+	  enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
 
-	    switch (insn_data[d->code].operand[1].mode)
+	  if (qualifiers & qualifier_unsigned)
 	    {
-	      case V4HFmode:
-	        eltype = V4HF_type_node;
-	        return_type = V4SF_type_node;
-	        break;
-	      default: gcc_unreachable ();
+	      type_signature[arg_num] = 'u';
+	      print_type_signature_p = true;
 	    }
-	    ftype = build_function_type_list (return_type, eltype, NULL);
-	    break;
-	  }
-	case NEON_FLOAT_NARROW:
-	  {
-	    tree eltype = NULL_TREE;
-	    tree return_type = NULL_TREE;
-
-	    switch (insn_data[d->code].operand[1].mode)
+	  else if (qualifiers & qualifier_poly)
 	    {
-	      case V4SFmode:
-	        eltype = V4SF_type_node;
-	        return_type = V4HF_type_node;
-	        break;
-	      default: gcc_unreachable ();
+	      type_signature[arg_num] = 'p';
+	      print_type_signature_p = true;
 	    }
-	    ftype = build_function_type_list (return_type, eltype, NULL);
-	    break;
-	  }
-	case NEON_BSWAP:
-	{
-	    tree eltype = NULL_TREE;
-	    switch (insn_data[d->code].operand[1].mode)
-	    {
-	      case V4HImode:
-	        eltype = V4UHI_type_node;
-	        break;
-	      case V8HImode:
-	        eltype = V8UHI_type_node;
-	        break;
-	      case V2SImode:
-	        eltype = V2USI_type_node;
-	        break;
-	      case V4SImode:
-	        eltype = V4USI_type_node;
-	        break;
-	      case V2DImode:
-	        eltype = V2UDI_type_node;
-	        break;
-	      default: gcc_unreachable ();
-	    }
-	    ftype = build_function_type_list (eltype, eltype, NULL);
-	    break;
-	}
-	case NEON_COPYSIGNF:
-	  {
-	    tree eltype = NULL_TREE;
-	    switch (insn_data[d->code].operand[1].mode)
-	      {
-	      case V2SFmode:
-		eltype = V2SF_type_node;
-		break;
-	      case V4SFmode:
-		eltype = V4SF_type_node;
-		break;
-	      default: gcc_unreachable ();
-	      }
-	    ftype = build_function_type_list (eltype, eltype, NULL);
-	    break;
-	  }
-	default:
-	  gcc_unreachable ();
+	  else
+	    type_signature[arg_num] = 's';
+
+	  /* Skip an internal operand for vget_{low, high}.  */
+	  if (qualifiers & qualifier_internal)
+	    continue;
+
+	  /* Some builtins have different user-facing types
+	     for certain arguments, encoded in d->mode.  */
+	  if (qualifiers & qualifier_map_mode)
+	      op_mode = modes[d->mode];
+
+	  /* For pointers, we want a pointer to the basic type
+	     of the vector.  */
+	  if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
+	    op_mode = GET_MODE_INNER (op_mode);
+
+	  eltype = arm_simd_builtin_type
+		     (op_mode,
+		      (qualifiers & qualifier_unsigned) != 0,
+		      (qualifiers & qualifier_poly) != 0);
+	  gcc_assert (eltype != NULL);
+
+	  /* Add qualifiers.  */
+	  if (qualifiers & qualifier_const)
+	    eltype = build_qualified_type (eltype, TYPE_QUAL_CONST);
+
+	  if (qualifiers & qualifier_pointer)
+	      eltype = build_pointer_type (eltype);
+
+	  /* If we have reached arg_num == 0, we are at a non-void
+	     return type.  Otherwise, we are still processing
+	     arguments.  */
+	  if (arg_num == 0)
+	    return_type = eltype;
+	  else
+	    args = tree_cons (NULL_TREE, eltype, args);
 	}
 
+      ftype = build_function_type (return_type, args);
+
       gcc_assert (ftype != NULL);
 
-      sprintf (namebuf, "__builtin_neon_%s%s", d->name, modenames[d->mode]);
+      if (print_type_signature_p)
+	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s%s_%s",
+		  d->name, modenames[d->mode], type_signature);
+      else
+	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s%s",
+		  d->name, modenames[d->mode]);
+
+      fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
+				     NULL, NULL_TREE);
+      arm_builtin_decls[fcode] = fndecl;
+    }
 
-      decl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD, NULL,
-				   NULL_TREE);
-      arm_builtin_decls[fcode] = decl;
+  if (TARGET_CRYPTO && TARGET_HARD_FLOAT)
+    {
+      tree V16UQI_type_node = arm_simd_builtin_type (V16QImode,
+						       true,
+						       false);
+
+      tree V4USI_type_node = arm_simd_builtin_type (V4SImode,
+						      true,
+						      false);
+
+      tree v16uqi_ftype_v16uqi
+	= build_function_type_list (V16UQI_type_node, V16UQI_type_node,
+				    NULL_TREE);
+
+      tree v16uqi_ftype_v16uqi_v16uqi
+	= build_function_type_list (V16UQI_type_node, V16UQI_type_node,
+				    V16UQI_type_node, NULL_TREE);
+
+      tree v4usi_ftype_v4usi
+	= build_function_type_list (V4USI_type_node, V4USI_type_node,
+				    NULL_TREE);
+
+      tree v4usi_ftype_v4usi_v4usi
+	= build_function_type_list (V4USI_type_node, V4USI_type_node,
+				    V4USI_type_node, NULL_TREE);
+
+      tree v4usi_ftype_v4usi_v4usi_v4usi
+	= build_function_type_list (V4USI_type_node, V4USI_type_node,
+				    V4USI_type_node, V4USI_type_node,
+				    NULL_TREE);
+
+      tree uti_ftype_udi_udi
+	= build_function_type_list (unsigned_intTI_type_node,
+				    unsigned_intDI_type_node,
+				    unsigned_intDI_type_node,
+				    NULL_TREE);
+
+      #undef CRYPTO1
+      #undef CRYPTO2
+      #undef CRYPTO3
+      #undef C
+      #undef N
+      #undef CF
+      #undef FT1
+      #undef FT2
+      #undef FT3
+
+      #define C(U) \
+	ARM_BUILTIN_CRYPTO_##U
+      #define N(L) \
+	"__builtin_arm_crypto_"#L
+      #define FT1(R, A) \
+	R##_ftype_##A
+      #define FT2(R, A1, A2) \
+	R##_ftype_##A1##_##A2
+      #define FT3(R, A1, A2, A3) \
+        R##_ftype_##A1##_##A2##_##A3
+      #define CRYPTO1(L, U, R, A) \
+	arm_builtin_decls[C (U)] \
+	  = add_builtin_function (N (L), FT1 (R, A), \
+				  C (U), BUILT_IN_MD, NULL, NULL_TREE);
+      #define CRYPTO2(L, U, R, A1, A2)  \
+	arm_builtin_decls[C (U)]	\
+	  = add_builtin_function (N (L), FT2 (R, A1, A2), \
+				  C (U), BUILT_IN_MD, NULL, NULL_TREE);
+
+      #define CRYPTO3(L, U, R, A1, A2, A3) \
+	arm_builtin_decls[C (U)]	   \
+	  = add_builtin_function (N (L), FT3 (R, A1, A2, A3), \
+				  C (U), BUILT_IN_MD, NULL, NULL_TREE);
+      #include "crypto.def"
+
+      #undef CRYPTO1
+      #undef CRYPTO2
+      #undef CRYPTO3
+      #undef C
+      #undef N
+      #undef FT1
+      #undef FT2
+      #undef FT3
     }
 }
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d9149ce..20cfa9f 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -225,6 +225,7 @@ extern void arm_pr_long_calls_off (struct cpp_reader *);
 extern void arm_lang_object_attributes_init(void);
 
 extern const char *arm_mangle_type (const_tree);
+extern const char *arm_mangle_builtin_type (const_tree);
 
 extern void arm_order_regs_for_local_alloc (void);
 
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
new file mode 100644
index 0000000..7360e26
--- /dev/null
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -0,0 +1,48 @@
+/* Builtin AdvSIMD types.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+  ENTRY (Int8x8_t, V8QI, none, 64, int8, 15)
+  ENTRY (Int16x4_t, V4HI, none, 64, int16, 16)
+  ENTRY (Int32x2_t, V2SI, none, 64, int32, 16)
+
+  ENTRY (Int8x16_t, V16QI, none, 128, int8, 16)
+  ENTRY (Int16x8_t, V8HI, none, 128, int16, 17)
+  ENTRY (Int32x4_t, V4SI, none, 128, int32, 17)
+  ENTRY (Int64x2_t, V2DI, none, 128, int64, 17)
+
+  ENTRY (Uint8x8_t, V8QI, unsigned, 64, uint8, 16)
+  ENTRY (Uint16x4_t, V4HI, unsigned, 64, uint16, 17)
+  ENTRY (Uint32x2_t, V2SI, unsigned, 64, uint32, 17)
+
+  ENTRY (Uint8x16_t, V16QI, unsigned, 128, uint8, 17)
+  ENTRY (Uint16x8_t, V8HI, unsigned, 128, uint16, 18)
+  ENTRY (Uint32x4_t, V4SI, unsigned, 128, uint32, 18)
+  ENTRY (Uint64x2_t, V2DI, unsigned, 128, uint64, 18)
+
+  ENTRY (Poly8x8_t, V8QI, poly, 64, poly8, 16)
+  ENTRY (Poly16x4_t, V4HI, poly, 64, poly16, 17)
+
+  ENTRY (Poly8x16_t, V16QI, poly, 128, poly8, 17)
+  ENTRY (Poly16x8_t, V8HI, poly, 128, poly16, 18)
+
+  ENTRY (Float16x4_t, V4HF, none, 64, float16, 18)
+  ENTRY (Float32x2_t, V2SF, none, 64, float32, 18)
+  ENTRY (Float32x4_t, V4SF, none, 128, float32, 19)
+
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d4157a6..156ca1f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27063,50 +27063,9 @@ arm_issue_rate (void)
     }
 }
 
-/* A table and a function to perform ARM-specific name mangling for
-   NEON vector types in order to conform to the AAPCS (see "Procedure
-   Call Standard for the ARM Architecture", Appendix A).  To qualify
-   for emission with the mangled names defined in that document, a
-   vector type must not only be of the correct mode but also be
-   composed of NEON vector element types (e.g. __builtin_neon_qi).  */
-typedef struct
-{
-  machine_mode mode;
-  const char *element_type_name;
-  const char *aapcs_name;
-} arm_mangle_map_entry;
-
-static arm_mangle_map_entry arm_mangle_map[] = {
-  /* 64-bit containerized types.  */
-  { V8QImode,  "__builtin_neon_qi",     "15__simd64_int8_t" },
-  { V8QImode,  "__builtin_neon_uqi",    "16__simd64_uint8_t" },
-  { V4HImode,  "__builtin_neon_hi",     "16__simd64_int16_t" },
-  { V4HImode,  "__builtin_neon_uhi",    "17__simd64_uint16_t" },
-  { V4HFmode,  "__builtin_neon_hf",     "18__simd64_float16_t" },
-  { V2SImode,  "__builtin_neon_si",     "16__simd64_int32_t" },
-  { V2SImode,  "__builtin_neon_usi",    "17__simd64_uint32_t" },
-  { V2SFmode,  "__builtin_neon_sf",     "18__simd64_float32_t" },
-  { V8QImode,  "__builtin_neon_poly8",  "16__simd64_poly8_t" },
-  { V4HImode,  "__builtin_neon_poly16", "17__simd64_poly16_t" },
-
-  /* 128-bit containerized types.  */
-  { V16QImode, "__builtin_neon_qi",     "16__simd128_int8_t" },
-  { V16QImode, "__builtin_neon_uqi",    "17__simd128_uint8_t" },
-  { V8HImode,  "__builtin_neon_hi",     "17__simd128_int16_t" },
-  { V8HImode,  "__builtin_neon_uhi",    "18__simd128_uint16_t" },
-  { V4SImode,  "__builtin_neon_si",     "17__simd128_int32_t" },
-  { V4SImode,  "__builtin_neon_usi",    "18__simd128_uint32_t" },
-  { V4SFmode,  "__builtin_neon_sf",     "19__simd128_float32_t" },
-  { V16QImode, "__builtin_neon_poly8",  "17__simd128_poly8_t" },
-  { V8HImode,  "__builtin_neon_poly16", "18__simd128_poly16_t" },
-  { VOIDmode, NULL, NULL }
-};
-
 const char *
 arm_mangle_type (const_tree type)
 {
-  arm_mangle_map_entry *pos = arm_mangle_map;
-
   /* The ARM ABI documents (10th October 2008) say that "__va_list"
      has to be managled as if it is in the "std" namespace.  */
   if (TARGET_AAPCS_BASED
@@ -27117,26 +27076,12 @@ arm_mangle_type (const_tree type)
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
     return "Dh";
 
-  if (TREE_CODE (type) != VECTOR_TYPE)
-    return NULL;
-
-  /* Check the mode of the vector type, and the name of the vector
-     element type, against the table.  */
-  while (pos->mode != VOIDmode)
-    {
-      tree elt_type = TREE_TYPE (type);
-
-      if (pos->mode == TYPE_MODE (type)
-	  && TREE_CODE (TYPE_NAME (elt_type)) == TYPE_DECL
-	  && !strcmp (IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (elt_type))),
-		      pos->element_type_name))
-        return pos->aapcs_name;
-
-      pos++;
-    }
+  /* Try mangling as a Neon type, TYPE_NAME is non-NULL if this is a
+     builtin type.  */
+  if (TYPE_NAME (type) != NULL)
+    return arm_mangle_builtin_type (type);
 
-  /* Use the default mangling for unrecognized (possibly user-defined)
-     vector types.  */
+  /* Use the default mangling.  */
   return NULL;
 }
 
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index d27d970..e58b772 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -37,37 +37,42 @@ extern "C" {
 
 #include <stdint.h>
 
-typedef __builtin_neon_qi int8x8_t	__attribute__ ((__vector_size__ (8)));
-typedef __builtin_neon_hi int16x4_t	__attribute__ ((__vector_size__ (8)));
-typedef __builtin_neon_si int32x2_t	__attribute__ ((__vector_size__ (8)));
+typedef __simd64_int8_t int8x8_t;
+typedef __simd64_int16_t int16x4_t;
+typedef __simd64_int32_t int32x2_t;
 typedef __builtin_neon_di int64x1_t;
-typedef __builtin_neon_hf float16x4_t	__attribute__ ((__vector_size__ (8)));
-typedef __builtin_neon_sf float32x2_t	__attribute__ ((__vector_size__ (8)));
-typedef __builtin_neon_poly8 poly8x8_t	__attribute__ ((__vector_size__ (8)));
-typedef __builtin_neon_poly16 poly16x4_t	__attribute__ ((__vector_size__ (8)));
+typedef __simd64_float16_t float16x4_t;
+typedef __simd64_float32_t float32x2_t;
+typedef __simd64_poly8_t poly8x8_t;
+typedef __simd64_poly16_t poly16x4_t;
 #ifdef __ARM_FEATURE_CRYPTO
 typedef __builtin_neon_poly64 poly64x1_t;
 #endif
-typedef __builtin_neon_uqi uint8x8_t	__attribute__ ((__vector_size__ (8)));
-typedef __builtin_neon_uhi uint16x4_t	__attribute__ ((__vector_size__ (8)));
-typedef __builtin_neon_usi uint32x2_t	__attribute__ ((__vector_size__ (8)));
+typedef __simd64_uint8_t uint8x8_t;
+typedef __simd64_uint16_t uint16x4_t;
+typedef __simd64_uint32_t uint32x2_t;
 typedef __builtin_neon_udi uint64x1_t;
-typedef __builtin_neon_qi int8x16_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_hi int16x8_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_si int32x4_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_di int64x2_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_sf float32x4_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_poly8 poly8x16_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_poly16 poly16x8_t	__attribute__ ((__vector_size__ (16)));
+
+typedef __simd128_int8_t int8x16_t;
+typedef __simd128_int16_t int16x8_t;
+typedef __simd128_int32_t int32x4_t;
+typedef __simd128_int64_t int64x2_t;
+typedef __simd128_float32_t float32x4_t;
+typedef __simd128_poly8_t poly8x16_t;
+typedef __simd128_poly16_t poly16x8_t;
 #ifdef __ARM_FEATURE_CRYPTO
-typedef __builtin_neon_poly64 poly64x2_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_poly64 poly64x2_t __attribute__ ((__vector_size__ (16)));
 #endif
-typedef __builtin_neon_uqi uint8x16_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_uhi uint16x8_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_usi uint32x4_t	__attribute__ ((__vector_size__ (16)));
-typedef __builtin_neon_udi uint64x2_t	__attribute__ ((__vector_size__ (16)));
+
+typedef __simd128_uint8_t uint8x16_t;
+typedef __simd128_uint16_t uint16x8_t;
+typedef __simd128_uint32_t uint32x4_t;
+typedef __simd128_uint64_t uint64x2_t;
 
 typedef float float32_t;
+
+/* The Poly types are user visible and live in their own world,
+   keep them that way.  */
 typedef __builtin_neon_poly8 poly8_t;
 typedef __builtin_neon_poly16 poly16_t;
 #ifdef __ARM_FEATURE_CRYPTO
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 98a1d3b..d82a123 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -100,7 +100,8 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(RTL_H) $(TREE_H) expr.h $(TM_P_H) $(RECOG_H) langhooks.h \
   $(DIAGNOSTIC_CORE_H) $(OPTABS_H) \
   $(srcdir)/config/arm/arm-protos.h \
-  $(srcdir)/config/arm/arm_neon_builtins.def
+  $(srcdir)/config/arm/arm_neon_builtins.def \
+  $(srcdir)/config/arm/arm-simd-builtin-types.def
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/arm/arm-builtins.c
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
                     ` (3 preceding siblings ...)
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling James Greenhalgh
@ 2014-11-12 17:12   ` James Greenhalgh
  2014-11-18  9:17     ` Ramana Radhakrishnan
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling James Greenhalgh
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 1957 bytes --]


Hi,

The config/arm/arm.c file has always seemed a worrying size to me.

This patch pulls out the builtin related code to its own file. I think
this will be a good idea as we move forward. It seems a more sensible
separation of concerns. There are no functional changes here.

Bootstrapped and regression tested on arm-none-linux-gnueabi, with
no issues.

OK?

Thanks,
James

---
2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* config.gcc (extra_objs): Add arm-builtins.o for arm*-*-*.
	(target_gtfiles): Add config/arm/arm-builtins.c for arm*-*-*.
	* config/arm/arm-builtins.c: New.
	* config/arm/t-arm (arm_builtins.o): New.
	* config/arm/arm-protos.h (arm_expand_builtin): New.
	(arm_builtin_decl): Likewise.
	(arm_init_builtins): Likewise.
	(arm_atomic_assign_expand_fenv): Likewise.
	* config/arm/arm.c (arm_atomic_assign_expand_fenv): Remove prototype.
	(arm_init_builtins): Likewise.
	(arm_init_iwmmxt_builtins): Likewise
	(safe_vector_operand): Likewise
	(arm_expand_binop_builtin): Likewise
	(arm_expand_unop_builtin): Likewise
	(arm_expand_builtin): Likewise
	(arm_builtin_decl): Likewise
	(insn_flags): Remove static.
	(tune_flags): Likewise.
	(enum arm_builtins): Move to config/arm/arm-builtins.c.
	(arm_init_neon_builtins): Likewise.
	(struct builtin_description): Likewise.
	(arm_init_iwmmxt_builtins): Likewise.
	(arm_init_fp16_builtins): Likewise.
	(arm_init_crc32_builtins): Likewise.
	(arm_init_builtins): Likewise.
	(arm_builtin_decl): Likewise.
	(safe_vector_operand): Likewise.
	(arm_expand_ternop_builtin): Likewise.
	(arm_expand_binop_builtin): Likewise.
	(arm_expand_unop_builtin): Likewise.
	(neon_dereference_pointer): Likewise.
	(arm_expand_neon_args): Likewise.
	(arm_expand_neon_builtin): Likewise.
	(neon_split_vcombine): Likewise.
	(arm_expand_builtin): Likewise.
	(arm_builtin_vectorized_function): Likewise.
	(arm_atomic_assign_expand_fenv): Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-Patch-ARM-Refactor-Builtins-3-8-Pull-builtins-code-t.patch --]
[-- Type: text/x-patch;  name=0003-Patch-ARM-Refactor-Builtins-3-8-Pull-builtins-code-t.patch, Size: 218057 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2284b9e..566f2bb 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -318,12 +318,13 @@ arc*-*-*)
 	;;
 arm*-*-*)
 	cpu_type=arm
-	extra_objs="aarch-common.o"
+	extra_objs="arm-builtins.o aarch-common.o"
 	extra_headers="mmintrin.h arm_neon.h arm_acle.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
 	extra_options="${extra_options} arm/arm-tables.opt"
+	target_gtfiles="\$(srcdir)/config/arm/arm-builtins.c"
 	;;
 avr-*-*)
 	cpu_type=avr
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
new file mode 100644
index 0000000..e387b60
--- /dev/null
+++ b/gcc/config/arm/arm-builtins.c
@@ -0,0 +1,3036 @@
+/* Description of builtins used by the ARM backend.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "stor-layout.h"
+#include "expr.h"
+#include "tm_p.h"
+#include "recog.h"
+#include "langhooks.h"
+#include "diagnostic-core.h"
+#include "optabs.h"
+#include "gimple-expr.h"
+#include "target.h"
+#include "ggc.h"
+#include "arm-protos.h"
+
+typedef enum {
+  T_V8QI,
+  T_V4HI,
+  T_V4HF,
+  T_V2SI,
+  T_V2SF,
+  T_DI,
+  T_V16QI,
+  T_V8HI,
+  T_V4SI,
+  T_V4SF,
+  T_V2DI,
+  T_TI,
+  T_EI,
+  T_OI,
+  T_MAX		/* Size of enum.  Keep last.  */
+} neon_builtin_type_mode;
+
+#define TYPE_MODE_BIT(X) (1 << (X))
+
+#define TB_DREG (TYPE_MODE_BIT (T_V8QI) | TYPE_MODE_BIT (T_V4HI)	\
+		 | TYPE_MODE_BIT (T_V4HF) | TYPE_MODE_BIT (T_V2SI)	\
+		 | TYPE_MODE_BIT (T_V2SF) | TYPE_MODE_BIT (T_DI))
+#define TB_QREG (TYPE_MODE_BIT (T_V16QI) | TYPE_MODE_BIT (T_V8HI)	\
+		 | TYPE_MODE_BIT (T_V4SI) | TYPE_MODE_BIT (T_V4SF)	\
+		 | TYPE_MODE_BIT (T_V2DI) | TYPE_MODE_BIT (T_TI))
+
+#define v8qi_UP  T_V8QI
+#define v4hi_UP  T_V4HI
+#define v4hf_UP  T_V4HF
+#define v2si_UP  T_V2SI
+#define v2sf_UP  T_V2SF
+#define di_UP    T_DI
+#define v16qi_UP T_V16QI
+#define v8hi_UP  T_V8HI
+#define v4si_UP  T_V4SI
+#define v4sf_UP  T_V4SF
+#define v2di_UP  T_V2DI
+#define ti_UP	 T_TI
+#define ei_UP	 T_EI
+#define oi_UP	 T_OI
+
+#define UP(X) X##_UP
+
+typedef enum {
+  NEON_BINOP,
+  NEON_TERNOP,
+  NEON_UNOP,
+  NEON_BSWAP,
+  NEON_GETLANE,
+  NEON_SETLANE,
+  NEON_CREATE,
+  NEON_RINT,
+  NEON_COPYSIGNF,
+  NEON_DUP,
+  NEON_DUPLANE,
+  NEON_COMBINE,
+  NEON_SPLIT,
+  NEON_LANEMUL,
+  NEON_LANEMULL,
+  NEON_LANEMULH,
+  NEON_LANEMAC,
+  NEON_SCALARMUL,
+  NEON_SCALARMULL,
+  NEON_SCALARMULH,
+  NEON_SCALARMAC,
+  NEON_CONVERT,
+  NEON_FLOAT_WIDEN,
+  NEON_FLOAT_NARROW,
+  NEON_FIXCONV,
+  NEON_SELECT,
+  NEON_REINTERP,
+  NEON_VTBL,
+  NEON_VTBX,
+  NEON_LOAD1,
+  NEON_LOAD1LANE,
+  NEON_STORE1,
+  NEON_STORE1LANE,
+  NEON_LOADSTRUCT,
+  NEON_LOADSTRUCTLANE,
+  NEON_STORESTRUCT,
+  NEON_STORESTRUCTLANE,
+  NEON_LOGICBINOP,
+  NEON_SHIFTINSERT,
+  NEON_SHIFTIMM,
+  NEON_SHIFTACC
+} neon_itype;
+
+typedef struct {
+  const char *name;
+  const neon_itype itype;
+  const neon_builtin_type_mode mode;
+  const enum insn_code code;
+  unsigned int fcode;
+} neon_builtin_datum;
+
+#define CF(N,X) CODE_FOR_neon_##N##X
+
+#define VAR1(T, N, A) \
+  {#N, NEON_##T, UP (A), CF (N, A), 0}
+#define VAR2(T, N, A, B) \
+  VAR1 (T, N, A), \
+  {#N, NEON_##T, UP (B), CF (N, B), 0}
+#define VAR3(T, N, A, B, C) \
+  VAR2 (T, N, A, B), \
+  {#N, NEON_##T, UP (C), CF (N, C), 0}
+#define VAR4(T, N, A, B, C, D) \
+  VAR3 (T, N, A, B, C), \
+  {#N, NEON_##T, UP (D), CF (N, D), 0}
+#define VAR5(T, N, A, B, C, D, E) \
+  VAR4 (T, N, A, B, C, D), \
+  {#N, NEON_##T, UP (E), CF (N, E), 0}
+#define VAR6(T, N, A, B, C, D, E, F) \
+  VAR5 (T, N, A, B, C, D, E), \
+  {#N, NEON_##T, UP (F), CF (N, F), 0}
+#define VAR7(T, N, A, B, C, D, E, F, G) \
+  VAR6 (T, N, A, B, C, D, E, F), \
+  {#N, NEON_##T, UP (G), CF (N, G), 0}
+#define VAR8(T, N, A, B, C, D, E, F, G, H) \
+  VAR7 (T, N, A, B, C, D, E, F, G), \
+  {#N, NEON_##T, UP (H), CF (N, H), 0}
+#define VAR9(T, N, A, B, C, D, E, F, G, H, I) \
+  VAR8 (T, N, A, B, C, D, E, F, G, H), \
+  {#N, NEON_##T, UP (I), CF (N, I), 0}
+#define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
+  VAR9 (T, N, A, B, C, D, E, F, G, H, I), \
+  {#N, NEON_##T, UP (J), CF (N, J), 0}
+
+/* The NEON builtin data can be found in arm_neon_builtins.def.
+   The mode entries in the following table correspond to the "key" type of the
+   instruction variant, i.e. equivalent to that which would be specified after
+   the assembler mnemonic, which usually refers to the last vector operand.
+   (Signed/unsigned/polynomial types are not differentiated between though, and
+   are all mapped onto the same mode for a given element size.) The modes
+   listed per instruction should be the same as those defined for that
+   instruction's pattern in neon.md.  */
+
+static neon_builtin_datum neon_builtin_data[] =
+{
+#include "arm_neon_builtins.def"
+};
+
+#undef CF
+#undef VAR1
+#undef VAR2
+#undef VAR3
+#undef VAR4
+#undef VAR5
+#undef VAR6
+#undef VAR7
+#undef VAR8
+#undef VAR9
+#undef VAR10
+
+#define CF(N,X) ARM_BUILTIN_NEON_##N##X
+#define VAR1(T, N, A) \
+  CF (N, A)
+#define VAR2(T, N, A, B) \
+  VAR1 (T, N, A), \
+  CF (N, B)
+#define VAR3(T, N, A, B, C) \
+  VAR2 (T, N, A, B), \
+  CF (N, C)
+#define VAR4(T, N, A, B, C, D) \
+  VAR3 (T, N, A, B, C), \
+  CF (N, D)
+#define VAR5(T, N, A, B, C, D, E) \
+  VAR4 (T, N, A, B, C, D), \
+  CF (N, E)
+#define VAR6(T, N, A, B, C, D, E, F) \
+  VAR5 (T, N, A, B, C, D, E), \
+  CF (N, F)
+#define VAR7(T, N, A, B, C, D, E, F, G) \
+  VAR6 (T, N, A, B, C, D, E, F), \
+  CF (N, G)
+#define VAR8(T, N, A, B, C, D, E, F, G, H) \
+  VAR7 (T, N, A, B, C, D, E, F, G), \
+  CF (N, H)
+#define VAR9(T, N, A, B, C, D, E, F, G, H, I) \
+  VAR8 (T, N, A, B, C, D, E, F, G, H), \
+  CF (N, I)
+#define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
+  VAR9 (T, N, A, B, C, D, E, F, G, H, I), \
+  CF (N, J)
+enum arm_builtins
+{
+  ARM_BUILTIN_GETWCGR0,
+  ARM_BUILTIN_GETWCGR1,
+  ARM_BUILTIN_GETWCGR2,
+  ARM_BUILTIN_GETWCGR3,
+
+  ARM_BUILTIN_SETWCGR0,
+  ARM_BUILTIN_SETWCGR1,
+  ARM_BUILTIN_SETWCGR2,
+  ARM_BUILTIN_SETWCGR3,
+
+  ARM_BUILTIN_WZERO,
+
+  ARM_BUILTIN_WAVG2BR,
+  ARM_BUILTIN_WAVG2HR,
+  ARM_BUILTIN_WAVG2B,
+  ARM_BUILTIN_WAVG2H,
+
+  ARM_BUILTIN_WACCB,
+  ARM_BUILTIN_WACCH,
+  ARM_BUILTIN_WACCW,
+
+  ARM_BUILTIN_WMACS,
+  ARM_BUILTIN_WMACSZ,
+  ARM_BUILTIN_WMACU,
+  ARM_BUILTIN_WMACUZ,
+
+  ARM_BUILTIN_WSADB,
+  ARM_BUILTIN_WSADBZ,
+  ARM_BUILTIN_WSADH,
+  ARM_BUILTIN_WSADHZ,
+
+  ARM_BUILTIN_WALIGNI,
+  ARM_BUILTIN_WALIGNR0,
+  ARM_BUILTIN_WALIGNR1,
+  ARM_BUILTIN_WALIGNR2,
+  ARM_BUILTIN_WALIGNR3,
+
+  ARM_BUILTIN_TMIA,
+  ARM_BUILTIN_TMIAPH,
+  ARM_BUILTIN_TMIABB,
+  ARM_BUILTIN_TMIABT,
+  ARM_BUILTIN_TMIATB,
+  ARM_BUILTIN_TMIATT,
+
+  ARM_BUILTIN_TMOVMSKB,
+  ARM_BUILTIN_TMOVMSKH,
+  ARM_BUILTIN_TMOVMSKW,
+
+  ARM_BUILTIN_TBCSTB,
+  ARM_BUILTIN_TBCSTH,
+  ARM_BUILTIN_TBCSTW,
+
+  ARM_BUILTIN_WMADDS,
+  ARM_BUILTIN_WMADDU,
+
+  ARM_BUILTIN_WPACKHSS,
+  ARM_BUILTIN_WPACKWSS,
+  ARM_BUILTIN_WPACKDSS,
+  ARM_BUILTIN_WPACKHUS,
+  ARM_BUILTIN_WPACKWUS,
+  ARM_BUILTIN_WPACKDUS,
+
+  ARM_BUILTIN_WADDB,
+  ARM_BUILTIN_WADDH,
+  ARM_BUILTIN_WADDW,
+  ARM_BUILTIN_WADDSSB,
+  ARM_BUILTIN_WADDSSH,
+  ARM_BUILTIN_WADDSSW,
+  ARM_BUILTIN_WADDUSB,
+  ARM_BUILTIN_WADDUSH,
+  ARM_BUILTIN_WADDUSW,
+  ARM_BUILTIN_WSUBB,
+  ARM_BUILTIN_WSUBH,
+  ARM_BUILTIN_WSUBW,
+  ARM_BUILTIN_WSUBSSB,
+  ARM_BUILTIN_WSUBSSH,
+  ARM_BUILTIN_WSUBSSW,
+  ARM_BUILTIN_WSUBUSB,
+  ARM_BUILTIN_WSUBUSH,
+  ARM_BUILTIN_WSUBUSW,
+
+  ARM_BUILTIN_WAND,
+  ARM_BUILTIN_WANDN,
+  ARM_BUILTIN_WOR,
+  ARM_BUILTIN_WXOR,
+
+  ARM_BUILTIN_WCMPEQB,
+  ARM_BUILTIN_WCMPEQH,
+  ARM_BUILTIN_WCMPEQW,
+  ARM_BUILTIN_WCMPGTUB,
+  ARM_BUILTIN_WCMPGTUH,
+  ARM_BUILTIN_WCMPGTUW,
+  ARM_BUILTIN_WCMPGTSB,
+  ARM_BUILTIN_WCMPGTSH,
+  ARM_BUILTIN_WCMPGTSW,
+
+  ARM_BUILTIN_TEXTRMSB,
+  ARM_BUILTIN_TEXTRMSH,
+  ARM_BUILTIN_TEXTRMSW,
+  ARM_BUILTIN_TEXTRMUB,
+  ARM_BUILTIN_TEXTRMUH,
+  ARM_BUILTIN_TEXTRMUW,
+  ARM_BUILTIN_TINSRB,
+  ARM_BUILTIN_TINSRH,
+  ARM_BUILTIN_TINSRW,
+
+  ARM_BUILTIN_WMAXSW,
+  ARM_BUILTIN_WMAXSH,
+  ARM_BUILTIN_WMAXSB,
+  ARM_BUILTIN_WMAXUW,
+  ARM_BUILTIN_WMAXUH,
+  ARM_BUILTIN_WMAXUB,
+  ARM_BUILTIN_WMINSW,
+  ARM_BUILTIN_WMINSH,
+  ARM_BUILTIN_WMINSB,
+  ARM_BUILTIN_WMINUW,
+  ARM_BUILTIN_WMINUH,
+  ARM_BUILTIN_WMINUB,
+
+  ARM_BUILTIN_WMULUM,
+  ARM_BUILTIN_WMULSM,
+  ARM_BUILTIN_WMULUL,
+
+  ARM_BUILTIN_PSADBH,
+  ARM_BUILTIN_WSHUFH,
+
+  ARM_BUILTIN_WSLLH,
+  ARM_BUILTIN_WSLLW,
+  ARM_BUILTIN_WSLLD,
+  ARM_BUILTIN_WSRAH,
+  ARM_BUILTIN_WSRAW,
+  ARM_BUILTIN_WSRAD,
+  ARM_BUILTIN_WSRLH,
+  ARM_BUILTIN_WSRLW,
+  ARM_BUILTIN_WSRLD,
+  ARM_BUILTIN_WRORH,
+  ARM_BUILTIN_WRORW,
+  ARM_BUILTIN_WRORD,
+  ARM_BUILTIN_WSLLHI,
+  ARM_BUILTIN_WSLLWI,
+  ARM_BUILTIN_WSLLDI,
+  ARM_BUILTIN_WSRAHI,
+  ARM_BUILTIN_WSRAWI,
+  ARM_BUILTIN_WSRADI,
+  ARM_BUILTIN_WSRLHI,
+  ARM_BUILTIN_WSRLWI,
+  ARM_BUILTIN_WSRLDI,
+  ARM_BUILTIN_WRORHI,
+  ARM_BUILTIN_WRORWI,
+  ARM_BUILTIN_WRORDI,
+
+  ARM_BUILTIN_WUNPCKIHB,
+  ARM_BUILTIN_WUNPCKIHH,
+  ARM_BUILTIN_WUNPCKIHW,
+  ARM_BUILTIN_WUNPCKILB,
+  ARM_BUILTIN_WUNPCKILH,
+  ARM_BUILTIN_WUNPCKILW,
+
+  ARM_BUILTIN_WUNPCKEHSB,
+  ARM_BUILTIN_WUNPCKEHSH,
+  ARM_BUILTIN_WUNPCKEHSW,
+  ARM_BUILTIN_WUNPCKEHUB,
+  ARM_BUILTIN_WUNPCKEHUH,
+  ARM_BUILTIN_WUNPCKEHUW,
+  ARM_BUILTIN_WUNPCKELSB,
+  ARM_BUILTIN_WUNPCKELSH,
+  ARM_BUILTIN_WUNPCKELSW,
+  ARM_BUILTIN_WUNPCKELUB,
+  ARM_BUILTIN_WUNPCKELUH,
+  ARM_BUILTIN_WUNPCKELUW,
+
+  ARM_BUILTIN_WABSB,
+  ARM_BUILTIN_WABSH,
+  ARM_BUILTIN_WABSW,
+
+  ARM_BUILTIN_WADDSUBHX,
+  ARM_BUILTIN_WSUBADDHX,
+
+  ARM_BUILTIN_WABSDIFFB,
+  ARM_BUILTIN_WABSDIFFH,
+  ARM_BUILTIN_WABSDIFFW,
+
+  ARM_BUILTIN_WADDCH,
+  ARM_BUILTIN_WADDCW,
+
+  ARM_BUILTIN_WAVG4,
+  ARM_BUILTIN_WAVG4R,
+
+  ARM_BUILTIN_WMADDSX,
+  ARM_BUILTIN_WMADDUX,
+
+  ARM_BUILTIN_WMADDSN,
+  ARM_BUILTIN_WMADDUN,
+
+  ARM_BUILTIN_WMULWSM,
+  ARM_BUILTIN_WMULWUM,
+
+  ARM_BUILTIN_WMULWSMR,
+  ARM_BUILTIN_WMULWUMR,
+
+  ARM_BUILTIN_WMULWL,
+
+  ARM_BUILTIN_WMULSMR,
+  ARM_BUILTIN_WMULUMR,
+
+  ARM_BUILTIN_WQMULM,
+  ARM_BUILTIN_WQMULMR,
+
+  ARM_BUILTIN_WQMULWM,
+  ARM_BUILTIN_WQMULWMR,
+
+  ARM_BUILTIN_WADDBHUSM,
+  ARM_BUILTIN_WADDBHUSL,
+
+  ARM_BUILTIN_WQMIABB,
+  ARM_BUILTIN_WQMIABT,
+  ARM_BUILTIN_WQMIATB,
+  ARM_BUILTIN_WQMIATT,
+
+  ARM_BUILTIN_WQMIABBN,
+  ARM_BUILTIN_WQMIABTN,
+  ARM_BUILTIN_WQMIATBN,
+  ARM_BUILTIN_WQMIATTN,
+
+  ARM_BUILTIN_WMIABB,
+  ARM_BUILTIN_WMIABT,
+  ARM_BUILTIN_WMIATB,
+  ARM_BUILTIN_WMIATT,
+
+  ARM_BUILTIN_WMIABBN,
+  ARM_BUILTIN_WMIABTN,
+  ARM_BUILTIN_WMIATBN,
+  ARM_BUILTIN_WMIATTN,
+
+  ARM_BUILTIN_WMIAWBB,
+  ARM_BUILTIN_WMIAWBT,
+  ARM_BUILTIN_WMIAWTB,
+  ARM_BUILTIN_WMIAWTT,
+
+  ARM_BUILTIN_WMIAWBBN,
+  ARM_BUILTIN_WMIAWBTN,
+  ARM_BUILTIN_WMIAWTBN,
+  ARM_BUILTIN_WMIAWTTN,
+
+  ARM_BUILTIN_WMERGE,
+
+  ARM_BUILTIN_CRC32B,
+  ARM_BUILTIN_CRC32H,
+  ARM_BUILTIN_CRC32W,
+  ARM_BUILTIN_CRC32CB,
+  ARM_BUILTIN_CRC32CH,
+  ARM_BUILTIN_CRC32CW,
+
+  ARM_BUILTIN_GET_FPSCR,
+  ARM_BUILTIN_SET_FPSCR,
+
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+
+#define CRYPTO1(L, U, M1, M2) \
+  ARM_BUILTIN_CRYPTO_##U,
+#define CRYPTO2(L, U, M1, M2, M3) \
+  ARM_BUILTIN_CRYPTO_##U,
+#define CRYPTO3(L, U, M1, M2, M3, M4) \
+  ARM_BUILTIN_CRYPTO_##U,
+
+#include "crypto.def"
+
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+
+#include "arm_neon_builtins.def"
+
+  ,ARM_BUILTIN_MAX
+};
+
+#define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+
+#undef CF
+#undef VAR1
+#undef VAR2
+#undef VAR3
+#undef VAR4
+#undef VAR5
+#undef VAR6
+#undef VAR7
+#undef VAR8
+#undef VAR9
+#undef VAR10
+
+static GTY(()) tree arm_builtin_decls[ARM_BUILTIN_MAX];
+
+#define NUM_DREG_TYPES 5
+#define NUM_QREG_TYPES 6
+
+static void
+arm_init_neon_builtins (void)
+{
+  unsigned int i, fcode;
+  tree decl;
+
+  tree neon_intQI_type_node;
+  tree neon_intHI_type_node;
+  tree neon_floatHF_type_node;
+  tree neon_polyQI_type_node;
+  tree neon_polyHI_type_node;
+  tree neon_intSI_type_node;
+  tree neon_intDI_type_node;
+  tree neon_intUTI_type_node;
+  tree neon_float_type_node;
+
+  tree intQI_pointer_node;
+  tree intHI_pointer_node;
+  tree intSI_pointer_node;
+  tree intDI_pointer_node;
+  tree float_pointer_node;
+
+  tree const_intQI_node;
+  tree const_intHI_node;
+  tree const_intSI_node;
+  tree const_intDI_node;
+  tree const_float_node;
+
+  tree const_intQI_pointer_node;
+  tree const_intHI_pointer_node;
+  tree const_intSI_pointer_node;
+  tree const_intDI_pointer_node;
+  tree const_float_pointer_node;
+
+  tree V8QI_type_node;
+  tree V4HI_type_node;
+  tree V4UHI_type_node;
+  tree V4HF_type_node;
+  tree V2SI_type_node;
+  tree V2USI_type_node;
+  tree V2SF_type_node;
+  tree V16QI_type_node;
+  tree V8HI_type_node;
+  tree V8UHI_type_node;
+  tree V4SI_type_node;
+  tree V4USI_type_node;
+  tree V4SF_type_node;
+  tree V2DI_type_node;
+  tree V2UDI_type_node;
+
+  tree intUQI_type_node;
+  tree intUHI_type_node;
+  tree intUSI_type_node;
+  tree intUDI_type_node;
+
+  tree intEI_type_node;
+  tree intOI_type_node;
+  tree intCI_type_node;
+  tree intXI_type_node;
+
+  tree reinterp_ftype_dreg[NUM_DREG_TYPES][NUM_DREG_TYPES];
+  tree reinterp_ftype_qreg[NUM_QREG_TYPES][NUM_QREG_TYPES];
+  tree dreg_types[NUM_DREG_TYPES], qreg_types[NUM_QREG_TYPES];
+
+  /* Create distinguished type nodes for NEON vector element types,
+     and pointers to values of such types, so we can detect them later.  */
+  neon_intQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
+  neon_intHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
+  neon_polyQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
+  neon_polyHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
+  neon_intSI_type_node = make_signed_type (GET_MODE_PRECISION (SImode));
+  neon_intDI_type_node = make_signed_type (GET_MODE_PRECISION (DImode));
+  neon_float_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (neon_float_type_node) = FLOAT_TYPE_SIZE;
+  layout_type (neon_float_type_node);
+  neon_floatHF_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (neon_floatHF_type_node) = GET_MODE_PRECISION (HFmode);
+  layout_type (neon_floatHF_type_node);
+
+  /* Define typedefs which exactly correspond to the modes we are basing vector
+     types on.  If you change these names you'll need to change
+     the table used by arm_mangle_type too.  */
+  (*lang_hooks.types.register_builtin_type) (neon_intQI_type_node,
+					     "__builtin_neon_qi");
+  (*lang_hooks.types.register_builtin_type) (neon_intHI_type_node,
+					     "__builtin_neon_hi");
+  (*lang_hooks.types.register_builtin_type) (neon_floatHF_type_node,
+					     "__builtin_neon_hf");
+  (*lang_hooks.types.register_builtin_type) (neon_intSI_type_node,
+					     "__builtin_neon_si");
+  (*lang_hooks.types.register_builtin_type) (neon_float_type_node,
+					     "__builtin_neon_sf");
+  (*lang_hooks.types.register_builtin_type) (neon_intDI_type_node,
+					     "__builtin_neon_di");
+  (*lang_hooks.types.register_builtin_type) (neon_polyQI_type_node,
+					     "__builtin_neon_poly8");
+  (*lang_hooks.types.register_builtin_type) (neon_polyHI_type_node,
+					     "__builtin_neon_poly16");
+
+  intQI_pointer_node = build_pointer_type (neon_intQI_type_node);
+  intHI_pointer_node = build_pointer_type (neon_intHI_type_node);
+  intSI_pointer_node = build_pointer_type (neon_intSI_type_node);
+  intDI_pointer_node = build_pointer_type (neon_intDI_type_node);
+  float_pointer_node = build_pointer_type (neon_float_type_node);
+
+  /* Next create constant-qualified versions of the above types.  */
+  const_intQI_node = build_qualified_type (neon_intQI_type_node,
+					   TYPE_QUAL_CONST);
+  const_intHI_node = build_qualified_type (neon_intHI_type_node,
+					   TYPE_QUAL_CONST);
+  const_intSI_node = build_qualified_type (neon_intSI_type_node,
+					   TYPE_QUAL_CONST);
+  const_intDI_node = build_qualified_type (neon_intDI_type_node,
+					   TYPE_QUAL_CONST);
+  const_float_node = build_qualified_type (neon_float_type_node,
+					   TYPE_QUAL_CONST);
+
+  const_intQI_pointer_node = build_pointer_type (const_intQI_node);
+  const_intHI_pointer_node = build_pointer_type (const_intHI_node);
+  const_intSI_pointer_node = build_pointer_type (const_intSI_node);
+  const_intDI_pointer_node = build_pointer_type (const_intDI_node);
+  const_float_pointer_node = build_pointer_type (const_float_node);
+
+  /* Unsigned integer types for various mode sizes.  */
+  intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
+  intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
+  intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
+  intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
+  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
+  /* Now create vector types based on our NEON element types.  */
+  /* 64-bit vectors.  */
+  V8QI_type_node =
+    build_vector_type_for_mode (neon_intQI_type_node, V8QImode);
+  V4HI_type_node =
+    build_vector_type_for_mode (neon_intHI_type_node, V4HImode);
+  V4UHI_type_node =
+    build_vector_type_for_mode (intUHI_type_node, V4HImode);
+  V4HF_type_node =
+    build_vector_type_for_mode (neon_floatHF_type_node, V4HFmode);
+  V2SI_type_node =
+    build_vector_type_for_mode (neon_intSI_type_node, V2SImode);
+  V2USI_type_node =
+    build_vector_type_for_mode (intUSI_type_node, V2SImode);
+  V2SF_type_node =
+    build_vector_type_for_mode (neon_float_type_node, V2SFmode);
+  /* 128-bit vectors.  */
+  V16QI_type_node =
+    build_vector_type_for_mode (neon_intQI_type_node, V16QImode);
+  V8HI_type_node =
+    build_vector_type_for_mode (neon_intHI_type_node, V8HImode);
+  V8UHI_type_node =
+    build_vector_type_for_mode (intUHI_type_node, V8HImode);
+  V4SI_type_node =
+    build_vector_type_for_mode (neon_intSI_type_node, V4SImode);
+  V4USI_type_node =
+    build_vector_type_for_mode (intUSI_type_node, V4SImode);
+  V4SF_type_node =
+    build_vector_type_for_mode (neon_float_type_node, V4SFmode);
+  V2DI_type_node =
+    build_vector_type_for_mode (neon_intDI_type_node, V2DImode);
+  V2UDI_type_node =
+    build_vector_type_for_mode (intUDI_type_node, V2DImode);
+
+
+  (*lang_hooks.types.register_builtin_type) (intUQI_type_node,
+					     "__builtin_neon_uqi");
+  (*lang_hooks.types.register_builtin_type) (intUHI_type_node,
+					     "__builtin_neon_uhi");
+  (*lang_hooks.types.register_builtin_type) (intUSI_type_node,
+					     "__builtin_neon_usi");
+  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
+					     "__builtin_neon_udi");
+  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
+					     "__builtin_neon_poly64");
+  (*lang_hooks.types.register_builtin_type) (neon_intUTI_type_node,
+					     "__builtin_neon_poly128");
+
+  /* Opaque integer types for structures of vectors.  */
+  intEI_type_node = make_signed_type (GET_MODE_PRECISION (EImode));
+  intOI_type_node = make_signed_type (GET_MODE_PRECISION (OImode));
+  intCI_type_node = make_signed_type (GET_MODE_PRECISION (CImode));
+  intXI_type_node = make_signed_type (GET_MODE_PRECISION (XImode));
+
+  (*lang_hooks.types.register_builtin_type) (intTI_type_node,
+					     "__builtin_neon_ti");
+  (*lang_hooks.types.register_builtin_type) (intEI_type_node,
+					     "__builtin_neon_ei");
+  (*lang_hooks.types.register_builtin_type) (intOI_type_node,
+					     "__builtin_neon_oi");
+  (*lang_hooks.types.register_builtin_type) (intCI_type_node,
+					     "__builtin_neon_ci");
+  (*lang_hooks.types.register_builtin_type) (intXI_type_node,
+					     "__builtin_neon_xi");
+
+  if (TARGET_CRYPTO && TARGET_HARD_FLOAT)
+  {
+
+    tree V16UQI_type_node =
+      build_vector_type_for_mode (intUQI_type_node, V16QImode);
+
+    tree v16uqi_ftype_v16uqi
+      = build_function_type_list (V16UQI_type_node, V16UQI_type_node, NULL_TREE);
+
+    tree v16uqi_ftype_v16uqi_v16uqi
+      = build_function_type_list (V16UQI_type_node, V16UQI_type_node,
+                                  V16UQI_type_node, NULL_TREE);
+
+    tree v4usi_ftype_v4usi
+      = build_function_type_list (V4USI_type_node, V4USI_type_node, NULL_TREE);
+
+    tree v4usi_ftype_v4usi_v4usi
+      = build_function_type_list (V4USI_type_node, V4USI_type_node,
+                                  V4USI_type_node, NULL_TREE);
+
+    tree v4usi_ftype_v4usi_v4usi_v4usi
+      = build_function_type_list (V4USI_type_node, V4USI_type_node,
+                                  V4USI_type_node, V4USI_type_node, NULL_TREE);
+
+    tree uti_ftype_udi_udi
+      = build_function_type_list (neon_intUTI_type_node, intUDI_type_node,
+                                  intUDI_type_node, NULL_TREE);
+
+    #undef CRYPTO1
+    #undef CRYPTO2
+    #undef CRYPTO3
+    #undef C
+    #undef N
+    #undef CF
+    #undef FT1
+    #undef FT2
+    #undef FT3
+
+    #define C(U) \
+      ARM_BUILTIN_CRYPTO_##U
+    #define N(L) \
+      "__builtin_arm_crypto_"#L
+    #define FT1(R, A) \
+      R##_ftype_##A
+    #define FT2(R, A1, A2) \
+      R##_ftype_##A1##_##A2
+    #define FT3(R, A1, A2, A3) \
+      R##_ftype_##A1##_##A2##_##A3
+    #define CRYPTO1(L, U, R, A) \
+      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT1 (R, A), \
+                                                       C (U), BUILT_IN_MD, \
+                                                       NULL, NULL_TREE);
+    #define CRYPTO2(L, U, R, A1, A2) \
+      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT2 (R, A1, A2), \
+                                                       C (U), BUILT_IN_MD, \
+                                                       NULL, NULL_TREE);
+
+    #define CRYPTO3(L, U, R, A1, A2, A3) \
+      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT3 (R, A1, A2, A3), \
+                                                       C (U), BUILT_IN_MD, \
+                                                       NULL, NULL_TREE);
+    #include "crypto.def"
+
+    #undef CRYPTO1
+    #undef CRYPTO2
+    #undef CRYPTO3
+    #undef C
+    #undef N
+    #undef FT1
+    #undef FT2
+    #undef FT3
+  }
+  dreg_types[0] = V8QI_type_node;
+  dreg_types[1] = V4HI_type_node;
+  dreg_types[2] = V2SI_type_node;
+  dreg_types[3] = V2SF_type_node;
+  dreg_types[4] = neon_intDI_type_node;
+
+  qreg_types[0] = V16QI_type_node;
+  qreg_types[1] = V8HI_type_node;
+  qreg_types[2] = V4SI_type_node;
+  qreg_types[3] = V4SF_type_node;
+  qreg_types[4] = V2DI_type_node;
+  qreg_types[5] = neon_intUTI_type_node;
+
+  for (i = 0; i < NUM_QREG_TYPES; i++)
+    {
+      int j;
+      for (j = 0; j < NUM_QREG_TYPES; j++)
+        {
+          if (i < NUM_DREG_TYPES && j < NUM_DREG_TYPES)
+            reinterp_ftype_dreg[i][j]
+              = build_function_type_list (dreg_types[i], dreg_types[j], NULL);
+
+          reinterp_ftype_qreg[i][j]
+            = build_function_type_list (qreg_types[i], qreg_types[j], NULL);
+        }
+    }
+
+  for (i = 0, fcode = ARM_BUILTIN_NEON_BASE;
+       i < ARRAY_SIZE (neon_builtin_data);
+       i++, fcode++)
+    {
+      neon_builtin_datum *d = &neon_builtin_data[i];
+
+      const char* const modenames[] = {
+	"v8qi", "v4hi", "v4hf", "v2si", "v2sf", "di",
+	"v16qi", "v8hi", "v4si", "v4sf", "v2di",
+	"ti", "ei", "oi"
+      };
+      char namebuf[60];
+      tree ftype = NULL;
+      int is_load = 0, is_store = 0;
+
+      gcc_assert (ARRAY_SIZE (modenames) == T_MAX);
+
+      d->fcode = fcode;
+
+      switch (d->itype)
+	{
+	case NEON_LOAD1:
+	case NEON_LOAD1LANE:
+	case NEON_LOADSTRUCT:
+	case NEON_LOADSTRUCTLANE:
+	  is_load = 1;
+	  /* Fall through.  */
+	case NEON_STORE1:
+	case NEON_STORE1LANE:
+	case NEON_STORESTRUCT:
+	case NEON_STORESTRUCTLANE:
+	  if (!is_load)
+	    is_store = 1;
+	  /* Fall through.  */
+	case NEON_UNOP:
+	case NEON_RINT:
+	case NEON_BINOP:
+	case NEON_LOGICBINOP:
+	case NEON_SHIFTINSERT:
+	case NEON_TERNOP:
+	case NEON_GETLANE:
+	case NEON_SETLANE:
+	case NEON_CREATE:
+	case NEON_DUP:
+	case NEON_DUPLANE:
+	case NEON_SHIFTIMM:
+	case NEON_SHIFTACC:
+	case NEON_COMBINE:
+	case NEON_SPLIT:
+	case NEON_CONVERT:
+	case NEON_FIXCONV:
+	case NEON_LANEMUL:
+	case NEON_LANEMULL:
+	case NEON_LANEMULH:
+	case NEON_LANEMAC:
+	case NEON_SCALARMUL:
+	case NEON_SCALARMULL:
+	case NEON_SCALARMULH:
+	case NEON_SCALARMAC:
+	case NEON_SELECT:
+	case NEON_VTBL:
+	case NEON_VTBX:
+	  {
+	    int k;
+	    tree return_type = void_type_node, args = void_list_node;
+
+	    /* Build a function type directly from the insn_data for
+	       this builtin.  The build_function_type() function takes
+	       care of removing duplicates for us.  */
+	    for (k = insn_data[d->code].n_generator_args - 1; k >= 0; k--)
+	      {
+		tree eltype;
+
+		if (is_load && k == 1)
+		  {
+		    /* Neon load patterns always have the memory
+		       operand in the operand 1 position.  */
+		    gcc_assert (insn_data[d->code].operand[k].predicate
+				== neon_struct_operand);
+
+		    switch (d->mode)
+		      {
+		      case T_V8QI:
+		      case T_V16QI:
+			eltype = const_intQI_pointer_node;
+			break;
+
+		      case T_V4HI:
+		      case T_V8HI:
+			eltype = const_intHI_pointer_node;
+			break;
+
+		      case T_V2SI:
+		      case T_V4SI:
+			eltype = const_intSI_pointer_node;
+			break;
+
+		      case T_V2SF:
+		      case T_V4SF:
+			eltype = const_float_pointer_node;
+			break;
+
+		      case T_DI:
+		      case T_V2DI:
+			eltype = const_intDI_pointer_node;
+			break;
+
+		      default: gcc_unreachable ();
+		      }
+		  }
+		else if (is_store && k == 0)
+		  {
+		    /* Similarly, Neon store patterns use operand 0 as
+		       the memory location to store to.  */
+		    gcc_assert (insn_data[d->code].operand[k].predicate
+				== neon_struct_operand);
+
+		    switch (d->mode)
+		      {
+		      case T_V8QI:
+		      case T_V16QI:
+			eltype = intQI_pointer_node;
+			break;
+
+		      case T_V4HI:
+		      case T_V8HI:
+			eltype = intHI_pointer_node;
+			break;
+
+		      case T_V2SI:
+		      case T_V4SI:
+			eltype = intSI_pointer_node;
+			break;
+
+		      case T_V2SF:
+		      case T_V4SF:
+			eltype = float_pointer_node;
+			break;
+
+		      case T_DI:
+		      case T_V2DI:
+			eltype = intDI_pointer_node;
+			break;
+
+		      default: gcc_unreachable ();
+		      }
+		  }
+		else
+		  {
+		    switch (insn_data[d->code].operand[k].mode)
+		      {
+		      case VOIDmode: eltype = void_type_node; break;
+			/* Scalars.  */
+		      case QImode: eltype = neon_intQI_type_node; break;
+		      case HImode: eltype = neon_intHI_type_node; break;
+		      case SImode: eltype = neon_intSI_type_node; break;
+		      case SFmode: eltype = neon_float_type_node; break;
+		      case DImode: eltype = neon_intDI_type_node; break;
+		      case TImode: eltype = intTI_type_node; break;
+		      case EImode: eltype = intEI_type_node; break;
+		      case OImode: eltype = intOI_type_node; break;
+		      case CImode: eltype = intCI_type_node; break;
+		      case XImode: eltype = intXI_type_node; break;
+			/* 64-bit vectors.  */
+		      case V8QImode: eltype = V8QI_type_node; break;
+		      case V4HImode: eltype = V4HI_type_node; break;
+		      case V2SImode: eltype = V2SI_type_node; break;
+		      case V2SFmode: eltype = V2SF_type_node; break;
+			/* 128-bit vectors.  */
+		      case V16QImode: eltype = V16QI_type_node; break;
+		      case V8HImode: eltype = V8HI_type_node; break;
+		      case V4SImode: eltype = V4SI_type_node; break;
+		      case V4SFmode: eltype = V4SF_type_node; break;
+		      case V2DImode: eltype = V2DI_type_node; break;
+		      default: gcc_unreachable ();
+		      }
+		  }
+
+		if (k == 0 && !is_store)
+		  return_type = eltype;
+		else
+		  args = tree_cons (NULL_TREE, eltype, args);
+	      }
+
+	    ftype = build_function_type (return_type, args);
+	  }
+	  break;
+
+	case NEON_REINTERP:
+	  {
+	    /* We iterate over NUM_DREG_TYPES doubleword types,
+	       then NUM_QREG_TYPES quadword  types.
+	       V4HF is not a type used in reinterpret, so we translate
+	       d->mode to the correct index in reinterp_ftype_dreg.  */
+	    bool qreg_p
+	      = GET_MODE_SIZE (insn_data[d->code].operand[0].mode) > 8;
+	    int rhs = (d->mode - ((!qreg_p && (d->mode > T_V4HF)) ? 1 : 0))
+	              % NUM_QREG_TYPES;
+	    switch (insn_data[d->code].operand[0].mode)
+	      {
+	      case V8QImode: ftype = reinterp_ftype_dreg[0][rhs]; break;
+	      case V4HImode: ftype = reinterp_ftype_dreg[1][rhs]; break;
+	      case V2SImode: ftype = reinterp_ftype_dreg[2][rhs]; break;
+	      case V2SFmode: ftype = reinterp_ftype_dreg[3][rhs]; break;
+	      case DImode: ftype = reinterp_ftype_dreg[4][rhs]; break;
+	      case V16QImode: ftype = reinterp_ftype_qreg[0][rhs]; break;
+	      case V8HImode: ftype = reinterp_ftype_qreg[1][rhs]; break;
+	      case V4SImode: ftype = reinterp_ftype_qreg[2][rhs]; break;
+	      case V4SFmode: ftype = reinterp_ftype_qreg[3][rhs]; break;
+	      case V2DImode: ftype = reinterp_ftype_qreg[4][rhs]; break;
+	      case TImode: ftype = reinterp_ftype_qreg[5][rhs]; break;
+	      default: gcc_unreachable ();
+	      }
+	  }
+	  break;
+	case NEON_FLOAT_WIDEN:
+	  {
+	    tree eltype = NULL_TREE;
+	    tree return_type = NULL_TREE;
+
+	    switch (insn_data[d->code].operand[1].mode)
+	    {
+	      case V4HFmode:
+	        eltype = V4HF_type_node;
+	        return_type = V4SF_type_node;
+	        break;
+	      default: gcc_unreachable ();
+	    }
+	    ftype = build_function_type_list (return_type, eltype, NULL);
+	    break;
+	  }
+	case NEON_FLOAT_NARROW:
+	  {
+	    tree eltype = NULL_TREE;
+	    tree return_type = NULL_TREE;
+
+	    switch (insn_data[d->code].operand[1].mode)
+	    {
+	      case V4SFmode:
+	        eltype = V4SF_type_node;
+	        return_type = V4HF_type_node;
+	        break;
+	      default: gcc_unreachable ();
+	    }
+	    ftype = build_function_type_list (return_type, eltype, NULL);
+	    break;
+	  }
+	case NEON_BSWAP:
+	{
+	    tree eltype = NULL_TREE;
+	    switch (insn_data[d->code].operand[1].mode)
+	    {
+	      case V4HImode:
+	        eltype = V4UHI_type_node;
+	        break;
+	      case V8HImode:
+	        eltype = V8UHI_type_node;
+	        break;
+	      case V2SImode:
+	        eltype = V2USI_type_node;
+	        break;
+	      case V4SImode:
+	        eltype = V4USI_type_node;
+	        break;
+	      case V2DImode:
+	        eltype = V2UDI_type_node;
+	        break;
+	      default: gcc_unreachable ();
+	    }
+	    ftype = build_function_type_list (eltype, eltype, NULL);
+	    break;
+	}
+	case NEON_COPYSIGNF:
+	  {
+	    tree eltype = NULL_TREE;
+	    switch (insn_data[d->code].operand[1].mode)
+	      {
+	      case V2SFmode:
+		eltype = V2SF_type_node;
+		break;
+	      case V4SFmode:
+		eltype = V4SF_type_node;
+		break;
+	      default: gcc_unreachable ();
+	      }
+	    ftype = build_function_type_list (eltype, eltype, NULL);
+	    break;
+	  }
+	default:
+	  gcc_unreachable ();
+	}
+
+      gcc_assert (ftype != NULL);
+
+      sprintf (namebuf, "__builtin_neon_%s%s", d->name, modenames[d->mode]);
+
+      decl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD, NULL,
+				   NULL_TREE);
+      arm_builtin_decls[fcode] = decl;
+    }
+}
+
+#undef NUM_DREG_TYPES
+#undef NUM_QREG_TYPES
+
+#define def_mbuiltin(MASK, NAME, TYPE, CODE)				\
+  do									\
+    {									\
+      if ((MASK) & insn_flags)						\
+	{								\
+	  tree bdecl;							\
+	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
+					BUILT_IN_MD, NULL, NULL_TREE);	\
+	  arm_builtin_decls[CODE] = bdecl;				\
+	}								\
+    }									\
+  while (0)
+
+struct builtin_description
+{
+  const unsigned int       mask;
+  const enum insn_code     icode;
+  const char * const       name;
+  const enum arm_builtins  code;
+  const enum rtx_code      comparison;
+  const unsigned int       flag;
+};
+
+static const struct builtin_description bdesc_2arg[] =
+{
+#define IWMMXT_BUILTIN(code, string, builtin) \
+  { FL_IWMMXT, CODE_FOR_##code, "__builtin_arm_" string, \
+    ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
+#define IWMMXT2_BUILTIN(code, string, builtin) \
+  { FL_IWMMXT2, CODE_FOR_##code, "__builtin_arm_" string, \
+    ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
+  IWMMXT_BUILTIN (addv8qi3, "waddb", WADDB)
+  IWMMXT_BUILTIN (addv4hi3, "waddh", WADDH)
+  IWMMXT_BUILTIN (addv2si3, "waddw", WADDW)
+  IWMMXT_BUILTIN (subv8qi3, "wsubb", WSUBB)
+  IWMMXT_BUILTIN (subv4hi3, "wsubh", WSUBH)
+  IWMMXT_BUILTIN (subv2si3, "wsubw", WSUBW)
+  IWMMXT_BUILTIN (ssaddv8qi3, "waddbss", WADDSSB)
+  IWMMXT_BUILTIN (ssaddv4hi3, "waddhss", WADDSSH)
+  IWMMXT_BUILTIN (ssaddv2si3, "waddwss", WADDSSW)
+  IWMMXT_BUILTIN (sssubv8qi3, "wsubbss", WSUBSSB)
+  IWMMXT_BUILTIN (sssubv4hi3, "wsubhss", WSUBSSH)
+  IWMMXT_BUILTIN (sssubv2si3, "wsubwss", WSUBSSW)
+  IWMMXT_BUILTIN (usaddv8qi3, "waddbus", WADDUSB)
+  IWMMXT_BUILTIN (usaddv4hi3, "waddhus", WADDUSH)
+  IWMMXT_BUILTIN (usaddv2si3, "waddwus", WADDUSW)
+  IWMMXT_BUILTIN (ussubv8qi3, "wsubbus", WSUBUSB)
+  IWMMXT_BUILTIN (ussubv4hi3, "wsubhus", WSUBUSH)
+  IWMMXT_BUILTIN (ussubv2si3, "wsubwus", WSUBUSW)
+  IWMMXT_BUILTIN (mulv4hi3, "wmulul", WMULUL)
+  IWMMXT_BUILTIN (smulv4hi3_highpart, "wmulsm", WMULSM)
+  IWMMXT_BUILTIN (umulv4hi3_highpart, "wmulum", WMULUM)
+  IWMMXT_BUILTIN (eqv8qi3, "wcmpeqb", WCMPEQB)
+  IWMMXT_BUILTIN (eqv4hi3, "wcmpeqh", WCMPEQH)
+  IWMMXT_BUILTIN (eqv2si3, "wcmpeqw", WCMPEQW)
+  IWMMXT_BUILTIN (gtuv8qi3, "wcmpgtub", WCMPGTUB)
+  IWMMXT_BUILTIN (gtuv4hi3, "wcmpgtuh", WCMPGTUH)
+  IWMMXT_BUILTIN (gtuv2si3, "wcmpgtuw", WCMPGTUW)
+  IWMMXT_BUILTIN (gtv8qi3, "wcmpgtsb", WCMPGTSB)
+  IWMMXT_BUILTIN (gtv4hi3, "wcmpgtsh", WCMPGTSH)
+  IWMMXT_BUILTIN (gtv2si3, "wcmpgtsw", WCMPGTSW)
+  IWMMXT_BUILTIN (umaxv8qi3, "wmaxub", WMAXUB)
+  IWMMXT_BUILTIN (smaxv8qi3, "wmaxsb", WMAXSB)
+  IWMMXT_BUILTIN (umaxv4hi3, "wmaxuh", WMAXUH)
+  IWMMXT_BUILTIN (smaxv4hi3, "wmaxsh", WMAXSH)
+  IWMMXT_BUILTIN (umaxv2si3, "wmaxuw", WMAXUW)
+  IWMMXT_BUILTIN (smaxv2si3, "wmaxsw", WMAXSW)
+  IWMMXT_BUILTIN (uminv8qi3, "wminub", WMINUB)
+  IWMMXT_BUILTIN (sminv8qi3, "wminsb", WMINSB)
+  IWMMXT_BUILTIN (uminv4hi3, "wminuh", WMINUH)
+  IWMMXT_BUILTIN (sminv4hi3, "wminsh", WMINSH)
+  IWMMXT_BUILTIN (uminv2si3, "wminuw", WMINUW)
+  IWMMXT_BUILTIN (sminv2si3, "wminsw", WMINSW)
+  IWMMXT_BUILTIN (iwmmxt_anddi3, "wand", WAND)
+  IWMMXT_BUILTIN (iwmmxt_nanddi3, "wandn", WANDN)
+  IWMMXT_BUILTIN (iwmmxt_iordi3, "wor", WOR)
+  IWMMXT_BUILTIN (iwmmxt_xordi3, "wxor", WXOR)
+  IWMMXT_BUILTIN (iwmmxt_uavgv8qi3, "wavg2b", WAVG2B)
+  IWMMXT_BUILTIN (iwmmxt_uavgv4hi3, "wavg2h", WAVG2H)
+  IWMMXT_BUILTIN (iwmmxt_uavgrndv8qi3, "wavg2br", WAVG2BR)
+  IWMMXT_BUILTIN (iwmmxt_uavgrndv4hi3, "wavg2hr", WAVG2HR)
+  IWMMXT_BUILTIN (iwmmxt_wunpckilb, "wunpckilb", WUNPCKILB)
+  IWMMXT_BUILTIN (iwmmxt_wunpckilh, "wunpckilh", WUNPCKILH)
+  IWMMXT_BUILTIN (iwmmxt_wunpckilw, "wunpckilw", WUNPCKILW)
+  IWMMXT_BUILTIN (iwmmxt_wunpckihb, "wunpckihb", WUNPCKIHB)
+  IWMMXT_BUILTIN (iwmmxt_wunpckihh, "wunpckihh", WUNPCKIHH)
+  IWMMXT_BUILTIN (iwmmxt_wunpckihw, "wunpckihw", WUNPCKIHW)
+  IWMMXT2_BUILTIN (iwmmxt_waddsubhx, "waddsubhx", WADDSUBHX)
+  IWMMXT2_BUILTIN (iwmmxt_wsubaddhx, "wsubaddhx", WSUBADDHX)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffb, "wabsdiffb", WABSDIFFB)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffh, "wabsdiffh", WABSDIFFH)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffw, "wabsdiffw", WABSDIFFW)
+  IWMMXT2_BUILTIN (iwmmxt_avg4, "wavg4", WAVG4)
+  IWMMXT2_BUILTIN (iwmmxt_avg4r, "wavg4r", WAVG4R)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsm, "wmulwsm", WMULWSM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwum, "wmulwum", WMULWUM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsmr, "wmulwsmr", WMULWSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwumr, "wmulwumr", WMULWUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwl, "wmulwl", WMULWL)
+  IWMMXT2_BUILTIN (iwmmxt_wmulsmr, "wmulsmr", WMULSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulumr, "wmulumr", WMULUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulm, "wqmulm", WQMULM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulmr, "wqmulmr", WQMULMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwm, "wqmulwm", WQMULWM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwmr, "wqmulwmr", WQMULWMR)
+  IWMMXT_BUILTIN (iwmmxt_walignr0, "walignr0", WALIGNR0)
+  IWMMXT_BUILTIN (iwmmxt_walignr1, "walignr1", WALIGNR1)
+  IWMMXT_BUILTIN (iwmmxt_walignr2, "walignr2", WALIGNR2)
+  IWMMXT_BUILTIN (iwmmxt_walignr3, "walignr3", WALIGNR3)
+
+#define IWMMXT_BUILTIN2(code, builtin) \
+  { FL_IWMMXT, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
+#define IWMMXT2_BUILTIN2(code, builtin) \
+  { FL_IWMMXT2, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
+  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusm, WADDBHUSM)
+  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusl, WADDBHUSL)
+  IWMMXT_BUILTIN2 (iwmmxt_wpackhss, WPACKHSS)
+  IWMMXT_BUILTIN2 (iwmmxt_wpackwss, WPACKWSS)
+  IWMMXT_BUILTIN2 (iwmmxt_wpackdss, WPACKDSS)
+  IWMMXT_BUILTIN2 (iwmmxt_wpackhus, WPACKHUS)
+  IWMMXT_BUILTIN2 (iwmmxt_wpackwus, WPACKWUS)
+  IWMMXT_BUILTIN2 (iwmmxt_wpackdus, WPACKDUS)
+  IWMMXT_BUILTIN2 (iwmmxt_wmacuz, WMACUZ)
+  IWMMXT_BUILTIN2 (iwmmxt_wmacsz, WMACSZ)
+
+
+#define FP_BUILTIN(L, U) \
+  {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
+   UNKNOWN, 0},
+
+  FP_BUILTIN (get_fpscr, GET_FPSCR)
+  FP_BUILTIN (set_fpscr, SET_FPSCR)
+#undef FP_BUILTIN
+
+#define CRC32_BUILTIN(L, U) \
+  {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
+   UNKNOWN, 0},
+   CRC32_BUILTIN (crc32b, CRC32B)
+   CRC32_BUILTIN (crc32h, CRC32H)
+   CRC32_BUILTIN (crc32w, CRC32W)
+   CRC32_BUILTIN (crc32cb, CRC32CB)
+   CRC32_BUILTIN (crc32ch, CRC32CH)
+   CRC32_BUILTIN (crc32cw, CRC32CW)
+#undef CRC32_BUILTIN
+
+
+#define CRYPTO_BUILTIN(L, U) \
+  {0, CODE_FOR_crypto_##L, "__builtin_arm_crypto_"#L, ARM_BUILTIN_CRYPTO_##U, \
+   UNKNOWN, 0},
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+#define CRYPTO2(L, U, R, A1, A2) CRYPTO_BUILTIN (L, U)
+#define CRYPTO1(L, U, R, A)
+#define CRYPTO3(L, U, R, A1, A2, A3)
+#include "crypto.def"
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+
+};
+
+static const struct builtin_description bdesc_1arg[] =
+{
+  IWMMXT_BUILTIN (iwmmxt_tmovmskb, "tmovmskb", TMOVMSKB)
+  IWMMXT_BUILTIN (iwmmxt_tmovmskh, "tmovmskh", TMOVMSKH)
+  IWMMXT_BUILTIN (iwmmxt_tmovmskw, "tmovmskw", TMOVMSKW)
+  IWMMXT_BUILTIN (iwmmxt_waccb, "waccb", WACCB)
+  IWMMXT_BUILTIN (iwmmxt_wacch, "wacch", WACCH)
+  IWMMXT_BUILTIN (iwmmxt_waccw, "waccw", WACCW)
+  IWMMXT_BUILTIN (iwmmxt_wunpckehub, "wunpckehub", WUNPCKEHUB)
+  IWMMXT_BUILTIN (iwmmxt_wunpckehuh, "wunpckehuh", WUNPCKEHUH)
+  IWMMXT_BUILTIN (iwmmxt_wunpckehuw, "wunpckehuw", WUNPCKEHUW)
+  IWMMXT_BUILTIN (iwmmxt_wunpckehsb, "wunpckehsb", WUNPCKEHSB)
+  IWMMXT_BUILTIN (iwmmxt_wunpckehsh, "wunpckehsh", WUNPCKEHSH)
+  IWMMXT_BUILTIN (iwmmxt_wunpckehsw, "wunpckehsw", WUNPCKEHSW)
+  IWMMXT_BUILTIN (iwmmxt_wunpckelub, "wunpckelub", WUNPCKELUB)
+  IWMMXT_BUILTIN (iwmmxt_wunpckeluh, "wunpckeluh", WUNPCKELUH)
+  IWMMXT_BUILTIN (iwmmxt_wunpckeluw, "wunpckeluw", WUNPCKELUW)
+  IWMMXT_BUILTIN (iwmmxt_wunpckelsb, "wunpckelsb", WUNPCKELSB)
+  IWMMXT_BUILTIN (iwmmxt_wunpckelsh, "wunpckelsh", WUNPCKELSH)
+  IWMMXT_BUILTIN (iwmmxt_wunpckelsw, "wunpckelsw", WUNPCKELSW)
+  IWMMXT2_BUILTIN (iwmmxt_wabsv8qi3, "wabsb", WABSB)
+  IWMMXT2_BUILTIN (iwmmxt_wabsv4hi3, "wabsh", WABSH)
+  IWMMXT2_BUILTIN (iwmmxt_wabsv2si3, "wabsw", WABSW)
+  IWMMXT_BUILTIN (tbcstv8qi, "tbcstb", TBCSTB)
+  IWMMXT_BUILTIN (tbcstv4hi, "tbcsth", TBCSTH)
+  IWMMXT_BUILTIN (tbcstv2si, "tbcstw", TBCSTW)
+
+#define CRYPTO1(L, U, R, A) CRYPTO_BUILTIN (L, U)
+#define CRYPTO2(L, U, R, A1, A2)
+#define CRYPTO3(L, U, R, A1, A2, A3)
+#include "crypto.def"
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+};
+
+static const struct builtin_description bdesc_3arg[] =
+{
+#define CRYPTO3(L, U, R, A1, A2, A3) CRYPTO_BUILTIN (L, U)
+#define CRYPTO1(L, U, R, A)
+#define CRYPTO2(L, U, R, A1, A2)
+#include "crypto.def"
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+ };
+#undef CRYPTO_BUILTIN
+
+/* Set up all the iWMMXt builtins.  This is not called if
+   TARGET_IWMMXT is zero.  */
+
+static void
+arm_init_iwmmxt_builtins (void)
+{
+  const struct builtin_description * d;
+  size_t i;
+
+  tree V2SI_type_node = build_vector_type_for_mode (intSI_type_node, V2SImode);
+  tree V4HI_type_node = build_vector_type_for_mode (intHI_type_node, V4HImode);
+  tree V8QI_type_node = build_vector_type_for_mode (intQI_type_node, V8QImode);
+
+  tree v8qi_ftype_v8qi_v8qi_int
+    = build_function_type_list (V8QI_type_node,
+				V8QI_type_node, V8QI_type_node,
+				integer_type_node, NULL_TREE);
+  tree v4hi_ftype_v4hi_int
+    = build_function_type_list (V4HI_type_node,
+				V4HI_type_node, integer_type_node, NULL_TREE);
+  tree v2si_ftype_v2si_int
+    = build_function_type_list (V2SI_type_node,
+				V2SI_type_node, integer_type_node, NULL_TREE);
+  tree v2si_ftype_di_di
+    = build_function_type_list (V2SI_type_node,
+				long_long_integer_type_node,
+				long_long_integer_type_node,
+				NULL_TREE);
+  tree di_ftype_di_int
+    = build_function_type_list (long_long_integer_type_node,
+				long_long_integer_type_node,
+				integer_type_node, NULL_TREE);
+  tree di_ftype_di_int_int
+    = build_function_type_list (long_long_integer_type_node,
+				long_long_integer_type_node,
+				integer_type_node,
+				integer_type_node, NULL_TREE);
+  tree int_ftype_v8qi
+    = build_function_type_list (integer_type_node,
+				V8QI_type_node, NULL_TREE);
+  tree int_ftype_v4hi
+    = build_function_type_list (integer_type_node,
+				V4HI_type_node, NULL_TREE);
+  tree int_ftype_v2si
+    = build_function_type_list (integer_type_node,
+				V2SI_type_node, NULL_TREE);
+  tree int_ftype_v8qi_int
+    = build_function_type_list (integer_type_node,
+				V8QI_type_node, integer_type_node, NULL_TREE);
+  tree int_ftype_v4hi_int
+    = build_function_type_list (integer_type_node,
+				V4HI_type_node, integer_type_node, NULL_TREE);
+  tree int_ftype_v2si_int
+    = build_function_type_list (integer_type_node,
+				V2SI_type_node, integer_type_node, NULL_TREE);
+  tree v8qi_ftype_v8qi_int_int
+    = build_function_type_list (V8QI_type_node,
+				V8QI_type_node, integer_type_node,
+				integer_type_node, NULL_TREE);
+  tree v4hi_ftype_v4hi_int_int
+    = build_function_type_list (V4HI_type_node,
+				V4HI_type_node, integer_type_node,
+				integer_type_node, NULL_TREE);
+  tree v2si_ftype_v2si_int_int
+    = build_function_type_list (V2SI_type_node,
+				V2SI_type_node, integer_type_node,
+				integer_type_node, NULL_TREE);
+  /* Miscellaneous.  */
+  tree v8qi_ftype_v4hi_v4hi
+    = build_function_type_list (V8QI_type_node,
+				V4HI_type_node, V4HI_type_node, NULL_TREE);
+  tree v4hi_ftype_v2si_v2si
+    = build_function_type_list (V4HI_type_node,
+				V2SI_type_node, V2SI_type_node, NULL_TREE);
+  tree v8qi_ftype_v4hi_v8qi
+    = build_function_type_list (V8QI_type_node,
+	                        V4HI_type_node, V8QI_type_node, NULL_TREE);
+  tree v2si_ftype_v4hi_v4hi
+    = build_function_type_list (V2SI_type_node,
+				V4HI_type_node, V4HI_type_node, NULL_TREE);
+  tree v2si_ftype_v8qi_v8qi
+    = build_function_type_list (V2SI_type_node,
+				V8QI_type_node, V8QI_type_node, NULL_TREE);
+  tree v4hi_ftype_v4hi_di
+    = build_function_type_list (V4HI_type_node,
+				V4HI_type_node, long_long_integer_type_node,
+				NULL_TREE);
+  tree v2si_ftype_v2si_di
+    = build_function_type_list (V2SI_type_node,
+				V2SI_type_node, long_long_integer_type_node,
+				NULL_TREE);
+  tree di_ftype_void
+    = build_function_type_list (long_long_unsigned_type_node, NULL_TREE);
+  tree int_ftype_void
+    = build_function_type_list (integer_type_node, NULL_TREE);
+  tree di_ftype_v8qi
+    = build_function_type_list (long_long_integer_type_node,
+				V8QI_type_node, NULL_TREE);
+  tree di_ftype_v4hi
+    = build_function_type_list (long_long_integer_type_node,
+				V4HI_type_node, NULL_TREE);
+  tree di_ftype_v2si
+    = build_function_type_list (long_long_integer_type_node,
+				V2SI_type_node, NULL_TREE);
+  tree v2si_ftype_v4hi
+    = build_function_type_list (V2SI_type_node,
+				V4HI_type_node, NULL_TREE);
+  tree v4hi_ftype_v8qi
+    = build_function_type_list (V4HI_type_node,
+				V8QI_type_node, NULL_TREE);
+  tree v8qi_ftype_v8qi
+    = build_function_type_list (V8QI_type_node,
+	                        V8QI_type_node, NULL_TREE);
+  tree v4hi_ftype_v4hi
+    = build_function_type_list (V4HI_type_node,
+	                        V4HI_type_node, NULL_TREE);
+  tree v2si_ftype_v2si
+    = build_function_type_list (V2SI_type_node,
+	                        V2SI_type_node, NULL_TREE);
+
+  tree di_ftype_di_v4hi_v4hi
+    = build_function_type_list (long_long_unsigned_type_node,
+				long_long_unsigned_type_node,
+				V4HI_type_node, V4HI_type_node,
+				NULL_TREE);
+
+  tree di_ftype_v4hi_v4hi
+    = build_function_type_list (long_long_unsigned_type_node,
+				V4HI_type_node,V4HI_type_node,
+				NULL_TREE);
+
+  tree v2si_ftype_v2si_v4hi_v4hi
+    = build_function_type_list (V2SI_type_node,
+                                V2SI_type_node, V4HI_type_node,
+                                V4HI_type_node, NULL_TREE);
+
+  tree v2si_ftype_v2si_v8qi_v8qi
+    = build_function_type_list (V2SI_type_node,
+                                V2SI_type_node, V8QI_type_node,
+                                V8QI_type_node, NULL_TREE);
+
+  tree di_ftype_di_v2si_v2si
+     = build_function_type_list (long_long_unsigned_type_node,
+                                 long_long_unsigned_type_node,
+                                 V2SI_type_node, V2SI_type_node,
+                                 NULL_TREE);
+
+   tree di_ftype_di_di_int
+     = build_function_type_list (long_long_unsigned_type_node,
+                                 long_long_unsigned_type_node,
+                                 long_long_unsigned_type_node,
+                                 integer_type_node, NULL_TREE);
+
+   tree void_ftype_int
+     = build_function_type_list (void_type_node,
+                                 integer_type_node, NULL_TREE);
+
+   tree v8qi_ftype_char
+     = build_function_type_list (V8QI_type_node,
+                                 signed_char_type_node, NULL_TREE);
+
+   tree v4hi_ftype_short
+     = build_function_type_list (V4HI_type_node,
+                                 short_integer_type_node, NULL_TREE);
+
+   tree v2si_ftype_int
+     = build_function_type_list (V2SI_type_node,
+                                 integer_type_node, NULL_TREE);
+
+  /* Normal vector binops.  */
+  tree v8qi_ftype_v8qi_v8qi
+    = build_function_type_list (V8QI_type_node,
+				V8QI_type_node, V8QI_type_node, NULL_TREE);
+  tree v4hi_ftype_v4hi_v4hi
+    = build_function_type_list (V4HI_type_node,
+				V4HI_type_node,V4HI_type_node, NULL_TREE);
+  tree v2si_ftype_v2si_v2si
+    = build_function_type_list (V2SI_type_node,
+				V2SI_type_node, V2SI_type_node, NULL_TREE);
+  tree di_ftype_di_di
+    = build_function_type_list (long_long_unsigned_type_node,
+				long_long_unsigned_type_node,
+				long_long_unsigned_type_node,
+				NULL_TREE);
+
+  /* Add all builtins that are more or less simple operations on two
+     operands.  */
+  for (i = 0, d = bdesc_2arg; i < ARRAY_SIZE (bdesc_2arg); i++, d++)
+    {
+      /* Use one of the operands; the target can have a different mode for
+	 mask-generating compares.  */
+      machine_mode mode;
+      tree type;
+
+      if (d->name == 0 || !(d->mask == FL_IWMMXT || d->mask == FL_IWMMXT2))
+	continue;
+
+      mode = insn_data[d->icode].operand[1].mode;
+
+      switch (mode)
+	{
+	case V8QImode:
+	  type = v8qi_ftype_v8qi_v8qi;
+	  break;
+	case V4HImode:
+	  type = v4hi_ftype_v4hi_v4hi;
+	  break;
+	case V2SImode:
+	  type = v2si_ftype_v2si_v2si;
+	  break;
+	case DImode:
+	  type = di_ftype_di_di;
+	  break;
+
+	default:
+	  gcc_unreachable ();
+	}
+
+      def_mbuiltin (d->mask, d->name, type, d->code);
+    }
+
+  /* Add the remaining MMX insns with somewhat more complicated types.  */
+#define iwmmx_mbuiltin(NAME, TYPE, CODE)			\
+  def_mbuiltin (FL_IWMMXT, "__builtin_arm_" NAME, (TYPE),	\
+		ARM_BUILTIN_ ## CODE)
+
+#define iwmmx2_mbuiltin(NAME, TYPE, CODE)                      \
+  def_mbuiltin (FL_IWMMXT2, "__builtin_arm_" NAME, (TYPE),     \
+               ARM_BUILTIN_ ## CODE)
+
+  iwmmx_mbuiltin ("wzero", di_ftype_void, WZERO);
+  iwmmx_mbuiltin ("setwcgr0", void_ftype_int, SETWCGR0);
+  iwmmx_mbuiltin ("setwcgr1", void_ftype_int, SETWCGR1);
+  iwmmx_mbuiltin ("setwcgr2", void_ftype_int, SETWCGR2);
+  iwmmx_mbuiltin ("setwcgr3", void_ftype_int, SETWCGR3);
+  iwmmx_mbuiltin ("getwcgr0", int_ftype_void, GETWCGR0);
+  iwmmx_mbuiltin ("getwcgr1", int_ftype_void, GETWCGR1);
+  iwmmx_mbuiltin ("getwcgr2", int_ftype_void, GETWCGR2);
+  iwmmx_mbuiltin ("getwcgr3", int_ftype_void, GETWCGR3);
+
+  iwmmx_mbuiltin ("wsllh", v4hi_ftype_v4hi_di, WSLLH);
+  iwmmx_mbuiltin ("wsllw", v2si_ftype_v2si_di, WSLLW);
+  iwmmx_mbuiltin ("wslld", di_ftype_di_di, WSLLD);
+  iwmmx_mbuiltin ("wsllhi", v4hi_ftype_v4hi_int, WSLLHI);
+  iwmmx_mbuiltin ("wsllwi", v2si_ftype_v2si_int, WSLLWI);
+  iwmmx_mbuiltin ("wslldi", di_ftype_di_int, WSLLDI);
+
+  iwmmx_mbuiltin ("wsrlh", v4hi_ftype_v4hi_di, WSRLH);
+  iwmmx_mbuiltin ("wsrlw", v2si_ftype_v2si_di, WSRLW);
+  iwmmx_mbuiltin ("wsrld", di_ftype_di_di, WSRLD);
+  iwmmx_mbuiltin ("wsrlhi", v4hi_ftype_v4hi_int, WSRLHI);
+  iwmmx_mbuiltin ("wsrlwi", v2si_ftype_v2si_int, WSRLWI);
+  iwmmx_mbuiltin ("wsrldi", di_ftype_di_int, WSRLDI);
+
+  iwmmx_mbuiltin ("wsrah", v4hi_ftype_v4hi_di, WSRAH);
+  iwmmx_mbuiltin ("wsraw", v2si_ftype_v2si_di, WSRAW);
+  iwmmx_mbuiltin ("wsrad", di_ftype_di_di, WSRAD);
+  iwmmx_mbuiltin ("wsrahi", v4hi_ftype_v4hi_int, WSRAHI);
+  iwmmx_mbuiltin ("wsrawi", v2si_ftype_v2si_int, WSRAWI);
+  iwmmx_mbuiltin ("wsradi", di_ftype_di_int, WSRADI);
+
+  iwmmx_mbuiltin ("wrorh", v4hi_ftype_v4hi_di, WRORH);
+  iwmmx_mbuiltin ("wrorw", v2si_ftype_v2si_di, WRORW);
+  iwmmx_mbuiltin ("wrord", di_ftype_di_di, WRORD);
+  iwmmx_mbuiltin ("wrorhi", v4hi_ftype_v4hi_int, WRORHI);
+  iwmmx_mbuiltin ("wrorwi", v2si_ftype_v2si_int, WRORWI);
+  iwmmx_mbuiltin ("wrordi", di_ftype_di_int, WRORDI);
+
+  iwmmx_mbuiltin ("wshufh", v4hi_ftype_v4hi_int, WSHUFH);
+
+  iwmmx_mbuiltin ("wsadb", v2si_ftype_v2si_v8qi_v8qi, WSADB);
+  iwmmx_mbuiltin ("wsadh", v2si_ftype_v2si_v4hi_v4hi, WSADH);
+  iwmmx_mbuiltin ("wmadds", v2si_ftype_v4hi_v4hi, WMADDS);
+  iwmmx2_mbuiltin ("wmaddsx", v2si_ftype_v4hi_v4hi, WMADDSX);
+  iwmmx2_mbuiltin ("wmaddsn", v2si_ftype_v4hi_v4hi, WMADDSN);
+  iwmmx_mbuiltin ("wmaddu", v2si_ftype_v4hi_v4hi, WMADDU);
+  iwmmx2_mbuiltin ("wmaddux", v2si_ftype_v4hi_v4hi, WMADDUX);
+  iwmmx2_mbuiltin ("wmaddun", v2si_ftype_v4hi_v4hi, WMADDUN);
+  iwmmx_mbuiltin ("wsadbz", v2si_ftype_v8qi_v8qi, WSADBZ);
+  iwmmx_mbuiltin ("wsadhz", v2si_ftype_v4hi_v4hi, WSADHZ);
+
+  iwmmx_mbuiltin ("textrmsb", int_ftype_v8qi_int, TEXTRMSB);
+  iwmmx_mbuiltin ("textrmsh", int_ftype_v4hi_int, TEXTRMSH);
+  iwmmx_mbuiltin ("textrmsw", int_ftype_v2si_int, TEXTRMSW);
+  iwmmx_mbuiltin ("textrmub", int_ftype_v8qi_int, TEXTRMUB);
+  iwmmx_mbuiltin ("textrmuh", int_ftype_v4hi_int, TEXTRMUH);
+  iwmmx_mbuiltin ("textrmuw", int_ftype_v2si_int, TEXTRMUW);
+  iwmmx_mbuiltin ("tinsrb", v8qi_ftype_v8qi_int_int, TINSRB);
+  iwmmx_mbuiltin ("tinsrh", v4hi_ftype_v4hi_int_int, TINSRH);
+  iwmmx_mbuiltin ("tinsrw", v2si_ftype_v2si_int_int, TINSRW);
+
+  iwmmx_mbuiltin ("waccb", di_ftype_v8qi, WACCB);
+  iwmmx_mbuiltin ("wacch", di_ftype_v4hi, WACCH);
+  iwmmx_mbuiltin ("waccw", di_ftype_v2si, WACCW);
+
+  iwmmx_mbuiltin ("tmovmskb", int_ftype_v8qi, TMOVMSKB);
+  iwmmx_mbuiltin ("tmovmskh", int_ftype_v4hi, TMOVMSKH);
+  iwmmx_mbuiltin ("tmovmskw", int_ftype_v2si, TMOVMSKW);
+
+  iwmmx2_mbuiltin ("waddbhusm", v8qi_ftype_v4hi_v8qi, WADDBHUSM);
+  iwmmx2_mbuiltin ("waddbhusl", v8qi_ftype_v4hi_v8qi, WADDBHUSL);
+
+  iwmmx_mbuiltin ("wpackhss", v8qi_ftype_v4hi_v4hi, WPACKHSS);
+  iwmmx_mbuiltin ("wpackhus", v8qi_ftype_v4hi_v4hi, WPACKHUS);
+  iwmmx_mbuiltin ("wpackwus", v4hi_ftype_v2si_v2si, WPACKWUS);
+  iwmmx_mbuiltin ("wpackwss", v4hi_ftype_v2si_v2si, WPACKWSS);
+  iwmmx_mbuiltin ("wpackdus", v2si_ftype_di_di, WPACKDUS);
+  iwmmx_mbuiltin ("wpackdss", v2si_ftype_di_di, WPACKDSS);
+
+  iwmmx_mbuiltin ("wunpckehub", v4hi_ftype_v8qi, WUNPCKEHUB);
+  iwmmx_mbuiltin ("wunpckehuh", v2si_ftype_v4hi, WUNPCKEHUH);
+  iwmmx_mbuiltin ("wunpckehuw", di_ftype_v2si, WUNPCKEHUW);
+  iwmmx_mbuiltin ("wunpckehsb", v4hi_ftype_v8qi, WUNPCKEHSB);
+  iwmmx_mbuiltin ("wunpckehsh", v2si_ftype_v4hi, WUNPCKEHSH);
+  iwmmx_mbuiltin ("wunpckehsw", di_ftype_v2si, WUNPCKEHSW);
+  iwmmx_mbuiltin ("wunpckelub", v4hi_ftype_v8qi, WUNPCKELUB);
+  iwmmx_mbuiltin ("wunpckeluh", v2si_ftype_v4hi, WUNPCKELUH);
+  iwmmx_mbuiltin ("wunpckeluw", di_ftype_v2si, WUNPCKELUW);
+  iwmmx_mbuiltin ("wunpckelsb", v4hi_ftype_v8qi, WUNPCKELSB);
+  iwmmx_mbuiltin ("wunpckelsh", v2si_ftype_v4hi, WUNPCKELSH);
+  iwmmx_mbuiltin ("wunpckelsw", di_ftype_v2si, WUNPCKELSW);
+
+  iwmmx_mbuiltin ("wmacs", di_ftype_di_v4hi_v4hi, WMACS);
+  iwmmx_mbuiltin ("wmacsz", di_ftype_v4hi_v4hi, WMACSZ);
+  iwmmx_mbuiltin ("wmacu", di_ftype_di_v4hi_v4hi, WMACU);
+  iwmmx_mbuiltin ("wmacuz", di_ftype_v4hi_v4hi, WMACUZ);
+
+  iwmmx_mbuiltin ("walign", v8qi_ftype_v8qi_v8qi_int, WALIGNI);
+  iwmmx_mbuiltin ("tmia", di_ftype_di_int_int, TMIA);
+  iwmmx_mbuiltin ("tmiaph", di_ftype_di_int_int, TMIAPH);
+  iwmmx_mbuiltin ("tmiabb", di_ftype_di_int_int, TMIABB);
+  iwmmx_mbuiltin ("tmiabt", di_ftype_di_int_int, TMIABT);
+  iwmmx_mbuiltin ("tmiatb", di_ftype_di_int_int, TMIATB);
+  iwmmx_mbuiltin ("tmiatt", di_ftype_di_int_int, TMIATT);
+
+  iwmmx2_mbuiltin ("wabsb", v8qi_ftype_v8qi, WABSB);
+  iwmmx2_mbuiltin ("wabsh", v4hi_ftype_v4hi, WABSH);
+  iwmmx2_mbuiltin ("wabsw", v2si_ftype_v2si, WABSW);
+
+  iwmmx2_mbuiltin ("wqmiabb", v2si_ftype_v2si_v4hi_v4hi, WQMIABB);
+  iwmmx2_mbuiltin ("wqmiabt", v2si_ftype_v2si_v4hi_v4hi, WQMIABT);
+  iwmmx2_mbuiltin ("wqmiatb", v2si_ftype_v2si_v4hi_v4hi, WQMIATB);
+  iwmmx2_mbuiltin ("wqmiatt", v2si_ftype_v2si_v4hi_v4hi, WQMIATT);
+
+  iwmmx2_mbuiltin ("wqmiabbn", v2si_ftype_v2si_v4hi_v4hi, WQMIABBN);
+  iwmmx2_mbuiltin ("wqmiabtn", v2si_ftype_v2si_v4hi_v4hi, WQMIABTN);
+  iwmmx2_mbuiltin ("wqmiatbn", v2si_ftype_v2si_v4hi_v4hi, WQMIATBN);
+  iwmmx2_mbuiltin ("wqmiattn", v2si_ftype_v2si_v4hi_v4hi, WQMIATTN);
+
+  iwmmx2_mbuiltin ("wmiabb", di_ftype_di_v4hi_v4hi, WMIABB);
+  iwmmx2_mbuiltin ("wmiabt", di_ftype_di_v4hi_v4hi, WMIABT);
+  iwmmx2_mbuiltin ("wmiatb", di_ftype_di_v4hi_v4hi, WMIATB);
+  iwmmx2_mbuiltin ("wmiatt", di_ftype_di_v4hi_v4hi, WMIATT);
+
+  iwmmx2_mbuiltin ("wmiabbn", di_ftype_di_v4hi_v4hi, WMIABBN);
+  iwmmx2_mbuiltin ("wmiabtn", di_ftype_di_v4hi_v4hi, WMIABTN);
+  iwmmx2_mbuiltin ("wmiatbn", di_ftype_di_v4hi_v4hi, WMIATBN);
+  iwmmx2_mbuiltin ("wmiattn", di_ftype_di_v4hi_v4hi, WMIATTN);
+
+  iwmmx2_mbuiltin ("wmiawbb", di_ftype_di_v2si_v2si, WMIAWBB);
+  iwmmx2_mbuiltin ("wmiawbt", di_ftype_di_v2si_v2si, WMIAWBT);
+  iwmmx2_mbuiltin ("wmiawtb", di_ftype_di_v2si_v2si, WMIAWTB);
+  iwmmx2_mbuiltin ("wmiawtt", di_ftype_di_v2si_v2si, WMIAWTT);
+
+  iwmmx2_mbuiltin ("wmiawbbn", di_ftype_di_v2si_v2si, WMIAWBBN);
+  iwmmx2_mbuiltin ("wmiawbtn", di_ftype_di_v2si_v2si, WMIAWBTN);
+  iwmmx2_mbuiltin ("wmiawtbn", di_ftype_di_v2si_v2si, WMIAWTBN);
+  iwmmx2_mbuiltin ("wmiawttn", di_ftype_di_v2si_v2si, WMIAWTTN);
+
+  iwmmx2_mbuiltin ("wmerge", di_ftype_di_di_int, WMERGE);
+
+  iwmmx_mbuiltin ("tbcstb", v8qi_ftype_char, TBCSTB);
+  iwmmx_mbuiltin ("tbcsth", v4hi_ftype_short, TBCSTH);
+  iwmmx_mbuiltin ("tbcstw", v2si_ftype_int, TBCSTW);
+
+#undef iwmmx_mbuiltin
+#undef iwmmx2_mbuiltin
+}
+
+static void
+arm_init_fp16_builtins (void)
+{
+  tree fp16_type = make_node (REAL_TYPE);
+  TYPE_PRECISION (fp16_type) = 16;
+  layout_type (fp16_type);
+  (*lang_hooks.types.register_builtin_type) (fp16_type, "__fp16");
+}
+
+static void
+arm_init_crc32_builtins ()
+{
+  tree si_ftype_si_qi
+    = build_function_type_list (unsigned_intSI_type_node,
+                                unsigned_intSI_type_node,
+                                unsigned_intQI_type_node, NULL_TREE);
+  tree si_ftype_si_hi
+    = build_function_type_list (unsigned_intSI_type_node,
+                                unsigned_intSI_type_node,
+                                unsigned_intHI_type_node, NULL_TREE);
+  tree si_ftype_si_si
+    = build_function_type_list (unsigned_intSI_type_node,
+                                unsigned_intSI_type_node,
+                                unsigned_intSI_type_node, NULL_TREE);
+
+  arm_builtin_decls[ARM_BUILTIN_CRC32B]
+    = add_builtin_function ("__builtin_arm_crc32b", si_ftype_si_qi,
+                            ARM_BUILTIN_CRC32B, BUILT_IN_MD, NULL, NULL_TREE);
+  arm_builtin_decls[ARM_BUILTIN_CRC32H]
+    = add_builtin_function ("__builtin_arm_crc32h", si_ftype_si_hi,
+                            ARM_BUILTIN_CRC32H, BUILT_IN_MD, NULL, NULL_TREE);
+  arm_builtin_decls[ARM_BUILTIN_CRC32W]
+    = add_builtin_function ("__builtin_arm_crc32w", si_ftype_si_si,
+                            ARM_BUILTIN_CRC32W, BUILT_IN_MD, NULL, NULL_TREE);
+  arm_builtin_decls[ARM_BUILTIN_CRC32CB]
+    = add_builtin_function ("__builtin_arm_crc32cb", si_ftype_si_qi,
+                            ARM_BUILTIN_CRC32CB, BUILT_IN_MD, NULL, NULL_TREE);
+  arm_builtin_decls[ARM_BUILTIN_CRC32CH]
+    = add_builtin_function ("__builtin_arm_crc32ch", si_ftype_si_hi,
+                            ARM_BUILTIN_CRC32CH, BUILT_IN_MD, NULL, NULL_TREE);
+  arm_builtin_decls[ARM_BUILTIN_CRC32CW]
+    = add_builtin_function ("__builtin_arm_crc32cw", si_ftype_si_si,
+                            ARM_BUILTIN_CRC32CW, BUILT_IN_MD, NULL, NULL_TREE);
+}
+
+void
+arm_init_builtins (void)
+{
+  if (TARGET_REALLY_IWMMXT)
+    arm_init_iwmmxt_builtins ();
+
+  if (TARGET_NEON)
+    arm_init_neon_builtins ();
+
+  if (arm_fp16_format)
+    arm_init_fp16_builtins ();
+
+  if (TARGET_CRC32)
+    arm_init_crc32_builtins ();
+
+  if (TARGET_VFP && TARGET_HARD_FLOAT)
+    {
+      tree ftype_set_fpscr
+	= build_function_type_list (void_type_node, unsigned_type_node, NULL);
+      tree ftype_get_fpscr
+	= build_function_type_list (unsigned_type_node, NULL);
+
+      arm_builtin_decls[ARM_BUILTIN_GET_FPSCR]
+	= add_builtin_function ("__builtin_arm_ldfscr", ftype_get_fpscr,
+				ARM_BUILTIN_GET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
+      arm_builtin_decls[ARM_BUILTIN_SET_FPSCR]
+	= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
+				ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
+    }
+}
+
+/* Return the ARM builtin for CODE.  */
+
+tree
+arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+{
+  if (code >= ARM_BUILTIN_MAX)
+    return error_mark_node;
+
+  return arm_builtin_decls[code];
+}
+
+/* Errors in the source file can cause expand_expr to return const0_rtx
+   where we expect a vector.  To avoid crashing, use one of the vector
+   clear instructions.  */
+
+static rtx
+safe_vector_operand (rtx x, machine_mode mode)
+{
+  if (x != const0_rtx)
+    return x;
+  x = gen_reg_rtx (mode);
+
+  emit_insn (gen_iwmmxt_clrdi (mode == DImode ? x
+			       : gen_rtx_SUBREG (DImode, x, 0)));
+  return x;
+}
+
+/* Function to expand ternary builtins.  */
+static rtx
+arm_expand_ternop_builtin (enum insn_code icode,
+                           tree exp, rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  rtx op3 = NULL_RTX;
+
+  /* The sha1c, sha1p, sha1m crypto builtins require a different vec_select
+     lane operand depending on endianness.  */
+  bool builtin_sha1cpm_p = false;
+
+  if (insn_data[icode].n_operands == 5)
+    {
+      gcc_assert (icode == CODE_FOR_crypto_sha1c
+                  || icode == CODE_FOR_crypto_sha1p
+                  || icode == CODE_FOR_crypto_sha1m);
+      builtin_sha1cpm_p = true;
+    }
+  machine_mode tmode = insn_data[icode].operand[0].mode;
+  machine_mode mode0 = insn_data[icode].operand[1].mode;
+  machine_mode mode1 = insn_data[icode].operand[2].mode;
+  machine_mode mode2 = insn_data[icode].operand[3].mode;
+
+
+  if (VECTOR_MODE_P (mode0))
+    op0 = safe_vector_operand (op0, mode0);
+  if (VECTOR_MODE_P (mode1))
+    op1 = safe_vector_operand (op1, mode1);
+  if (VECTOR_MODE_P (mode2))
+    op2 = safe_vector_operand (op2, mode2);
+
+  if (! target
+      || GET_MODE (target) != tmode
+      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+    target = gen_reg_rtx (tmode);
+
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+	      && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode)
+	      && (GET_MODE (op2) == mode2 || GET_MODE (op2) == VOIDmode));
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
+    op0 = copy_to_mode_reg (mode0, op0);
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    op1 = copy_to_mode_reg (mode1, op1);
+  if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
+    op2 = copy_to_mode_reg (mode2, op2);
+  if (builtin_sha1cpm_p)
+    op3 = GEN_INT (TARGET_BIG_END ? 1 : 0);
+
+  if (builtin_sha1cpm_p)
+    pat = GEN_FCN (icode) (target, op0, op1, op2, op3);
+  else
+    pat = GEN_FCN (icode) (target, op0, op1, op2);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
+/* Subroutine of arm_expand_builtin to take care of binop insns.  */
+
+static rtx
+arm_expand_binop_builtin (enum insn_code icode,
+			  tree exp, rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  machine_mode tmode = insn_data[icode].operand[0].mode;
+  machine_mode mode0 = insn_data[icode].operand[1].mode;
+  machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (VECTOR_MODE_P (mode0))
+    op0 = safe_vector_operand (op0, mode0);
+  if (VECTOR_MODE_P (mode1))
+    op1 = safe_vector_operand (op1, mode1);
+
+  if (! target
+      || GET_MODE (target) != tmode
+      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+    target = gen_reg_rtx (tmode);
+
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+	      && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
+    op0 = copy_to_mode_reg (mode0, op0);
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    op1 = copy_to_mode_reg (mode1, op1);
+
+  pat = GEN_FCN (icode) (target, op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
+/* Subroutine of arm_expand_builtin to take care of unop insns.  */
+
+static rtx
+arm_expand_unop_builtin (enum insn_code icode,
+			 tree exp, rtx target, int do_load)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = NULL_RTX;
+  machine_mode tmode = insn_data[icode].operand[0].mode;
+  machine_mode mode0 = insn_data[icode].operand[1].mode;
+  bool builtin_sha1h_p = false;
+
+  if (insn_data[icode].n_operands == 3)
+    {
+      gcc_assert (icode == CODE_FOR_crypto_sha1h);
+      builtin_sha1h_p = true;
+    }
+
+  if (! target
+      || GET_MODE (target) != tmode
+      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+    target = gen_reg_rtx (tmode);
+  if (do_load)
+    op0 = gen_rtx_MEM (mode0, copy_to_mode_reg (Pmode, op0));
+  else
+    {
+      if (VECTOR_MODE_P (mode0))
+	op0 = safe_vector_operand (op0, mode0);
+
+      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
+	op0 = copy_to_mode_reg (mode0, op0);
+    }
+  if (builtin_sha1h_p)
+    op1 = GEN_INT (TARGET_BIG_END ? 1 : 0);
+
+  if (builtin_sha1h_p)
+    pat = GEN_FCN (icode) (target, op0, op1);
+  else
+    pat = GEN_FCN (icode) (target, op0);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
+typedef enum {
+  NEON_ARG_COPY_TO_REG,
+  NEON_ARG_CONSTANT,
+  NEON_ARG_MEMORY,
+  NEON_ARG_STOP
+} builtin_arg;
+
+#define NEON_MAX_BUILTIN_ARGS 5
+
+/* EXP is a pointer argument to a Neon load or store intrinsic.  Derive
+   and return an expression for the accessed memory.
+
+   The intrinsic function operates on a block of registers that has
+   mode REG_MODE.  This block contains vectors of type TYPE_MODE.  The
+   function references the memory at EXP of type TYPE and in mode
+   MEM_MODE; this mode may be BLKmode if no more suitable mode is
+   available.  */
+
+static tree
+neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
+			  machine_mode reg_mode,
+			  neon_builtin_type_mode type_mode)
+{
+  HOST_WIDE_INT reg_size, vector_size, nvectors, nelems;
+  tree elem_type, upper_bound, array_type;
+
+  /* Work out the size of the register block in bytes.  */
+  reg_size = GET_MODE_SIZE (reg_mode);
+
+  /* Work out the size of each vector in bytes.  */
+  gcc_assert (TYPE_MODE_BIT (type_mode) & (TB_DREG | TB_QREG));
+  vector_size = (TYPE_MODE_BIT (type_mode) & TB_QREG ? 16 : 8);
+
+  /* Work out how many vectors there are.  */
+  gcc_assert (reg_size % vector_size == 0);
+  nvectors = reg_size / vector_size;
+
+  /* Work out the type of each element.  */
+  gcc_assert (POINTER_TYPE_P (type));
+  elem_type = TREE_TYPE (type);
+
+  /* Work out how many elements are being loaded or stored.
+     MEM_MODE == REG_MODE implies a one-to-one mapping between register
+     and memory elements; anything else implies a lane load or store.  */
+  if (mem_mode == reg_mode)
+    nelems = vector_size * nvectors / int_size_in_bytes (elem_type);
+  else
+    nelems = nvectors;
+
+  /* Create a type that describes the full access.  */
+  upper_bound = build_int_cst (size_type_node, nelems - 1);
+  array_type = build_array_type (elem_type, build_index_type (upper_bound));
+
+  /* Dereference EXP using that type.  */
+  return fold_build2 (MEM_REF, array_type, exp,
+		      build_int_cst (build_pointer_type (array_type), 0));
+}
+
+/* Expand a Neon builtin.  */
+static rtx
+arm_expand_neon_args (rtx target, int icode, int have_retval,
+		      neon_builtin_type_mode type_mode,
+		      tree exp, int fcode, ...)
+{
+  va_list ap;
+  rtx pat;
+  tree arg[NEON_MAX_BUILTIN_ARGS];
+  rtx op[NEON_MAX_BUILTIN_ARGS];
+  tree arg_type;
+  tree formals;
+  machine_mode tmode = insn_data[icode].operand[0].mode;
+  machine_mode mode[NEON_MAX_BUILTIN_ARGS];
+  machine_mode other_mode;
+  int argc = 0;
+  int opno;
+
+  if (have_retval
+      && (!target
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
+    target = gen_reg_rtx (tmode);
+
+  va_start (ap, fcode);
+
+  formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode]));
+
+  for (;;)
+    {
+      builtin_arg thisarg = (builtin_arg) va_arg (ap, int);
+
+      if (thisarg == NEON_ARG_STOP)
+        break;
+      else
+        {
+          opno = argc + have_retval;
+          mode[argc] = insn_data[icode].operand[opno].mode;
+          arg[argc] = CALL_EXPR_ARG (exp, argc);
+	  arg_type = TREE_VALUE (formals);
+          if (thisarg == NEON_ARG_MEMORY)
+            {
+              other_mode = insn_data[icode].operand[1 - opno].mode;
+              arg[argc] = neon_dereference_pointer (arg[argc], arg_type,
+						    mode[argc], other_mode,
+						    type_mode);
+            }
+
+	  /* Use EXPAND_MEMORY for NEON_ARG_MEMORY to ensure a MEM_P
+	     be returned.  */
+	  op[argc] = expand_expr (arg[argc], NULL_RTX, VOIDmode,
+				  (thisarg == NEON_ARG_MEMORY
+				   ? EXPAND_MEMORY : EXPAND_NORMAL));
+
+          switch (thisarg)
+            {
+            case NEON_ARG_COPY_TO_REG:
+              /*gcc_assert (GET_MODE (op[argc]) == mode[argc]);*/
+              if (!(*insn_data[icode].operand[opno].predicate)
+                     (op[argc], mode[argc]))
+                op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
+              break;
+
+            case NEON_ARG_CONSTANT:
+              /* FIXME: This error message is somewhat unhelpful.  */
+              if (!(*insn_data[icode].operand[opno].predicate)
+                    (op[argc], mode[argc]))
+		error ("argument must be a constant");
+              break;
+
+            case NEON_ARG_MEMORY:
+	      /* Check if expand failed.  */
+	      if (op[argc] == const0_rtx)
+		return 0;
+	      gcc_assert (MEM_P (op[argc]));
+	      PUT_MODE (op[argc], mode[argc]);
+	      /* ??? arm_neon.h uses the same built-in functions for signed
+		 and unsigned accesses, casting where necessary.  This isn't
+		 alias safe.  */
+	      set_mem_alias_set (op[argc], 0);
+	      if (!(*insn_data[icode].operand[opno].predicate)
+                    (op[argc], mode[argc]))
+		op[argc] = (replace_equiv_address
+			    (op[argc], force_reg (Pmode, XEXP (op[argc], 0))));
+              break;
+
+            case NEON_ARG_STOP:
+              gcc_unreachable ();
+            }
+
+          argc++;
+	  formals = TREE_CHAIN (formals);
+        }
+    }
+
+  va_end (ap);
+
+  if (have_retval)
+    switch (argc)
+      {
+      case 1:
+	pat = GEN_FCN (icode) (target, op[0]);
+	break;
+
+      case 2:
+	pat = GEN_FCN (icode) (target, op[0], op[1]);
+	break;
+
+      case 3:
+	pat = GEN_FCN (icode) (target, op[0], op[1], op[2]);
+	break;
+
+      case 4:
+	pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3]);
+	break;
+
+      case 5:
+	pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3], op[4]);
+	break;
+
+      default:
+	gcc_unreachable ();
+      }
+  else
+    switch (argc)
+      {
+      case 1:
+	pat = GEN_FCN (icode) (op[0]);
+	break;
+
+      case 2:
+	pat = GEN_FCN (icode) (op[0], op[1]);
+	break;
+
+      case 3:
+	pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+	break;
+
+      case 4:
+	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+	break;
+
+      case 5:
+	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+        break;
+
+      default:
+	gcc_unreachable ();
+      }
+
+  if (!pat)
+    return 0;
+
+  emit_insn (pat);
+
+  return target;
+}
+
+/* Expand a Neon builtin. These are "special" because they don't have symbolic
+   constants defined per-instruction or per instruction-variant. Instead, the
+   required info is looked up in the table neon_builtin_data.  */
+static rtx
+arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+{
+  neon_builtin_datum *d = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE];
+  neon_itype itype = d->itype;
+  enum insn_code icode = d->code;
+  neon_builtin_type_mode type_mode = d->mode;
+
+  switch (itype)
+    {
+    case NEON_UNOP:
+    case NEON_CONVERT:
+    case NEON_DUPLANE:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_BINOP:
+    case NEON_LOGICBINOP:
+    case NEON_SCALARMUL:
+    case NEON_SCALARMULL:
+    case NEON_SCALARMULH:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_TERNOP:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_STOP);
+
+    case NEON_GETLANE:
+    case NEON_FIXCONV:
+    case NEON_SHIFTIMM:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+        NEON_ARG_STOP);
+
+    case NEON_CREATE:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_DUP:
+    case NEON_RINT:
+    case NEON_SPLIT:
+    case NEON_FLOAT_WIDEN:
+    case NEON_FLOAT_NARROW:
+    case NEON_BSWAP:
+    case NEON_REINTERP:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_COPYSIGNF:
+    case NEON_COMBINE:
+    case NEON_VTBL:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_LANEMUL:
+    case NEON_LANEMULL:
+    case NEON_LANEMULH:
+    case NEON_SETLANE:
+    case NEON_SHIFTINSERT:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+        NEON_ARG_STOP);
+
+    case NEON_LANEMAC:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+
+    case NEON_SHIFTACC:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+        NEON_ARG_STOP);
+
+    case NEON_SCALARMAC:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_STOP);
+
+    case NEON_SELECT:
+    case NEON_VTBX:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_STOP);
+
+    case NEON_LOAD1:
+    case NEON_LOADSTRUCT:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+	NEON_ARG_MEMORY, NEON_ARG_STOP);
+
+    case NEON_LOAD1LANE:
+    case NEON_LOADSTRUCTLANE:
+      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
+	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+	NEON_ARG_STOP);
+
+    case NEON_STORE1:
+    case NEON_STORESTRUCT:
+      return arm_expand_neon_args (target, icode, 0, type_mode, exp, fcode,
+	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_STORE1LANE:
+    case NEON_STORESTRUCTLANE:
+      return arm_expand_neon_args (target, icode, 0, type_mode, exp, fcode,
+	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+	NEON_ARG_STOP);
+    }
+
+  gcc_unreachable ();
+}
+
+/* Expand an expression EXP that calls a built-in function,
+   with result going to TARGET if that's convenient
+   (and in mode MODE if that's convenient).
+   SUBTARGET may be used as the target for computing one of EXP's operands.
+   IGNORE is nonzero if the value is to be ignored.  */
+
+rtx
+arm_expand_builtin (tree exp,
+		    rtx target,
+		    rtx subtarget ATTRIBUTE_UNUSED,
+		    machine_mode mode ATTRIBUTE_UNUSED,
+		    int ignore ATTRIBUTE_UNUSED)
+{
+  const struct builtin_description * d;
+  enum insn_code    icode;
+  tree              fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  tree              arg0;
+  tree              arg1;
+  tree              arg2;
+  rtx               op0;
+  rtx               op1;
+  rtx               op2;
+  rtx               pat;
+  unsigned int      fcode = DECL_FUNCTION_CODE (fndecl);
+  size_t            i;
+  machine_mode tmode;
+  machine_mode mode0;
+  machine_mode mode1;
+  machine_mode mode2;
+  int opint;
+  int selector;
+  int mask;
+  int imm;
+
+  if (fcode >= ARM_BUILTIN_NEON_BASE)
+    return arm_expand_neon_builtin (fcode, exp, target);
+
+  switch (fcode)
+    {
+    case ARM_BUILTIN_GET_FPSCR:
+    case ARM_BUILTIN_SET_FPSCR:
+      if (fcode == ARM_BUILTIN_GET_FPSCR)
+	{
+	  icode = CODE_FOR_get_fpscr;
+	  target = gen_reg_rtx (SImode);
+	  pat = GEN_FCN (icode) (target);
+	}
+      else
+	{
+	  target = NULL_RTX;
+	  icode = CODE_FOR_set_fpscr;
+	  arg0 = CALL_EXPR_ARG (exp, 0);
+	  op0 = expand_normal (arg0);
+	  pat = GEN_FCN (icode) (op0);
+	}
+      emit_insn (pat);
+      return target;
+
+    case ARM_BUILTIN_TEXTRMSB:
+    case ARM_BUILTIN_TEXTRMUB:
+    case ARM_BUILTIN_TEXTRMSH:
+    case ARM_BUILTIN_TEXTRMUH:
+    case ARM_BUILTIN_TEXTRMSW:
+    case ARM_BUILTIN_TEXTRMUW:
+      icode = (fcode == ARM_BUILTIN_TEXTRMSB ? CODE_FOR_iwmmxt_textrmsb
+	       : fcode == ARM_BUILTIN_TEXTRMUB ? CODE_FOR_iwmmxt_textrmub
+	       : fcode == ARM_BUILTIN_TEXTRMSH ? CODE_FOR_iwmmxt_textrmsh
+	       : fcode == ARM_BUILTIN_TEXTRMUH ? CODE_FOR_iwmmxt_textrmuh
+	       : CODE_FOR_iwmmxt_textrmw);
+
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      op0 = expand_normal (arg0);
+      op1 = expand_normal (arg1);
+      tmode = insn_data[icode].operand[0].mode;
+      mode0 = insn_data[icode].operand[1].mode;
+      mode1 = insn_data[icode].operand[2].mode;
+
+      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
+	op0 = copy_to_mode_reg (mode0, op0);
+      if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+	{
+	  /* @@@ better error message */
+	  error ("selector must be an immediate");
+	  return gen_reg_rtx (tmode);
+	}
+
+      opint = INTVAL (op1);
+      if (fcode == ARM_BUILTIN_TEXTRMSB || fcode == ARM_BUILTIN_TEXTRMUB)
+	{
+	  if (opint > 7 || opint < 0)
+	    error ("the range of selector should be in 0 to 7");
+	}
+      else if (fcode == ARM_BUILTIN_TEXTRMSH || fcode == ARM_BUILTIN_TEXTRMUH)
+	{
+	  if (opint > 3 || opint < 0)
+	    error ("the range of selector should be in 0 to 3");
+	}
+      else /* ARM_BUILTIN_TEXTRMSW || ARM_BUILTIN_TEXTRMUW.  */
+	{
+	  if (opint > 1 || opint < 0)
+	    error ("the range of selector should be in 0 to 1");
+	}
+
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target, op0, op1);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+
+    case ARM_BUILTIN_WALIGNI:
+      /* If op2 is immediate, call walighi, else call walighr.  */
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      arg2 = CALL_EXPR_ARG (exp, 2);
+      op0 = expand_normal (arg0);
+      op1 = expand_normal (arg1);
+      op2 = expand_normal (arg2);
+      if (CONST_INT_P (op2))
+        {
+	  icode = CODE_FOR_iwmmxt_waligni;
+          tmode = insn_data[icode].operand[0].mode;
+	  mode0 = insn_data[icode].operand[1].mode;
+	  mode1 = insn_data[icode].operand[2].mode;
+	  mode2 = insn_data[icode].operand[3].mode;
+          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
+	    op0 = copy_to_mode_reg (mode0, op0);
+          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
+	    op1 = copy_to_mode_reg (mode1, op1);
+          gcc_assert ((*insn_data[icode].operand[3].predicate) (op2, mode2));
+	  selector = INTVAL (op2);
+	  if (selector > 7 || selector < 0)
+	    error ("the range of selector should be in 0 to 7");
+	}
+      else
+        {
+	  icode = CODE_FOR_iwmmxt_walignr;
+          tmode = insn_data[icode].operand[0].mode;
+	  mode0 = insn_data[icode].operand[1].mode;
+	  mode1 = insn_data[icode].operand[2].mode;
+	  mode2 = insn_data[icode].operand[3].mode;
+          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
+	    op0 = copy_to_mode_reg (mode0, op0);
+          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
+	    op1 = copy_to_mode_reg (mode1, op1);
+          if (!(*insn_data[icode].operand[3].predicate) (op2, mode2))
+	    op2 = copy_to_mode_reg (mode2, op2);
+	}
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target, op0, op1, op2);
+      if (!pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+
+    case ARM_BUILTIN_TINSRB:
+    case ARM_BUILTIN_TINSRH:
+    case ARM_BUILTIN_TINSRW:
+    case ARM_BUILTIN_WMERGE:
+      icode = (fcode == ARM_BUILTIN_TINSRB ? CODE_FOR_iwmmxt_tinsrb
+	       : fcode == ARM_BUILTIN_TINSRH ? CODE_FOR_iwmmxt_tinsrh
+	       : fcode == ARM_BUILTIN_WMERGE ? CODE_FOR_iwmmxt_wmerge
+	       : CODE_FOR_iwmmxt_tinsrw);
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      arg2 = CALL_EXPR_ARG (exp, 2);
+      op0 = expand_normal (arg0);
+      op1 = expand_normal (arg1);
+      op2 = expand_normal (arg2);
+      tmode = insn_data[icode].operand[0].mode;
+      mode0 = insn_data[icode].operand[1].mode;
+      mode1 = insn_data[icode].operand[2].mode;
+      mode2 = insn_data[icode].operand[3].mode;
+
+      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
+	op0 = copy_to_mode_reg (mode0, op0);
+      if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+	op1 = copy_to_mode_reg (mode1, op1);
+      if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
+	{
+	  error ("selector must be an immediate");
+	  return const0_rtx;
+	}
+      if (icode == CODE_FOR_iwmmxt_wmerge)
+	{
+	  selector = INTVAL (op2);
+	  if (selector > 7 || selector < 0)
+	    error ("the range of selector should be in 0 to 7");
+	}
+      if ((icode == CODE_FOR_iwmmxt_tinsrb)
+	  || (icode == CODE_FOR_iwmmxt_tinsrh)
+	  || (icode == CODE_FOR_iwmmxt_tinsrw))
+        {
+	  mask = 0x01;
+	  selector= INTVAL (op2);
+	  if (icode == CODE_FOR_iwmmxt_tinsrb && (selector < 0 || selector > 7))
+	    error ("the range of selector should be in 0 to 7");
+	  else if (icode == CODE_FOR_iwmmxt_tinsrh && (selector < 0 ||selector > 3))
+	    error ("the range of selector should be in 0 to 3");
+	  else if (icode == CODE_FOR_iwmmxt_tinsrw && (selector < 0 ||selector > 1))
+	    error ("the range of selector should be in 0 to 1");
+	  mask <<= selector;
+	  op2 = GEN_INT (mask);
+	}
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target, op0, op1, op2);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+
+    case ARM_BUILTIN_SETWCGR0:
+    case ARM_BUILTIN_SETWCGR1:
+    case ARM_BUILTIN_SETWCGR2:
+    case ARM_BUILTIN_SETWCGR3:
+      icode = (fcode == ARM_BUILTIN_SETWCGR0 ? CODE_FOR_iwmmxt_setwcgr0
+	       : fcode == ARM_BUILTIN_SETWCGR1 ? CODE_FOR_iwmmxt_setwcgr1
+	       : fcode == ARM_BUILTIN_SETWCGR2 ? CODE_FOR_iwmmxt_setwcgr2
+	       : CODE_FOR_iwmmxt_setwcgr3);
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      op0 = expand_normal (arg0);
+      mode0 = insn_data[icode].operand[0].mode;
+      if (!(*insn_data[icode].operand[0].predicate) (op0, mode0))
+        op0 = copy_to_mode_reg (mode0, op0);
+      pat = GEN_FCN (icode) (op0);
+      if (!pat)
+	return 0;
+      emit_insn (pat);
+      return 0;
+
+    case ARM_BUILTIN_GETWCGR0:
+    case ARM_BUILTIN_GETWCGR1:
+    case ARM_BUILTIN_GETWCGR2:
+    case ARM_BUILTIN_GETWCGR3:
+      icode = (fcode == ARM_BUILTIN_GETWCGR0 ? CODE_FOR_iwmmxt_getwcgr0
+	       : fcode == ARM_BUILTIN_GETWCGR1 ? CODE_FOR_iwmmxt_getwcgr1
+	       : fcode == ARM_BUILTIN_GETWCGR2 ? CODE_FOR_iwmmxt_getwcgr2
+	       : CODE_FOR_iwmmxt_getwcgr3);
+      tmode = insn_data[icode].operand[0].mode;
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+        target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target);
+      if (!pat)
+        return 0;
+      emit_insn (pat);
+      return target;
+
+    case ARM_BUILTIN_WSHUFH:
+      icode = CODE_FOR_iwmmxt_wshufh;
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      op0 = expand_normal (arg0);
+      op1 = expand_normal (arg1);
+      tmode = insn_data[icode].operand[0].mode;
+      mode1 = insn_data[icode].operand[1].mode;
+      mode2 = insn_data[icode].operand[2].mode;
+
+      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
+	op0 = copy_to_mode_reg (mode1, op0);
+      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
+	{
+	  error ("mask must be an immediate");
+	  return const0_rtx;
+	}
+      selector = INTVAL (op1);
+      if (selector < 0 || selector > 255)
+	error ("the range of mask should be in 0 to 255");
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target, op0, op1);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+
+    case ARM_BUILTIN_WMADDS:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmadds, exp, target);
+    case ARM_BUILTIN_WMADDSX:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsx, exp, target);
+    case ARM_BUILTIN_WMADDSN:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsn, exp, target);
+    case ARM_BUILTIN_WMADDU:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddu, exp, target);
+    case ARM_BUILTIN_WMADDUX:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddux, exp, target);
+    case ARM_BUILTIN_WMADDUN:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddun, exp, target);
+    case ARM_BUILTIN_WSADBZ:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target);
+    case ARM_BUILTIN_WSADHZ:
+      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadhz, exp, target);
+
+      /* Several three-argument builtins.  */
+    case ARM_BUILTIN_WMACS:
+    case ARM_BUILTIN_WMACU:
+    case ARM_BUILTIN_TMIA:
+    case ARM_BUILTIN_TMIAPH:
+    case ARM_BUILTIN_TMIATT:
+    case ARM_BUILTIN_TMIATB:
+    case ARM_BUILTIN_TMIABT:
+    case ARM_BUILTIN_TMIABB:
+    case ARM_BUILTIN_WQMIABB:
+    case ARM_BUILTIN_WQMIABT:
+    case ARM_BUILTIN_WQMIATB:
+    case ARM_BUILTIN_WQMIATT:
+    case ARM_BUILTIN_WQMIABBN:
+    case ARM_BUILTIN_WQMIABTN:
+    case ARM_BUILTIN_WQMIATBN:
+    case ARM_BUILTIN_WQMIATTN:
+    case ARM_BUILTIN_WMIABB:
+    case ARM_BUILTIN_WMIABT:
+    case ARM_BUILTIN_WMIATB:
+    case ARM_BUILTIN_WMIATT:
+    case ARM_BUILTIN_WMIABBN:
+    case ARM_BUILTIN_WMIABTN:
+    case ARM_BUILTIN_WMIATBN:
+    case ARM_BUILTIN_WMIATTN:
+    case ARM_BUILTIN_WMIAWBB:
+    case ARM_BUILTIN_WMIAWBT:
+    case ARM_BUILTIN_WMIAWTB:
+    case ARM_BUILTIN_WMIAWTT:
+    case ARM_BUILTIN_WMIAWBBN:
+    case ARM_BUILTIN_WMIAWBTN:
+    case ARM_BUILTIN_WMIAWTBN:
+    case ARM_BUILTIN_WMIAWTTN:
+    case ARM_BUILTIN_WSADB:
+    case ARM_BUILTIN_WSADH:
+      icode = (fcode == ARM_BUILTIN_WMACS ? CODE_FOR_iwmmxt_wmacs
+	       : fcode == ARM_BUILTIN_WMACU ? CODE_FOR_iwmmxt_wmacu
+	       : fcode == ARM_BUILTIN_TMIA ? CODE_FOR_iwmmxt_tmia
+	       : fcode == ARM_BUILTIN_TMIAPH ? CODE_FOR_iwmmxt_tmiaph
+	       : fcode == ARM_BUILTIN_TMIABB ? CODE_FOR_iwmmxt_tmiabb
+	       : fcode == ARM_BUILTIN_TMIABT ? CODE_FOR_iwmmxt_tmiabt
+	       : fcode == ARM_BUILTIN_TMIATB ? CODE_FOR_iwmmxt_tmiatb
+	       : fcode == ARM_BUILTIN_TMIATT ? CODE_FOR_iwmmxt_tmiatt
+	       : fcode == ARM_BUILTIN_WQMIABB ? CODE_FOR_iwmmxt_wqmiabb
+	       : fcode == ARM_BUILTIN_WQMIABT ? CODE_FOR_iwmmxt_wqmiabt
+	       : fcode == ARM_BUILTIN_WQMIATB ? CODE_FOR_iwmmxt_wqmiatb
+	       : fcode == ARM_BUILTIN_WQMIATT ? CODE_FOR_iwmmxt_wqmiatt
+	       : fcode == ARM_BUILTIN_WQMIABBN ? CODE_FOR_iwmmxt_wqmiabbn
+	       : fcode == ARM_BUILTIN_WQMIABTN ? CODE_FOR_iwmmxt_wqmiabtn
+	       : fcode == ARM_BUILTIN_WQMIATBN ? CODE_FOR_iwmmxt_wqmiatbn
+	       : fcode == ARM_BUILTIN_WQMIATTN ? CODE_FOR_iwmmxt_wqmiattn
+	       : fcode == ARM_BUILTIN_WMIABB ? CODE_FOR_iwmmxt_wmiabb
+	       : fcode == ARM_BUILTIN_WMIABT ? CODE_FOR_iwmmxt_wmiabt
+	       : fcode == ARM_BUILTIN_WMIATB ? CODE_FOR_iwmmxt_wmiatb
+	       : fcode == ARM_BUILTIN_WMIATT ? CODE_FOR_iwmmxt_wmiatt
+	       : fcode == ARM_BUILTIN_WMIABBN ? CODE_FOR_iwmmxt_wmiabbn
+	       : fcode == ARM_BUILTIN_WMIABTN ? CODE_FOR_iwmmxt_wmiabtn
+	       : fcode == ARM_BUILTIN_WMIATBN ? CODE_FOR_iwmmxt_wmiatbn
+	       : fcode == ARM_BUILTIN_WMIATTN ? CODE_FOR_iwmmxt_wmiattn
+	       : fcode == ARM_BUILTIN_WMIAWBB ? CODE_FOR_iwmmxt_wmiawbb
+	       : fcode == ARM_BUILTIN_WMIAWBT ? CODE_FOR_iwmmxt_wmiawbt
+	       : fcode == ARM_BUILTIN_WMIAWTB ? CODE_FOR_iwmmxt_wmiawtb
+	       : fcode == ARM_BUILTIN_WMIAWTT ? CODE_FOR_iwmmxt_wmiawtt
+	       : fcode == ARM_BUILTIN_WMIAWBBN ? CODE_FOR_iwmmxt_wmiawbbn
+	       : fcode == ARM_BUILTIN_WMIAWBTN ? CODE_FOR_iwmmxt_wmiawbtn
+	       : fcode == ARM_BUILTIN_WMIAWTBN ? CODE_FOR_iwmmxt_wmiawtbn
+	       : fcode == ARM_BUILTIN_WMIAWTTN ? CODE_FOR_iwmmxt_wmiawttn
+	       : fcode == ARM_BUILTIN_WSADB ? CODE_FOR_iwmmxt_wsadb
+	       : CODE_FOR_iwmmxt_wsadh);
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      arg2 = CALL_EXPR_ARG (exp, 2);
+      op0 = expand_normal (arg0);
+      op1 = expand_normal (arg1);
+      op2 = expand_normal (arg2);
+      tmode = insn_data[icode].operand[0].mode;
+      mode0 = insn_data[icode].operand[1].mode;
+      mode1 = insn_data[icode].operand[2].mode;
+      mode2 = insn_data[icode].operand[3].mode;
+
+      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
+	op0 = copy_to_mode_reg (mode0, op0);
+      if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+	op1 = copy_to_mode_reg (mode1, op1);
+      if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
+	op2 = copy_to_mode_reg (mode2, op2);
+      if (target == 0
+	  || GET_MODE (target) != tmode
+	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      pat = GEN_FCN (icode) (target, op0, op1, op2);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+
+    case ARM_BUILTIN_WZERO:
+      target = gen_reg_rtx (DImode);
+      emit_insn (gen_iwmmxt_clrdi (target));
+      return target;
+
+    case ARM_BUILTIN_WSRLHI:
+    case ARM_BUILTIN_WSRLWI:
+    case ARM_BUILTIN_WSRLDI:
+    case ARM_BUILTIN_WSLLHI:
+    case ARM_BUILTIN_WSLLWI:
+    case ARM_BUILTIN_WSLLDI:
+    case ARM_BUILTIN_WSRAHI:
+    case ARM_BUILTIN_WSRAWI:
+    case ARM_BUILTIN_WSRADI:
+    case ARM_BUILTIN_WRORHI:
+    case ARM_BUILTIN_WRORWI:
+    case ARM_BUILTIN_WRORDI:
+    case ARM_BUILTIN_WSRLH:
+    case ARM_BUILTIN_WSRLW:
+    case ARM_BUILTIN_WSRLD:
+    case ARM_BUILTIN_WSLLH:
+    case ARM_BUILTIN_WSLLW:
+    case ARM_BUILTIN_WSLLD:
+    case ARM_BUILTIN_WSRAH:
+    case ARM_BUILTIN_WSRAW:
+    case ARM_BUILTIN_WSRAD:
+    case ARM_BUILTIN_WRORH:
+    case ARM_BUILTIN_WRORW:
+    case ARM_BUILTIN_WRORD:
+      icode = (fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+	       : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+	       : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+	       : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+	       : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+	       : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+	       : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+	       : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+	       : fcode == ARM_BUILTIN_WSLLH  ? CODE_FOR_ashlv4hi3_di
+	       : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+	       : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+	       : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+	       : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+	       : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+	       : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+	       : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+	       : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+	       : CODE_FOR_nothing);
+      arg1 = CALL_EXPR_ARG (exp, 1);
+      op1 = expand_normal (arg1);
+      if (GET_MODE (op1) == VOIDmode)
+	{
+	  imm = INTVAL (op1);
+	  if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI
+	       || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
+	      && (imm < 0 || imm > 32))
+	    {
+	      if (fcode == ARM_BUILTIN_WRORHI)
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WRORWI)
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WRORH)
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi16 in code.");
+	      else
+		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi32 in code.");
+	    }
+	  else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
+		   && (imm < 0 || imm > 64))
+	    {
+	      if (fcode == ARM_BUILTIN_WRORDI)
+		error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_rori_si64 in code.");
+	      else
+		error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_ror_si64 in code.");
+	    }
+	  else if (imm < 0)
+	    {
+	      if (fcode == ARM_BUILTIN_WSRLHI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLWI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLDI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLHI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLWI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLDI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAHI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAWI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRADI)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLH)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLW)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRLD)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLH)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLW)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi32 in code.");
+	      else if (fcode == ARM_BUILTIN_WSLLD)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_si64 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAH)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi16 in code.");
+	      else if (fcode == ARM_BUILTIN_WSRAW)
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi32 in code.");
+	      else
+		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_si64 in code.");
+	    }
+	}
+      return arm_expand_binop_builtin (icode, exp, target);
+
+    default:
+      break;
+    }
+
+  for (i = 0, d = bdesc_2arg; i < ARRAY_SIZE (bdesc_2arg); i++, d++)
+    if (d->code == (const enum arm_builtins) fcode)
+      return arm_expand_binop_builtin (d->icode, exp, target);
+
+  for (i = 0, d = bdesc_1arg; i < ARRAY_SIZE (bdesc_1arg); i++, d++)
+    if (d->code == (const enum arm_builtins) fcode)
+      return arm_expand_unop_builtin (d->icode, exp, target, 0);
+
+  for (i = 0, d = bdesc_3arg; i < ARRAY_SIZE (bdesc_3arg); i++, d++)
+    if (d->code == (const enum arm_builtins) fcode)
+      return arm_expand_ternop_builtin (d->icode, exp, target);
+
+  /* @@@ Should really do something sensible here.  */
+  return NULL_RTX;
+}
+
+tree
+arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
+{
+  machine_mode in_mode, out_mode;
+  int in_n, out_n;
+  bool out_unsigned_p = TYPE_UNSIGNED (type_out);
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+      || TREE_CODE (type_in) != VECTOR_TYPE)
+    return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
+/* ARM_CHECK_BUILTIN_MODE and ARM_FIND_VRINT_VARIANT are used to find the
+   decl of the vectorized builtin for the appropriate vector mode.
+   NULL_TREE is returned if no such builtin is available.  */
+#undef ARM_CHECK_BUILTIN_MODE
+#define ARM_CHECK_BUILTIN_MODE(C)    \
+  (TARGET_NEON && TARGET_FPU_ARMV8   \
+   && flag_unsafe_math_optimizations \
+   && ARM_CHECK_BUILTIN_MODE_1 (C))
+
+#undef ARM_CHECK_BUILTIN_MODE_1
+#define ARM_CHECK_BUILTIN_MODE_1(C) \
+  (out_mode == SFmode && out_n == C \
+   && in_mode == SFmode && in_n == C)
+
+#undef ARM_FIND_VRINT_VARIANT
+#define ARM_FIND_VRINT_VARIANT(N) \
+  (ARM_CHECK_BUILTIN_MODE (2) \
+    ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v2sf, false) \
+    : (ARM_CHECK_BUILTIN_MODE (4) \
+      ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sf, false) \
+      : NULL_TREE))
+
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+    {
+      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
+      switch (fn)
+        {
+          case BUILT_IN_FLOORF:
+            return ARM_FIND_VRINT_VARIANT (vrintm);
+          case BUILT_IN_CEILF:
+            return ARM_FIND_VRINT_VARIANT (vrintp);
+          case BUILT_IN_TRUNCF:
+            return ARM_FIND_VRINT_VARIANT (vrintz);
+          case BUILT_IN_ROUNDF:
+            return ARM_FIND_VRINT_VARIANT (vrinta);
+#undef ARM_CHECK_BUILTIN_MODE_1
+#define ARM_CHECK_BUILTIN_MODE_1(C) \
+  (out_mode == SImode && out_n == C \
+   && in_mode == SFmode && in_n == C)
+
+#define ARM_FIND_VCVT_VARIANT(N) \
+  (ARM_CHECK_BUILTIN_MODE (2) \
+   ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v2sfv2si, false) \
+   : (ARM_CHECK_BUILTIN_MODE (4) \
+     ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sfv4si, false) \
+     : NULL_TREE))
+
+#define ARM_FIND_VCVTU_VARIANT(N) \
+  (ARM_CHECK_BUILTIN_MODE (2) \
+   ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv2sfv2si, false) \
+   : (ARM_CHECK_BUILTIN_MODE (4) \
+     ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv4sfv4si, false) \
+     : NULL_TREE))
+          case BUILT_IN_LROUNDF:
+            return out_unsigned_p
+                     ? ARM_FIND_VCVTU_VARIANT (vcvta)
+                     : ARM_FIND_VCVT_VARIANT (vcvta);
+          case BUILT_IN_LCEILF:
+            return out_unsigned_p
+                     ? ARM_FIND_VCVTU_VARIANT (vcvtp)
+                     : ARM_FIND_VCVT_VARIANT (vcvtp);
+          case BUILT_IN_LFLOORF:
+            return out_unsigned_p
+                     ? ARM_FIND_VCVTU_VARIANT (vcvtm)
+                     : ARM_FIND_VCVT_VARIANT (vcvtm);
+#undef ARM_CHECK_BUILTIN_MODE
+#define ARM_CHECK_BUILTIN_MODE(C, N) \
+  (out_mode == N##mode && out_n == C \
+   && in_mode == N##mode && in_n == C)
+          case BUILT_IN_BSWAP16:
+            if (ARM_CHECK_BUILTIN_MODE (4, HI))
+              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4hi, false);
+            else if (ARM_CHECK_BUILTIN_MODE (8, HI))
+              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv8hi, false);
+            else
+              return NULL_TREE;
+          case BUILT_IN_BSWAP32:
+            if (ARM_CHECK_BUILTIN_MODE (2, SI))
+              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2si, false);
+            else if (ARM_CHECK_BUILTIN_MODE (4, SI))
+              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4si, false);
+            else
+              return NULL_TREE;
+          case BUILT_IN_BSWAP64:
+            if (ARM_CHECK_BUILTIN_MODE (2, DI))
+              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2di, false);
+            else
+              return NULL_TREE;
+	  case BUILT_IN_COPYSIGNF:
+	    if (ARM_CHECK_BUILTIN_MODE (2, SF))
+              return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv2sf, false);
+	    else if (ARM_CHECK_BUILTIN_MODE (4, SF))
+              return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv4sf, false);
+	    else
+	      return NULL_TREE;
+
+          default:
+            return NULL_TREE;
+        }
+    }
+  return NULL_TREE;
+}
+#undef ARM_FIND_VCVT_VARIANT
+#undef ARM_FIND_VCVTU_VARIANT
+#undef ARM_CHECK_BUILTIN_MODE
+#undef ARM_FIND_VRINT_VARIANT
+
+void
+arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
+{
+  const unsigned ARM_FE_INVALID = 1;
+  const unsigned ARM_FE_DIVBYZERO = 2;
+  const unsigned ARM_FE_OVERFLOW = 4;
+  const unsigned ARM_FE_UNDERFLOW = 8;
+  const unsigned ARM_FE_INEXACT = 16;
+  const unsigned HOST_WIDE_INT ARM_FE_ALL_EXCEPT = (ARM_FE_INVALID
+						    | ARM_FE_DIVBYZERO
+						    | ARM_FE_OVERFLOW
+						    | ARM_FE_UNDERFLOW
+						    | ARM_FE_INEXACT);
+  const unsigned HOST_WIDE_INT ARM_FE_EXCEPT_SHIFT = 8;
+  tree fenv_var, get_fpscr, set_fpscr, mask, ld_fenv, masked_fenv;
+  tree new_fenv_var, reload_fenv, restore_fnenv;
+  tree update_call, atomic_feraiseexcept, hold_fnclex;
+
+  if (!TARGET_VFP || !TARGET_HARD_FLOAT)
+    return;
+
+  /* Generate the equivalent of :
+       unsigned int fenv_var;
+       fenv_var = __builtin_arm_get_fpscr ();
+
+       unsigned int masked_fenv;
+       masked_fenv = fenv_var & mask;
+
+       __builtin_arm_set_fpscr (masked_fenv);  */
+
+  fenv_var = create_tmp_var (unsigned_type_node, NULL);
+  get_fpscr = arm_builtin_decls[ARM_BUILTIN_GET_FPSCR];
+  set_fpscr = arm_builtin_decls[ARM_BUILTIN_SET_FPSCR];
+  mask = build_int_cst (unsigned_type_node,
+			~((ARM_FE_ALL_EXCEPT << ARM_FE_EXCEPT_SHIFT)
+			  | ARM_FE_ALL_EXCEPT));
+  ld_fenv = build2 (MODIFY_EXPR, unsigned_type_node,
+		    fenv_var, build_call_expr (get_fpscr, 0));
+  masked_fenv = build2 (BIT_AND_EXPR, unsigned_type_node, fenv_var, mask);
+  hold_fnclex = build_call_expr (set_fpscr, 1, masked_fenv);
+  *hold = build2 (COMPOUND_EXPR, void_type_node,
+		  build2 (COMPOUND_EXPR, void_type_node, masked_fenv, ld_fenv),
+		  hold_fnclex);
+
+  /* Store the value of masked_fenv to clear the exceptions:
+     __builtin_arm_set_fpscr (masked_fenv);  */
+
+  *clear = build_call_expr (set_fpscr, 1, masked_fenv);
+
+  /* Generate the equivalent of :
+       unsigned int new_fenv_var;
+       new_fenv_var = __builtin_arm_get_fpscr ();
+
+       __builtin_arm_set_fpscr (fenv_var);
+
+       __atomic_feraiseexcept (new_fenv_var);  */
+
+  new_fenv_var = create_tmp_var (unsigned_type_node, NULL);
+  reload_fenv = build2 (MODIFY_EXPR, unsigned_type_node, new_fenv_var,
+			build_call_expr (get_fpscr, 0));
+  restore_fnenv = build_call_expr (set_fpscr, 1, fenv_var);
+  atomic_feraiseexcept = builtin_decl_implicit (BUILT_IN_ATOMIC_FERAISEEXCEPT);
+  update_call = build_call_expr (atomic_feraiseexcept, 1,
+				 fold_convert (integer_type_node, new_fenv_var));
+  *update = build2 (COMPOUND_EXPR, void_type_node,
+		    build2 (COMPOUND_EXPR, void_type_node,
+			    reload_fenv, restore_fnenv), update_call);
+}
+
+#include "gt-arm-builtins.h"
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index aa9b1cb..d9149ce 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -41,7 +41,14 @@ extern HOST_WIDE_INT thumb_compute_initial_elimination_offset (unsigned int,
 							       unsigned int);
 extern unsigned int arm_dbx_register_number (unsigned int);
 extern void arm_output_fn_unwind (FILE *, bool);
-  
+
+extern rtx arm_expand_builtin (tree exp, rtx target, rtx subtarget
+			       ATTRIBUTE_UNUSED, enum machine_mode mode
+			       ATTRIBUTE_UNUSED, int ignore ATTRIBUTE_UNUSED);
+extern tree arm_builtin_decl (unsigned code, bool initialize_p
+			      ATTRIBUTE_UNUSED);
+extern void arm_init_builtins (void);
+extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
 
 #ifdef RTX_CODE
 extern bool arm_vector_mode_supported_p (machine_mode);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e338e05..d4157a6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -117,7 +117,6 @@ static int thumb_far_jump_used_p (void);
 static bool thumb_force_lr_save (void);
 static unsigned arm_size_return_regs (void);
 static bool arm_assemble_integer (rtx, unsigned int, int);
-static void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
@@ -193,13 +192,6 @@ static bool arm_rtx_costs (rtx, int, int, int, int *, bool);
 static int arm_address_cost (rtx, machine_mode, addr_space_t, bool);
 static int arm_register_move_cost (machine_mode, reg_class_t, reg_class_t);
 static int arm_memory_move_cost (machine_mode, reg_class_t, bool);
-static void arm_init_builtins (void);
-static void arm_init_iwmmxt_builtins (void);
-static rtx safe_vector_operand (rtx, machine_mode);
-static rtx arm_expand_binop_builtin (enum insn_code, tree, rtx);
-static rtx arm_expand_unop_builtin (enum insn_code, tree, rtx, int);
-static rtx arm_expand_builtin (tree, rtx, rtx, machine_mode, int);
-static tree arm_builtin_decl (unsigned, bool);
 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
@@ -23225,1767 +23217,6 @@ arm_debugger_arg_offset (int value, rtx addr)
   return value;
 }
 \f
-typedef enum {
-  T_V8QI,
-  T_V4HI,
-  T_V4HF,
-  T_V2SI,
-  T_V2SF,
-  T_DI,
-  T_V16QI,
-  T_V8HI,
-  T_V4SI,
-  T_V4SF,
-  T_V2DI,
-  T_TI,
-  T_EI,
-  T_OI,
-  T_MAX		/* Size of enum.  Keep last.  */
-} neon_builtin_type_mode;
-
-#define TYPE_MODE_BIT(X) (1 << (X))
-
-#define TB_DREG (TYPE_MODE_BIT (T_V8QI) | TYPE_MODE_BIT (T_V4HI)	\
-		 | TYPE_MODE_BIT (T_V4HF) | TYPE_MODE_BIT (T_V2SI)	\
-		 | TYPE_MODE_BIT (T_V2SF) | TYPE_MODE_BIT (T_DI))
-#define TB_QREG (TYPE_MODE_BIT (T_V16QI) | TYPE_MODE_BIT (T_V8HI)	\
-		 | TYPE_MODE_BIT (T_V4SI) | TYPE_MODE_BIT (T_V4SF)	\
-		 | TYPE_MODE_BIT (T_V2DI) | TYPE_MODE_BIT (T_TI))
-
-#define v8qi_UP  T_V8QI
-#define v4hi_UP  T_V4HI
-#define v4hf_UP  T_V4HF
-#define v2si_UP  T_V2SI
-#define v2sf_UP  T_V2SF
-#define di_UP    T_DI
-#define v16qi_UP T_V16QI
-#define v8hi_UP  T_V8HI
-#define v4si_UP  T_V4SI
-#define v4sf_UP  T_V4SF
-#define v2di_UP  T_V2DI
-#define ti_UP	 T_TI
-#define ei_UP	 T_EI
-#define oi_UP	 T_OI
-
-#define UP(X) X##_UP
-
-typedef enum {
-  NEON_BINOP,
-  NEON_TERNOP,
-  NEON_UNOP,
-  NEON_BSWAP,
-  NEON_GETLANE,
-  NEON_SETLANE,
-  NEON_CREATE,
-  NEON_RINT,
-  NEON_COPYSIGNF,
-  NEON_DUP,
-  NEON_DUPLANE,
-  NEON_COMBINE,
-  NEON_SPLIT,
-  NEON_LANEMUL,
-  NEON_LANEMULL,
-  NEON_LANEMULH,
-  NEON_LANEMAC,
-  NEON_SCALARMUL,
-  NEON_SCALARMULL,
-  NEON_SCALARMULH,
-  NEON_SCALARMAC,
-  NEON_CONVERT,
-  NEON_FLOAT_WIDEN,
-  NEON_FLOAT_NARROW,
-  NEON_FIXCONV,
-  NEON_SELECT,
-  NEON_REINTERP,
-  NEON_VTBL,
-  NEON_VTBX,
-  NEON_LOAD1,
-  NEON_LOAD1LANE,
-  NEON_STORE1,
-  NEON_STORE1LANE,
-  NEON_LOADSTRUCT,
-  NEON_LOADSTRUCTLANE,
-  NEON_STORESTRUCT,
-  NEON_STORESTRUCTLANE,
-  NEON_LOGICBINOP,
-  NEON_SHIFTINSERT,
-  NEON_SHIFTIMM,
-  NEON_SHIFTACC
-} neon_itype;
-
-typedef struct {
-  const char *name;
-  const neon_itype itype;
-  const neon_builtin_type_mode mode;
-  const enum insn_code code;
-  unsigned int fcode;
-} neon_builtin_datum;
-
-#define CF(N,X) CODE_FOR_neon_##N##X
-
-#define VAR1(T, N, A) \
-  {#N, NEON_##T, UP (A), CF (N, A), 0}
-#define VAR2(T, N, A, B) \
-  VAR1 (T, N, A), \
-  {#N, NEON_##T, UP (B), CF (N, B), 0}
-#define VAR3(T, N, A, B, C) \
-  VAR2 (T, N, A, B), \
-  {#N, NEON_##T, UP (C), CF (N, C), 0}
-#define VAR4(T, N, A, B, C, D) \
-  VAR3 (T, N, A, B, C), \
-  {#N, NEON_##T, UP (D), CF (N, D), 0}
-#define VAR5(T, N, A, B, C, D, E) \
-  VAR4 (T, N, A, B, C, D), \
-  {#N, NEON_##T, UP (E), CF (N, E), 0}
-#define VAR6(T, N, A, B, C, D, E, F) \
-  VAR5 (T, N, A, B, C, D, E), \
-  {#N, NEON_##T, UP (F), CF (N, F), 0}
-#define VAR7(T, N, A, B, C, D, E, F, G) \
-  VAR6 (T, N, A, B, C, D, E, F), \
-  {#N, NEON_##T, UP (G), CF (N, G), 0}
-#define VAR8(T, N, A, B, C, D, E, F, G, H) \
-  VAR7 (T, N, A, B, C, D, E, F, G), \
-  {#N, NEON_##T, UP (H), CF (N, H), 0}
-#define VAR9(T, N, A, B, C, D, E, F, G, H, I) \
-  VAR8 (T, N, A, B, C, D, E, F, G, H), \
-  {#N, NEON_##T, UP (I), CF (N, I), 0}
-#define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
-  VAR9 (T, N, A, B, C, D, E, F, G, H, I), \
-  {#N, NEON_##T, UP (J), CF (N, J), 0}
-
-/* The NEON builtin data can be found in arm_neon_builtins.def.
-   The mode entries in the following table correspond to the "key" type of the
-   instruction variant, i.e. equivalent to that which would be specified after
-   the assembler mnemonic, which usually refers to the last vector operand.
-   (Signed/unsigned/polynomial types are not differentiated between though, and
-   are all mapped onto the same mode for a given element size.) The modes
-   listed per instruction should be the same as those defined for that
-   instruction's pattern in neon.md.  */
-
-static neon_builtin_datum neon_builtin_data[] =
-{
-#include "arm_neon_builtins.def"
-};
-
-#undef CF
-#undef VAR1
-#undef VAR2
-#undef VAR3
-#undef VAR4
-#undef VAR5
-#undef VAR6
-#undef VAR7
-#undef VAR8
-#undef VAR9
-#undef VAR10
-
-#define CF(N,X) ARM_BUILTIN_NEON_##N##X
-#define VAR1(T, N, A) \
-  CF (N, A)
-#define VAR2(T, N, A, B) \
-  VAR1 (T, N, A), \
-  CF (N, B)
-#define VAR3(T, N, A, B, C) \
-  VAR2 (T, N, A, B), \
-  CF (N, C)
-#define VAR4(T, N, A, B, C, D) \
-  VAR3 (T, N, A, B, C), \
-  CF (N, D)
-#define VAR5(T, N, A, B, C, D, E) \
-  VAR4 (T, N, A, B, C, D), \
-  CF (N, E)
-#define VAR6(T, N, A, B, C, D, E, F) \
-  VAR5 (T, N, A, B, C, D, E), \
-  CF (N, F)
-#define VAR7(T, N, A, B, C, D, E, F, G) \
-  VAR6 (T, N, A, B, C, D, E, F), \
-  CF (N, G)
-#define VAR8(T, N, A, B, C, D, E, F, G, H) \
-  VAR7 (T, N, A, B, C, D, E, F, G), \
-  CF (N, H)
-#define VAR9(T, N, A, B, C, D, E, F, G, H, I) \
-  VAR8 (T, N, A, B, C, D, E, F, G, H), \
-  CF (N, I)
-#define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
-  VAR9 (T, N, A, B, C, D, E, F, G, H, I), \
-  CF (N, J)
-enum arm_builtins
-{
-  ARM_BUILTIN_GETWCGR0,
-  ARM_BUILTIN_GETWCGR1,
-  ARM_BUILTIN_GETWCGR2,
-  ARM_BUILTIN_GETWCGR3,
-
-  ARM_BUILTIN_SETWCGR0,
-  ARM_BUILTIN_SETWCGR1,
-  ARM_BUILTIN_SETWCGR2,
-  ARM_BUILTIN_SETWCGR3,
-
-  ARM_BUILTIN_WZERO,
-
-  ARM_BUILTIN_WAVG2BR,
-  ARM_BUILTIN_WAVG2HR,
-  ARM_BUILTIN_WAVG2B,
-  ARM_BUILTIN_WAVG2H,
-
-  ARM_BUILTIN_WACCB,
-  ARM_BUILTIN_WACCH,
-  ARM_BUILTIN_WACCW,
-
-  ARM_BUILTIN_WMACS,
-  ARM_BUILTIN_WMACSZ,
-  ARM_BUILTIN_WMACU,
-  ARM_BUILTIN_WMACUZ,
-
-  ARM_BUILTIN_WSADB,
-  ARM_BUILTIN_WSADBZ,
-  ARM_BUILTIN_WSADH,
-  ARM_BUILTIN_WSADHZ,
-
-  ARM_BUILTIN_WALIGNI,
-  ARM_BUILTIN_WALIGNR0,
-  ARM_BUILTIN_WALIGNR1,
-  ARM_BUILTIN_WALIGNR2,
-  ARM_BUILTIN_WALIGNR3,
-
-  ARM_BUILTIN_TMIA,
-  ARM_BUILTIN_TMIAPH,
-  ARM_BUILTIN_TMIABB,
-  ARM_BUILTIN_TMIABT,
-  ARM_BUILTIN_TMIATB,
-  ARM_BUILTIN_TMIATT,
-
-  ARM_BUILTIN_TMOVMSKB,
-  ARM_BUILTIN_TMOVMSKH,
-  ARM_BUILTIN_TMOVMSKW,
-
-  ARM_BUILTIN_TBCSTB,
-  ARM_BUILTIN_TBCSTH,
-  ARM_BUILTIN_TBCSTW,
-
-  ARM_BUILTIN_WMADDS,
-  ARM_BUILTIN_WMADDU,
-
-  ARM_BUILTIN_WPACKHSS,
-  ARM_BUILTIN_WPACKWSS,
-  ARM_BUILTIN_WPACKDSS,
-  ARM_BUILTIN_WPACKHUS,
-  ARM_BUILTIN_WPACKWUS,
-  ARM_BUILTIN_WPACKDUS,
-
-  ARM_BUILTIN_WADDB,
-  ARM_BUILTIN_WADDH,
-  ARM_BUILTIN_WADDW,
-  ARM_BUILTIN_WADDSSB,
-  ARM_BUILTIN_WADDSSH,
-  ARM_BUILTIN_WADDSSW,
-  ARM_BUILTIN_WADDUSB,
-  ARM_BUILTIN_WADDUSH,
-  ARM_BUILTIN_WADDUSW,
-  ARM_BUILTIN_WSUBB,
-  ARM_BUILTIN_WSUBH,
-  ARM_BUILTIN_WSUBW,
-  ARM_BUILTIN_WSUBSSB,
-  ARM_BUILTIN_WSUBSSH,
-  ARM_BUILTIN_WSUBSSW,
-  ARM_BUILTIN_WSUBUSB,
-  ARM_BUILTIN_WSUBUSH,
-  ARM_BUILTIN_WSUBUSW,
-
-  ARM_BUILTIN_WAND,
-  ARM_BUILTIN_WANDN,
-  ARM_BUILTIN_WOR,
-  ARM_BUILTIN_WXOR,
-
-  ARM_BUILTIN_WCMPEQB,
-  ARM_BUILTIN_WCMPEQH,
-  ARM_BUILTIN_WCMPEQW,
-  ARM_BUILTIN_WCMPGTUB,
-  ARM_BUILTIN_WCMPGTUH,
-  ARM_BUILTIN_WCMPGTUW,
-  ARM_BUILTIN_WCMPGTSB,
-  ARM_BUILTIN_WCMPGTSH,
-  ARM_BUILTIN_WCMPGTSW,
-
-  ARM_BUILTIN_TEXTRMSB,
-  ARM_BUILTIN_TEXTRMSH,
-  ARM_BUILTIN_TEXTRMSW,
-  ARM_BUILTIN_TEXTRMUB,
-  ARM_BUILTIN_TEXTRMUH,
-  ARM_BUILTIN_TEXTRMUW,
-  ARM_BUILTIN_TINSRB,
-  ARM_BUILTIN_TINSRH,
-  ARM_BUILTIN_TINSRW,
-
-  ARM_BUILTIN_WMAXSW,
-  ARM_BUILTIN_WMAXSH,
-  ARM_BUILTIN_WMAXSB,
-  ARM_BUILTIN_WMAXUW,
-  ARM_BUILTIN_WMAXUH,
-  ARM_BUILTIN_WMAXUB,
-  ARM_BUILTIN_WMINSW,
-  ARM_BUILTIN_WMINSH,
-  ARM_BUILTIN_WMINSB,
-  ARM_BUILTIN_WMINUW,
-  ARM_BUILTIN_WMINUH,
-  ARM_BUILTIN_WMINUB,
-
-  ARM_BUILTIN_WMULUM,
-  ARM_BUILTIN_WMULSM,
-  ARM_BUILTIN_WMULUL,
-
-  ARM_BUILTIN_PSADBH,
-  ARM_BUILTIN_WSHUFH,
-
-  ARM_BUILTIN_WSLLH,
-  ARM_BUILTIN_WSLLW,
-  ARM_BUILTIN_WSLLD,
-  ARM_BUILTIN_WSRAH,
-  ARM_BUILTIN_WSRAW,
-  ARM_BUILTIN_WSRAD,
-  ARM_BUILTIN_WSRLH,
-  ARM_BUILTIN_WSRLW,
-  ARM_BUILTIN_WSRLD,
-  ARM_BUILTIN_WRORH,
-  ARM_BUILTIN_WRORW,
-  ARM_BUILTIN_WRORD,
-  ARM_BUILTIN_WSLLHI,
-  ARM_BUILTIN_WSLLWI,
-  ARM_BUILTIN_WSLLDI,
-  ARM_BUILTIN_WSRAHI,
-  ARM_BUILTIN_WSRAWI,
-  ARM_BUILTIN_WSRADI,
-  ARM_BUILTIN_WSRLHI,
-  ARM_BUILTIN_WSRLWI,
-  ARM_BUILTIN_WSRLDI,
-  ARM_BUILTIN_WRORHI,
-  ARM_BUILTIN_WRORWI,
-  ARM_BUILTIN_WRORDI,
-
-  ARM_BUILTIN_WUNPCKIHB,
-  ARM_BUILTIN_WUNPCKIHH,
-  ARM_BUILTIN_WUNPCKIHW,
-  ARM_BUILTIN_WUNPCKILB,
-  ARM_BUILTIN_WUNPCKILH,
-  ARM_BUILTIN_WUNPCKILW,
-
-  ARM_BUILTIN_WUNPCKEHSB,
-  ARM_BUILTIN_WUNPCKEHSH,
-  ARM_BUILTIN_WUNPCKEHSW,
-  ARM_BUILTIN_WUNPCKEHUB,
-  ARM_BUILTIN_WUNPCKEHUH,
-  ARM_BUILTIN_WUNPCKEHUW,
-  ARM_BUILTIN_WUNPCKELSB,
-  ARM_BUILTIN_WUNPCKELSH,
-  ARM_BUILTIN_WUNPCKELSW,
-  ARM_BUILTIN_WUNPCKELUB,
-  ARM_BUILTIN_WUNPCKELUH,
-  ARM_BUILTIN_WUNPCKELUW,
-
-  ARM_BUILTIN_WABSB,
-  ARM_BUILTIN_WABSH,
-  ARM_BUILTIN_WABSW,
-
-  ARM_BUILTIN_WADDSUBHX,
-  ARM_BUILTIN_WSUBADDHX,
-
-  ARM_BUILTIN_WABSDIFFB,
-  ARM_BUILTIN_WABSDIFFH,
-  ARM_BUILTIN_WABSDIFFW,
-
-  ARM_BUILTIN_WADDCH,
-  ARM_BUILTIN_WADDCW,
-
-  ARM_BUILTIN_WAVG4,
-  ARM_BUILTIN_WAVG4R,
-
-  ARM_BUILTIN_WMADDSX,
-  ARM_BUILTIN_WMADDUX,
-
-  ARM_BUILTIN_WMADDSN,
-  ARM_BUILTIN_WMADDUN,
-
-  ARM_BUILTIN_WMULWSM,
-  ARM_BUILTIN_WMULWUM,
-
-  ARM_BUILTIN_WMULWSMR,
-  ARM_BUILTIN_WMULWUMR,
-
-  ARM_BUILTIN_WMULWL,
-
-  ARM_BUILTIN_WMULSMR,
-  ARM_BUILTIN_WMULUMR,
-
-  ARM_BUILTIN_WQMULM,
-  ARM_BUILTIN_WQMULMR,
-
-  ARM_BUILTIN_WQMULWM,
-  ARM_BUILTIN_WQMULWMR,
-
-  ARM_BUILTIN_WADDBHUSM,
-  ARM_BUILTIN_WADDBHUSL,
-
-  ARM_BUILTIN_WQMIABB,
-  ARM_BUILTIN_WQMIABT,
-  ARM_BUILTIN_WQMIATB,
-  ARM_BUILTIN_WQMIATT,
-
-  ARM_BUILTIN_WQMIABBN,
-  ARM_BUILTIN_WQMIABTN,
-  ARM_BUILTIN_WQMIATBN,
-  ARM_BUILTIN_WQMIATTN,
-
-  ARM_BUILTIN_WMIABB,
-  ARM_BUILTIN_WMIABT,
-  ARM_BUILTIN_WMIATB,
-  ARM_BUILTIN_WMIATT,
-
-  ARM_BUILTIN_WMIABBN,
-  ARM_BUILTIN_WMIABTN,
-  ARM_BUILTIN_WMIATBN,
-  ARM_BUILTIN_WMIATTN,
-
-  ARM_BUILTIN_WMIAWBB,
-  ARM_BUILTIN_WMIAWBT,
-  ARM_BUILTIN_WMIAWTB,
-  ARM_BUILTIN_WMIAWTT,
-
-  ARM_BUILTIN_WMIAWBBN,
-  ARM_BUILTIN_WMIAWBTN,
-  ARM_BUILTIN_WMIAWTBN,
-  ARM_BUILTIN_WMIAWTTN,
-
-  ARM_BUILTIN_WMERGE,
-
-  ARM_BUILTIN_CRC32B,
-  ARM_BUILTIN_CRC32H,
-  ARM_BUILTIN_CRC32W,
-  ARM_BUILTIN_CRC32CB,
-  ARM_BUILTIN_CRC32CH,
-  ARM_BUILTIN_CRC32CW,
-
-  ARM_BUILTIN_GET_FPSCR,
-  ARM_BUILTIN_SET_FPSCR,
-
-#undef CRYPTO1
-#undef CRYPTO2
-#undef CRYPTO3
-
-#define CRYPTO1(L, U, M1, M2) \
-  ARM_BUILTIN_CRYPTO_##U,
-#define CRYPTO2(L, U, M1, M2, M3) \
-  ARM_BUILTIN_CRYPTO_##U,
-#define CRYPTO3(L, U, M1, M2, M3, M4) \
-  ARM_BUILTIN_CRYPTO_##U,
-
-#include "crypto.def"
-
-#undef CRYPTO1
-#undef CRYPTO2
-#undef CRYPTO3
-
-#include "arm_neon_builtins.def"
-
-  ,ARM_BUILTIN_MAX
-};
-
-#define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
-
-#undef CF
-#undef VAR1
-#undef VAR2
-#undef VAR3
-#undef VAR4
-#undef VAR5
-#undef VAR6
-#undef VAR7
-#undef VAR8
-#undef VAR9
-#undef VAR10
-
-static GTY(()) tree arm_builtin_decls[ARM_BUILTIN_MAX];
-
-#define NUM_DREG_TYPES 5
-#define NUM_QREG_TYPES 6
-
-static void
-arm_init_neon_builtins (void)
-{
-  unsigned int i, fcode;
-  tree decl;
-
-  tree neon_intQI_type_node;
-  tree neon_intHI_type_node;
-  tree neon_floatHF_type_node;
-  tree neon_polyQI_type_node;
-  tree neon_polyHI_type_node;
-  tree neon_intSI_type_node;
-  tree neon_intDI_type_node;
-  tree neon_intUTI_type_node;
-  tree neon_float_type_node;
-
-  tree intQI_pointer_node;
-  tree intHI_pointer_node;
-  tree intSI_pointer_node;
-  tree intDI_pointer_node;
-  tree float_pointer_node;
-
-  tree const_intQI_node;
-  tree const_intHI_node;
-  tree const_intSI_node;
-  tree const_intDI_node;
-  tree const_float_node;
-
-  tree const_intQI_pointer_node;
-  tree const_intHI_pointer_node;
-  tree const_intSI_pointer_node;
-  tree const_intDI_pointer_node;
-  tree const_float_pointer_node;
-
-  tree V8QI_type_node;
-  tree V4HI_type_node;
-  tree V4UHI_type_node;
-  tree V4HF_type_node;
-  tree V2SI_type_node;
-  tree V2USI_type_node;
-  tree V2SF_type_node;
-  tree V16QI_type_node;
-  tree V8HI_type_node;
-  tree V8UHI_type_node;
-  tree V4SI_type_node;
-  tree V4USI_type_node;
-  tree V4SF_type_node;
-  tree V2DI_type_node;
-  tree V2UDI_type_node;
-
-  tree intUQI_type_node;
-  tree intUHI_type_node;
-  tree intUSI_type_node;
-  tree intUDI_type_node;
-
-  tree intEI_type_node;
-  tree intOI_type_node;
-  tree intCI_type_node;
-  tree intXI_type_node;
-
-  tree reinterp_ftype_dreg[NUM_DREG_TYPES][NUM_DREG_TYPES];
-  tree reinterp_ftype_qreg[NUM_QREG_TYPES][NUM_QREG_TYPES];
-  tree dreg_types[NUM_DREG_TYPES], qreg_types[NUM_QREG_TYPES];
-
-  /* Create distinguished type nodes for NEON vector element types,
-     and pointers to values of such types, so we can detect them later.  */
-  neon_intQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
-  neon_intHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
-  neon_polyQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
-  neon_polyHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
-  neon_intSI_type_node = make_signed_type (GET_MODE_PRECISION (SImode));
-  neon_intDI_type_node = make_signed_type (GET_MODE_PRECISION (DImode));
-  neon_float_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (neon_float_type_node) = FLOAT_TYPE_SIZE;
-  layout_type (neon_float_type_node);
-  neon_floatHF_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (neon_floatHF_type_node) = GET_MODE_PRECISION (HFmode);
-  layout_type (neon_floatHF_type_node);
-
-  /* Define typedefs which exactly correspond to the modes we are basing vector
-     types on.  If you change these names you'll need to change
-     the table used by arm_mangle_type too.  */
-  (*lang_hooks.types.register_builtin_type) (neon_intQI_type_node,
-					     "__builtin_neon_qi");
-  (*lang_hooks.types.register_builtin_type) (neon_intHI_type_node,
-					     "__builtin_neon_hi");
-  (*lang_hooks.types.register_builtin_type) (neon_floatHF_type_node,
-					     "__builtin_neon_hf");
-  (*lang_hooks.types.register_builtin_type) (neon_intSI_type_node,
-					     "__builtin_neon_si");
-  (*lang_hooks.types.register_builtin_type) (neon_float_type_node,
-					     "__builtin_neon_sf");
-  (*lang_hooks.types.register_builtin_type) (neon_intDI_type_node,
-					     "__builtin_neon_di");
-  (*lang_hooks.types.register_builtin_type) (neon_polyQI_type_node,
-					     "__builtin_neon_poly8");
-  (*lang_hooks.types.register_builtin_type) (neon_polyHI_type_node,
-					     "__builtin_neon_poly16");
-
-  intQI_pointer_node = build_pointer_type (neon_intQI_type_node);
-  intHI_pointer_node = build_pointer_type (neon_intHI_type_node);
-  intSI_pointer_node = build_pointer_type (neon_intSI_type_node);
-  intDI_pointer_node = build_pointer_type (neon_intDI_type_node);
-  float_pointer_node = build_pointer_type (neon_float_type_node);
-
-  /* Next create constant-qualified versions of the above types.  */
-  const_intQI_node = build_qualified_type (neon_intQI_type_node,
-					   TYPE_QUAL_CONST);
-  const_intHI_node = build_qualified_type (neon_intHI_type_node,
-					   TYPE_QUAL_CONST);
-  const_intSI_node = build_qualified_type (neon_intSI_type_node,
-					   TYPE_QUAL_CONST);
-  const_intDI_node = build_qualified_type (neon_intDI_type_node,
-					   TYPE_QUAL_CONST);
-  const_float_node = build_qualified_type (neon_float_type_node,
-					   TYPE_QUAL_CONST);
-
-  const_intQI_pointer_node = build_pointer_type (const_intQI_node);
-  const_intHI_pointer_node = build_pointer_type (const_intHI_node);
-  const_intSI_pointer_node = build_pointer_type (const_intSI_node);
-  const_intDI_pointer_node = build_pointer_type (const_intDI_node);
-  const_float_pointer_node = build_pointer_type (const_float_node);
-
-  /* Unsigned integer types for various mode sizes.  */
-  intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
-  intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
-  intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
-  intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
-  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
-  /* Now create vector types based on our NEON element types.  */
-  /* 64-bit vectors.  */
-  V8QI_type_node =
-    build_vector_type_for_mode (neon_intQI_type_node, V8QImode);
-  V4HI_type_node =
-    build_vector_type_for_mode (neon_intHI_type_node, V4HImode);
-  V4UHI_type_node =
-    build_vector_type_for_mode (intUHI_type_node, V4HImode);
-  V4HF_type_node =
-    build_vector_type_for_mode (neon_floatHF_type_node, V4HFmode);
-  V2SI_type_node =
-    build_vector_type_for_mode (neon_intSI_type_node, V2SImode);
-  V2USI_type_node =
-    build_vector_type_for_mode (intUSI_type_node, V2SImode);
-  V2SF_type_node =
-    build_vector_type_for_mode (neon_float_type_node, V2SFmode);
-  /* 128-bit vectors.  */
-  V16QI_type_node =
-    build_vector_type_for_mode (neon_intQI_type_node, V16QImode);
-  V8HI_type_node =
-    build_vector_type_for_mode (neon_intHI_type_node, V8HImode);
-  V8UHI_type_node =
-    build_vector_type_for_mode (intUHI_type_node, V8HImode);
-  V4SI_type_node =
-    build_vector_type_for_mode (neon_intSI_type_node, V4SImode);
-  V4USI_type_node =
-    build_vector_type_for_mode (intUSI_type_node, V4SImode);
-  V4SF_type_node =
-    build_vector_type_for_mode (neon_float_type_node, V4SFmode);
-  V2DI_type_node =
-    build_vector_type_for_mode (neon_intDI_type_node, V2DImode);
-  V2UDI_type_node =
-    build_vector_type_for_mode (intUDI_type_node, V2DImode);
-
-
-  (*lang_hooks.types.register_builtin_type) (intUQI_type_node,
-					     "__builtin_neon_uqi");
-  (*lang_hooks.types.register_builtin_type) (intUHI_type_node,
-					     "__builtin_neon_uhi");
-  (*lang_hooks.types.register_builtin_type) (intUSI_type_node,
-					     "__builtin_neon_usi");
-  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
-					     "__builtin_neon_udi");
-  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
-					     "__builtin_neon_poly64");
-  (*lang_hooks.types.register_builtin_type) (neon_intUTI_type_node,
-					     "__builtin_neon_poly128");
-
-  /* Opaque integer types for structures of vectors.  */
-  intEI_type_node = make_signed_type (GET_MODE_PRECISION (EImode));
-  intOI_type_node = make_signed_type (GET_MODE_PRECISION (OImode));
-  intCI_type_node = make_signed_type (GET_MODE_PRECISION (CImode));
-  intXI_type_node = make_signed_type (GET_MODE_PRECISION (XImode));
-
-  (*lang_hooks.types.register_builtin_type) (intTI_type_node,
-					     "__builtin_neon_ti");
-  (*lang_hooks.types.register_builtin_type) (intEI_type_node,
-					     "__builtin_neon_ei");
-  (*lang_hooks.types.register_builtin_type) (intOI_type_node,
-					     "__builtin_neon_oi");
-  (*lang_hooks.types.register_builtin_type) (intCI_type_node,
-					     "__builtin_neon_ci");
-  (*lang_hooks.types.register_builtin_type) (intXI_type_node,
-					     "__builtin_neon_xi");
-
-  if (TARGET_CRYPTO && TARGET_HARD_FLOAT)
-  {
-
-    tree V16UQI_type_node =
-      build_vector_type_for_mode (intUQI_type_node, V16QImode);
-
-    tree v16uqi_ftype_v16uqi
-      = build_function_type_list (V16UQI_type_node, V16UQI_type_node, NULL_TREE);
-
-    tree v16uqi_ftype_v16uqi_v16uqi
-      = build_function_type_list (V16UQI_type_node, V16UQI_type_node,
-                                  V16UQI_type_node, NULL_TREE);
-
-    tree v4usi_ftype_v4usi
-      = build_function_type_list (V4USI_type_node, V4USI_type_node, NULL_TREE);
-
-    tree v4usi_ftype_v4usi_v4usi
-      = build_function_type_list (V4USI_type_node, V4USI_type_node,
-                                  V4USI_type_node, NULL_TREE);
-
-    tree v4usi_ftype_v4usi_v4usi_v4usi
-      = build_function_type_list (V4USI_type_node, V4USI_type_node,
-                                  V4USI_type_node, V4USI_type_node, NULL_TREE);
-
-    tree uti_ftype_udi_udi
-      = build_function_type_list (neon_intUTI_type_node, intUDI_type_node,
-                                  intUDI_type_node, NULL_TREE);
-
-    #undef CRYPTO1
-    #undef CRYPTO2
-    #undef CRYPTO3
-    #undef C
-    #undef N
-    #undef CF
-    #undef FT1
-    #undef FT2
-    #undef FT3
-
-    #define C(U) \
-      ARM_BUILTIN_CRYPTO_##U
-    #define N(L) \
-      "__builtin_arm_crypto_"#L
-    #define FT1(R, A) \
-      R##_ftype_##A
-    #define FT2(R, A1, A2) \
-      R##_ftype_##A1##_##A2
-    #define FT3(R, A1, A2, A3) \
-      R##_ftype_##A1##_##A2##_##A3
-    #define CRYPTO1(L, U, R, A) \
-      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT1 (R, A), \
-                                                       C (U), BUILT_IN_MD, \
-                                                       NULL, NULL_TREE);
-    #define CRYPTO2(L, U, R, A1, A2) \
-      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT2 (R, A1, A2), \
-                                                       C (U), BUILT_IN_MD, \
-                                                       NULL, NULL_TREE);
-
-    #define CRYPTO3(L, U, R, A1, A2, A3) \
-      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT3 (R, A1, A2, A3), \
-                                                       C (U), BUILT_IN_MD, \
-                                                       NULL, NULL_TREE);
-    #include "crypto.def"
-
-    #undef CRYPTO1
-    #undef CRYPTO2
-    #undef CRYPTO3
-    #undef C
-    #undef N
-    #undef FT1
-    #undef FT2
-    #undef FT3
-  }
-  dreg_types[0] = V8QI_type_node;
-  dreg_types[1] = V4HI_type_node;
-  dreg_types[2] = V2SI_type_node;
-  dreg_types[3] = V2SF_type_node;
-  dreg_types[4] = neon_intDI_type_node;
-
-  qreg_types[0] = V16QI_type_node;
-  qreg_types[1] = V8HI_type_node;
-  qreg_types[2] = V4SI_type_node;
-  qreg_types[3] = V4SF_type_node;
-  qreg_types[4] = V2DI_type_node;
-  qreg_types[5] = neon_intUTI_type_node;
-
-  for (i = 0; i < NUM_QREG_TYPES; i++)
-    {
-      int j;
-      for (j = 0; j < NUM_QREG_TYPES; j++)
-        {
-          if (i < NUM_DREG_TYPES && j < NUM_DREG_TYPES)
-            reinterp_ftype_dreg[i][j]
-              = build_function_type_list (dreg_types[i], dreg_types[j], NULL);
-
-          reinterp_ftype_qreg[i][j]
-            = build_function_type_list (qreg_types[i], qreg_types[j], NULL);
-        }
-    }
-
-  for (i = 0, fcode = ARM_BUILTIN_NEON_BASE;
-       i < ARRAY_SIZE (neon_builtin_data);
-       i++, fcode++)
-    {
-      neon_builtin_datum *d = &neon_builtin_data[i];
-
-      const char* const modenames[] = {
-	"v8qi", "v4hi", "v4hf", "v2si", "v2sf", "di",
-	"v16qi", "v8hi", "v4si", "v4sf", "v2di",
-	"ti", "ei", "oi"
-      };
-      char namebuf[60];
-      tree ftype = NULL;
-      int is_load = 0, is_store = 0;
-
-      gcc_assert (ARRAY_SIZE (modenames) == T_MAX);
-
-      d->fcode = fcode;
-
-      switch (d->itype)
-	{
-	case NEON_LOAD1:
-	case NEON_LOAD1LANE:
-	case NEON_LOADSTRUCT:
-	case NEON_LOADSTRUCTLANE:
-	  is_load = 1;
-	  /* Fall through.  */
-	case NEON_STORE1:
-	case NEON_STORE1LANE:
-	case NEON_STORESTRUCT:
-	case NEON_STORESTRUCTLANE:
-	  if (!is_load)
-	    is_store = 1;
-	  /* Fall through.  */
-	case NEON_UNOP:
-	case NEON_RINT:
-	case NEON_BINOP:
-	case NEON_LOGICBINOP:
-	case NEON_SHIFTINSERT:
-	case NEON_TERNOP:
-	case NEON_GETLANE:
-	case NEON_SETLANE:
-	case NEON_CREATE:
-	case NEON_DUP:
-	case NEON_DUPLANE:
-	case NEON_SHIFTIMM:
-	case NEON_SHIFTACC:
-	case NEON_COMBINE:
-	case NEON_SPLIT:
-	case NEON_CONVERT:
-	case NEON_FIXCONV:
-	case NEON_LANEMUL:
-	case NEON_LANEMULL:
-	case NEON_LANEMULH:
-	case NEON_LANEMAC:
-	case NEON_SCALARMUL:
-	case NEON_SCALARMULL:
-	case NEON_SCALARMULH:
-	case NEON_SCALARMAC:
-	case NEON_SELECT:
-	case NEON_VTBL:
-	case NEON_VTBX:
-	  {
-	    int k;
-	    tree return_type = void_type_node, args = void_list_node;
-
-	    /* Build a function type directly from the insn_data for
-	       this builtin.  The build_function_type() function takes
-	       care of removing duplicates for us.  */
-	    for (k = insn_data[d->code].n_generator_args - 1; k >= 0; k--)
-	      {
-		tree eltype;
-
-		if (is_load && k == 1)
-		  {
-		    /* Neon load patterns always have the memory
-		       operand in the operand 1 position.  */
-		    gcc_assert (insn_data[d->code].operand[k].predicate
-				== neon_struct_operand);
-
-		    switch (d->mode)
-		      {
-		      case T_V8QI:
-		      case T_V16QI:
-			eltype = const_intQI_pointer_node;
-			break;
-
-		      case T_V4HI:
-		      case T_V8HI:
-			eltype = const_intHI_pointer_node;
-			break;
-
-		      case T_V2SI:
-		      case T_V4SI:
-			eltype = const_intSI_pointer_node;
-			break;
-
-		      case T_V2SF:
-		      case T_V4SF:
-			eltype = const_float_pointer_node;
-			break;
-
-		      case T_DI:
-		      case T_V2DI:
-			eltype = const_intDI_pointer_node;
-			break;
-
-		      default: gcc_unreachable ();
-		      }
-		  }
-		else if (is_store && k == 0)
-		  {
-		    /* Similarly, Neon store patterns use operand 0 as
-		       the memory location to store to.  */
-		    gcc_assert (insn_data[d->code].operand[k].predicate
-				== neon_struct_operand);
-
-		    switch (d->mode)
-		      {
-		      case T_V8QI:
-		      case T_V16QI:
-			eltype = intQI_pointer_node;
-			break;
-
-		      case T_V4HI:
-		      case T_V8HI:
-			eltype = intHI_pointer_node;
-			break;
-
-		      case T_V2SI:
-		      case T_V4SI:
-			eltype = intSI_pointer_node;
-			break;
-
-		      case T_V2SF:
-		      case T_V4SF:
-			eltype = float_pointer_node;
-			break;
-
-		      case T_DI:
-		      case T_V2DI:
-			eltype = intDI_pointer_node;
-			break;
-
-		      default: gcc_unreachable ();
-		      }
-		  }
-		else
-		  {
-		    switch (insn_data[d->code].operand[k].mode)
-		      {
-		      case VOIDmode: eltype = void_type_node; break;
-			/* Scalars.  */
-		      case QImode: eltype = neon_intQI_type_node; break;
-		      case HImode: eltype = neon_intHI_type_node; break;
-		      case SImode: eltype = neon_intSI_type_node; break;
-		      case SFmode: eltype = neon_float_type_node; break;
-		      case DImode: eltype = neon_intDI_type_node; break;
-		      case TImode: eltype = intTI_type_node; break;
-		      case EImode: eltype = intEI_type_node; break;
-		      case OImode: eltype = intOI_type_node; break;
-		      case CImode: eltype = intCI_type_node; break;
-		      case XImode: eltype = intXI_type_node; break;
-			/* 64-bit vectors.  */
-		      case V8QImode: eltype = V8QI_type_node; break;
-		      case V4HImode: eltype = V4HI_type_node; break;
-		      case V2SImode: eltype = V2SI_type_node; break;
-		      case V2SFmode: eltype = V2SF_type_node; break;
-			/* 128-bit vectors.  */
-		      case V16QImode: eltype = V16QI_type_node; break;
-		      case V8HImode: eltype = V8HI_type_node; break;
-		      case V4SImode: eltype = V4SI_type_node; break;
-		      case V4SFmode: eltype = V4SF_type_node; break;
-		      case V2DImode: eltype = V2DI_type_node; break;
-		      default: gcc_unreachable ();
-		      }
-		  }
-
-		if (k == 0 && !is_store)
-		  return_type = eltype;
-		else
-		  args = tree_cons (NULL_TREE, eltype, args);
-	      }
-
-	    ftype = build_function_type (return_type, args);
-	  }
-	  break;
-
-	case NEON_REINTERP:
-	  {
-	    /* We iterate over NUM_DREG_TYPES doubleword types,
-	       then NUM_QREG_TYPES quadword  types.
-	       V4HF is not a type used in reinterpret, so we translate
-	       d->mode to the correct index in reinterp_ftype_dreg.  */
-	    bool qreg_p
-	      = GET_MODE_SIZE (insn_data[d->code].operand[0].mode) > 8;
-	    int rhs = (d->mode - ((!qreg_p && (d->mode > T_V4HF)) ? 1 : 0))
-	              % NUM_QREG_TYPES;
-	    switch (insn_data[d->code].operand[0].mode)
-	      {
-	      case V8QImode: ftype = reinterp_ftype_dreg[0][rhs]; break;
-	      case V4HImode: ftype = reinterp_ftype_dreg[1][rhs]; break;
-	      case V2SImode: ftype = reinterp_ftype_dreg[2][rhs]; break;
-	      case V2SFmode: ftype = reinterp_ftype_dreg[3][rhs]; break;
-	      case DImode: ftype = reinterp_ftype_dreg[4][rhs]; break;
-	      case V16QImode: ftype = reinterp_ftype_qreg[0][rhs]; break;
-	      case V8HImode: ftype = reinterp_ftype_qreg[1][rhs]; break;
-	      case V4SImode: ftype = reinterp_ftype_qreg[2][rhs]; break;
-	      case V4SFmode: ftype = reinterp_ftype_qreg[3][rhs]; break;
-	      case V2DImode: ftype = reinterp_ftype_qreg[4][rhs]; break;
-	      case TImode: ftype = reinterp_ftype_qreg[5][rhs]; break;
-	      default: gcc_unreachable ();
-	      }
-	  }
-	  break;
-	case NEON_FLOAT_WIDEN:
-	  {
-	    tree eltype = NULL_TREE;
-	    tree return_type = NULL_TREE;
-
-	    switch (insn_data[d->code].operand[1].mode)
-	    {
-	      case V4HFmode:
-	        eltype = V4HF_type_node;
-	        return_type = V4SF_type_node;
-	        break;
-	      default: gcc_unreachable ();
-	    }
-	    ftype = build_function_type_list (return_type, eltype, NULL);
-	    break;
-	  }
-	case NEON_FLOAT_NARROW:
-	  {
-	    tree eltype = NULL_TREE;
-	    tree return_type = NULL_TREE;
-
-	    switch (insn_data[d->code].operand[1].mode)
-	    {
-	      case V4SFmode:
-	        eltype = V4SF_type_node;
-	        return_type = V4HF_type_node;
-	        break;
-	      default: gcc_unreachable ();
-	    }
-	    ftype = build_function_type_list (return_type, eltype, NULL);
-	    break;
-	  }
-	case NEON_BSWAP:
-	{
-	    tree eltype = NULL_TREE;
-	    switch (insn_data[d->code].operand[1].mode)
-	    {
-	      case V4HImode:
-	        eltype = V4UHI_type_node;
-	        break;
-	      case V8HImode:
-	        eltype = V8UHI_type_node;
-	        break;
-	      case V2SImode:
-	        eltype = V2USI_type_node;
-	        break;
-	      case V4SImode:
-	        eltype = V4USI_type_node;
-	        break;
-	      case V2DImode:
-	        eltype = V2UDI_type_node;
-	        break;
-	      default: gcc_unreachable ();
-	    }
-	    ftype = build_function_type_list (eltype, eltype, NULL);
-	    break;
-	}
-	case NEON_COPYSIGNF:
-	  {
-	    tree eltype = NULL_TREE;
-	    switch (insn_data[d->code].operand[1].mode)
-	      {
-	      case V2SFmode:
-		eltype = V2SF_type_node;
-		break;
-	      case V4SFmode:
-		eltype = V4SF_type_node;
-		break;
-	      default: gcc_unreachable ();
-	      }
-	    ftype = build_function_type_list (eltype, eltype, NULL);
-	    break;
-	  }
-	default:
-	  gcc_unreachable ();
-	}
-
-      gcc_assert (ftype != NULL);
-
-      sprintf (namebuf, "__builtin_neon_%s%s", d->name, modenames[d->mode]);
-
-      decl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD, NULL,
-				   NULL_TREE);
-      arm_builtin_decls[fcode] = decl;
-    }
-}
-
-#undef NUM_DREG_TYPES
-#undef NUM_QREG_TYPES
-
-#define def_mbuiltin(MASK, NAME, TYPE, CODE)				\
-  do									\
-    {									\
-      if ((MASK) & insn_flags)						\
-	{								\
-	  tree bdecl;							\
-	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
-					BUILT_IN_MD, NULL, NULL_TREE);	\
-	  arm_builtin_decls[CODE] = bdecl;				\
-	}								\
-    }									\
-  while (0)
-
-struct builtin_description
-{
-  const unsigned int       mask;
-  const enum insn_code     icode;
-  const char * const       name;
-  const enum arm_builtins  code;
-  const enum rtx_code      comparison;
-  const unsigned int       flag;
-};
-
-static const struct builtin_description bdesc_2arg[] =
-{
-#define IWMMXT_BUILTIN(code, string, builtin) \
-  { FL_IWMMXT, CODE_FOR_##code, "__builtin_arm_" string, \
-    ARM_BUILTIN_##builtin, UNKNOWN, 0 },
-
-#define IWMMXT2_BUILTIN(code, string, builtin) \
-  { FL_IWMMXT2, CODE_FOR_##code, "__builtin_arm_" string, \
-    ARM_BUILTIN_##builtin, UNKNOWN, 0 },
-
-  IWMMXT_BUILTIN (addv8qi3, "waddb", WADDB)
-  IWMMXT_BUILTIN (addv4hi3, "waddh", WADDH)
-  IWMMXT_BUILTIN (addv2si3, "waddw", WADDW)
-  IWMMXT_BUILTIN (subv8qi3, "wsubb", WSUBB)
-  IWMMXT_BUILTIN (subv4hi3, "wsubh", WSUBH)
-  IWMMXT_BUILTIN (subv2si3, "wsubw", WSUBW)
-  IWMMXT_BUILTIN (ssaddv8qi3, "waddbss", WADDSSB)
-  IWMMXT_BUILTIN (ssaddv4hi3, "waddhss", WADDSSH)
-  IWMMXT_BUILTIN (ssaddv2si3, "waddwss", WADDSSW)
-  IWMMXT_BUILTIN (sssubv8qi3, "wsubbss", WSUBSSB)
-  IWMMXT_BUILTIN (sssubv4hi3, "wsubhss", WSUBSSH)
-  IWMMXT_BUILTIN (sssubv2si3, "wsubwss", WSUBSSW)
-  IWMMXT_BUILTIN (usaddv8qi3, "waddbus", WADDUSB)
-  IWMMXT_BUILTIN (usaddv4hi3, "waddhus", WADDUSH)
-  IWMMXT_BUILTIN (usaddv2si3, "waddwus", WADDUSW)
-  IWMMXT_BUILTIN (ussubv8qi3, "wsubbus", WSUBUSB)
-  IWMMXT_BUILTIN (ussubv4hi3, "wsubhus", WSUBUSH)
-  IWMMXT_BUILTIN (ussubv2si3, "wsubwus", WSUBUSW)
-  IWMMXT_BUILTIN (mulv4hi3, "wmulul", WMULUL)
-  IWMMXT_BUILTIN (smulv4hi3_highpart, "wmulsm", WMULSM)
-  IWMMXT_BUILTIN (umulv4hi3_highpart, "wmulum", WMULUM)
-  IWMMXT_BUILTIN (eqv8qi3, "wcmpeqb", WCMPEQB)
-  IWMMXT_BUILTIN (eqv4hi3, "wcmpeqh", WCMPEQH)
-  IWMMXT_BUILTIN (eqv2si3, "wcmpeqw", WCMPEQW)
-  IWMMXT_BUILTIN (gtuv8qi3, "wcmpgtub", WCMPGTUB)
-  IWMMXT_BUILTIN (gtuv4hi3, "wcmpgtuh", WCMPGTUH)
-  IWMMXT_BUILTIN (gtuv2si3, "wcmpgtuw", WCMPGTUW)
-  IWMMXT_BUILTIN (gtv8qi3, "wcmpgtsb", WCMPGTSB)
-  IWMMXT_BUILTIN (gtv4hi3, "wcmpgtsh", WCMPGTSH)
-  IWMMXT_BUILTIN (gtv2si3, "wcmpgtsw", WCMPGTSW)
-  IWMMXT_BUILTIN (umaxv8qi3, "wmaxub", WMAXUB)
-  IWMMXT_BUILTIN (smaxv8qi3, "wmaxsb", WMAXSB)
-  IWMMXT_BUILTIN (umaxv4hi3, "wmaxuh", WMAXUH)
-  IWMMXT_BUILTIN (smaxv4hi3, "wmaxsh", WMAXSH)
-  IWMMXT_BUILTIN (umaxv2si3, "wmaxuw", WMAXUW)
-  IWMMXT_BUILTIN (smaxv2si3, "wmaxsw", WMAXSW)
-  IWMMXT_BUILTIN (uminv8qi3, "wminub", WMINUB)
-  IWMMXT_BUILTIN (sminv8qi3, "wminsb", WMINSB)
-  IWMMXT_BUILTIN (uminv4hi3, "wminuh", WMINUH)
-  IWMMXT_BUILTIN (sminv4hi3, "wminsh", WMINSH)
-  IWMMXT_BUILTIN (uminv2si3, "wminuw", WMINUW)
-  IWMMXT_BUILTIN (sminv2si3, "wminsw", WMINSW)
-  IWMMXT_BUILTIN (iwmmxt_anddi3, "wand", WAND)
-  IWMMXT_BUILTIN (iwmmxt_nanddi3, "wandn", WANDN)
-  IWMMXT_BUILTIN (iwmmxt_iordi3, "wor", WOR)
-  IWMMXT_BUILTIN (iwmmxt_xordi3, "wxor", WXOR)
-  IWMMXT_BUILTIN (iwmmxt_uavgv8qi3, "wavg2b", WAVG2B)
-  IWMMXT_BUILTIN (iwmmxt_uavgv4hi3, "wavg2h", WAVG2H)
-  IWMMXT_BUILTIN (iwmmxt_uavgrndv8qi3, "wavg2br", WAVG2BR)
-  IWMMXT_BUILTIN (iwmmxt_uavgrndv4hi3, "wavg2hr", WAVG2HR)
-  IWMMXT_BUILTIN (iwmmxt_wunpckilb, "wunpckilb", WUNPCKILB)
-  IWMMXT_BUILTIN (iwmmxt_wunpckilh, "wunpckilh", WUNPCKILH)
-  IWMMXT_BUILTIN (iwmmxt_wunpckilw, "wunpckilw", WUNPCKILW)
-  IWMMXT_BUILTIN (iwmmxt_wunpckihb, "wunpckihb", WUNPCKIHB)
-  IWMMXT_BUILTIN (iwmmxt_wunpckihh, "wunpckihh", WUNPCKIHH)
-  IWMMXT_BUILTIN (iwmmxt_wunpckihw, "wunpckihw", WUNPCKIHW)
-  IWMMXT2_BUILTIN (iwmmxt_waddsubhx, "waddsubhx", WADDSUBHX)
-  IWMMXT2_BUILTIN (iwmmxt_wsubaddhx, "wsubaddhx", WSUBADDHX)
-  IWMMXT2_BUILTIN (iwmmxt_wabsdiffb, "wabsdiffb", WABSDIFFB)
-  IWMMXT2_BUILTIN (iwmmxt_wabsdiffh, "wabsdiffh", WABSDIFFH)
-  IWMMXT2_BUILTIN (iwmmxt_wabsdiffw, "wabsdiffw", WABSDIFFW)
-  IWMMXT2_BUILTIN (iwmmxt_avg4, "wavg4", WAVG4)
-  IWMMXT2_BUILTIN (iwmmxt_avg4r, "wavg4r", WAVG4R)
-  IWMMXT2_BUILTIN (iwmmxt_wmulwsm, "wmulwsm", WMULWSM)
-  IWMMXT2_BUILTIN (iwmmxt_wmulwum, "wmulwum", WMULWUM)
-  IWMMXT2_BUILTIN (iwmmxt_wmulwsmr, "wmulwsmr", WMULWSMR)
-  IWMMXT2_BUILTIN (iwmmxt_wmulwumr, "wmulwumr", WMULWUMR)
-  IWMMXT2_BUILTIN (iwmmxt_wmulwl, "wmulwl", WMULWL)
-  IWMMXT2_BUILTIN (iwmmxt_wmulsmr, "wmulsmr", WMULSMR)
-  IWMMXT2_BUILTIN (iwmmxt_wmulumr, "wmulumr", WMULUMR)
-  IWMMXT2_BUILTIN (iwmmxt_wqmulm, "wqmulm", WQMULM)
-  IWMMXT2_BUILTIN (iwmmxt_wqmulmr, "wqmulmr", WQMULMR)
-  IWMMXT2_BUILTIN (iwmmxt_wqmulwm, "wqmulwm", WQMULWM)
-  IWMMXT2_BUILTIN (iwmmxt_wqmulwmr, "wqmulwmr", WQMULWMR)
-  IWMMXT_BUILTIN (iwmmxt_walignr0, "walignr0", WALIGNR0)
-  IWMMXT_BUILTIN (iwmmxt_walignr1, "walignr1", WALIGNR1)
-  IWMMXT_BUILTIN (iwmmxt_walignr2, "walignr2", WALIGNR2)
-  IWMMXT_BUILTIN (iwmmxt_walignr3, "walignr3", WALIGNR3)
-
-#define IWMMXT_BUILTIN2(code, builtin) \
-  { FL_IWMMXT, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
-
-#define IWMMXT2_BUILTIN2(code, builtin) \
-  { FL_IWMMXT2, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
-
-  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusm, WADDBHUSM)
-  IWMMXT2_BUILTIN2 (iwmmxt_waddbhusl, WADDBHUSL)
-  IWMMXT_BUILTIN2 (iwmmxt_wpackhss, WPACKHSS)
-  IWMMXT_BUILTIN2 (iwmmxt_wpackwss, WPACKWSS)
-  IWMMXT_BUILTIN2 (iwmmxt_wpackdss, WPACKDSS)
-  IWMMXT_BUILTIN2 (iwmmxt_wpackhus, WPACKHUS)
-  IWMMXT_BUILTIN2 (iwmmxt_wpackwus, WPACKWUS)
-  IWMMXT_BUILTIN2 (iwmmxt_wpackdus, WPACKDUS)
-  IWMMXT_BUILTIN2 (iwmmxt_wmacuz, WMACUZ)
-  IWMMXT_BUILTIN2 (iwmmxt_wmacsz, WMACSZ)
-
-
-#define FP_BUILTIN(L, U) \
-  {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
-   UNKNOWN, 0},
-
-  FP_BUILTIN (get_fpscr, GET_FPSCR)
-  FP_BUILTIN (set_fpscr, SET_FPSCR)
-#undef FP_BUILTIN
-
-#define CRC32_BUILTIN(L, U) \
-  {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
-   UNKNOWN, 0},
-   CRC32_BUILTIN (crc32b, CRC32B)
-   CRC32_BUILTIN (crc32h, CRC32H)
-   CRC32_BUILTIN (crc32w, CRC32W)
-   CRC32_BUILTIN (crc32cb, CRC32CB)
-   CRC32_BUILTIN (crc32ch, CRC32CH)
-   CRC32_BUILTIN (crc32cw, CRC32CW)
-#undef CRC32_BUILTIN
-
-
-#define CRYPTO_BUILTIN(L, U) \
-  {0, CODE_FOR_crypto_##L, "__builtin_arm_crypto_"#L, ARM_BUILTIN_CRYPTO_##U, \
-   UNKNOWN, 0},
-#undef CRYPTO1
-#undef CRYPTO2
-#undef CRYPTO3
-#define CRYPTO2(L, U, R, A1, A2) CRYPTO_BUILTIN (L, U)
-#define CRYPTO1(L, U, R, A)
-#define CRYPTO3(L, U, R, A1, A2, A3)
-#include "crypto.def"
-#undef CRYPTO1
-#undef CRYPTO2
-#undef CRYPTO3
-
-};
-
-static const struct builtin_description bdesc_1arg[] =
-{
-  IWMMXT_BUILTIN (iwmmxt_tmovmskb, "tmovmskb", TMOVMSKB)
-  IWMMXT_BUILTIN (iwmmxt_tmovmskh, "tmovmskh", TMOVMSKH)
-  IWMMXT_BUILTIN (iwmmxt_tmovmskw, "tmovmskw", TMOVMSKW)
-  IWMMXT_BUILTIN (iwmmxt_waccb, "waccb", WACCB)
-  IWMMXT_BUILTIN (iwmmxt_wacch, "wacch", WACCH)
-  IWMMXT_BUILTIN (iwmmxt_waccw, "waccw", WACCW)
-  IWMMXT_BUILTIN (iwmmxt_wunpckehub, "wunpckehub", WUNPCKEHUB)
-  IWMMXT_BUILTIN (iwmmxt_wunpckehuh, "wunpckehuh", WUNPCKEHUH)
-  IWMMXT_BUILTIN (iwmmxt_wunpckehuw, "wunpckehuw", WUNPCKEHUW)
-  IWMMXT_BUILTIN (iwmmxt_wunpckehsb, "wunpckehsb", WUNPCKEHSB)
-  IWMMXT_BUILTIN (iwmmxt_wunpckehsh, "wunpckehsh", WUNPCKEHSH)
-  IWMMXT_BUILTIN (iwmmxt_wunpckehsw, "wunpckehsw", WUNPCKEHSW)
-  IWMMXT_BUILTIN (iwmmxt_wunpckelub, "wunpckelub", WUNPCKELUB)
-  IWMMXT_BUILTIN (iwmmxt_wunpckeluh, "wunpckeluh", WUNPCKELUH)
-  IWMMXT_BUILTIN (iwmmxt_wunpckeluw, "wunpckeluw", WUNPCKELUW)
-  IWMMXT_BUILTIN (iwmmxt_wunpckelsb, "wunpckelsb", WUNPCKELSB)
-  IWMMXT_BUILTIN (iwmmxt_wunpckelsh, "wunpckelsh", WUNPCKELSH)
-  IWMMXT_BUILTIN (iwmmxt_wunpckelsw, "wunpckelsw", WUNPCKELSW)
-  IWMMXT2_BUILTIN (iwmmxt_wabsv8qi3, "wabsb", WABSB)
-  IWMMXT2_BUILTIN (iwmmxt_wabsv4hi3, "wabsh", WABSH)
-  IWMMXT2_BUILTIN (iwmmxt_wabsv2si3, "wabsw", WABSW)
-  IWMMXT_BUILTIN (tbcstv8qi, "tbcstb", TBCSTB)
-  IWMMXT_BUILTIN (tbcstv4hi, "tbcsth", TBCSTH)
-  IWMMXT_BUILTIN (tbcstv2si, "tbcstw", TBCSTW)
-
-#define CRYPTO1(L, U, R, A) CRYPTO_BUILTIN (L, U)
-#define CRYPTO2(L, U, R, A1, A2)
-#define CRYPTO3(L, U, R, A1, A2, A3)
-#include "crypto.def"
-#undef CRYPTO1
-#undef CRYPTO2
-#undef CRYPTO3
-};
-
-static const struct builtin_description bdesc_3arg[] =
-{
-#define CRYPTO3(L, U, R, A1, A2, A3) CRYPTO_BUILTIN (L, U)
-#define CRYPTO1(L, U, R, A)
-#define CRYPTO2(L, U, R, A1, A2)
-#include "crypto.def"
-#undef CRYPTO1
-#undef CRYPTO2
-#undef CRYPTO3
- };
-#undef CRYPTO_BUILTIN
-
-/* Set up all the iWMMXt builtins.  This is not called if
-   TARGET_IWMMXT is zero.  */
-
-static void
-arm_init_iwmmxt_builtins (void)
-{
-  const struct builtin_description * d;
-  size_t i;
-
-  tree V2SI_type_node = build_vector_type_for_mode (intSI_type_node, V2SImode);
-  tree V4HI_type_node = build_vector_type_for_mode (intHI_type_node, V4HImode);
-  tree V8QI_type_node = build_vector_type_for_mode (intQI_type_node, V8QImode);
-
-  tree v8qi_ftype_v8qi_v8qi_int
-    = build_function_type_list (V8QI_type_node,
-				V8QI_type_node, V8QI_type_node,
-				integer_type_node, NULL_TREE);
-  tree v4hi_ftype_v4hi_int
-    = build_function_type_list (V4HI_type_node,
-				V4HI_type_node, integer_type_node, NULL_TREE);
-  tree v2si_ftype_v2si_int
-    = build_function_type_list (V2SI_type_node,
-				V2SI_type_node, integer_type_node, NULL_TREE);
-  tree v2si_ftype_di_di
-    = build_function_type_list (V2SI_type_node,
-				long_long_integer_type_node,
-				long_long_integer_type_node,
-				NULL_TREE);
-  tree di_ftype_di_int
-    = build_function_type_list (long_long_integer_type_node,
-				long_long_integer_type_node,
-				integer_type_node, NULL_TREE);
-  tree di_ftype_di_int_int
-    = build_function_type_list (long_long_integer_type_node,
-				long_long_integer_type_node,
-				integer_type_node,
-				integer_type_node, NULL_TREE);
-  tree int_ftype_v8qi
-    = build_function_type_list (integer_type_node,
-				V8QI_type_node, NULL_TREE);
-  tree int_ftype_v4hi
-    = build_function_type_list (integer_type_node,
-				V4HI_type_node, NULL_TREE);
-  tree int_ftype_v2si
-    = build_function_type_list (integer_type_node,
-				V2SI_type_node, NULL_TREE);
-  tree int_ftype_v8qi_int
-    = build_function_type_list (integer_type_node,
-				V8QI_type_node, integer_type_node, NULL_TREE);
-  tree int_ftype_v4hi_int
-    = build_function_type_list (integer_type_node,
-				V4HI_type_node, integer_type_node, NULL_TREE);
-  tree int_ftype_v2si_int
-    = build_function_type_list (integer_type_node,
-				V2SI_type_node, integer_type_node, NULL_TREE);
-  tree v8qi_ftype_v8qi_int_int
-    = build_function_type_list (V8QI_type_node,
-				V8QI_type_node, integer_type_node,
-				integer_type_node, NULL_TREE);
-  tree v4hi_ftype_v4hi_int_int
-    = build_function_type_list (V4HI_type_node,
-				V4HI_type_node, integer_type_node,
-				integer_type_node, NULL_TREE);
-  tree v2si_ftype_v2si_int_int
-    = build_function_type_list (V2SI_type_node,
-				V2SI_type_node, integer_type_node,
-				integer_type_node, NULL_TREE);
-  /* Miscellaneous.  */
-  tree v8qi_ftype_v4hi_v4hi
-    = build_function_type_list (V8QI_type_node,
-				V4HI_type_node, V4HI_type_node, NULL_TREE);
-  tree v4hi_ftype_v2si_v2si
-    = build_function_type_list (V4HI_type_node,
-				V2SI_type_node, V2SI_type_node, NULL_TREE);
-  tree v8qi_ftype_v4hi_v8qi
-    = build_function_type_list (V8QI_type_node,
-	                        V4HI_type_node, V8QI_type_node, NULL_TREE);
-  tree v2si_ftype_v4hi_v4hi
-    = build_function_type_list (V2SI_type_node,
-				V4HI_type_node, V4HI_type_node, NULL_TREE);
-  tree v2si_ftype_v8qi_v8qi
-    = build_function_type_list (V2SI_type_node,
-				V8QI_type_node, V8QI_type_node, NULL_TREE);
-  tree v4hi_ftype_v4hi_di
-    = build_function_type_list (V4HI_type_node,
-				V4HI_type_node, long_long_integer_type_node,
-				NULL_TREE);
-  tree v2si_ftype_v2si_di
-    = build_function_type_list (V2SI_type_node,
-				V2SI_type_node, long_long_integer_type_node,
-				NULL_TREE);
-  tree di_ftype_void
-    = build_function_type_list (long_long_unsigned_type_node, NULL_TREE);
-  tree int_ftype_void
-    = build_function_type_list (integer_type_node, NULL_TREE);
-  tree di_ftype_v8qi
-    = build_function_type_list (long_long_integer_type_node,
-				V8QI_type_node, NULL_TREE);
-  tree di_ftype_v4hi
-    = build_function_type_list (long_long_integer_type_node,
-				V4HI_type_node, NULL_TREE);
-  tree di_ftype_v2si
-    = build_function_type_list (long_long_integer_type_node,
-				V2SI_type_node, NULL_TREE);
-  tree v2si_ftype_v4hi
-    = build_function_type_list (V2SI_type_node,
-				V4HI_type_node, NULL_TREE);
-  tree v4hi_ftype_v8qi
-    = build_function_type_list (V4HI_type_node,
-				V8QI_type_node, NULL_TREE);
-  tree v8qi_ftype_v8qi
-    = build_function_type_list (V8QI_type_node,
-	                        V8QI_type_node, NULL_TREE);
-  tree v4hi_ftype_v4hi
-    = build_function_type_list (V4HI_type_node,
-	                        V4HI_type_node, NULL_TREE);
-  tree v2si_ftype_v2si
-    = build_function_type_list (V2SI_type_node,
-	                        V2SI_type_node, NULL_TREE);
-
-  tree di_ftype_di_v4hi_v4hi
-    = build_function_type_list (long_long_unsigned_type_node,
-				long_long_unsigned_type_node,
-				V4HI_type_node, V4HI_type_node,
-				NULL_TREE);
-
-  tree di_ftype_v4hi_v4hi
-    = build_function_type_list (long_long_unsigned_type_node,
-				V4HI_type_node,V4HI_type_node,
-				NULL_TREE);
-
-  tree v2si_ftype_v2si_v4hi_v4hi
-    = build_function_type_list (V2SI_type_node,
-                                V2SI_type_node, V4HI_type_node,
-                                V4HI_type_node, NULL_TREE);
-
-  tree v2si_ftype_v2si_v8qi_v8qi
-    = build_function_type_list (V2SI_type_node,
-                                V2SI_type_node, V8QI_type_node,
-                                V8QI_type_node, NULL_TREE);
-
-  tree di_ftype_di_v2si_v2si
-     = build_function_type_list (long_long_unsigned_type_node,
-                                 long_long_unsigned_type_node,
-                                 V2SI_type_node, V2SI_type_node,
-                                 NULL_TREE);
-
-   tree di_ftype_di_di_int
-     = build_function_type_list (long_long_unsigned_type_node,
-                                 long_long_unsigned_type_node,
-                                 long_long_unsigned_type_node,
-                                 integer_type_node, NULL_TREE);
-
-   tree void_ftype_int
-     = build_function_type_list (void_type_node,
-                                 integer_type_node, NULL_TREE);
-
-   tree v8qi_ftype_char
-     = build_function_type_list (V8QI_type_node,
-                                 signed_char_type_node, NULL_TREE);
-
-   tree v4hi_ftype_short
-     = build_function_type_list (V4HI_type_node,
-                                 short_integer_type_node, NULL_TREE);
-
-   tree v2si_ftype_int
-     = build_function_type_list (V2SI_type_node,
-                                 integer_type_node, NULL_TREE);
-
-  /* Normal vector binops.  */
-  tree v8qi_ftype_v8qi_v8qi
-    = build_function_type_list (V8QI_type_node,
-				V8QI_type_node, V8QI_type_node, NULL_TREE);
-  tree v4hi_ftype_v4hi_v4hi
-    = build_function_type_list (V4HI_type_node,
-				V4HI_type_node,V4HI_type_node, NULL_TREE);
-  tree v2si_ftype_v2si_v2si
-    = build_function_type_list (V2SI_type_node,
-				V2SI_type_node, V2SI_type_node, NULL_TREE);
-  tree di_ftype_di_di
-    = build_function_type_list (long_long_unsigned_type_node,
-				long_long_unsigned_type_node,
-				long_long_unsigned_type_node,
-				NULL_TREE);
-
-  /* Add all builtins that are more or less simple operations on two
-     operands.  */
-  for (i = 0, d = bdesc_2arg; i < ARRAY_SIZE (bdesc_2arg); i++, d++)
-    {
-      /* Use one of the operands; the target can have a different mode for
-	 mask-generating compares.  */
-      machine_mode mode;
-      tree type;
-
-      if (d->name == 0 || !(d->mask == FL_IWMMXT || d->mask == FL_IWMMXT2))
-	continue;
-
-      mode = insn_data[d->icode].operand[1].mode;
-
-      switch (mode)
-	{
-	case V8QImode:
-	  type = v8qi_ftype_v8qi_v8qi;
-	  break;
-	case V4HImode:
-	  type = v4hi_ftype_v4hi_v4hi;
-	  break;
-	case V2SImode:
-	  type = v2si_ftype_v2si_v2si;
-	  break;
-	case DImode:
-	  type = di_ftype_di_di;
-	  break;
-
-	default:
-	  gcc_unreachable ();
-	}
-
-      def_mbuiltin (d->mask, d->name, type, d->code);
-    }
-
-  /* Add the remaining MMX insns with somewhat more complicated types.  */
-#define iwmmx_mbuiltin(NAME, TYPE, CODE)			\
-  def_mbuiltin (FL_IWMMXT, "__builtin_arm_" NAME, (TYPE),	\
-		ARM_BUILTIN_ ## CODE)
-
-#define iwmmx2_mbuiltin(NAME, TYPE, CODE)                      \
-  def_mbuiltin (FL_IWMMXT2, "__builtin_arm_" NAME, (TYPE),     \
-               ARM_BUILTIN_ ## CODE)
-
-  iwmmx_mbuiltin ("wzero", di_ftype_void, WZERO);
-  iwmmx_mbuiltin ("setwcgr0", void_ftype_int, SETWCGR0);
-  iwmmx_mbuiltin ("setwcgr1", void_ftype_int, SETWCGR1);
-  iwmmx_mbuiltin ("setwcgr2", void_ftype_int, SETWCGR2);
-  iwmmx_mbuiltin ("setwcgr3", void_ftype_int, SETWCGR3);
-  iwmmx_mbuiltin ("getwcgr0", int_ftype_void, GETWCGR0);
-  iwmmx_mbuiltin ("getwcgr1", int_ftype_void, GETWCGR1);
-  iwmmx_mbuiltin ("getwcgr2", int_ftype_void, GETWCGR2);
-  iwmmx_mbuiltin ("getwcgr3", int_ftype_void, GETWCGR3);
-
-  iwmmx_mbuiltin ("wsllh", v4hi_ftype_v4hi_di, WSLLH);
-  iwmmx_mbuiltin ("wsllw", v2si_ftype_v2si_di, WSLLW);
-  iwmmx_mbuiltin ("wslld", di_ftype_di_di, WSLLD);
-  iwmmx_mbuiltin ("wsllhi", v4hi_ftype_v4hi_int, WSLLHI);
-  iwmmx_mbuiltin ("wsllwi", v2si_ftype_v2si_int, WSLLWI);
-  iwmmx_mbuiltin ("wslldi", di_ftype_di_int, WSLLDI);
-
-  iwmmx_mbuiltin ("wsrlh", v4hi_ftype_v4hi_di, WSRLH);
-  iwmmx_mbuiltin ("wsrlw", v2si_ftype_v2si_di, WSRLW);
-  iwmmx_mbuiltin ("wsrld", di_ftype_di_di, WSRLD);
-  iwmmx_mbuiltin ("wsrlhi", v4hi_ftype_v4hi_int, WSRLHI);
-  iwmmx_mbuiltin ("wsrlwi", v2si_ftype_v2si_int, WSRLWI);
-  iwmmx_mbuiltin ("wsrldi", di_ftype_di_int, WSRLDI);
-
-  iwmmx_mbuiltin ("wsrah", v4hi_ftype_v4hi_di, WSRAH);
-  iwmmx_mbuiltin ("wsraw", v2si_ftype_v2si_di, WSRAW);
-  iwmmx_mbuiltin ("wsrad", di_ftype_di_di, WSRAD);
-  iwmmx_mbuiltin ("wsrahi", v4hi_ftype_v4hi_int, WSRAHI);
-  iwmmx_mbuiltin ("wsrawi", v2si_ftype_v2si_int, WSRAWI);
-  iwmmx_mbuiltin ("wsradi", di_ftype_di_int, WSRADI);
-
-  iwmmx_mbuiltin ("wrorh", v4hi_ftype_v4hi_di, WRORH);
-  iwmmx_mbuiltin ("wrorw", v2si_ftype_v2si_di, WRORW);
-  iwmmx_mbuiltin ("wrord", di_ftype_di_di, WRORD);
-  iwmmx_mbuiltin ("wrorhi", v4hi_ftype_v4hi_int, WRORHI);
-  iwmmx_mbuiltin ("wrorwi", v2si_ftype_v2si_int, WRORWI);
-  iwmmx_mbuiltin ("wrordi", di_ftype_di_int, WRORDI);
-
-  iwmmx_mbuiltin ("wshufh", v4hi_ftype_v4hi_int, WSHUFH);
-
-  iwmmx_mbuiltin ("wsadb", v2si_ftype_v2si_v8qi_v8qi, WSADB);
-  iwmmx_mbuiltin ("wsadh", v2si_ftype_v2si_v4hi_v4hi, WSADH);
-  iwmmx_mbuiltin ("wmadds", v2si_ftype_v4hi_v4hi, WMADDS);
-  iwmmx2_mbuiltin ("wmaddsx", v2si_ftype_v4hi_v4hi, WMADDSX);
-  iwmmx2_mbuiltin ("wmaddsn", v2si_ftype_v4hi_v4hi, WMADDSN);
-  iwmmx_mbuiltin ("wmaddu", v2si_ftype_v4hi_v4hi, WMADDU);
-  iwmmx2_mbuiltin ("wmaddux", v2si_ftype_v4hi_v4hi, WMADDUX);
-  iwmmx2_mbuiltin ("wmaddun", v2si_ftype_v4hi_v4hi, WMADDUN);
-  iwmmx_mbuiltin ("wsadbz", v2si_ftype_v8qi_v8qi, WSADBZ);
-  iwmmx_mbuiltin ("wsadhz", v2si_ftype_v4hi_v4hi, WSADHZ);
-
-  iwmmx_mbuiltin ("textrmsb", int_ftype_v8qi_int, TEXTRMSB);
-  iwmmx_mbuiltin ("textrmsh", int_ftype_v4hi_int, TEXTRMSH);
-  iwmmx_mbuiltin ("textrmsw", int_ftype_v2si_int, TEXTRMSW);
-  iwmmx_mbuiltin ("textrmub", int_ftype_v8qi_int, TEXTRMUB);
-  iwmmx_mbuiltin ("textrmuh", int_ftype_v4hi_int, TEXTRMUH);
-  iwmmx_mbuiltin ("textrmuw", int_ftype_v2si_int, TEXTRMUW);
-  iwmmx_mbuiltin ("tinsrb", v8qi_ftype_v8qi_int_int, TINSRB);
-  iwmmx_mbuiltin ("tinsrh", v4hi_ftype_v4hi_int_int, TINSRH);
-  iwmmx_mbuiltin ("tinsrw", v2si_ftype_v2si_int_int, TINSRW);
-
-  iwmmx_mbuiltin ("waccb", di_ftype_v8qi, WACCB);
-  iwmmx_mbuiltin ("wacch", di_ftype_v4hi, WACCH);
-  iwmmx_mbuiltin ("waccw", di_ftype_v2si, WACCW);
-
-  iwmmx_mbuiltin ("tmovmskb", int_ftype_v8qi, TMOVMSKB);
-  iwmmx_mbuiltin ("tmovmskh", int_ftype_v4hi, TMOVMSKH);
-  iwmmx_mbuiltin ("tmovmskw", int_ftype_v2si, TMOVMSKW);
-
-  iwmmx2_mbuiltin ("waddbhusm", v8qi_ftype_v4hi_v8qi, WADDBHUSM);
-  iwmmx2_mbuiltin ("waddbhusl", v8qi_ftype_v4hi_v8qi, WADDBHUSL);
-
-  iwmmx_mbuiltin ("wpackhss", v8qi_ftype_v4hi_v4hi, WPACKHSS);
-  iwmmx_mbuiltin ("wpackhus", v8qi_ftype_v4hi_v4hi, WPACKHUS);
-  iwmmx_mbuiltin ("wpackwus", v4hi_ftype_v2si_v2si, WPACKWUS);
-  iwmmx_mbuiltin ("wpackwss", v4hi_ftype_v2si_v2si, WPACKWSS);
-  iwmmx_mbuiltin ("wpackdus", v2si_ftype_di_di, WPACKDUS);
-  iwmmx_mbuiltin ("wpackdss", v2si_ftype_di_di, WPACKDSS);
-
-  iwmmx_mbuiltin ("wunpckehub", v4hi_ftype_v8qi, WUNPCKEHUB);
-  iwmmx_mbuiltin ("wunpckehuh", v2si_ftype_v4hi, WUNPCKEHUH);
-  iwmmx_mbuiltin ("wunpckehuw", di_ftype_v2si, WUNPCKEHUW);
-  iwmmx_mbuiltin ("wunpckehsb", v4hi_ftype_v8qi, WUNPCKEHSB);
-  iwmmx_mbuiltin ("wunpckehsh", v2si_ftype_v4hi, WUNPCKEHSH);
-  iwmmx_mbuiltin ("wunpckehsw", di_ftype_v2si, WUNPCKEHSW);
-  iwmmx_mbuiltin ("wunpckelub", v4hi_ftype_v8qi, WUNPCKELUB);
-  iwmmx_mbuiltin ("wunpckeluh", v2si_ftype_v4hi, WUNPCKELUH);
-  iwmmx_mbuiltin ("wunpckeluw", di_ftype_v2si, WUNPCKELUW);
-  iwmmx_mbuiltin ("wunpckelsb", v4hi_ftype_v8qi, WUNPCKELSB);
-  iwmmx_mbuiltin ("wunpckelsh", v2si_ftype_v4hi, WUNPCKELSH);
-  iwmmx_mbuiltin ("wunpckelsw", di_ftype_v2si, WUNPCKELSW);
-
-  iwmmx_mbuiltin ("wmacs", di_ftype_di_v4hi_v4hi, WMACS);
-  iwmmx_mbuiltin ("wmacsz", di_ftype_v4hi_v4hi, WMACSZ);
-  iwmmx_mbuiltin ("wmacu", di_ftype_di_v4hi_v4hi, WMACU);
-  iwmmx_mbuiltin ("wmacuz", di_ftype_v4hi_v4hi, WMACUZ);
-
-  iwmmx_mbuiltin ("walign", v8qi_ftype_v8qi_v8qi_int, WALIGNI);
-  iwmmx_mbuiltin ("tmia", di_ftype_di_int_int, TMIA);
-  iwmmx_mbuiltin ("tmiaph", di_ftype_di_int_int, TMIAPH);
-  iwmmx_mbuiltin ("tmiabb", di_ftype_di_int_int, TMIABB);
-  iwmmx_mbuiltin ("tmiabt", di_ftype_di_int_int, TMIABT);
-  iwmmx_mbuiltin ("tmiatb", di_ftype_di_int_int, TMIATB);
-  iwmmx_mbuiltin ("tmiatt", di_ftype_di_int_int, TMIATT);
-
-  iwmmx2_mbuiltin ("wabsb", v8qi_ftype_v8qi, WABSB);
-  iwmmx2_mbuiltin ("wabsh", v4hi_ftype_v4hi, WABSH);
-  iwmmx2_mbuiltin ("wabsw", v2si_ftype_v2si, WABSW);
-
-  iwmmx2_mbuiltin ("wqmiabb", v2si_ftype_v2si_v4hi_v4hi, WQMIABB);
-  iwmmx2_mbuiltin ("wqmiabt", v2si_ftype_v2si_v4hi_v4hi, WQMIABT);
-  iwmmx2_mbuiltin ("wqmiatb", v2si_ftype_v2si_v4hi_v4hi, WQMIATB);
-  iwmmx2_mbuiltin ("wqmiatt", v2si_ftype_v2si_v4hi_v4hi, WQMIATT);
-
-  iwmmx2_mbuiltin ("wqmiabbn", v2si_ftype_v2si_v4hi_v4hi, WQMIABBN);
-  iwmmx2_mbuiltin ("wqmiabtn", v2si_ftype_v2si_v4hi_v4hi, WQMIABTN);
-  iwmmx2_mbuiltin ("wqmiatbn", v2si_ftype_v2si_v4hi_v4hi, WQMIATBN);
-  iwmmx2_mbuiltin ("wqmiattn", v2si_ftype_v2si_v4hi_v4hi, WQMIATTN);
-
-  iwmmx2_mbuiltin ("wmiabb", di_ftype_di_v4hi_v4hi, WMIABB);
-  iwmmx2_mbuiltin ("wmiabt", di_ftype_di_v4hi_v4hi, WMIABT);
-  iwmmx2_mbuiltin ("wmiatb", di_ftype_di_v4hi_v4hi, WMIATB);
-  iwmmx2_mbuiltin ("wmiatt", di_ftype_di_v4hi_v4hi, WMIATT);
-
-  iwmmx2_mbuiltin ("wmiabbn", di_ftype_di_v4hi_v4hi, WMIABBN);
-  iwmmx2_mbuiltin ("wmiabtn", di_ftype_di_v4hi_v4hi, WMIABTN);
-  iwmmx2_mbuiltin ("wmiatbn", di_ftype_di_v4hi_v4hi, WMIATBN);
-  iwmmx2_mbuiltin ("wmiattn", di_ftype_di_v4hi_v4hi, WMIATTN);
-
-  iwmmx2_mbuiltin ("wmiawbb", di_ftype_di_v2si_v2si, WMIAWBB);
-  iwmmx2_mbuiltin ("wmiawbt", di_ftype_di_v2si_v2si, WMIAWBT);
-  iwmmx2_mbuiltin ("wmiawtb", di_ftype_di_v2si_v2si, WMIAWTB);
-  iwmmx2_mbuiltin ("wmiawtt", di_ftype_di_v2si_v2si, WMIAWTT);
-
-  iwmmx2_mbuiltin ("wmiawbbn", di_ftype_di_v2si_v2si, WMIAWBBN);
-  iwmmx2_mbuiltin ("wmiawbtn", di_ftype_di_v2si_v2si, WMIAWBTN);
-  iwmmx2_mbuiltin ("wmiawtbn", di_ftype_di_v2si_v2si, WMIAWTBN);
-  iwmmx2_mbuiltin ("wmiawttn", di_ftype_di_v2si_v2si, WMIAWTTN);
-
-  iwmmx2_mbuiltin ("wmerge", di_ftype_di_di_int, WMERGE);
-
-  iwmmx_mbuiltin ("tbcstb", v8qi_ftype_char, TBCSTB);
-  iwmmx_mbuiltin ("tbcsth", v4hi_ftype_short, TBCSTH);
-  iwmmx_mbuiltin ("tbcstw", v2si_ftype_int, TBCSTW);
-
-#undef iwmmx_mbuiltin
-#undef iwmmx2_mbuiltin
-}
-
-static void
-arm_init_fp16_builtins (void)
-{
-  tree fp16_type = make_node (REAL_TYPE);
-  TYPE_PRECISION (fp16_type) = 16;
-  layout_type (fp16_type);
-  (*lang_hooks.types.register_builtin_type) (fp16_type, "__fp16");
-}
-
-static void
-arm_init_crc32_builtins ()
-{
-  tree si_ftype_si_qi
-    = build_function_type_list (unsigned_intSI_type_node,
-                                unsigned_intSI_type_node,
-                                unsigned_intQI_type_node, NULL_TREE);
-  tree si_ftype_si_hi
-    = build_function_type_list (unsigned_intSI_type_node,
-                                unsigned_intSI_type_node,
-                                unsigned_intHI_type_node, NULL_TREE);
-  tree si_ftype_si_si
-    = build_function_type_list (unsigned_intSI_type_node,
-                                unsigned_intSI_type_node,
-                                unsigned_intSI_type_node, NULL_TREE);
-
-  arm_builtin_decls[ARM_BUILTIN_CRC32B]
-    = add_builtin_function ("__builtin_arm_crc32b", si_ftype_si_qi,
-                            ARM_BUILTIN_CRC32B, BUILT_IN_MD, NULL, NULL_TREE);
-  arm_builtin_decls[ARM_BUILTIN_CRC32H]
-    = add_builtin_function ("__builtin_arm_crc32h", si_ftype_si_hi,
-                            ARM_BUILTIN_CRC32H, BUILT_IN_MD, NULL, NULL_TREE);
-  arm_builtin_decls[ARM_BUILTIN_CRC32W]
-    = add_builtin_function ("__builtin_arm_crc32w", si_ftype_si_si,
-                            ARM_BUILTIN_CRC32W, BUILT_IN_MD, NULL, NULL_TREE);
-  arm_builtin_decls[ARM_BUILTIN_CRC32CB]
-    = add_builtin_function ("__builtin_arm_crc32cb", si_ftype_si_qi,
-                            ARM_BUILTIN_CRC32CB, BUILT_IN_MD, NULL, NULL_TREE);
-  arm_builtin_decls[ARM_BUILTIN_CRC32CH]
-    = add_builtin_function ("__builtin_arm_crc32ch", si_ftype_si_hi,
-                            ARM_BUILTIN_CRC32CH, BUILT_IN_MD, NULL, NULL_TREE);
-  arm_builtin_decls[ARM_BUILTIN_CRC32CW]
-    = add_builtin_function ("__builtin_arm_crc32cw", si_ftype_si_si,
-                            ARM_BUILTIN_CRC32CW, BUILT_IN_MD, NULL, NULL_TREE);
-}
-
-static void
-arm_init_builtins (void)
-{
-  if (TARGET_REALLY_IWMMXT)
-    arm_init_iwmmxt_builtins ();
-
-  if (TARGET_NEON)
-    arm_init_neon_builtins ();
-
-  if (arm_fp16_format)
-    arm_init_fp16_builtins ();
-
-  if (TARGET_CRC32)
-    arm_init_crc32_builtins ();
-
-  if (TARGET_VFP && TARGET_HARD_FLOAT)
-    {
-      tree ftype_set_fpscr
-	= build_function_type_list (void_type_node, unsigned_type_node, NULL);
-      tree ftype_get_fpscr
-	= build_function_type_list (unsigned_type_node, NULL);
-
-      arm_builtin_decls[ARM_BUILTIN_GET_FPSCR]
-	= add_builtin_function ("__builtin_arm_ldfscr", ftype_get_fpscr,
-				ARM_BUILTIN_GET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
-      arm_builtin_decls[ARM_BUILTIN_SET_FPSCR]
-	= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
-				ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
-    }
-}
-
-/* Return the ARM builtin for CODE.  */
-
-static tree
-arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
-{
-  if (code >= ARM_BUILTIN_MAX)
-    return error_mark_node;
-
-  return arm_builtin_decls[code];
-}
-
 /* Implement TARGET_INVALID_PARAMETER_TYPE.  */
 
 static const char *
@@ -25050,507 +23281,6 @@ arm_scalar_mode_supported_p (machine_mode mode)
     return default_scalar_mode_supported_p (mode);
 }
 
-/* Errors in the source file can cause expand_expr to return const0_rtx
-   where we expect a vector.  To avoid crashing, use one of the vector
-   clear instructions.  */
-
-static rtx
-safe_vector_operand (rtx x, machine_mode mode)
-{
-  if (x != const0_rtx)
-    return x;
-  x = gen_reg_rtx (mode);
-
-  emit_insn (gen_iwmmxt_clrdi (mode == DImode ? x
-			       : gen_rtx_SUBREG (DImode, x, 0)));
-  return x;
-}
-
-/* Function to expand ternary builtins.  */
-static rtx
-arm_expand_ternop_builtin (enum insn_code icode,
-                           tree exp, rtx target)
-{
-  rtx pat;
-  tree arg0 = CALL_EXPR_ARG (exp, 0);
-  tree arg1 = CALL_EXPR_ARG (exp, 1);
-  tree arg2 = CALL_EXPR_ARG (exp, 2);
-
-  rtx op0 = expand_normal (arg0);
-  rtx op1 = expand_normal (arg1);
-  rtx op2 = expand_normal (arg2);
-  rtx op3 = NULL_RTX;
-
-  /* The sha1c, sha1p, sha1m crypto builtins require a different vec_select
-     lane operand depending on endianness.  */
-  bool builtin_sha1cpm_p = false;
-
-  if (insn_data[icode].n_operands == 5)
-    {
-      gcc_assert (icode == CODE_FOR_crypto_sha1c
-                  || icode == CODE_FOR_crypto_sha1p
-                  || icode == CODE_FOR_crypto_sha1m);
-      builtin_sha1cpm_p = true;
-    }
-  machine_mode tmode = insn_data[icode].operand[0].mode;
-  machine_mode mode0 = insn_data[icode].operand[1].mode;
-  machine_mode mode1 = insn_data[icode].operand[2].mode;
-  machine_mode mode2 = insn_data[icode].operand[3].mode;
-
-
-  if (VECTOR_MODE_P (mode0))
-    op0 = safe_vector_operand (op0, mode0);
-  if (VECTOR_MODE_P (mode1))
-    op1 = safe_vector_operand (op1, mode1);
-  if (VECTOR_MODE_P (mode2))
-    op2 = safe_vector_operand (op2, mode2);
-
-  if (! target
-      || GET_MODE (target) != tmode
-      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-    target = gen_reg_rtx (tmode);
-
-  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
-	      && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode)
-	      && (GET_MODE (op2) == mode2 || GET_MODE (op2) == VOIDmode));
-
-  if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
-    op0 = copy_to_mode_reg (mode0, op0);
-  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-    op1 = copy_to_mode_reg (mode1, op1);
-  if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
-    op2 = copy_to_mode_reg (mode2, op2);
-  if (builtin_sha1cpm_p)
-    op3 = GEN_INT (TARGET_BIG_END ? 1 : 0);
-
-  if (builtin_sha1cpm_p)
-    pat = GEN_FCN (icode) (target, op0, op1, op2, op3);
-  else
-    pat = GEN_FCN (icode) (target, op0, op1, op2);
-  if (! pat)
-    return 0;
-  emit_insn (pat);
-  return target;
-}
-
-/* Subroutine of arm_expand_builtin to take care of binop insns.  */
-
-static rtx
-arm_expand_binop_builtin (enum insn_code icode,
-			  tree exp, rtx target)
-{
-  rtx pat;
-  tree arg0 = CALL_EXPR_ARG (exp, 0);
-  tree arg1 = CALL_EXPR_ARG (exp, 1);
-  rtx op0 = expand_normal (arg0);
-  rtx op1 = expand_normal (arg1);
-  machine_mode tmode = insn_data[icode].operand[0].mode;
-  machine_mode mode0 = insn_data[icode].operand[1].mode;
-  machine_mode mode1 = insn_data[icode].operand[2].mode;
-
-  if (VECTOR_MODE_P (mode0))
-    op0 = safe_vector_operand (op0, mode0);
-  if (VECTOR_MODE_P (mode1))
-    op1 = safe_vector_operand (op1, mode1);
-
-  if (! target
-      || GET_MODE (target) != tmode
-      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-    target = gen_reg_rtx (tmode);
-
-  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
-	      && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
-
-  if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
-    op0 = copy_to_mode_reg (mode0, op0);
-  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-    op1 = copy_to_mode_reg (mode1, op1);
-
-  pat = GEN_FCN (icode) (target, op0, op1);
-  if (! pat)
-    return 0;
-  emit_insn (pat);
-  return target;
-}
-
-/* Subroutine of arm_expand_builtin to take care of unop insns.  */
-
-static rtx
-arm_expand_unop_builtin (enum insn_code icode,
-			 tree exp, rtx target, int do_load)
-{
-  rtx pat;
-  tree arg0 = CALL_EXPR_ARG (exp, 0);
-  rtx op0 = expand_normal (arg0);
-  rtx op1 = NULL_RTX;
-  machine_mode tmode = insn_data[icode].operand[0].mode;
-  machine_mode mode0 = insn_data[icode].operand[1].mode;
-  bool builtin_sha1h_p = false;
-
-  if (insn_data[icode].n_operands == 3)
-    {
-      gcc_assert (icode == CODE_FOR_crypto_sha1h);
-      builtin_sha1h_p = true;
-    }
-
-  if (! target
-      || GET_MODE (target) != tmode
-      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-    target = gen_reg_rtx (tmode);
-  if (do_load)
-    op0 = gen_rtx_MEM (mode0, copy_to_mode_reg (Pmode, op0));
-  else
-    {
-      if (VECTOR_MODE_P (mode0))
-	op0 = safe_vector_operand (op0, mode0);
-
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
-	op0 = copy_to_mode_reg (mode0, op0);
-    }
-  if (builtin_sha1h_p)
-    op1 = GEN_INT (TARGET_BIG_END ? 1 : 0);
-
-  if (builtin_sha1h_p)
-    pat = GEN_FCN (icode) (target, op0, op1);
-  else
-    pat = GEN_FCN (icode) (target, op0);
-  if (! pat)
-    return 0;
-  emit_insn (pat);
-  return target;
-}
-
-typedef enum {
-  NEON_ARG_COPY_TO_REG,
-  NEON_ARG_CONSTANT,
-  NEON_ARG_MEMORY,
-  NEON_ARG_STOP
-} builtin_arg;
-
-#define NEON_MAX_BUILTIN_ARGS 5
-
-/* EXP is a pointer argument to a Neon load or store intrinsic.  Derive
-   and return an expression for the accessed memory.
-
-   The intrinsic function operates on a block of registers that has
-   mode REG_MODE.  This block contains vectors of type TYPE_MODE.  The
-   function references the memory at EXP of type TYPE and in mode
-   MEM_MODE; this mode may be BLKmode if no more suitable mode is
-   available.  */
-
-static tree
-neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
-			  machine_mode reg_mode,
-			  neon_builtin_type_mode type_mode)
-{
-  HOST_WIDE_INT reg_size, vector_size, nvectors, nelems;
-  tree elem_type, upper_bound, array_type;
-
-  /* Work out the size of the register block in bytes.  */
-  reg_size = GET_MODE_SIZE (reg_mode);
-
-  /* Work out the size of each vector in bytes.  */
-  gcc_assert (TYPE_MODE_BIT (type_mode) & (TB_DREG | TB_QREG));
-  vector_size = (TYPE_MODE_BIT (type_mode) & TB_QREG ? 16 : 8);
-
-  /* Work out how many vectors there are.  */
-  gcc_assert (reg_size % vector_size == 0);
-  nvectors = reg_size / vector_size;
-
-  /* Work out the type of each element.  */
-  gcc_assert (POINTER_TYPE_P (type));
-  elem_type = TREE_TYPE (type);
-
-  /* Work out how many elements are being loaded or stored.
-     MEM_MODE == REG_MODE implies a one-to-one mapping between register
-     and memory elements; anything else implies a lane load or store.  */
-  if (mem_mode == reg_mode)
-    nelems = vector_size * nvectors / int_size_in_bytes (elem_type);
-  else
-    nelems = nvectors;
-
-  /* Create a type that describes the full access.  */
-  upper_bound = build_int_cst (size_type_node, nelems - 1);
-  array_type = build_array_type (elem_type, build_index_type (upper_bound));
-
-  /* Dereference EXP using that type.  */
-  return fold_build2 (MEM_REF, array_type, exp,
-		      build_int_cst (build_pointer_type (array_type), 0));
-}
-
-/* Expand a Neon builtin.  */
-static rtx
-arm_expand_neon_args (rtx target, int icode, int have_retval,
-		      neon_builtin_type_mode type_mode,
-		      tree exp, int fcode, ...)
-{
-  va_list ap;
-  rtx pat;
-  tree arg[NEON_MAX_BUILTIN_ARGS];
-  rtx op[NEON_MAX_BUILTIN_ARGS];
-  tree arg_type;
-  tree formals;
-  machine_mode tmode = insn_data[icode].operand[0].mode;
-  machine_mode mode[NEON_MAX_BUILTIN_ARGS];
-  machine_mode other_mode;
-  int argc = 0;
-  int opno;
-
-  if (have_retval
-      && (!target
-	  || GET_MODE (target) != tmode
-	  || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
-    target = gen_reg_rtx (tmode);
-
-  va_start (ap, fcode);
-
-  formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode]));
-
-  for (;;)
-    {
-      builtin_arg thisarg = (builtin_arg) va_arg (ap, int);
-
-      if (thisarg == NEON_ARG_STOP)
-        break;
-      else
-        {
-          opno = argc + have_retval;
-          mode[argc] = insn_data[icode].operand[opno].mode;
-          arg[argc] = CALL_EXPR_ARG (exp, argc);
-	  arg_type = TREE_VALUE (formals);
-          if (thisarg == NEON_ARG_MEMORY)
-            {
-              other_mode = insn_data[icode].operand[1 - opno].mode;
-              arg[argc] = neon_dereference_pointer (arg[argc], arg_type,
-						    mode[argc], other_mode,
-						    type_mode);
-            }
-
-	  /* Use EXPAND_MEMORY for NEON_ARG_MEMORY to ensure a MEM_P
-	     be returned.  */
-	  op[argc] = expand_expr (arg[argc], NULL_RTX, VOIDmode,
-				  (thisarg == NEON_ARG_MEMORY
-				   ? EXPAND_MEMORY : EXPAND_NORMAL));
-
-          switch (thisarg)
-            {
-            case NEON_ARG_COPY_TO_REG:
-              /*gcc_assert (GET_MODE (op[argc]) == mode[argc]);*/
-              if (!(*insn_data[icode].operand[opno].predicate)
-                     (op[argc], mode[argc]))
-                op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
-              break;
-
-            case NEON_ARG_CONSTANT:
-              /* FIXME: This error message is somewhat unhelpful.  */
-              if (!(*insn_data[icode].operand[opno].predicate)
-                    (op[argc], mode[argc]))
-		error ("argument must be a constant");
-              break;
-
-            case NEON_ARG_MEMORY:
-	      /* Check if expand failed.  */
-	      if (op[argc] == const0_rtx)
-		return 0;
-	      gcc_assert (MEM_P (op[argc]));
-	      PUT_MODE (op[argc], mode[argc]);
-	      /* ??? arm_neon.h uses the same built-in functions for signed
-		 and unsigned accesses, casting where necessary.  This isn't
-		 alias safe.  */
-	      set_mem_alias_set (op[argc], 0);
-	      if (!(*insn_data[icode].operand[opno].predicate)
-                    (op[argc], mode[argc]))
-		op[argc] = (replace_equiv_address
-			    (op[argc], force_reg (Pmode, XEXP (op[argc], 0))));
-              break;
-
-            case NEON_ARG_STOP:
-              gcc_unreachable ();
-            }
-
-          argc++;
-	  formals = TREE_CHAIN (formals);
-        }
-    }
-
-  va_end (ap);
-
-  if (have_retval)
-    switch (argc)
-      {
-      case 1:
-	pat = GEN_FCN (icode) (target, op[0]);
-	break;
-
-      case 2:
-	pat = GEN_FCN (icode) (target, op[0], op[1]);
-	break;
-
-      case 3:
-	pat = GEN_FCN (icode) (target, op[0], op[1], op[2]);
-	break;
-
-      case 4:
-	pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3]);
-	break;
-
-      case 5:
-	pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3], op[4]);
-	break;
-
-      default:
-	gcc_unreachable ();
-      }
-  else
-    switch (argc)
-      {
-      case 1:
-	pat = GEN_FCN (icode) (op[0]);
-	break;
-
-      case 2:
-	pat = GEN_FCN (icode) (op[0], op[1]);
-	break;
-
-      case 3:
-	pat = GEN_FCN (icode) (op[0], op[1], op[2]);
-	break;
-
-      case 4:
-	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
-	break;
-
-      case 5:
-	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
-        break;
-
-      default:
-	gcc_unreachable ();
-      }
-
-  if (!pat)
-    return 0;
-
-  emit_insn (pat);
-
-  return target;
-}
-
-/* Expand a Neon builtin. These are "special" because they don't have symbolic
-   constants defined per-instruction or per instruction-variant. Instead, the
-   required info is looked up in the table neon_builtin_data.  */
-static rtx
-arm_expand_neon_builtin (int fcode, tree exp, rtx target)
-{
-  neon_builtin_datum *d = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE];
-  neon_itype itype = d->itype;
-  enum insn_code icode = d->code;
-  neon_builtin_type_mode type_mode = d->mode;
-
-  switch (itype)
-    {
-    case NEON_UNOP:
-    case NEON_CONVERT:
-    case NEON_DUPLANE:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_BINOP:
-    case NEON_LOGICBINOP:
-    case NEON_SCALARMUL:
-    case NEON_SCALARMULL:
-    case NEON_SCALARMULH:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_TERNOP:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_STOP);
-
-    case NEON_GETLANE:
-    case NEON_FIXCONV:
-    case NEON_SHIFTIMM:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_STOP);
-
-    case NEON_CREATE:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_DUP:
-    case NEON_RINT:
-    case NEON_SPLIT:
-    case NEON_FLOAT_WIDEN:
-    case NEON_FLOAT_NARROW:
-    case NEON_BSWAP:
-    case NEON_REINTERP:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_COPYSIGNF:
-    case NEON_COMBINE:
-    case NEON_VTBL:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_LANEMUL:
-    case NEON_LANEMULL:
-    case NEON_LANEMULH:
-    case NEON_SETLANE:
-    case NEON_SHIFTINSERT:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_STOP);
-
-    case NEON_LANEMAC:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_CONSTANT, NEON_ARG_STOP);
-
-    case NEON_SHIFTACC:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_STOP);
-
-    case NEON_SCALARMAC:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_STOP);
-
-    case NEON_SELECT:
-    case NEON_VTBX:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_STOP);
-
-    case NEON_LOAD1:
-    case NEON_LOADSTRUCT:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_STOP);
-
-    case NEON_LOAD1LANE:
-    case NEON_LOADSTRUCTLANE:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-	NEON_ARG_STOP);
-
-    case NEON_STORE1:
-    case NEON_STORESTRUCT:
-      return arm_expand_neon_args (target, icode, 0, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_STORE1LANE:
-    case NEON_STORESTRUCTLANE:
-      return arm_expand_neon_args (target, icode, 0, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-	NEON_ARG_STOP);
-    }
-
-  gcc_unreachable ();
-}
-
 /* Emit code to reinterpret one Neon type as another, without altering bits.  */
 void
 neon_reinterpret (rtx dest, rtx src)
@@ -25638,550 +23368,6 @@ neon_split_vcombine (rtx operands[3])
 	emit_move_insn (destlo, operands[1]);
     }
 }
-
-/* Expand an expression EXP that calls a built-in function,
-   with result going to TARGET if that's convenient
-   (and in mode MODE if that's convenient).
-   SUBTARGET may be used as the target for computing one of EXP's operands.
-   IGNORE is nonzero if the value is to be ignored.  */
-
-static rtx
-arm_expand_builtin (tree exp,
-		    rtx target,
-		    rtx subtarget ATTRIBUTE_UNUSED,
-		    machine_mode mode ATTRIBUTE_UNUSED,
-		    int ignore ATTRIBUTE_UNUSED)
-{
-  const struct builtin_description * d;
-  enum insn_code    icode;
-  tree              fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
-  tree              arg0;
-  tree              arg1;
-  tree              arg2;
-  rtx               op0;
-  rtx               op1;
-  rtx               op2;
-  rtx               pat;
-  unsigned int      fcode = DECL_FUNCTION_CODE (fndecl);
-  size_t            i;
-  machine_mode tmode;
-  machine_mode mode0;
-  machine_mode mode1;
-  machine_mode mode2;
-  int opint;
-  int selector;
-  int mask;
-  int imm;
-
-  if (fcode >= ARM_BUILTIN_NEON_BASE)
-    return arm_expand_neon_builtin (fcode, exp, target);
-
-  switch (fcode)
-    {
-    case ARM_BUILTIN_GET_FPSCR:
-    case ARM_BUILTIN_SET_FPSCR:
-      if (fcode == ARM_BUILTIN_GET_FPSCR)
-	{
-	  icode = CODE_FOR_get_fpscr;
-	  target = gen_reg_rtx (SImode);
-	  pat = GEN_FCN (icode) (target);
-	}
-      else
-	{
-	  target = NULL_RTX;
-	  icode = CODE_FOR_set_fpscr;
-	  arg0 = CALL_EXPR_ARG (exp, 0);
-	  op0 = expand_normal (arg0);
-	  pat = GEN_FCN (icode) (op0);
-	}
-      emit_insn (pat);
-      return target;
-
-    case ARM_BUILTIN_TEXTRMSB:
-    case ARM_BUILTIN_TEXTRMUB:
-    case ARM_BUILTIN_TEXTRMSH:
-    case ARM_BUILTIN_TEXTRMUH:
-    case ARM_BUILTIN_TEXTRMSW:
-    case ARM_BUILTIN_TEXTRMUW:
-      icode = (fcode == ARM_BUILTIN_TEXTRMSB ? CODE_FOR_iwmmxt_textrmsb
-	       : fcode == ARM_BUILTIN_TEXTRMUB ? CODE_FOR_iwmmxt_textrmub
-	       : fcode == ARM_BUILTIN_TEXTRMSH ? CODE_FOR_iwmmxt_textrmsh
-	       : fcode == ARM_BUILTIN_TEXTRMUH ? CODE_FOR_iwmmxt_textrmuh
-	       : CODE_FOR_iwmmxt_textrmw);
-
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode0 = insn_data[icode].operand[1].mode;
-      mode1 = insn_data[icode].operand[2].mode;
-
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
-	op0 = copy_to_mode_reg (mode0, op0);
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-	{
-	  /* @@@ better error message */
-	  error ("selector must be an immediate");
-	  return gen_reg_rtx (tmode);
-	}
-
-      opint = INTVAL (op1);
-      if (fcode == ARM_BUILTIN_TEXTRMSB || fcode == ARM_BUILTIN_TEXTRMUB)
-	{
-	  if (opint > 7 || opint < 0)
-	    error ("the range of selector should be in 0 to 7");
-	}
-      else if (fcode == ARM_BUILTIN_TEXTRMSH || fcode == ARM_BUILTIN_TEXTRMUH)
-	{
-	  if (opint > 3 || opint < 0)
-	    error ("the range of selector should be in 0 to 3");
-	}
-      else /* ARM_BUILTIN_TEXTRMSW || ARM_BUILTIN_TEXTRMUW.  */
-	{
-	  if (opint > 1 || opint < 0)
-	    error ("the range of selector should be in 0 to 1");
-	}
-
-      if (target == 0
-	  || GET_MODE (target) != tmode
-	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-	target = gen_reg_rtx (tmode);
-      pat = GEN_FCN (icode) (target, op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
-
-    case ARM_BUILTIN_WALIGNI:
-      /* If op2 is immediate, call walighi, else call walighr.  */
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      arg2 = CALL_EXPR_ARG (exp, 2);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      op2 = expand_normal (arg2);
-      if (CONST_INT_P (op2))
-        {
-	  icode = CODE_FOR_iwmmxt_waligni;
-          tmode = insn_data[icode].operand[0].mode;
-	  mode0 = insn_data[icode].operand[1].mode;
-	  mode1 = insn_data[icode].operand[2].mode;
-	  mode2 = insn_data[icode].operand[3].mode;
-          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
-	    op0 = copy_to_mode_reg (mode0, op0);
-          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
-	    op1 = copy_to_mode_reg (mode1, op1);
-          gcc_assert ((*insn_data[icode].operand[3].predicate) (op2, mode2));
-	  selector = INTVAL (op2);
-	  if (selector > 7 || selector < 0)
-	    error ("the range of selector should be in 0 to 7");
-	}
-      else
-        {
-	  icode = CODE_FOR_iwmmxt_walignr;
-          tmode = insn_data[icode].operand[0].mode;
-	  mode0 = insn_data[icode].operand[1].mode;
-	  mode1 = insn_data[icode].operand[2].mode;
-	  mode2 = insn_data[icode].operand[3].mode;
-          if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
-	    op0 = copy_to_mode_reg (mode0, op0);
-          if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
-	    op1 = copy_to_mode_reg (mode1, op1);
-          if (!(*insn_data[icode].operand[3].predicate) (op2, mode2))
-	    op2 = copy_to_mode_reg (mode2, op2);
-	}
-      if (target == 0
-	  || GET_MODE (target) != tmode
-	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
-	target = gen_reg_rtx (tmode);
-      pat = GEN_FCN (icode) (target, op0, op1, op2);
-      if (!pat)
-	return 0;
-      emit_insn (pat);
-      return target;
-
-    case ARM_BUILTIN_TINSRB:
-    case ARM_BUILTIN_TINSRH:
-    case ARM_BUILTIN_TINSRW:
-    case ARM_BUILTIN_WMERGE:
-      icode = (fcode == ARM_BUILTIN_TINSRB ? CODE_FOR_iwmmxt_tinsrb
-	       : fcode == ARM_BUILTIN_TINSRH ? CODE_FOR_iwmmxt_tinsrh
-	       : fcode == ARM_BUILTIN_WMERGE ? CODE_FOR_iwmmxt_wmerge
-	       : CODE_FOR_iwmmxt_tinsrw);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      arg2 = CALL_EXPR_ARG (exp, 2);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      op2 = expand_normal (arg2);
-      tmode = insn_data[icode].operand[0].mode;
-      mode0 = insn_data[icode].operand[1].mode;
-      mode1 = insn_data[icode].operand[2].mode;
-      mode2 = insn_data[icode].operand[3].mode;
-
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
-	op0 = copy_to_mode_reg (mode0, op0);
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-	op1 = copy_to_mode_reg (mode1, op1);
-      if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
-	{
-	  error ("selector must be an immediate");
-	  return const0_rtx;
-	}
-      if (icode == CODE_FOR_iwmmxt_wmerge)
-	{
-	  selector = INTVAL (op2);
-	  if (selector > 7 || selector < 0)
-	    error ("the range of selector should be in 0 to 7");
-	}
-      if ((icode == CODE_FOR_iwmmxt_tinsrb)
-	  || (icode == CODE_FOR_iwmmxt_tinsrh)
-	  || (icode == CODE_FOR_iwmmxt_tinsrw))
-        {
-	  mask = 0x01;
-	  selector= INTVAL (op2);
-	  if (icode == CODE_FOR_iwmmxt_tinsrb && (selector < 0 || selector > 7))
-	    error ("the range of selector should be in 0 to 7");
-	  else if (icode == CODE_FOR_iwmmxt_tinsrh && (selector < 0 ||selector > 3))
-	    error ("the range of selector should be in 0 to 3");
-	  else if (icode == CODE_FOR_iwmmxt_tinsrw && (selector < 0 ||selector > 1))
-	    error ("the range of selector should be in 0 to 1");
-	  mask <<= selector;
-	  op2 = GEN_INT (mask);
-	}
-      if (target == 0
-	  || GET_MODE (target) != tmode
-	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-	target = gen_reg_rtx (tmode);
-      pat = GEN_FCN (icode) (target, op0, op1, op2);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
-
-    case ARM_BUILTIN_SETWCGR0:
-    case ARM_BUILTIN_SETWCGR1:
-    case ARM_BUILTIN_SETWCGR2:
-    case ARM_BUILTIN_SETWCGR3:
-      icode = (fcode == ARM_BUILTIN_SETWCGR0 ? CODE_FOR_iwmmxt_setwcgr0
-	       : fcode == ARM_BUILTIN_SETWCGR1 ? CODE_FOR_iwmmxt_setwcgr1
-	       : fcode == ARM_BUILTIN_SETWCGR2 ? CODE_FOR_iwmmxt_setwcgr2
-	       : CODE_FOR_iwmmxt_setwcgr3);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      op0 = expand_normal (arg0);
-      mode0 = insn_data[icode].operand[0].mode;
-      if (!(*insn_data[icode].operand[0].predicate) (op0, mode0))
-        op0 = copy_to_mode_reg (mode0, op0);
-      pat = GEN_FCN (icode) (op0);
-      if (!pat)
-	return 0;
-      emit_insn (pat);
-      return 0;
-
-    case ARM_BUILTIN_GETWCGR0:
-    case ARM_BUILTIN_GETWCGR1:
-    case ARM_BUILTIN_GETWCGR2:
-    case ARM_BUILTIN_GETWCGR3:
-      icode = (fcode == ARM_BUILTIN_GETWCGR0 ? CODE_FOR_iwmmxt_getwcgr0
-	       : fcode == ARM_BUILTIN_GETWCGR1 ? CODE_FOR_iwmmxt_getwcgr1
-	       : fcode == ARM_BUILTIN_GETWCGR2 ? CODE_FOR_iwmmxt_getwcgr2
-	       : CODE_FOR_iwmmxt_getwcgr3);
-      tmode = insn_data[icode].operand[0].mode;
-      if (target == 0
-	  || GET_MODE (target) != tmode
-	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
-        target = gen_reg_rtx (tmode);
-      pat = GEN_FCN (icode) (target);
-      if (!pat)
-        return 0;
-      emit_insn (pat);
-      return target;
-
-    case ARM_BUILTIN_WSHUFH:
-      icode = CODE_FOR_iwmmxt_wshufh;
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode1 = insn_data[icode].operand[1].mode;
-      mode2 = insn_data[icode].operand[2].mode;
-
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-	op0 = copy_to_mode_reg (mode1, op0);
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
-	{
-	  error ("mask must be an immediate");
-	  return const0_rtx;
-	}
-      selector = INTVAL (op1);
-      if (selector < 0 || selector > 255)
-	error ("the range of mask should be in 0 to 255");
-      if (target == 0
-	  || GET_MODE (target) != tmode
-	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-	target = gen_reg_rtx (tmode);
-      pat = GEN_FCN (icode) (target, op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
-
-    case ARM_BUILTIN_WMADDS:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmadds, exp, target);
-    case ARM_BUILTIN_WMADDSX:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsx, exp, target);
-    case ARM_BUILTIN_WMADDSN:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddsn, exp, target);
-    case ARM_BUILTIN_WMADDU:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddu, exp, target);
-    case ARM_BUILTIN_WMADDUX:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddux, exp, target);
-    case ARM_BUILTIN_WMADDUN:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wmaddun, exp, target);
-    case ARM_BUILTIN_WSADBZ:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target);
-    case ARM_BUILTIN_WSADHZ:
-      return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadhz, exp, target);
-
-      /* Several three-argument builtins.  */
-    case ARM_BUILTIN_WMACS:
-    case ARM_BUILTIN_WMACU:
-    case ARM_BUILTIN_TMIA:
-    case ARM_BUILTIN_TMIAPH:
-    case ARM_BUILTIN_TMIATT:
-    case ARM_BUILTIN_TMIATB:
-    case ARM_BUILTIN_TMIABT:
-    case ARM_BUILTIN_TMIABB:
-    case ARM_BUILTIN_WQMIABB:
-    case ARM_BUILTIN_WQMIABT:
-    case ARM_BUILTIN_WQMIATB:
-    case ARM_BUILTIN_WQMIATT:
-    case ARM_BUILTIN_WQMIABBN:
-    case ARM_BUILTIN_WQMIABTN:
-    case ARM_BUILTIN_WQMIATBN:
-    case ARM_BUILTIN_WQMIATTN:
-    case ARM_BUILTIN_WMIABB:
-    case ARM_BUILTIN_WMIABT:
-    case ARM_BUILTIN_WMIATB:
-    case ARM_BUILTIN_WMIATT:
-    case ARM_BUILTIN_WMIABBN:
-    case ARM_BUILTIN_WMIABTN:
-    case ARM_BUILTIN_WMIATBN:
-    case ARM_BUILTIN_WMIATTN:
-    case ARM_BUILTIN_WMIAWBB:
-    case ARM_BUILTIN_WMIAWBT:
-    case ARM_BUILTIN_WMIAWTB:
-    case ARM_BUILTIN_WMIAWTT:
-    case ARM_BUILTIN_WMIAWBBN:
-    case ARM_BUILTIN_WMIAWBTN:
-    case ARM_BUILTIN_WMIAWTBN:
-    case ARM_BUILTIN_WMIAWTTN:
-    case ARM_BUILTIN_WSADB:
-    case ARM_BUILTIN_WSADH:
-      icode = (fcode == ARM_BUILTIN_WMACS ? CODE_FOR_iwmmxt_wmacs
-	       : fcode == ARM_BUILTIN_WMACU ? CODE_FOR_iwmmxt_wmacu
-	       : fcode == ARM_BUILTIN_TMIA ? CODE_FOR_iwmmxt_tmia
-	       : fcode == ARM_BUILTIN_TMIAPH ? CODE_FOR_iwmmxt_tmiaph
-	       : fcode == ARM_BUILTIN_TMIABB ? CODE_FOR_iwmmxt_tmiabb
-	       : fcode == ARM_BUILTIN_TMIABT ? CODE_FOR_iwmmxt_tmiabt
-	       : fcode == ARM_BUILTIN_TMIATB ? CODE_FOR_iwmmxt_tmiatb
-	       : fcode == ARM_BUILTIN_TMIATT ? CODE_FOR_iwmmxt_tmiatt
-	       : fcode == ARM_BUILTIN_WQMIABB ? CODE_FOR_iwmmxt_wqmiabb
-	       : fcode == ARM_BUILTIN_WQMIABT ? CODE_FOR_iwmmxt_wqmiabt
-	       : fcode == ARM_BUILTIN_WQMIATB ? CODE_FOR_iwmmxt_wqmiatb
-	       : fcode == ARM_BUILTIN_WQMIATT ? CODE_FOR_iwmmxt_wqmiatt
-	       : fcode == ARM_BUILTIN_WQMIABBN ? CODE_FOR_iwmmxt_wqmiabbn
-	       : fcode == ARM_BUILTIN_WQMIABTN ? CODE_FOR_iwmmxt_wqmiabtn
-	       : fcode == ARM_BUILTIN_WQMIATBN ? CODE_FOR_iwmmxt_wqmiatbn
-	       : fcode == ARM_BUILTIN_WQMIATTN ? CODE_FOR_iwmmxt_wqmiattn
-	       : fcode == ARM_BUILTIN_WMIABB ? CODE_FOR_iwmmxt_wmiabb
-	       : fcode == ARM_BUILTIN_WMIABT ? CODE_FOR_iwmmxt_wmiabt
-	       : fcode == ARM_BUILTIN_WMIATB ? CODE_FOR_iwmmxt_wmiatb
-	       : fcode == ARM_BUILTIN_WMIATT ? CODE_FOR_iwmmxt_wmiatt
-	       : fcode == ARM_BUILTIN_WMIABBN ? CODE_FOR_iwmmxt_wmiabbn
-	       : fcode == ARM_BUILTIN_WMIABTN ? CODE_FOR_iwmmxt_wmiabtn
-	       : fcode == ARM_BUILTIN_WMIATBN ? CODE_FOR_iwmmxt_wmiatbn
-	       : fcode == ARM_BUILTIN_WMIATTN ? CODE_FOR_iwmmxt_wmiattn
-	       : fcode == ARM_BUILTIN_WMIAWBB ? CODE_FOR_iwmmxt_wmiawbb
-	       : fcode == ARM_BUILTIN_WMIAWBT ? CODE_FOR_iwmmxt_wmiawbt
-	       : fcode == ARM_BUILTIN_WMIAWTB ? CODE_FOR_iwmmxt_wmiawtb
-	       : fcode == ARM_BUILTIN_WMIAWTT ? CODE_FOR_iwmmxt_wmiawtt
-	       : fcode == ARM_BUILTIN_WMIAWBBN ? CODE_FOR_iwmmxt_wmiawbbn
-	       : fcode == ARM_BUILTIN_WMIAWBTN ? CODE_FOR_iwmmxt_wmiawbtn
-	       : fcode == ARM_BUILTIN_WMIAWTBN ? CODE_FOR_iwmmxt_wmiawtbn
-	       : fcode == ARM_BUILTIN_WMIAWTTN ? CODE_FOR_iwmmxt_wmiawttn
-	       : fcode == ARM_BUILTIN_WSADB ? CODE_FOR_iwmmxt_wsadb
-	       : CODE_FOR_iwmmxt_wsadh);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      arg2 = CALL_EXPR_ARG (exp, 2);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      op2 = expand_normal (arg2);
-      tmode = insn_data[icode].operand[0].mode;
-      mode0 = insn_data[icode].operand[1].mode;
-      mode1 = insn_data[icode].operand[2].mode;
-      mode2 = insn_data[icode].operand[3].mode;
-
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
-	op0 = copy_to_mode_reg (mode0, op0);
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-	op1 = copy_to_mode_reg (mode1, op1);
-      if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
-	op2 = copy_to_mode_reg (mode2, op2);
-      if (target == 0
-	  || GET_MODE (target) != tmode
-	  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-	target = gen_reg_rtx (tmode);
-      pat = GEN_FCN (icode) (target, op0, op1, op2);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
-
-    case ARM_BUILTIN_WZERO:
-      target = gen_reg_rtx (DImode);
-      emit_insn (gen_iwmmxt_clrdi (target));
-      return target;
-
-    case ARM_BUILTIN_WSRLHI:
-    case ARM_BUILTIN_WSRLWI:
-    case ARM_BUILTIN_WSRLDI:
-    case ARM_BUILTIN_WSLLHI:
-    case ARM_BUILTIN_WSLLWI:
-    case ARM_BUILTIN_WSLLDI:
-    case ARM_BUILTIN_WSRAHI:
-    case ARM_BUILTIN_WSRAWI:
-    case ARM_BUILTIN_WSRADI:
-    case ARM_BUILTIN_WRORHI:
-    case ARM_BUILTIN_WRORWI:
-    case ARM_BUILTIN_WRORDI:
-    case ARM_BUILTIN_WSRLH:
-    case ARM_BUILTIN_WSRLW:
-    case ARM_BUILTIN_WSRLD:
-    case ARM_BUILTIN_WSLLH:
-    case ARM_BUILTIN_WSLLW:
-    case ARM_BUILTIN_WSLLD:
-    case ARM_BUILTIN_WSRAH:
-    case ARM_BUILTIN_WSRAW:
-    case ARM_BUILTIN_WSRAD:
-    case ARM_BUILTIN_WRORH:
-    case ARM_BUILTIN_WRORW:
-    case ARM_BUILTIN_WRORD:
-      icode = (fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
-	       : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
-	       : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
-	       : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
-	       : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
-	       : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
-	       : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
-	       : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
-	       : fcode == ARM_BUILTIN_WSLLH  ? CODE_FOR_ashlv4hi3_di
-	       : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
-	       : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
-	       : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
-	       : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
-	       : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
-	       : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
-	       : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
-	       : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
-	       : CODE_FOR_nothing);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op1 = expand_normal (arg1);
-      if (GET_MODE (op1) == VOIDmode)
-	{
-	  imm = INTVAL (op1);
-	  if ((fcode == ARM_BUILTIN_WRORHI || fcode == ARM_BUILTIN_WRORWI
-	       || fcode == ARM_BUILTIN_WRORH || fcode == ARM_BUILTIN_WRORW)
-	      && (imm < 0 || imm > 32))
-	    {
-	      if (fcode == ARM_BUILTIN_WRORHI)
-		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi16 in code.");
-	      else if (fcode == ARM_BUILTIN_WRORWI)
-		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_rori_pi32 in code.");
-	      else if (fcode == ARM_BUILTIN_WRORH)
-		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi16 in code.");
-	      else
-		error ("the range of count should be in 0 to 32.  please check the intrinsic _mm_ror_pi32 in code.");
-	    }
-	  else if ((fcode == ARM_BUILTIN_WRORDI || fcode == ARM_BUILTIN_WRORD)
-		   && (imm < 0 || imm > 64))
-	    {
-	      if (fcode == ARM_BUILTIN_WRORDI)
-		error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_rori_si64 in code.");
-	      else
-		error ("the range of count should be in 0 to 64.  please check the intrinsic _mm_ror_si64 in code.");
-	    }
-	  else if (imm < 0)
-	    {
-	      if (fcode == ARM_BUILTIN_WSRLHI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi16 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRLWI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_pi32 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRLDI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srli_si64 in code.");
-	      else if (fcode == ARM_BUILTIN_WSLLHI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi16 in code.");
-	      else if (fcode == ARM_BUILTIN_WSLLWI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_pi32 in code.");
-	      else if (fcode == ARM_BUILTIN_WSLLDI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_slli_si64 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRAHI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi16 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRAWI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_pi32 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRADI)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srai_si64 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRLH)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi16 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRLW)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_pi32 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRLD)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_srl_si64 in code.");
-	      else if (fcode == ARM_BUILTIN_WSLLH)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi16 in code.");
-	      else if (fcode == ARM_BUILTIN_WSLLW)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_pi32 in code.");
-	      else if (fcode == ARM_BUILTIN_WSLLD)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_sll_si64 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRAH)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi16 in code.");
-	      else if (fcode == ARM_BUILTIN_WSRAW)
-		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_pi32 in code.");
-	      else
-		error ("the count should be no less than 0.  please check the intrinsic _mm_sra_si64 in code.");
-	    }
-	}
-      return arm_expand_binop_builtin (icode, exp, target);
-
-    default:
-      break;
-    }
-
-  for (i = 0, d = bdesc_2arg; i < ARRAY_SIZE (bdesc_2arg); i++, d++)
-    if (d->code == (const enum arm_builtins) fcode)
-      return arm_expand_binop_builtin (d->icode, exp, target);
-
-  for (i = 0, d = bdesc_1arg; i < ARRAY_SIZE (bdesc_1arg); i++, d++)
-    if (d->code == (const enum arm_builtins) fcode)
-      return arm_expand_unop_builtin (d->icode, exp, target, 0);
-
-  for (i = 0, d = bdesc_3arg; i < ARRAY_SIZE (bdesc_3arg); i++, d++)
-    if (d->code == (const enum arm_builtins) fcode)
-      return arm_expand_ternop_builtin (d->icode, exp, target);
-
-  /* @@@ Should really do something sensible here.  */
-  return NULL_RTX;
-}
 \f
 /* Return the number (counting from 0) of
    the least significant set bit in MASK.  */
@@ -29996,130 +27182,6 @@ arm_have_conditional_execution (void)
   return !TARGET_THUMB1;
 }
 
-tree
-arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
-{
-  machine_mode in_mode, out_mode;
-  int in_n, out_n;
-  bool out_unsigned_p = TYPE_UNSIGNED (type_out);
-
-  if (TREE_CODE (type_out) != VECTOR_TYPE
-      || TREE_CODE (type_in) != VECTOR_TYPE)
-    return NULL_TREE;
-
-  out_mode = TYPE_MODE (TREE_TYPE (type_out));
-  out_n = TYPE_VECTOR_SUBPARTS (type_out);
-  in_mode = TYPE_MODE (TREE_TYPE (type_in));
-  in_n = TYPE_VECTOR_SUBPARTS (type_in);
-
-/* ARM_CHECK_BUILTIN_MODE and ARM_FIND_VRINT_VARIANT are used to find the
-   decl of the vectorized builtin for the appropriate vector mode.
-   NULL_TREE is returned if no such builtin is available.  */
-#undef ARM_CHECK_BUILTIN_MODE
-#define ARM_CHECK_BUILTIN_MODE(C)    \
-  (TARGET_NEON && TARGET_FPU_ARMV8   \
-   && flag_unsafe_math_optimizations \
-   && ARM_CHECK_BUILTIN_MODE_1 (C))
-
-#undef ARM_CHECK_BUILTIN_MODE_1
-#define ARM_CHECK_BUILTIN_MODE_1(C) \
-  (out_mode == SFmode && out_n == C \
-   && in_mode == SFmode && in_n == C)
-
-#undef ARM_FIND_VRINT_VARIANT
-#define ARM_FIND_VRINT_VARIANT(N) \
-  (ARM_CHECK_BUILTIN_MODE (2) \
-    ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v2sf, false) \
-    : (ARM_CHECK_BUILTIN_MODE (4) \
-      ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sf, false) \
-      : NULL_TREE))
-
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
-    {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
-      switch (fn)
-        {
-          case BUILT_IN_FLOORF:
-            return ARM_FIND_VRINT_VARIANT (vrintm);
-          case BUILT_IN_CEILF:
-            return ARM_FIND_VRINT_VARIANT (vrintp);
-          case BUILT_IN_TRUNCF:
-            return ARM_FIND_VRINT_VARIANT (vrintz);
-          case BUILT_IN_ROUNDF:
-            return ARM_FIND_VRINT_VARIANT (vrinta);
-#undef ARM_CHECK_BUILTIN_MODE_1
-#define ARM_CHECK_BUILTIN_MODE_1(C) \
-  (out_mode == SImode && out_n == C \
-   && in_mode == SFmode && in_n == C)
-
-#define ARM_FIND_VCVT_VARIANT(N) \
-  (ARM_CHECK_BUILTIN_MODE (2) \
-   ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v2sfv2si, false) \
-   : (ARM_CHECK_BUILTIN_MODE (4) \
-     ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sfv4si, false) \
-     : NULL_TREE))
-
-#define ARM_FIND_VCVTU_VARIANT(N) \
-  (ARM_CHECK_BUILTIN_MODE (2) \
-   ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv2sfv2si, false) \
-   : (ARM_CHECK_BUILTIN_MODE (4) \
-     ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv4sfv4si, false) \
-     : NULL_TREE))
-          case BUILT_IN_LROUNDF:
-            return out_unsigned_p
-                     ? ARM_FIND_VCVTU_VARIANT (vcvta)
-                     : ARM_FIND_VCVT_VARIANT (vcvta);
-          case BUILT_IN_LCEILF:
-            return out_unsigned_p
-                     ? ARM_FIND_VCVTU_VARIANT (vcvtp)
-                     : ARM_FIND_VCVT_VARIANT (vcvtp);
-          case BUILT_IN_LFLOORF:
-            return out_unsigned_p
-                     ? ARM_FIND_VCVTU_VARIANT (vcvtm)
-                     : ARM_FIND_VCVT_VARIANT (vcvtm);
-#undef ARM_CHECK_BUILTIN_MODE
-#define ARM_CHECK_BUILTIN_MODE(C, N) \
-  (out_mode == N##mode && out_n == C \
-   && in_mode == N##mode && in_n == C)
-          case BUILT_IN_BSWAP16:
-            if (ARM_CHECK_BUILTIN_MODE (4, HI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4hi, false);
-            else if (ARM_CHECK_BUILTIN_MODE (8, HI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv8hi, false);
-            else
-              return NULL_TREE;
-          case BUILT_IN_BSWAP32:
-            if (ARM_CHECK_BUILTIN_MODE (2, SI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2si, false);
-            else if (ARM_CHECK_BUILTIN_MODE (4, SI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4si, false);
-            else
-              return NULL_TREE;
-          case BUILT_IN_BSWAP64:
-            if (ARM_CHECK_BUILTIN_MODE (2, DI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2di, false);
-            else
-              return NULL_TREE;
-	  case BUILT_IN_COPYSIGNF:
-	    if (ARM_CHECK_BUILTIN_MODE (2, SF))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv2sf, false);
-	    else if (ARM_CHECK_BUILTIN_MODE (4, SF))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv4sf, false);
-	    else
-	      return NULL_TREE;
-
-          default:
-            return NULL_TREE;
-        }
-    }
-  return NULL_TREE;
-}
-#undef ARM_FIND_VCVT_VARIANT
-#undef ARM_FIND_VCVTU_VARIANT
-#undef ARM_CHECK_BUILTIN_MODE
-#undef ARM_FIND_VRINT_VARIANT
-
-
 /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
 static HOST_WIDE_INT
 arm_vector_alignment (const_tree type)
@@ -32203,75 +29265,6 @@ arm_const_not_ok_for_debug_p (rtx p)
   return false;
 }
 
-static void
-arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
-{
-  const unsigned ARM_FE_INVALID = 1;
-  const unsigned ARM_FE_DIVBYZERO = 2;
-  const unsigned ARM_FE_OVERFLOW = 4;
-  const unsigned ARM_FE_UNDERFLOW = 8;
-  const unsigned ARM_FE_INEXACT = 16;
-  const unsigned HOST_WIDE_INT ARM_FE_ALL_EXCEPT = (ARM_FE_INVALID
-						    | ARM_FE_DIVBYZERO
-						    | ARM_FE_OVERFLOW
-						    | ARM_FE_UNDERFLOW
-						    | ARM_FE_INEXACT);
-  const unsigned HOST_WIDE_INT ARM_FE_EXCEPT_SHIFT = 8;
-  tree fenv_var, get_fpscr, set_fpscr, mask, ld_fenv, masked_fenv;
-  tree new_fenv_var, reload_fenv, restore_fnenv;
-  tree update_call, atomic_feraiseexcept, hold_fnclex;
-
-  if (!TARGET_VFP || !TARGET_HARD_FLOAT)
-    return;
-
-  /* Generate the equivalent of :
-       unsigned int fenv_var;
-       fenv_var = __builtin_arm_get_fpscr ();
-
-       unsigned int masked_fenv;
-       masked_fenv = fenv_var & mask;
-
-       __builtin_arm_set_fpscr (masked_fenv);  */
-
-  fenv_var = create_tmp_var (unsigned_type_node, NULL);
-  get_fpscr = arm_builtin_decls[ARM_BUILTIN_GET_FPSCR];
-  set_fpscr = arm_builtin_decls[ARM_BUILTIN_SET_FPSCR];
-  mask = build_int_cst (unsigned_type_node,
-			~((ARM_FE_ALL_EXCEPT << ARM_FE_EXCEPT_SHIFT)
-			  | ARM_FE_ALL_EXCEPT));
-  ld_fenv = build2 (MODIFY_EXPR, unsigned_type_node,
-		    fenv_var, build_call_expr (get_fpscr, 0));
-  masked_fenv = build2 (BIT_AND_EXPR, unsigned_type_node, fenv_var, mask);
-  hold_fnclex = build_call_expr (set_fpscr, 1, masked_fenv);
-  *hold = build2 (COMPOUND_EXPR, void_type_node,
-		  build2 (COMPOUND_EXPR, void_type_node, masked_fenv, ld_fenv),
-		  hold_fnclex);
-
-  /* Store the value of masked_fenv to clear the exceptions:
-     __builtin_arm_set_fpscr (masked_fenv);  */
-
-  *clear = build_call_expr (set_fpscr, 1, masked_fenv);
-
-  /* Generate the equivalent of :
-       unsigned int new_fenv_var;
-       new_fenv_var = __builtin_arm_get_fpscr ();
-
-       __builtin_arm_set_fpscr (fenv_var);
-
-       __atomic_feraiseexcept (new_fenv_var);  */
-
-  new_fenv_var = create_tmp_var (unsigned_type_node, NULL);
-  reload_fenv = build2 (MODIFY_EXPR, unsigned_type_node, new_fenv_var,
-			build_call_expr (get_fpscr, 0));
-  restore_fnenv = build_call_expr (set_fpscr, 1, fenv_var);
-  atomic_feraiseexcept = builtin_decl_implicit (BUILT_IN_ATOMIC_FERAISEEXCEPT);
-  update_call = build_call_expr (atomic_feraiseexcept, 1,
-				 fold_convert (integer_type_node, new_fenv_var));
-  *update = build2 (COMPOUND_EXPR, void_type_node,
-		    build2 (COMPOUND_EXPR, void_type_node,
-			    reload_fenv, restore_fnenv), update_call);
-}
-
 /* return TRUE if x is a reference to a value in a constant pool */
 extern bool
 arm_is_constant_pool_ref (rtx x)
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 25236a4..98a1d3b 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -95,6 +95,15 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   $(srcdir)/config/arm/arm-protos.h \
   $(srcdir)/config/arm/arm_neon_builtins.def
 
+arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
+  $(SYSTEM_H) coretypes.h $(TM_H) \
+  $(RTL_H) $(TREE_H) expr.h $(TM_P_H) $(RECOG_H) langhooks.h \
+  $(DIAGNOSTIC_CORE_H) $(OPTABS_H) \
+  $(srcdir)/config/arm/arm-protos.h \
+  $(srcdir)/config/arm/arm_neon_builtins.def
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+		$(srcdir)/config/arm/arm-builtins.c
+
 arm-c.o: $(srcdir)/config/arm/arm-c.c $(CONFIG_H) $(SYSTEM_H) \
     coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
                     ` (4 preceding siblings ...)
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file James Greenhalgh
@ 2014-11-12 17:12   ` James Greenhalgh
  2014-11-18  9:21     ` Ramana Radhakrishnan
  2014-11-12 17:32   ` [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure James Greenhalgh
  2014-11-18  9:16   ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" Ramana Radhakrishnan
  7 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]


Hi,

The poly types end up going through the default mangler, but only
sometimes.

We don't want to change the mangling for poly types with the next patch in
this series, so add a test which should pass before and after.

I've checked that the new tests pass at this stage of the patch series,
and bootstrapped on arm-none-linux-gnueabihf for good luck.

OK?

Thanks,
James

---
gcc/testsuite/

2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* g++.dg/abi/mangle-arm-crypto.C: New.
	* g++.dg/abi/mangle-neon.C (f19): New.
	(f20): Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0006-Patch-ARM-Refactor-Builtins-6-8-Add-some-tests-for-p.patch --]
[-- Type: text/x-patch;  name=0006-Patch-ARM-Refactor-Builtins-6-8-Add-some-tests-for-p.patch, Size: 1613 bytes --]

diff --git a/gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C b/gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
new file mode 100644
index 0000000..aae8847
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
@@ -0,0 +1,16 @@
+// Test that ARM NEON types used by the Cryptograpy Extensions
+// have their names mangled correctly.
+
+// { dg-do compile }
+// { dg-require-effective-target arm_crypto_ok }
+// { dg-add-options arm_neon }
+
+#include <arm_neon.h>
+
+void f0 (poly64_t a) {}
+void f1 (poly128_t a) {}
+void f2 (poly64x2_t a) {}
+
+// { dg-final { scan-assembler "_Z2f0y:" } }
+// { dg-final { scan-assembler "_Z2f1o:" } }
+// { dg-final { scan-assembler "_Z2f2Dv2_y:" } }
diff --git a/gcc/testsuite/g++.dg/abi/mangle-neon.C b/gcc/testsuite/g++.dg/abi/mangle-neon.C
index af1fe49..9fabf4d 100644
--- a/gcc/testsuite/g++.dg/abi/mangle-neon.C
+++ b/gcc/testsuite/g++.dg/abi/mangle-neon.C
@@ -28,6 +28,9 @@ void f17 (poly16x8_t a) {}
 
 void f18 (int8x16_t, int8x16_t) {}
 
+void f19 (poly8_t a) {}
+void f20 (poly16_t a) {}
+
 // { dg-final { scan-assembler "_Z2f015__simd64_int8_t:" } }
 // { dg-final { scan-assembler "_Z2f116__simd64_int16_t:" } }
 // { dg-final { scan-assembler "_Z2f216__simd64_int32_t:" } }
@@ -47,3 +50,5 @@ void f18 (int8x16_t, int8x16_t) {}
 // { dg-final { scan-assembler "_Z3f1617__simd128_poly8_t:" } }
 // { dg-final { scan-assembler "_Z3f1718__simd128_poly16_t:" } }
 // { dg-final { scan-assembler "_Z3f1816__simd128_int8_tS_:" } }
+// { dg-final { scan-assembler "_Z3f19a:" } }
+// { dg-final { scan-assembler "_Z3f20s:" } }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
                     ` (5 preceding siblings ...)
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling James Greenhalgh
@ 2014-11-12 17:32   ` James Greenhalgh
  2014-11-18  9:38     ` Ramana Radhakrishnan
  2014-11-18  9:16   ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" Ramana Radhakrishnan
  7 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2014-11-12 17:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, ramana.radhakrishnan, nickc

[-- Attachment #1: Type: text/plain, Size: 2747 bytes --]


Hi,

This final patch clears up the remaining data structures which we no
longer have any use for...

 * "_QUALIFIERS" macros which do not name a distinct pattern of
   arguments/return types.
 * The neon_builtin_type_mode enum is not needed, we can map directly to
   the machine_mode.
 * The neon_itype enum is not needed, the builtin expand functions can
   be rewritten to use the "qualifiers" data.

This gives us reasonable parity between the builtin infrastructure for
the ARM and AArch64 targets. We could go further and start sharing some
of the logic between the two back-ends (and after that the builtin
definitions, and some of arm_neon.h, etc.), but I haven't done that here
as the immediate benefit is minimal.

Bootstrapped and regression tested with no issues.

OK?

Thanks,
James

---
gcc/

2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/arm/arm-builtins.c (CONVERT_QUALIFIERS): Delete.
	(COPYSIGNF_QUALIFIERS): Likewise.
	(CREATE_QUALIFIERS): Likewise.
	(DUP_QUALIFIERS): Likewise.
	(FLOAT_WIDEN_QUALIFIERS): Likewise.
	(FLOAT_NARROW_QUALIFIERS): Likewise.
	(REINTERP_QUALIFIERS): Likewise.
	(RINT_QUALIFIERS): Likewise.
	(SPLIT_QUALIFIERS): Likewise.
	(FIXCONV_QUALIFIERS): Likewise.
	(SCALARMUL_QUALIFIERS): Likewise.
	(SCALARMULL_QUALIFIERS): Likewise.
	(SCALARMULH_QUALIFIERS): Likewise.
	(SELECT_QUALIFIERS): Likewise.
	(VTBX_QUALIFIERS): Likewise.
	(SHIFTIMM_QUALIFIERS): Likewise.
	(SCALARMAC_QUALIFIERS): Likewise.
	(LANEMUL_QUALIFIERS): Likewise.
	(LANEMULH_QUALIFIERS): Likewise.
	(LANEMULL_QUALIFIERS): Likewise.
	(SHIFTACC_QUALIFIERS): Likewise.
	(SHIFTINSERT_QUALIFIERS): Likewise.
	(VTBL_QUALIFIERS): Likewise.
	(LOADSTRUCT_QUALIFIERS): Likewise.
	(LOADSTRUCTLANE_QUALIFIERS): Likewise.
	(STORESTRUCT_QUALIFIERS): Likewise.
	(STORESTRUCTLANE_QUALIFIERS): Likewise.
	(neon_builtin_type_mode): Delete.
	(v8qi_UP): Map to V8QImode.
	(v8qi_UP): Map to V8QImode.
	(v4hi_UP): Map to V4HImode.
	(v4hf_UP): Map to V4HFmode.
	(v2si_UP): Map to V2SImode.
	(v2sf_UP): Map to V2SFmode.
	(di_UP): Map to DImode.
	(v16qi_UP): Map to V16QImode.
	(v8hi_UP): Map to V8HImode.
	(v4si_UP): Map to V4SImode.
	(v4sf_UP): Map to V4SFmode.
	(v2di_UP): Map to V2DImode.
	(ti_UP): Map to TImode.
	(ei_UP): Map to EImode.
	(oi_UP): Map to OImode.
	(neon_itype): Delete.
	(neon_builtin_datum): Remove itype, make mode a machine_mode.
	(VAR1): Update accordingly.
	(arm_init_neon_builtins): Use machine_mode directly.
	(neon_dereference_pointer): Likewise.
	(arm_expand_neon_args): Use qualifiers to decide operand types.
	(arm_expand_neon_builtin): Likewise.
	* config/arm/arm_neon_builtins.def: Remap operation type for
	many builtins.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0008-Patch-ARM-Refactor-Builtins-8-8-Neaten-up-the-ARM-Ne.patch --]
[-- Type: text/x-patch;  name=0008-Patch-ARM-Refactor-Builtins-8-8-Neaten-up-the-ARM-Ne.patch, Size: 35169 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 6f3183e..7787208 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -73,15 +73,6 @@ enum arm_type_qualifiers
 static enum arm_type_qualifiers
 arm_unop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_internal };
-#define CONVERT_QUALIFIERS (arm_unop_qualifiers)
-#define COPYSIGNF_QUALIFIERS (arm_unop_qualifiers)
-#define CREATE_QUALIFIERS (arm_unop_qualifiers)
-#define DUP_QUALIFIERS (arm_unop_qualifiers)
-#define FLOAT_WIDEN_QUALIFIERS (arm_unop_qualifiers)
-#define FLOAT_NARROW_QUALIFIERS (arm_unop_qualifiers)
-#define REINTERP_QUALIFIERS (arm_unop_qualifiers)
-#define RINT_QUALIFIERS (arm_unop_qualifiers)
-#define SPLIT_QUALIFIERS (arm_unop_qualifiers)
 #define UNOP_QUALIFIERS (arm_unop_qualifiers)
 
 /* unsigned T (unsigned T).  */
@@ -95,25 +86,18 @@ static enum arm_type_qualifiers
 arm_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_maybe_immediate };
 #define BINOP_QUALIFIERS (arm_binop_qualifiers)
-#define FIXCONV_QUALIFIERS (arm_binop_qualifiers)
-#define SCALARMUL_QUALIFIERS (arm_binop_qualifiers)
-#define SCALARMULL_QUALIFIERS (arm_binop_qualifiers)
-#define SCALARMULH_QUALIFIERS (arm_binop_qualifiers)
 
 /* T (T, T, T).  */
 static enum arm_type_qualifiers
 arm_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
 #define TERNOP_QUALIFIERS (arm_ternop_qualifiers)
-#define SELECT_QUALIFIERS (arm_ternop_qualifiers)
-#define VTBX_QUALIFIERS (arm_ternop_qualifiers)
 
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
 arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate };
 #define GETLANE_QUALIFIERS (arm_getlane_qualifiers)
-#define SHIFTIMM_QUALIFIERS (arm_getlane_qualifiers)
 
 /* T (T, T, T, immediate).  */
 static enum arm_type_qualifiers
@@ -121,32 +105,24 @@ arm_lanemac_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none,
       qualifier_none, qualifier_immediate };
 #define LANEMAC_QUALIFIERS (arm_lanemac_qualifiers)
-#define SCALARMAC_QUALIFIERS (arm_lanemac_qualifiers)
 
 /* T (T, T, immediate).  */
 static enum arm_type_qualifiers
 arm_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate };
-#define LANEMUL_QUALIFIERS (arm_setlane_qualifiers)
-#define LANEMULH_QUALIFIERS (arm_setlane_qualifiers)
-#define LANEMULL_QUALIFIERS (arm_setlane_qualifiers)
 #define SETLANE_QUALIFIERS (arm_setlane_qualifiers)
-#define SHIFTACC_QUALIFIERS (arm_setlane_qualifiers)
-#define SHIFTINSERT_QUALIFIERS (arm_setlane_qualifiers)
 
 /* T (T, T).  */
 static enum arm_type_qualifiers
 arm_combine_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none };
 #define COMBINE_QUALIFIERS (arm_combine_qualifiers)
-#define VTBL_QUALIFIERS (arm_combine_qualifiers)
 
 /* T ([T element type] *).  */
 static enum arm_type_qualifiers
 arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_const_pointer_map_mode };
 #define LOAD1_QUALIFIERS (arm_load1_qualifiers)
-#define LOADSTRUCT_QUALIFIERS (arm_load1_qualifiers)
 
 /* T ([T element type] *, T, immediate).  */
 static enum arm_type_qualifiers
@@ -154,7 +130,6 @@ arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_const_pointer_map_mode,
       qualifier_none, qualifier_immediate };
 #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
-#define LOADSTRUCTLANE_QUALIFIERS (arm_load1_lane_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
    which we represent with qualifier_void.  Their first operand will be
@@ -167,7 +142,6 @@ static enum arm_type_qualifiers
 arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer_map_mode, qualifier_none };
 #define STORE1_QUALIFIERS (arm_store1_qualifiers)
-#define STORESTRUCT_QUALIFIERS (arm_store1_qualifiers)
 
    /* void ([T element type] *, T, immediate).  */
 static enum arm_type_qualifiers
@@ -175,100 +149,27 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer_map_mode,
       qualifier_none, qualifier_immediate };
 #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers)
-#define STORESTRUCTLANE_QUALIFIERS (arm_storestruct_lane_qualifiers)
 
-typedef enum {
-  T_V8QI,
-  T_V4HI,
-  T_V4HF,
-  T_V2SI,
-  T_V2SF,
-  T_DI,
-  T_V16QI,
-  T_V8HI,
-  T_V4SI,
-  T_V4SF,
-  T_V2DI,
-  T_TI,
-  T_EI,
-  T_OI,
-  T_MAX		/* Size of enum.  Keep last.  */
-} neon_builtin_type_mode;
-
-#define TYPE_MODE_BIT(X) (1 << (X))
-
-#define TB_DREG (TYPE_MODE_BIT (T_V8QI) | TYPE_MODE_BIT (T_V4HI)	\
-		 | TYPE_MODE_BIT (T_V4HF) | TYPE_MODE_BIT (T_V2SI)	\
-		 | TYPE_MODE_BIT (T_V2SF) | TYPE_MODE_BIT (T_DI))
-#define TB_QREG (TYPE_MODE_BIT (T_V16QI) | TYPE_MODE_BIT (T_V8HI)	\
-		 | TYPE_MODE_BIT (T_V4SI) | TYPE_MODE_BIT (T_V4SF)	\
-		 | TYPE_MODE_BIT (T_V2DI) | TYPE_MODE_BIT (T_TI))
-
-#define v8qi_UP  T_V8QI
-#define v4hi_UP  T_V4HI
-#define v4hf_UP  T_V4HF
-#define v2si_UP  T_V2SI
-#define v2sf_UP  T_V2SF
-#define di_UP    T_DI
-#define v16qi_UP T_V16QI
-#define v8hi_UP  T_V8HI
-#define v4si_UP  T_V4SI
-#define v4sf_UP  T_V4SF
-#define v2di_UP  T_V2DI
-#define ti_UP	 T_TI
-#define ei_UP	 T_EI
-#define oi_UP	 T_OI
+#define v8qi_UP  V8QImode
+#define v4hi_UP  V4HImode
+#define v4hf_UP  V4HFmode
+#define v2si_UP  V2SImode
+#define v2sf_UP  V2SFmode
+#define di_UP    DImode
+#define v16qi_UP V16QImode
+#define v8hi_UP  V8HImode
+#define v4si_UP  V4SImode
+#define v4sf_UP  V4SFmode
+#define v2di_UP  V2DImode
+#define ti_UP	 TImode
+#define ei_UP	 EImode
+#define oi_UP	 OImode
 
 #define UP(X) X##_UP
 
-typedef enum {
-  NEON_BINOP,
-  NEON_TERNOP,
-  NEON_UNOP,
-  NEON_BSWAP,
-  NEON_GETLANE,
-  NEON_SETLANE,
-  NEON_CREATE,
-  NEON_RINT,
-  NEON_COPYSIGNF,
-  NEON_DUP,
-  NEON_DUPLANE,
-  NEON_COMBINE,
-  NEON_SPLIT,
-  NEON_LANEMUL,
-  NEON_LANEMULL,
-  NEON_LANEMULH,
-  NEON_LANEMAC,
-  NEON_SCALARMUL,
-  NEON_SCALARMULL,
-  NEON_SCALARMULH,
-  NEON_SCALARMAC,
-  NEON_CONVERT,
-  NEON_FLOAT_WIDEN,
-  NEON_FLOAT_NARROW,
-  NEON_FIXCONV,
-  NEON_SELECT,
-  NEON_REINTERP,
-  NEON_VTBL,
-  NEON_VTBX,
-  NEON_LOAD1,
-  NEON_LOAD1LANE,
-  NEON_STORE1,
-  NEON_STORE1LANE,
-  NEON_LOADSTRUCT,
-  NEON_LOADSTRUCTLANE,
-  NEON_STORESTRUCT,
-  NEON_STORESTRUCTLANE,
-  NEON_LOGICBINOP,
-  NEON_SHIFTINSERT,
-  NEON_SHIFTIMM,
-  NEON_SHIFTACC
-} neon_itype;
-
 typedef struct {
   const char *name;
-  const neon_itype itype;
-  const neon_builtin_type_mode mode;
+  machine_mode mode;
   const enum insn_code code;
   unsigned int fcode;
   enum arm_type_qualifiers *qualifiers;
@@ -277,7 +178,7 @@ typedef struct {
 #define CF(N,X) CODE_FOR_neon_##N##X
 
 #define VAR1(T, N, A) \
-  {#N, NEON_##T, UP (A), CF (N, A), 0, T##_QUALIFIERS},
+  {#N #A, UP (A), CF (N, A), 0, T##_QUALIFIERS},
 #define VAR2(T, N, A, B) \
   VAR1 (T, N, A) \
   VAR1 (T, N, B)
@@ -310,10 +211,8 @@ typedef struct {
    The mode entries in the following table correspond to the "key" type of the
    instruction variant, i.e. equivalent to that which would be specified after
    the assembler mnemonic, which usually refers to the last vector operand.
-   (Signed/unsigned/polynomial types are not differentiated between though, and
-   are all mapped onto the same mode for a given element size.) The modes
-   listed per instruction should be the same as those defined for that
-   instruction's pattern in neon.md.  */
+   The modes listed per instruction should be the same as those defined for
+   that instruction's pattern in neon.md.  */
 
 static neon_builtin_datum neon_builtin_data[] =
 {
@@ -980,25 +879,10 @@ arm_init_neon_builtins (void)
       bool print_type_signature_p = false;
       char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
       neon_builtin_datum *d = &neon_builtin_data[i];
-      const char *const modenames[] =
-	{
-	  "v8qi", "v4hi", "v4hf", "v2si", "v2sf", "di",
-	  "v16qi", "v8hi", "v4si", "v4sf", "v2di",
-	  "ti", "ei", "oi"
-	};
-      const enum machine_mode modes[] =
-	{
-	  V8QImode, V4HImode, V4HFmode, V2SImode, V2SFmode, DImode,
-	  V16QImode, V8HImode, V4SImode, V4SFmode, V2DImode,
-	  TImode, EImode, OImode
-	};
-
       char namebuf[60];
       tree ftype = NULL;
       tree fndecl = NULL;
 
-      gcc_assert (ARRAY_SIZE (modenames) == T_MAX);
-
       d->fcode = fcode;
 
       /* We must track two variables here.  op_num is
@@ -1046,7 +930,7 @@ arm_init_neon_builtins (void)
 	  /* Some builtins have different user-facing types
 	     for certain arguments, encoded in d->mode.  */
 	  if (qualifiers & qualifier_map_mode)
-	      op_mode = modes[d->mode];
+	      op_mode = d->mode;
 
 	  /* For pointers, we want a pointer to the basic type
 	     of the vector.  */
@@ -1080,11 +964,11 @@ arm_init_neon_builtins (void)
       gcc_assert (ftype != NULL);
 
       if (print_type_signature_p)
-	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s%s_%s",
-		  d->name, modenames[d->mode], type_signature);
+	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
+		  d->name, type_signature);
       else
-	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s%s",
-		  d->name, modenames[d->mode]);
+	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
+		  d->name);
 
       fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
 				     NULL, NULL_TREE);
@@ -2048,7 +1932,7 @@ typedef enum {
 static tree
 neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
 			  machine_mode reg_mode,
-			  neon_builtin_type_mode type_mode)
+			  machine_mode vector_mode)
 {
   HOST_WIDE_INT reg_size, vector_size, nvectors, nelems;
   tree elem_type, upper_bound, array_type;
@@ -2057,8 +1941,7 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
   reg_size = GET_MODE_SIZE (reg_mode);
 
   /* Work out the size of each vector in bytes.  */
-  gcc_assert (TYPE_MODE_BIT (type_mode) & (TB_DREG | TB_QREG));
-  vector_size = (TYPE_MODE_BIT (type_mode) & TB_QREG ? 16 : 8);
+  vector_size = GET_MODE_SIZE (vector_mode);
 
   /* Work out how many vectors there are.  */
   gcc_assert (reg_size % vector_size == 0);
@@ -2087,21 +1970,17 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
 
 /* Expand a Neon builtin.  */
 static rtx
-arm_expand_neon_args (rtx target, int icode, int have_retval,
-		      neon_builtin_type_mode type_mode,
-		      tree exp, int fcode, ...)
+arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
+		      int icode, int have_retval, tree exp, ...)
 {
   va_list ap;
   rtx pat;
-  tree arg[NEON_MAX_BUILTIN_ARGS];
-  rtx op[NEON_MAX_BUILTIN_ARGS];
-  tree arg_type;
-  tree formals;
+  tree arg[SIMD_MAX_BUILTIN_ARGS];
+  rtx op[SIMD_MAX_BUILTIN_ARGS];
   machine_mode tmode = insn_data[icode].operand[0].mode;
-  machine_mode mode[NEON_MAX_BUILTIN_ARGS];
-  machine_mode other_mode;
+  machine_mode mode[SIMD_MAX_BUILTIN_ARGS];
+  tree formals;
   int argc = 0;
-  int opno;
 
   if (have_retval
       && (!target
@@ -2109,7 +1988,7 @@ arm_expand_neon_args (rtx target, int icode, int have_retval,
 	  || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
     target = gen_reg_rtx (tmode);
 
-  va_start (ap, fcode);
+  va_start (ap, exp);
 
   formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode]));
 
@@ -2118,19 +1997,20 @@ arm_expand_neon_args (rtx target, int icode, int have_retval,
       builtin_arg thisarg = (builtin_arg) va_arg (ap, int);
 
       if (thisarg == NEON_ARG_STOP)
-        break;
+	break;
       else
-        {
-          opno = argc + have_retval;
-          mode[argc] = insn_data[icode].operand[opno].mode;
-          arg[argc] = CALL_EXPR_ARG (exp, argc);
-	  arg_type = TREE_VALUE (formals);
+	{
+	  int opno = argc + have_retval;
+	  arg[argc] = CALL_EXPR_ARG (exp, argc);
+	  mode[argc] = insn_data[icode].operand[opno].mode;
           if (thisarg == NEON_ARG_MEMORY)
             {
-              other_mode = insn_data[icode].operand[1 - opno].mode;
-              arg[argc] = neon_dereference_pointer (arg[argc], arg_type,
+              machine_mode other_mode
+		= insn_data[icode].operand[1 - opno].mode;
+              arg[argc] = neon_dereference_pointer (arg[argc],
+						    TREE_VALUE (formals),
 						    mode[argc], other_mode,
-						    type_mode);
+						    map_mode);
             }
 
 	  /* Use EXPAND_MEMORY for NEON_ARG_MEMORY to ensure a MEM_P
@@ -2139,22 +2019,23 @@ arm_expand_neon_args (rtx target, int icode, int have_retval,
 				  (thisarg == NEON_ARG_MEMORY
 				   ? EXPAND_MEMORY : EXPAND_NORMAL));
 
-          switch (thisarg)
-            {
-            case NEON_ARG_COPY_TO_REG:
-              /*gcc_assert (GET_MODE (op[argc]) == mode[argc]);*/
-              if (!(*insn_data[icode].operand[opno].predicate)
-                     (op[argc], mode[argc]))
-                op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
-              break;
-
-            case NEON_ARG_CONSTANT:
-              /* FIXME: This error message is somewhat unhelpful.  */
-              if (!(*insn_data[icode].operand[opno].predicate)
-                    (op[argc], mode[argc]))
-		error ("argument must be a constant");
-              break;
+	  switch (thisarg)
+	    {
+	    case NEON_ARG_COPY_TO_REG:
+	      if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
+		op[argc] = convert_memory_address (Pmode, op[argc]);
+	      /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
+	      if (!(*insn_data[icode].operand[opno].predicate)
+		  (op[argc], mode[argc]))
+		op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
+	      break;
 
+	    case NEON_ARG_CONSTANT:
+	      if (!(*insn_data[icode].operand[opno].predicate)
+		  (op[argc], mode[argc]))
+		error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, "
+		       "expected %<const int%>", argc + 1);
+	      break;
             case NEON_ARG_MEMORY:
 	      /* Check if expand failed.  */
 	      if (op[argc] == const0_rtx)
@@ -2166,18 +2047,17 @@ arm_expand_neon_args (rtx target, int icode, int have_retval,
 		 alias safe.  */
 	      set_mem_alias_set (op[argc], 0);
 	      if (!(*insn_data[icode].operand[opno].predicate)
-                    (op[argc], mode[argc]))
+                   (op[argc], mode[argc]))
 		op[argc] = (replace_equiv_address
 			    (op[argc], force_reg (Pmode, XEXP (op[argc], 0))));
               break;
 
-            case NEON_ARG_STOP:
-              gcc_unreachable ();
-            }
+	    case NEON_ARG_STOP:
+	      gcc_unreachable ();
+	    }
 
-          argc++;
-	  formals = TREE_CHAIN (formals);
-        }
+	  argc++;
+	}
     }
 
   va_end (ap);
@@ -2229,7 +2109,7 @@ arm_expand_neon_args (rtx target, int icode, int have_retval,
 
       case 5:
 	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
-        break;
+	break;
 
       default:
 	gcc_unreachable ();
@@ -2249,113 +2129,61 @@ arm_expand_neon_args (rtx target, int icode, int have_retval,
 static rtx
 arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 {
-  neon_builtin_datum *d = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE];
-  neon_itype itype = d->itype;
+  neon_builtin_datum *d =
+		&neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE];
   enum insn_code icode = d->code;
-  neon_builtin_type_mode type_mode = d->mode;
+  builtin_arg args[SIMD_MAX_BUILTIN_ARGS];
+  int num_args = insn_data[d->code].n_operands;
+  int is_void = 0;
+  int k;
+
+  is_void = !!(d->qualifiers[0] & qualifier_void);
 
-  switch (itype)
+  num_args += is_void;
+
+  for (k = 1; k < num_args; k++)
     {
-    case NEON_UNOP:
-    case NEON_CONVERT:
-    case NEON_DUPLANE:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_BINOP:
-    case NEON_LOGICBINOP:
-    case NEON_SCALARMUL:
-    case NEON_SCALARMULL:
-    case NEON_SCALARMULH:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_TERNOP:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_STOP);
-
-    case NEON_GETLANE:
-    case NEON_FIXCONV:
-    case NEON_SHIFTIMM:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_STOP);
-
-    case NEON_CREATE:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_DUP:
-    case NEON_RINT:
-    case NEON_SPLIT:
-    case NEON_FLOAT_WIDEN:
-    case NEON_FLOAT_NARROW:
-    case NEON_BSWAP:
-    case NEON_REINTERP:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_COPYSIGNF:
-    case NEON_COMBINE:
-    case NEON_VTBL:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_LANEMUL:
-    case NEON_LANEMULL:
-    case NEON_LANEMULH:
-    case NEON_SETLANE:
-    case NEON_SHIFTINSERT:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_STOP);
-
-    case NEON_LANEMAC:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_CONSTANT, NEON_ARG_STOP);
-
-    case NEON_SHIFTACC:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-        NEON_ARG_STOP);
-
-    case NEON_SCALARMAC:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_STOP);
-
-    case NEON_SELECT:
-    case NEON_VTBX:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
-        NEON_ARG_STOP);
-
-    case NEON_LOAD1:
-    case NEON_LOADSTRUCT:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_STOP);
-
-    case NEON_LOAD1LANE:
-    case NEON_LOADSTRUCTLANE:
-      return arm_expand_neon_args (target, icode, 1, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-	NEON_ARG_STOP);
-
-    case NEON_STORE1:
-    case NEON_STORESTRUCT:
-      return arm_expand_neon_args (target, icode, 0, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
-
-    case NEON_STORE1LANE:
-    case NEON_STORESTRUCTLANE:
-      return arm_expand_neon_args (target, icode, 0, type_mode, exp, fcode,
-	NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
-	NEON_ARG_STOP);
+      /* We have four arrays of data, each indexed in a different fashion.
+	 qualifiers - element 0 always describes the function return type.
+	 operands - element 0 is either the operand for return value (if
+	   the function has a non-void return type) or the operand for the
+	   first argument.
+	 expr_args - element 0 always holds the first argument.
+	 args - element 0 is always used for the return type.  */
+      int qualifiers_k = k;
+      int operands_k = k - is_void;
+      int expr_args_k = k - 1;
+
+      if (d->qualifiers[qualifiers_k] & qualifier_immediate)
+	args[k] = NEON_ARG_CONSTANT;
+      else if (d->qualifiers[qualifiers_k] & qualifier_maybe_immediate)
+	{
+	  rtx arg
+	    = expand_normal (CALL_EXPR_ARG (exp,
+					    (expr_args_k)));
+	  /* Handle constants only if the predicate allows it.  */
+	  bool op_const_int_p =
+	    (CONST_INT_P (arg)
+	     && (*insn_data[icode].operand[operands_k].predicate)
+		(arg, insn_data[icode].operand[operands_k].mode));
+	  args[k] = op_const_int_p ? NEON_ARG_CONSTANT : NEON_ARG_COPY_TO_REG;
+	}
+      else if (d->qualifiers[qualifiers_k] & qualifier_pointer)
+	args[k] = NEON_ARG_MEMORY;
+      else
+	args[k] = NEON_ARG_COPY_TO_REG;
     }
-
-  gcc_unreachable ();
+  args[k] = NEON_ARG_STOP;
+
+  /* The interface to arm_expand_neon_args expects a 0 if
+     the function is void, and a 1 if it is not.  */
+  return arm_expand_neon_args
+	  (target, d->mode, fcode, icode, !is_void, exp,
+	   args[1],
+	   args[2],
+	   args[3],
+	   args[4],
+	   NEON_ARG_STOP);
 }
 
 /* Expand an expression EXP that calls a built-in function,
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 88f0788..b19dc23 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -48,16 +48,16 @@ VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
 VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
-VAR2 (SCALARMULL, vmulls_n, v4hi, v2si)
-VAR2 (SCALARMULL, vmullu_n, v4hi, v2si)
-VAR2 (LANEMULL, vmulls_lane, v4hi, v2si)
-VAR2 (LANEMULL, vmullu_lane, v4hi, v2si)
-VAR2 (SCALARMULL, vqdmull_n, v4hi, v2si)
-VAR2 (LANEMULL, vqdmull_lane, v4hi, v2si)
-VAR4 (SCALARMULH, vqdmulh_n, v4hi, v2si, v8hi, v4si)
-VAR4 (SCALARMULH, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
-VAR4 (LANEMULH, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
-VAR4 (LANEMULH, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR2 (BINOP, vmulls_n, v4hi, v2si)
+VAR2 (BINOP, vmullu_n, v4hi, v2si)
+VAR2 (SETLANE, vmulls_lane, v4hi, v2si)
+VAR2 (SETLANE, vmullu_lane, v4hi, v2si)
+VAR2 (BINOP, vqdmull_n, v4hi, v2si)
+VAR2 (SETLANE, vqdmull_lane, v4hi, v2si)
+VAR4 (BINOP, vqdmulh_n, v4hi, v2si, v8hi, v4si)
+VAR4 (BINOP, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
+VAR4 (SETLANE, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (SETLANE, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
 VAR2 (BINOP, vqdmull, v4hi, v2si)
 VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
@@ -67,28 +67,28 @@ VAR8 (BINOP, vqshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vqshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vqrshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vqrshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vrshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vrshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vshrn_n, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vrshrn_n, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vqshrns_n, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vqshrnu_n, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vqrshrns_n, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vqrshrnu_n, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vqshrun_n, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vqrshrun_n, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vqshl_s_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vqshl_u_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTIMM, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR3 (SHIFTIMM, vshlls_n, v8qi, v4hi, v2si)
-VAR3 (SHIFTIMM, vshllu_n, v8qi, v4hi, v2si)
-VAR8 (SHIFTACC, vsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTACC, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTACC, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTACC, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (GETLANE, vshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (GETLANE, vshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (GETLANE, vrshrs_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (GETLANE, vrshru_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR3 (GETLANE, vshrn_n, v8hi, v4si, v2di)
+VAR3 (GETLANE, vrshrn_n, v8hi, v4si, v2di)
+VAR3 (GETLANE, vqshrns_n, v8hi, v4si, v2di)
+VAR3 (GETLANE, vqshrnu_n, v8hi, v4si, v2di)
+VAR3 (GETLANE, vqrshrns_n, v8hi, v4si, v2di)
+VAR3 (GETLANE, vqrshrnu_n, v8hi, v4si, v2di)
+VAR3 (GETLANE, vqshrun_n, v8hi, v4si, v2di)
+VAR3 (GETLANE, vqrshrun_n, v8hi, v4si, v2di)
+VAR8 (GETLANE, vshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (GETLANE, vqshl_s_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (GETLANE, vqshl_u_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (GETLANE, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR3 (GETLANE, vshlls_n, v8qi, v4hi, v2si)
+VAR3 (GETLANE, vshllu_n, v8qi, v4hi, v2si)
+VAR8 (SETLANE, vsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SETLANE, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SETLANE, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SETLANE, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR2 (BINOP, vsub, v2sf, v4sf)
 VAR3 (BINOP, vsubls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vsublu, v8qi, v4hi, v2si)
@@ -140,8 +140,8 @@ VAR6 (BINOP, vpadals, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vpadalu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vrecps, v2sf, v4sf)
 VAR2 (BINOP, vrsqrts, v2sf, v4sf)
-VAR8 (SHIFTINSERT, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-VAR8 (SHIFTINSERT, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SETLANE, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+VAR8 (SETLANE, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (UNOP, vabs, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR6 (UNOP, vqabs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR8 (UNOP, vneg, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
@@ -159,21 +159,21 @@ VAR10 (GETLANE, vget_lane,
 VAR6 (GETLANE, vget_laneu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR10 (SETLANE, vset_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
-VAR5 (CREATE, vcreate, v8qi, v4hi, v2si, v2sf, di)
-VAR10 (DUP, vdup_n,
+VAR5 (UNOP, vcreate, v8qi, v4hi, v2si, v2sf, di)
+VAR10 (UNOP, vdup_n,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (BINOP, vdup_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR5 (COMBINE, vcombine, v8qi, v4hi, v2si, v2sf, di)
-VAR5 (SPLIT, vget_high, v16qi, v8hi, v4si, v4sf, v2di)
-VAR5 (SPLIT, vget_low, v16qi, v8hi, v4si, v4sf, v2di)
+VAR5 (UNOP, vget_high, v16qi, v8hi, v4si, v4sf, v2di)
+VAR5 (UNOP, vget_low, v16qi, v8hi, v4si, v4sf, v2di)
 VAR3 (UNOP, vmovn, v8hi, v4si, v2di)
 VAR3 (UNOP, vqmovns, v8hi, v4si, v2di)
 VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di)
 VAR3 (UNOP, vqmovun, v8hi, v4si, v2di)
 VAR3 (UNOP, vmovls, v8qi, v4hi, v2si)
 VAR3 (UNOP, vmovlu, v8qi, v4hi, v2si)
-VAR6 (LANEMUL, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR6 (SETLANE, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR6 (LANEMAC, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (LANEMAC, vmlals_lane, v4hi, v2si)
 VAR2 (LANEMAC, vmlalu_lane, v4hi, v2si)
@@ -182,66 +182,66 @@ VAR6 (LANEMAC, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (LANEMAC, vmlsls_lane, v4hi, v2si)
 VAR2 (LANEMAC, vmlslu_lane, v4hi, v2si)
 VAR2 (LANEMAC, vqdmlsl_lane, v4hi, v2si)
-VAR6 (SCALARMUL, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR6 (SCALARMAC, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR2 (SCALARMAC, vmlals_n, v4hi, v2si)
-VAR2 (SCALARMAC, vmlalu_n, v4hi, v2si)
-VAR2 (SCALARMAC, vqdmlal_n, v4hi, v2si)
-VAR6 (SCALARMAC, vmls_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR2 (SCALARMAC, vmlsls_n, v4hi, v2si)
-VAR2 (SCALARMAC, vmlslu_n, v4hi, v2si)
-VAR2 (SCALARMAC, vqdmlsl_n, v4hi, v2si)
-VAR10 (SHIFTINSERT, vext,
+VAR6 (BINOP, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR6 (LANEMAC, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR2 (LANEMAC, vmlals_n, v4hi, v2si)
+VAR2 (LANEMAC, vmlalu_n, v4hi, v2si)
+VAR2 (LANEMAC, vqdmlal_n, v4hi, v2si)
+VAR6 (LANEMAC, vmls_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR2 (LANEMAC, vmlsls_n, v4hi, v2si)
+VAR2 (LANEMAC, vmlslu_n, v4hi, v2si)
+VAR2 (LANEMAC, vqdmlsl_n, v4hi, v2si)
+VAR10 (SETLANE, vext,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi)
 VAR2 (UNOP, vrev16, v8qi, v16qi)
-VAR4 (CONVERT, vcvts, v2si, v2sf, v4si, v4sf)
-VAR4 (CONVERT, vcvtu, v2si, v2sf, v4si, v4sf)
-VAR4 (FIXCONV, vcvts_n, v2si, v2sf, v4si, v4sf)
-VAR4 (FIXCONV, vcvtu_n, v2si, v2sf, v4si, v4sf)
-VAR1 (FLOAT_WIDEN, vcvtv4sf, v4hf)
-VAR1 (FLOAT_NARROW, vcvtv4hf, v4sf)
-VAR10 (SELECT, vbsl,
+VAR4 (UNOP, vcvts, v2si, v2sf, v4si, v4sf)
+VAR4 (UNOP, vcvtu, v2si, v2sf, v4si, v4sf)
+VAR4 (BINOP, vcvts_n, v2si, v2sf, v4si, v4sf)
+VAR4 (BINOP, vcvtu_n, v2si, v2sf, v4si, v4sf)
+VAR1 (UNOP, vcvtv4sf, v4hf)
+VAR1 (UNOP, vcvtv4hf, v4sf)
+VAR10 (TERNOP, vbsl,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
-VAR2 (COPYSIGNF, copysignf, v2sf, v4sf)
-VAR2 (RINT, vrintn, v2sf, v4sf)
-VAR2 (RINT, vrinta, v2sf, v4sf)
-VAR2 (RINT, vrintp, v2sf, v4sf)
-VAR2 (RINT, vrintm, v2sf, v4sf)
-VAR2 (RINT, vrintz, v2sf, v4sf)
-VAR2 (RINT, vrintx, v2sf, v4sf)
-VAR1 (RINT, vcvtav2sf, v2si)
-VAR1 (RINT, vcvtav4sf, v4si)
-VAR1 (RINT, vcvtauv2sf, v2si)
-VAR1 (RINT, vcvtauv4sf, v4si)
-VAR1 (RINT, vcvtpv2sf, v2si)
-VAR1 (RINT, vcvtpv4sf, v4si)
-VAR1 (RINT, vcvtpuv2sf, v2si)
-VAR1 (RINT, vcvtpuv4sf, v4si)
-VAR1 (RINT, vcvtmv2sf, v2si)
-VAR1 (RINT, vcvtmv4sf, v4si)
-VAR1 (RINT, vcvtmuv2sf, v2si)
-VAR1 (RINT, vcvtmuv4sf, v4si)
-VAR1 (VTBL, vtbl1, v8qi)
-VAR1 (VTBL, vtbl2, v8qi)
-VAR1 (VTBL, vtbl3, v8qi)
-VAR1 (VTBL, vtbl4, v8qi)
-VAR1 (VTBX, vtbx1, v8qi)
-VAR1 (VTBX, vtbx2, v8qi)
-VAR1 (VTBX, vtbx3, v8qi)
-VAR1 (VTBX, vtbx4, v8qi)
-VAR5 (REINTERP, vreinterpretv8qi, v8qi, v4hi, v2si, v2sf, di)
-VAR5 (REINTERP, vreinterpretv4hi, v8qi, v4hi, v2si, v2sf, di)
-VAR5 (REINTERP, vreinterpretv2si, v8qi, v4hi, v2si, v2sf, di)
-VAR5 (REINTERP, vreinterpretv2sf, v8qi, v4hi, v2si, v2sf, di)
-VAR5 (REINTERP, vreinterpretdi, v8qi, v4hi, v2si, v2sf, di)
-VAR6 (REINTERP, vreinterpretv16qi, v16qi, v8hi, v4si, v4sf, v2di, ti)
-VAR6 (REINTERP, vreinterpretv8hi, v16qi, v8hi, v4si, v4sf, v2di, ti)
-VAR6 (REINTERP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di, ti)
-VAR6 (REINTERP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di, ti)
-VAR6 (REINTERP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di, ti)
-VAR6 (REINTERP, vreinterpretti, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR2 (UNOP, copysignf, v2sf, v4sf)
+VAR2 (UNOP, vrintn, v2sf, v4sf)
+VAR2 (UNOP, vrinta, v2sf, v4sf)
+VAR2 (UNOP, vrintp, v2sf, v4sf)
+VAR2 (UNOP, vrintm, v2sf, v4sf)
+VAR2 (UNOP, vrintz, v2sf, v4sf)
+VAR2 (UNOP, vrintx, v2sf, v4sf)
+VAR1 (UNOP, vcvtav2sf, v2si)
+VAR1 (UNOP, vcvtav4sf, v4si)
+VAR1 (UNOP, vcvtauv2sf, v2si)
+VAR1 (UNOP, vcvtauv4sf, v4si)
+VAR1 (UNOP, vcvtpv2sf, v2si)
+VAR1 (UNOP, vcvtpv4sf, v4si)
+VAR1 (UNOP, vcvtpuv2sf, v2si)
+VAR1 (UNOP, vcvtpuv4sf, v4si)
+VAR1 (UNOP, vcvtmv2sf, v2si)
+VAR1 (UNOP, vcvtmv4sf, v4si)
+VAR1 (UNOP, vcvtmuv2sf, v2si)
+VAR1 (UNOP, vcvtmuv4sf, v4si)
+VAR1 (COMBINE, vtbl1, v8qi)
+VAR1 (COMBINE, vtbl2, v8qi)
+VAR1 (COMBINE, vtbl3, v8qi)
+VAR1 (COMBINE, vtbl4, v8qi)
+VAR1 (TERNOP, vtbx1, v8qi)
+VAR1 (TERNOP, vtbx2, v8qi)
+VAR1 (TERNOP, vtbx3, v8qi)
+VAR1 (TERNOP, vtbx4, v8qi)
+VAR5 (UNOP, vreinterpretv8qi, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (UNOP, vreinterpretv4hi, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (UNOP, vreinterpretv2si, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (UNOP, vreinterpretv2sf, v8qi, v4hi, v2si, v2sf, di)
+VAR5 (UNOP, vreinterpretdi, v8qi, v4hi, v2si, v2sf, di)
+VAR6 (UNOP, vreinterpretv16qi, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (UNOP, vreinterpretv8hi, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (UNOP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (UNOP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (UNOP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di, ti)
+VAR6 (UNOP, vreinterpretti, v16qi, v8hi, v4si, v4sf, v2di, ti)
 VAR10 (LOAD1, vld1,
         v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (LOAD1LANE, vld1_lane,
@@ -252,30 +252,30 @@ VAR10 (STORE1, vst1,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (STORE1LANE, vst1_lane,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
-VAR9 (LOADSTRUCT, vld2,
+VAR9 (LOAD1, vld2,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (LOADSTRUCTLANE, vld2_lane,
+VAR7 (LOAD1LANE, vld2_lane,
 	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR5 (LOADSTRUCT, vld2_dup, v8qi, v4hi, v2si, v2sf, di)
-VAR9 (STORESTRUCT, vst2,
+VAR5 (LOAD1, vld2_dup, v8qi, v4hi, v2si, v2sf, di)
+VAR9 (STORE1, vst2,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (STORESTRUCTLANE, vst2_lane,
+VAR7 (STORE1LANE, vst2_lane,
 	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR9 (LOADSTRUCT, vld3,
+VAR9 (LOAD1, vld3,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (LOADSTRUCTLANE, vld3_lane,
+VAR7 (LOAD1LANE, vld3_lane,
 	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR5 (LOADSTRUCT, vld3_dup, v8qi, v4hi, v2si, v2sf, di)
-VAR9 (STORESTRUCT, vst3,
+VAR5 (LOAD1, vld3_dup, v8qi, v4hi, v2si, v2sf, di)
+VAR9 (STORE1, vst3,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (STORESTRUCTLANE, vst3_lane,
+VAR7 (STORE1LANE, vst3_lane,
 	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR9 (LOADSTRUCT, vld4,
+VAR9 (LOAD1, vld4,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (LOADSTRUCTLANE, vld4_lane,
+VAR7 (LOAD1LANE, vld4_lane,
 	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR5 (LOADSTRUCT, vld4_dup, v8qi, v4hi, v2si, v2sf, di)
-VAR9 (STORESTRUCT, vst4,
+VAR5 (LOAD1, vld4_dup, v8qi, v4hi, v2si, v2sf, di)
+VAR9 (STORE1, vst4,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (STORESTRUCTLANE, vst4_lane,
+VAR7 (STORE1LANE, vst4_lane,
 	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ARM] Refactor Neon Builtins infrastructure
  2014-11-12 17:09 [ARM] Refactor Neon Builtins infrastructure James Greenhalgh
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
@ 2014-11-18  9:15 ` Ramana Radhakrishnan
  1 sibling, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:15 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Nick Clifton, Alan Lawrence

On Wed, Nov 12, 2014 at 5:08 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> Hi,
>
> I was taking a look at fixing the issues in the ARM back-end exposed
> by Marc Glisse's patch in [1], and hoped to fix them by adapting the
> patch recently commited by Tejas ([2]).
>
> As I looked, I realised that the ARM target and the AArch64 target
> now differ drastically in how their Advanced SIMD builtin
> initialisation and expansion logic works. This is a growing
> maintenance burden. This patch series is an attempt to start fixing
> the problem.
>
> From a high level, I see five problems with the ARM Neon builtin code.
>
> First is the "magic number" interface, which gives builtins with signed
> and unsigned, or saturating and non-saturating, variants an extra
> parameter used to control which instruction is ultimately emitted. This
> is problematic as it enforces that these intrinsics be implemented with
> an UNSPEC pattern, we would like the flexibility to try to do a better job
> of modeling these patterns.
>
> Second, is that all the code lives in arm.c. This file is huge and
> frightening. The least we could do is start to split it up!
>
> Third, is the complicated builtin initialisation code. If we collect
> common cases together from the large switch in the initialisation function,
> it is clear we can eliminate much of the existing code. In fact, we have
> already solved the same problem in AArch64 ([3]), and we don't gain
> anything from having these interfaces separate.
>
> Fourth, is that we don't have infrastructure to strongly type the functions
> in arm_neon.h -  instead casting around between signed and unsigned vector
> arguments as required. We need this to avoid special casing some builtins
> we may want to vectorize (bswap and friends). Again we've solved this
> in AArch64 ([4]).
>
> Finally, there are the issues with type mangling Marc has seen.
>
> This patch-set tries to fix those issues in order, and progresses as so:
>
> First the magic words:
>
>   [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words"
>
> Then moving code out to arm-builtins.c:
>
>   [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h
>   [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file
>
> And then making the ARM backend look like the AArch64 backend and fixing
> Marc's issue.
>
>   [Patch ARM Refactor Builtins 4/8]  Refactor "VAR<n>" Macros
>   [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in
>     ARM.
>   [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling
>   [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when
>     initialising builtins and fix type mangling
>   [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin
>     infrastructure
>
> Clearly there is more we could do to start sharing code between the two
> targets rather than duplicating it. For now, the benefit did not seem worth
> the substantial churn that this would cause both back-ends.
>
> I've bootstrapped each patch in this series in turn for both arm and
> thumb on arm-none-linux-gnueabihf.
>
> OK for trunk?

This is OK thanks.



regards
Ramana

>
> Thanks,
> James
>
> ---
> [1]: [c++] typeinfo for target types
>      https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00618.html
> [2]: [AArch64, Patch] Restructure arm_neon.h vector types's implementation
>      https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00264.html
> [3]: [AArch64] Refactor Advanced SIMD builtin initialisation.
>      https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00532.html
> [4]: [AArch64] AArch64 SIMD Builtins Better Type Correctness.
>      https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02005.html
>
> ---
>  gcc/config.gcc                               |    3 +-
>  gcc/config/arm/arm-builtins.c                | 2925 ++++++++++++++++++++++++
>  gcc/config/arm/arm-protos.h                  |  173 +-
>  gcc/config/arm/arm-simd-builtin-types.def    |   48 +
>  gcc/config/arm/arm.c                         | 3149 +-------------------------
>  gcc/config/arm/arm_neon.h                    | 1743 +++++++-------
>  gcc/config/arm/arm_neon_builtins.def         |  435 ++--
>  gcc/config/arm/iterators.md                  |  167 ++
>  gcc/config/arm/neon.md                       |  893 ++++----
>  gcc/config/arm/t-arm                         |   11 +
>  gcc/config/arm/unspecs.md                    |  109 +-
>  gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C |   16 +
>  gcc/testsuite/g++.dg/abi/mangle-neon.C       |    5 +
>  gcc/testsuite/gcc.target/arm/pr51968.c       |    2 +-
>  create mode 100644 gcc/config/arm/arm-builtins.c
>  create mode 100644 gcc/config/arm/arm-simd-builtin-types.def
>  create mode 100644 gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
>  14 files changed, 4992 insertions(+), 4687 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words"
  2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
                     ` (6 preceding siblings ...)
  2014-11-12 17:32   ` [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure James Greenhalgh
@ 2014-11-18  9:16   ` Ramana Radhakrishnan
  7 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:16 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> As part of some wider cleanup I'd like to do to ARM's Neon Builtin
> infrastructure, my first step will be to remove the "Magic Words" used
> to decide which variant of an instruction should be emitted.
>
> The "Magic Words" interface allows a single builtin
> (say, __builtin_neon_shr_nv4hi) to cover signed, unsigned and rounding
> variants through the use of an extra control parameter.
>
> This patch removes that interface, defining individual builtins for each
> variant and dropping the extra parameter.
>
> There are several benefits to cleaning this up:
>
>    * We can start to drop some of the UNSPEC operations without having to
>      add additional expand patterns to map them.
>    * The interface is confusing on first glance at the file.
>    * Having such a different interface to AArch64 doubles the amount of
>      time it takes to grok the Neon Builtins infrastructure.
>
> The drawbacks of changing this interface are:
>
>    * Another big churn change for the ARM backend.
>    * A series of new iterators, UNSPECs and builtin functions to cover the
>      variants which were previously controlled by a "Magic Word".
>    * Lots more patterns for genrecog to think about, potentially slowing
>      down compilation, increasing bootstrap time, and increasing compiler
>      binary size.
>
> On balance, I think we should deal with drawbacks in return for the future
> clean-ups we enable, but I expect this to be controversial.
>
> This patch is naieve and conservative. I don't make any effort to merge
> patterns across iterators, nor any attempt to change UNSPECs to specified
> tree codes. Future improvements in this area would be useful.
>
> I've bootstrapped the patch for arm-none-linux-gnueabihf in isolation, and
> in series.
>
> OK for trunk?

Ok,

Ramana
>
> Thanks,
> James
>
> ---
> gcc/testsuite/
>
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
>          * gcc.target/arm/pr51968.c (foo): Do not try to pass "Magic Word".
>
> gcc/
>
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
>          * config/arm/arm.c (arm_expand_neon_builtin): Remove "Magic Word"
>          parameter, rearrange switch statement accordingly.
>          (arm_evpc_neon_vrev): Remove "Magic Word".
>          * config/arm/unspecs.md (unspec): Split many UNSPECs to
>          rounding, or signed/unsigned variants.
>          * config/arm/neon.md (vcond<mode><mode>): Remove "Magic Word" code.
>          (vcondu<mode><mode>): Likewise.
>          (neon_vadd): Remove "Magic Word" operand.
>          (neon_vaddl): Remove "Magic Word" operand, convert to use
>          signed/unsigned iterator.
>          (neon_vaddw): Likewise.
>          (neon_vhadd): Likewise, also iterate over "rounding" forms.
>          (neon_vqadd): Remove "Magic Word" operand, convert to use
>          signed/unsigned iterator.
>          (neon_v<r>addhn): Remove "Magic Word" operand, convert to iterate
>          over "rounding" forms.
>          (neon_vmul): Remove "Magic Word" operand, iterate over
>          polynomial/float instruction forms.
>          (neon_vmla): Remove "Magic Word" operand.
>          (neon_vfma): Likewise.
>          (neon_vfms): Likewise.
>          (neon_vmls): Likewise.
>          (neon_vmlal): Remove "Magic Word" operand, iterate over
>          signed/unsigned forms.
>          (neon_vmlsl): Likewise.
>          (neon_vqdmulh): Remove "Magic Word" operand, iterate over "rounding"
>          forms.
>          (neon_vqdmlal): Remove "Magic Word" operand, iterate over
>          signed/unsigned forms.
>          (neon_vqdmlsl): Likewise.
>          (neon_vmull): Likewise.
>          (neon_vqdmull): Remove "Magic Word" operand.
>          (neon_vsub): Remove "Magic Word" operand.
>          (neon_vsubl): Remove "Magic Word" operand, convert to use
>          signed/unsigned iterator.
>          (neon_vsubw): Likewise.
>          (neon_vhsub): Likewise.
>          (neon_vqsub): Likewise.
>          (neon_v<r>subhn): Remove "Magic Word" operand, convert to iterate
>          over "rounding" forms.
>          (neon_vceq): Remove "Magic Word" operand.
>          (neon_vcge): Likewise.
>          (neon_vcgeu): Likewise.
>          (neon_vcgt): Likewise.
>          (neon_vcgtu): Likewise.
>          (neon_vcle): Likewise.
>          (neon_vclt): Likewise.
>          (neon_vcage): Likewise.
>          (neon_vcagt): Likewise.
>          (neon_vabd): Remove "Magic Word" operand, iterate over
>          signed/unsigned forms, and split out...
>          (neon_vabdf): ...this as new.
>          (neon_vabdl): Remove "Magic Word" operand, iterate over
>          signed/unsigned forms.
>          (neon_vaba): Likewise.
>          (neon_vmax): Remove "Magic Word" operand, iterate over
>          signed/unsigned and max/min forms, and split out...
>          (neon_v<maxmin>f): ...this as new.
>          (neon_vmin): Delete.
>          (neon_vpadd): Remove "Magic Word" operand.
>          (neon_vpaddl): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vpadal): Likewise.
>          (neon_vpmax): Remove "Magic Word" operand, iterate over
>          signed/unsigned and max/min forms, and split out...
>          (neon_vp<maxmin>f): ...this as new.
>          (neon_vpmin): Delete.
>          (neon_vrecps): Remove "Magic Word" operand.
>          (neon_vrsqrts): Likewise.
>          (neon_vabs): Likewise.
>          (neon_vqabs): Likewise.
>          (neon_vneg): Likewise.
>          (neon_vqneg): Likewise.
>          (neon_vcls): Likewise.
>          (neon_vcnt): Likewise.
>          (neon_vrecpe): Likewise.
>          (neon_vrsqrte): Likewise.
>          (neon_vmvn): Likewise.
>          (neon_vget_lane): Likewise.
>          (neon_vget_laneu): New.
>          (neon_vget_lanedi): Remove "Magic Word" operand.
>          (neon_vget_lanev2di): Likewise.
>          (neon_vcvt): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vcvt_n): Likewise.
>          (neon_vmovn): Remove "Magic Word" operand.
>          (neon_vqmovn): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vmovun): Remove "Magic Word" operand.
>          (neon_vmovl): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vmul_lane): Remove "Magic Word" operand.
>          (neon_vmull_lane): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vqdmull_lane): Remove "Magic Word" operand.
>          (neon_vqdmulh_lane): Remove "Magic Word" operand, iterate over
>          rounding variants.
>          (neon_vmla_lane): Remove "Magic Word" operand.
>          (neon_vmlal_lane): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vqdmlal_lane): Remove "Magic Word" operand.
>          (neon_vmls_lane): Likewise.
>          (neon_vmlsl_lane): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vqdmlsl_lane): Remove "Magic Word" operand.
>          (neon_vmul_n): Remove "Magic Word" operand.
>          (neon_vmull_n): Rename to...
>          (neon_vmulls_n): ...this, remove "Magic Word" operand.
>          (neon_vmullu_n): New.
>          (neon_vqdmull_n): Remove "Magic Word" operand.
>          (neon_vqdmulh_n): Likewise.
>          (neon_vqrdmulh_n): New.
>          (neon_vmla_n): Remove "Magic Word" operand.
>          (neon_vmls_n): Likewise.
>          (neon_vmlal_n): Rename to...
>          (neon_vmlals_n): ...this, remove "Magic Word" operand.
>          (neon_vmlalu_n): New.
>          (neon_vqdmlal_n): Remove "Magic Word" operand.
>          (neon_vmlsl_n): Rename to...
>          (neon_vmlsls_n): ...this, remove "Magic Word" operand.
>          (neon_vmlslu_n): New.
>          (neon_vqdmlsl_n): Remove "Magic Word" operand.
>          (neon_vrev64): Remove "Magic Word" operand.
>          (neon_vrev32): Likewise.
>          (neon_vrev16): Likewise.
>          (neon_vshl): Remove "Magic Word" operand, iterate over
>          signed/unsigned and "rounding" forms.
>          (neon_vqshl): Likewise.
>          (neon_vshr_n): Likewise.
>          (neon_vshrn_n): Remove "Magic Word" operand, iterate over
>          "rounding" forms.
>          (neon_vqshrn_n): Remove "Magic Word" operand, iterate over
>          signed/unsigned and "rounding" forms.
>          (neon_vqshrun_n): Remove "Magic Word" operand, iterate over
>          "rounding" forms.
>          (neon_vshl_n): Remove "Magic Word" operand.
>          (neon_vqshl_n): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vqshlu_n): Remove "Magic Word" operand.
>          (neon_vshll_n): Remove "Magic Word" operand, iterate over
>          signed/unsigned variants.
>          (neon_vsra_n): Remove "Magic Word" operand, iterate over
>          signed/unsigned and "rounding" forms.
>          * config/arm/iterators.md (VPF): New.
>          (VADDL): Likewise.
>          (VADDW): Likewise.
>          (VHADD): Likewise.
>          (VQADD): Likewise.
>          (VADDHN): Likewise.
>          (VMLAL): Likewise.
>          (VMLAL_LANE): Likewise.
>          (VLMSL): Likewise.
>          (VMLSL_LANE): Likewise.
>          (VQDMULH): Likewise,
>          (VQDMULH_LANE): Likewise.
>          (VMULL): Likewise.
>          (VMULL_LANE): Likewise.
>          (VSUBL): Likewise.
>          (VSUBW): Likewise.
>          (VHSUB): Likewise.
>          (VQSUB): Likewise.
>          (VSUBHN): Likewise.
>          (VABD): Likewise.
>          (VABDL): Likewise.
>          (VMAXMIN): Likewise.
>          (VMAXMINF): Likewise.
>          (VPADDL): Likewise.
>          (VPADAL): Likewise.
>          (VPMAXMIN): Likewise.
>          (VPMAXMINF): Likewise.
>          (VCVT_US): Likewise.
>          (VCVT_US_N): Likewise.
>          (VQMOVN): Likewise.
>          (VMOVL): Likewise.
>          (VSHL): Likewise.
>          (VQSHL): Likewise.
>          (VSHR_N): Likewise.
>          (VSHRN_N): Likewise.
>          (VQSHRN_N): Likewise.
>          (VQSHRUN_N): Likewise.
>          (VQSHL_N): Likewise.
>          (VSHLL_N): Likewise.
>          (VSRA_N): Likewise.
>          (pf): Likewise.
>          (sup): Likewise.
>          (r): Liekwise.
>          (maxmin): Likewise.
>          (shift_op): Likewise.
>          * config/arm/arm_neon_builtins.def (vaddl): Split to...
>          (vaddls): ...this and...
>          (vaddlu): ...this.
>          (vaddw): Split to...
>          (vaddws): ...this and...
>          (vaddwu): ...this.
>          (vhadd): Split to...
>          (vhadds): ...this and...
>          (vhaddu): ...this and...
>          (vrhadds): ...this and...
>          (vrhaddu): ...this.
>          (vqadd): Split to...
>          (vqadds): ...this and...
>          (vqaddu): ...this.
>          (vaddhn): Split to itself and...
>          (vraddhn): ...this.
>          (vmul): Split to...
>          (vmulf): ...this and...
>          (vmulp): ...this.
>          (vmlal): Split to...
>          (vmlals): ...this and...
>          (vmlalu): ...this.
>          (vmlsl): Split to...
>          (vmlsls): ...this and...
>          (vmlslu): ...this.
>          (vqdmulh): Split to itself and...
>          (vqrdmulh): ...this.
>          (vmull): Split to...
>          (vmullp): ...this and...
>          (vmulls): ...this and...
>          (vmullu): ...this.
>          (vmull_n): Split to...
>          (vmulls_n): ...this and...
>          (vmullu_n): ...this.
>          (vmull_lane): Split to...
>          (vmulls_lane): ...this and...
>          (vmullu_lane): ...this.
>          (vqdmulh_n): Split to itself and...
>          (vqrdmulh_n): ...this.
>          (vqdmulh_lane): Split to itself and...
>          (vqrdmulh_lane): ...this.
>          (vshl): Split to...
>          (vshls): ...this and...
>          (vshlu): ...this and...
>          (vrshls): ...this and...
>          (vrshlu): ...this.
>          (vqshl): Split to...
>          (vqshls): ...this and...
>          (vqrshlu): ...this and...
>          (vqrshls): ...this and...
>          (vqrshlu): ...this.
>          (vshr_n): Split to...
>          (vshrs_n): ...this and...
>          (vshru_n): ...this and...
>          (vrshrs_n): ...this and...
>          (vrshru_n): ...this.
>          (vshrn_n): Split to itself and...
>          (vrshrn_n): ...this.
>          (vqshrn_n): Split to...
>          (vqshrns_n): ...this and...
>          (vqshrnu_n): ...this and...
>          (vqrshrns_n): ...this and...
>          (vqrshrnu_n): ...this.
>          (vqshrun_n): Split to itself and...
>          (vqrshrun_n): ...this.
>          (vqshl_n): Split to...
>          (vqshl_s_n): ...this and...
>          (vqshl_u_n): ...this.
>          (vshll_n): Split to...
>          (vshlls_n): ...this and...
>          (vshllu_n): ...this.
>          (vsra_n): Split to...
>          (vsras_n): ...this and...
>          (vsrau_n): ...this and.
>          (vrsras_n): ...this and...
>          (vrsrau_n): ...this and.
>          (vsubl): Split to...
>          (vsubls): ...this and...
>          (vsublu): ...this.
>          (vsubw): Split to...
>          (vsubws): ...this and...
>          (vsubwu): ...this.
>          (vqsub): Split to...
>          (vqsubs): ...this and...
>          (vqsubu): ...this.
>          (vhsub): Split to...
>          (vhsubs): ...this and...
>          (vhsubu): ...this.
>          (vsubhn): Split to itself and...
>          (vrsubhn): ...this.
>          (vabd): Split to...
>          (vabds): ...this and...
>          (vabdu): ...this and...
>          (vabdf): ...this.
>          (vabdl): Split to...
>          (vabdls): ...this and...
>          (vabdlu): ...this.
>          (vaba): Split to...
>          (vabas): ...this and...
>          (vabau): ...this and...
>          (vabal): Split to...
>          (vabals): ...this and...
>          (vabalu): ...this.
>          (vmax): Split to...
>          (vmaxs): ...this and...
>          (vmaxu): ...this and...
>          (vmaxf): ...this.
>          (vmin): Split to...
>          (vmins): ...this and...
>          (vminu): ...this and...
>          (vminf): ...this.
>          (vpmax): Split to...
>          (vpmaxs): ...this and...
>          (vpmaxu): ...this and...
>          (vpmaxf): ...this.
>          (vpmin): Split to...
>          (vpmins): ...this and...
>          (vpminu): ...this and...
>          (vpminf): ...this.
>          (vpaddl): Split to...
>          (vpaddls): ...this and...
>          (vpaddlu): ...this.
>          (vpadal): Split to...
>          (vpadals): ...this and...
>          (vpadalu): ...this.
>          (vget_laneu): New.
>          (vqmovn): Split to...
>          (vqmovns): ...this and...
>          (vqmovnu): ...this.
>          (vmovl): Split to...
>          (vmovls): ...this and...
>          (vmovlu): ...this.
>          (vmlal_lane): Split to...
>          (vmlals_lane): ...this and...
>          (vmlalu_lane): ...this.
>          (vmlsl_lane): Split to...
>          (vmlsls_lane): ...this and...
>          (vmlslu_lane): ...this.
>          (vmlal_n): Split to...
>          (vmlals_n): ...this and...
>          (vmlalu_n): ...this.
>          (vmlsl_n): Split to...
>          (vmlsls_n): ...this and...
>          (vmlslu_n): ...this.
>          (vext): Make type "SHIFTINSERT".
>          (vcvt): Split to...
>          (vcvts): ...this and...
>          (vcvtu): ...this.
>          (vcvt_n): Split to...
>          (vcvts_n): ...this and...
>          (vcvtu_n): ...this.
>          * config/arm/arm_neon.h (vaddl_s8): Remove "Magic Word".
>          (vaddl_s16): Likewise.
>          (vaddl_s32): Likewise.
>          (vaddl_u8): Likewise.
>          (vaddl_u16): Likewise.
>          (vaddl_u32): Likewise.
>          (vaddw_s8): Likewise.
>          (vaddw_s16): Likewise.
>          (vaddw_s32): Likewise.
>          (vaddw_u8): Likewise.
>          (vaddw_u16): Likewise.
>          (vaddw_u32): Likewise.
>          (vhadd_s8): Likewise.
>          (vhadd_s16): Likewise.
>          (vhadd_s32): Likewise.
>          (vhadd_u8): Likewise.
>          (vhadd_u16): Likewise.
>          (vhadd_u32): Likewise.
>          (vhaddq_s8): Likewise.
>          (vhaddq_s16): Likewise.
>          (vhaddq_s32): Likewise.
>          (vhaddq_u8): Likewise.
>          (vhaddq_u16): Likewise.
>          (vrhadd_s8): Likewise.
>          (vrhadd_s16): Likewise.
>          (vrhadd_s32): Likewise.
>          (vrhadd_u8): Likewise.
>          (vrhadd_u16): Likewise.
>          (vrhadd_u32): Likewise.
>          (vrhaddq_s8): Likewise.
>          (vrhaddq_s16): Likewise.
>          (vrhaddq_s32): Likewise.
>          (vrhaddq_u8): Likewise.
>          (vrhaddq_u16): Likewise.
>          (vrhaddq_u32): Likewise.
>          (vqadd_s8): Likewise.
>          (vqadd_s16): Likewise.
>          (vqadd_s32): Likewise.
>          (vqadd_s64): Likewise.
>          (vqadd_u8): Likewise.
>          (vqadd_u16): Likewise.
>          (vqadd_u32): Likewise.
>          (vqadd_u64): Likewise.
>          (vqaddq_s8): Likewise.
>          (vqaddq_s16): Likewise.
>          (vqaddq_s32): Likewise.
>          (vqaddq_s64): Likewise.
>          (vqaddq_u8): Likewise.
>          (vqaddq_u16): Likewise.
>          (vqaddq_u32): Likewise.
>          (vqaddq_u64): Likewise.
>          (vaddhn_s16): Likewise.
>          (vaddhn_s32): Likewise.
>          (vaddhn_s64): Likewise.
>          (vaddhn_u16): Likewise.
>          (vaddhn_u32): Likewise.
>          (vaddhn_u64): Likewise.
>          (vraddhn_s16): Likewise.
>          (vraddhn_s32): Likewise.
>          (vraddhn_s64): Likewise.
>          (vraddhn_u16): Likewise.
>          (vraddhn_u32): Likewise.
>          (vraddhn_u64): Likewise.
>          (vmul_p8): Likewise.
>          (vmulq_p8): Likewise.
>          (vqdmulh_s16): Likewise.
>          (vqdmulh_s32): Likewise.
>          (vqdmulhq_s16): Likewise.
>          (vqdmulhq_s32): Likewise.
>          (vqrdmulh_s16): Likewise.
>          (vqrdmulh_s32): Likewise.
>          (vqrdmulhq_s16): Likewise.
>          (vqrdmulhq_s32): Likewise.
>          (vmull_s8): Likewise.
>          (vmull_s16): Likewise.
>          (vmull_s32): Likewise.
>          (vmull_u8): Likewise.
>          (vmull_u16): Likewise.
>          (vmull_u32): Likewise.
>          (vmull_p8): Likewise.
>          (vqdmull_s16): Likewise.
>          (vqdmull_s32): Likewise.
>          (vmla_s8): Likewise.
>          (vmla_s16): Likewise.
>          (vmla_s32): Likewise.
>          (vmla_f32): Likewise.
>          (vmla_u8): Likewise.
>          (vmla_u16): Likewise.
>          (vmla_u32): Likewise.
>          (vmlaq_s8): Likewise.
>          (vmlaq_s16): Likewise.
>          (vmlaq_s32): Likewise.
>          (vmlaq_f32): Likewise.
>          (vmlaq_u8): Likewise.
>          (vmlaq_u16): Likewise.
>          (vmlaq_u32): Likewise.
>          (vmlal_s8): Likewise.
>          (vmlal_s16): Likewise.
>          (vmlal_s32): Likewise.
>          (vmlal_u8): Likewise.
>          (vmlal_u16): Likewise.
>          (vmlal_u32): Likewise.
>          (vqdmlal_s16): Likewise.
>          (vqdmlal_s32): Likewise.
>          (vmls_s8): Likewise.
>          (vmls_s16): Likewise.
>          (vmls_s32): Likewise.
>          (vmls_f32): Likewise.
>          (vmls_u8): Likewise.
>          (vmls_u16): Likewise.
>          (vmls_u32): Likewise.
>          (vmlsq_s8): Likewise.
>          (vmlsq_s16): Likewise.
>          (vmlsq_s32): Likewise.
>          (vmlsq_f32): Likewise.
>          (vmlsq_u8): Likewise.
>          (vmlsq_u16): Likewise.
>          (vmlsq_u32): Likewise.
>          (vmlsl_s8): Likewise.
>          (vmlsl_s16): Likewise.
>          (vmlsl_s32): Likewise.
>          (vmlsl_u8): Likewise.
>          (vmlsl_u16): Likewise.
>          (vmlsl_u32): Likewise.
>          (vqdmlsl_s16): Likewise.
>          (vqdmlsl_s32): Likewise.
>          (vfma_f32): Likewise.
>          (vfmaq_f32): Likewise.
>          (vfms_f32): Likewise.
>          (vfmsq_f32): Likewise.
>          (vsubl_s8): Likewise.
>          (vsubl_s16): Likewise.
>          (vsubl_s32): Likewise.
>          (vsubl_u8): Likewise.
>          (vsubl_u16): Likewise.
>          (vsubl_u32): Likewise.
>          (vsubw_s8): Likewise.
>          (vsubw_s16): Likewise.
>          (vsubw_s32): Likewise.
>          (vsubw_u8): Likewise.
>          (vsubw_u16): Likewise.
>          (vsubw_u32): Likewise.
>          (vhsub_s8): Likewise.
>          (vhsub_s16): Likewise.
>          (vhsub_s32): Likewise.
>          (vhsub_u8): Likewise.
>          (vhsub_u16): Likewise.
>          (vhsub_u32): Likewise.
>          (vhsubq_s8): Likewise.
>          (vhsubq_s16): Likewise.
>          (vhsubq_s32): Likewise.
>          (vhsubq_u8): Likewise.
>          (vhsubq_u16): Likewise.
>          (vhsubq_u32): Likewise.
>          (vqsub_s8): Likewise.
>          (vqsub_s16): Likewise.
>          (vqsub_s32): Likewise.
>          (vqsub_s64): Likewise.
>          (vqsub_u8): Likewise.
>          (vqsub_u16): Likewise.
>          (vqsub_u32): Likewise.
>          (vqsub_u64): Likewise.
>          (vqsubq_s8): Likewise.
>          (vqsubq_s16): Likewise.
>          (vqsubq_s32): Likewise.
>          (vqsubq_s64): Likewise.
>          (vqsubq_u8): Likewise.
>          (vqsubq_u16): Likewise.
>          (vqsubq_u32): Likewise.
>          (vqsubq_u64): Likewise.
>          (vsubhn_s16): Likewise.
>          (vsubhn_s32): Likewise.
>          (vsubhn_s64): Likewise.
>          (vsubhn_u16): Likewise.
>          (vsubhn_u32): Likewise.
>          (vsubhn_u64): Likewise.
>          (vrsubhn_s16): Likewise.
>          (vrsubhn_s32): Likewise.
>          (vrsubhn_s64): Likewise.
>          (vrsubhn_u16): Likewise.
>          (vrsubhn_u32): Likewise.
>          (vrsubhn_u64): Likewise.
>          (vceq_s8): Likewise.
>          (vceq_s16): Likewise.
>          (vceq_s32): Likewise.
>          (vceq_f32): Likewise.
>          (vceq_u8): Likewise.
>          (vceq_u16): Likewise.
>          (vceq_u32): Likewise.
>          (vceq_p8): Likewise.
>          (vceqq_s8): Likewise.
>          (vceqq_s16): Likewise.
>          (vceqq_s32): Likewise.
>          (vceqq_f32): Likewise.
>          (vceqq_u8): Likewise.
>          (vceqq_u16): Likewise.
>          (vceqq_u32): Likewise.
>          (vceqq_p8): Likewise.
>          (vcge_s8): Likewise.
>          (vcge_s16): Likewise.
>          (vcge_s32): Likewise.
>          (vcge_f32): Likewise.
>          (vcge_u8): Likewise.
>          (vcge_u16): Likewise.
>          (vcge_u32): Likewise.
>          (vcgeq_s8): Likewise.
>          (vcgeq_s16): Likewise.
>          (vcgeq_s32): Likewise.
>          (vcgeq_f32): Likewise.
>          (vcgeq_u8): Likewise.
>          (vcgeq_u16): Likewise.
>          (vcgeq_u32): Likewise.
>          (vcle_s8): Likewise.
>          (vcle_s16): Likewise.
>          (vcle_s32): Likewise.
>          (vcle_f32): Likewise.
>          (vcle_u8): Likewise.
>          (vcle_u16): Likewise.
>          (vcle_u32): Likewise.
>          (vcleq_s8): Likewise.
>          (vcleq_s16): Likewise.
>          (vcleq_s32): Likewise.
>          (vcleq_f32): Likewise.
>          (vcleq_u8): Likewise.
>          (vcleq_u16): Likewise.
>          (vcleq_u32): Likewise.
>          (vcgt_s8): Likewise.
>          (vcgt_s16): Likewise.
>          (vcgt_s32): Likewise.
>          (vcgt_f32): Likewise.
>          (vcgt_u8): Likewise.
>          (vcgt_u16): Likewise.
>          (vcgt_u32): Likewise.
>          (vcgtq_s8): Likewise.
>          (vcgtq_s16): Likewise.
>          (vcgtq_s32): Likewise.
>          (vcgtq_f32): Likewise.
>          (vcgtq_u8): Likewise.
>          (vcgtq_u16): Likewise.
>          (vcgtq_u32): Likewise.
>          (vclt_s8): Likewise.
>          (vclt_s16): Likewise.
>          (vclt_s32): Likewise.
>          (vclt_f32): Likewise.
>          (vclt_u8): Likewise.
>          (vclt_u16): Likewise.
>          (vclt_u32): Likewise.
>          (vcltq_s8): Likewise.
>          (vcltq_s16): Likewise.
>          (vcltq_s32): Likewise.
>          (vcltq_f32): Likewise.
>          (vcltq_u8): Likewise.
>          (vcltq_u16): Likewise.
>          (vcltq_u32): Likewise.
>          (vcage_f32): Likewise.
>          (vcageq_f32): Likewise.
>          (vcale_f32): Likewise.
>          (vcaleq_f32): Likewise.
>          (vcagt_f32): Likewise.
>          (vcagtq_f32): Likewise.
>          (vcalt_f32): Likewise.
>          (vcaltq_f32): Likewise.
>          (vtst_s8): Likewise.
>          (vtst_s16): Likewise.
>          (vtst_s32): Likewise.
>          (vtst_u8): Likewise.
>          (vtst_u16): Likewise.
>          (vtst_u32): Likewise.
>          (vtst_p8): Likewise.
>          (vtstq_s8): Likewise.
>          (vtstq_s16): Likewise.
>          (vtstq_s32): Likewise.
>          (vtstq_u8): Likewise.
>          (vtstq_u16): Likewise.
>          (vtstq_u32): Likewise.
>          (vtstq_p8): Likewise.
>          (vabd_s8): Likewise.
>          (vabd_s16): Likewise.
>          (vabd_s32): Likewise.
>          (vabd_f32): Likewise.
>          (vabd_u8): Likewise.
>          (vabd_u16): Likewise.
>          (vabd_u32): Likewise.
>          (vabdq_s8): Likewise.
>          (vabdq_s16): Likewise.
>          (vabdq_s32): Likewise.
>          (vabdq_f32): Likewise.
>          (vabdq_u8): Likewise.
>          (vabdq_u16): Likewise.
>          (vabdq_u32): Likewise.
>          (vabdl_s8): Likewise.
>          (vabdl_s16): Likewise.
>          (vabdl_s32): Likewise.
>          (vabdl_u8): Likewise.
>          (vabdl_u16): Likewise.
>          (vabdl_u32): Likewise.
>          (vaba_s8): Likewise.
>          (vaba_s16): Likewise.
>          (vaba_s32): Likewise.
>          (vaba_u8): Likewise.
>          (vaba_u16): Likewise.
>          (vaba_u32): Likewise.
>          (vabaq_s8): Likewise.
>          (vabaq_s16): Likewise.
>          (vabaq_s32): Likewise.
>          (vabaq_u8): Likewise.
>          (vabaq_u16): Likewise.
>          (vabaq_u32): Likewise.
>          (vabal_s8): Likewise.
>          (vabal_s16): Likewise.
>          (vabal_s32): Likewise.
>          (vabal_u8): Likewise.
>          (vabal_u16): Likewise.
>          (vabal_u32): Likewise.
>          (vmax_s8): Likewise.
>          (vmax_s16): Likewise.
>          (vmax_s32): Likewise.
>          (vmax_f32): Likewise.
>          (vmax_u8): Likewise.
>          (vmax_u16): Likewise.
>          (vmax_u32): Likewise.
>          (vmaxq_s8): Likewise.
>          (vmaxq_s16): Likewise.
>          (vmaxq_s32): Likewise.
>          (vmaxq_f32): Likewise.
>          (vmaxq_u8): Likewise.
>          (vmaxq_u16): Likewise.
>          (vmaxq_u32): Likewise.
>          (vmin_s8): Likewise.
>          (vmin_s16): Likewise.
>          (vmin_s32): Likewise.
>          (vmin_f32): Likewise.
>          (vmin_u8): Likewise.
>          (vmin_u16): Likewise.
>          (vmin_u32): Likewise.
>          (vminq_s8): Likewise.
>          (vminq_s16): Likewise.
>          (vminq_s32): Likewise.
>          (vminq_f32): Likewise.
>          (vminq_u8): Likewise.
>          (vminq_u16): Likewise.
>          (vminq_u32): Likewise.
>          (vpadd_s8): Likewise.
>          (vpadd_s16): Likewise.
>          (vpadd_s32): Likewise.
>          (vpadd_f32): Likewise.
>          (vpadd_u8): Likewise.
>          (vpadd_u16): Likewise.
>          (vpadd_u32): Likewise.
>          (vpaddl_s8): Likewise.
>          (vpaddl_s16): Likewise.
>          (vpaddl_s32): Likewise.
>          (vpaddl_u8): Likewise.
>          (vpaddl_u16): Likewise.
>          (vpaddl_u32): Likewise.
>          (vpaddlq_s8): Likewise.
>          (vpaddlq_s16): Likewise.
>          (vpaddlq_s32): Likewise.
>          (vpaddlq_u8): Likewise.
>          (vpaddlq_u16): Likewise.
>          (vpaddlq_u32): Likewise.
>          (vpadal_s8): Likewise.
>          (vpadal_s16): Likewise.
>          (vpadal_s32): Likewise.
>          (vpadal_u8): Likewise.
>          (vpadal_u16): Likewise.
>          (vpadal_u32): Likewise.
>          (vpadalq_s8): Likewise.
>          (vpadalq_s16): Likewise.
>          (vpadalq_s32): Likewise.
>          (vpadalq_u8): Likewise.
>          (vpadalq_u16): Likewise.
>          (vpadalq_u32): Likewise.
>          (vpmax_s8): Likewise.
>          (vpmax_s16): Likewise.
>          (vpmax_s32): Likewise.
>          (vpmax_f32): Likewise.
>          (vpmax_u8): Likewise.
>          (vpmax_u16): Likewise.
>          (vpmax_u32): Likewise.
>          (vpmin_s8): Likewise.
>          (vpmin_s16): Likewise.
>          (vpmin_s32): Likewise.
>          (vpmin_f32): Likewise.
>          (vpmin_u8): Likewise.
>          (vpmin_u16): Likewise.
>          (vpmin_u32): Likewise.
>          (vrecps_f32): Likewise.
>          (vrecpsq_f32): Likewise.
>          (vrsqrts_f32): Likewise.
>          (vrsqrtsq_f32): Likewise.
>          (vshl_s8): Likewise.
>          (vshl_s16): Likewise.
>          (vshl_s32): Likewise.
>          (vshl_s64): Likewise.
>          (vshl_u8): Likewise.
>          (vshl_u16): Likewise.
>          (vshl_u32): Likewise.
>          (vshl_u64): Likewise.
>          (vshlq_s8): Likewise.
>          (vshlq_s16): Likewise.
>          (vshlq_s32): Likewise.
>          (vshlq_s64): Likewise.
>          (vshlq_u8): Likewise.
>          (vshlq_u16): Likewise.
>          (vshlq_u32): Likewise.
>          (vshlq_u64): Likewise.
>          (vrshl_s8): Likewise.
>          (vrshl_s16): Likewise.
>          (vrshl_s32): Likewise.
>          (vrshl_s64): Likewise.
>          (vrshl_u8): Likewise.
>          (vrshl_u16): Likewise.
>          (vrshl_u32): Likewise.
>          (vrshl_u64): Likewise.
>          (vrshlq_s8): Likewise.
>          (vrshlq_s16): Likewise.
>          (vrshlq_s32): Likewise.
>          (vrshlq_s64): Likewise.
>          (vrshlq_u8): Likewise.
>          (vrshlq_u16): Likewise.
>          (vrshlq_u32): Likewise.
>          (vrshlq_u64): Likewise.
>          (vqshl_s8): Likewise.
>          (vqshl_s16): Likewise.
>          (vqshl_s32): Likewise.
>          (vqshl_s64): Likewise.
>          (vqshl_u8): Likewise.
>          (vqshl_u16): Likewise.
>          (vqshl_u32): Likewise.
>          (vqshl_u64): Likewise.
>          (vqshlq_s8): Likewise.
>          (vqshlq_s16): Likewise.
>          (vqshlq_s32): Likewise.
>          (vqshlq_s64): Likewise.
>          (vqshlq_u8): Likewise.
>          (vqshlq_u16): Likewise.
>          (vqshlq_u32): Likewise.
>          (vqshlq_u64): Likewise.
>          (vqrshl_s8): Likewise.
>          (vqrshl_s16): Likewise.
>          (vqrshl_s32): Likewise.
>          (vqrshl_s64): Likewise.
>          (vqrshl_u8): Likewise.
>          (vqrshl_u16): Likewise.
>          (vqrshl_u32): Likewise.
>          (vqrshl_u64): Likewise.
>          (vqrshlq_s8): Likewise.
>          (vqrshlq_s16): Likewise.
>          (vqrshlq_s32): Likewise.
>          (vqrshlq_s64): Likewise.
>          (vqrshlq_u8): Likewise.
>          (vqrshlq_u16): Likewise.
>          (vqrshlq_u32): Likewise.
>          (vqrshlq_u64): Likewise.
>          (vshr_n_s8): Likewise.
>          (vshr_n_s16): Likewise.
>          (vshr_n_s32): Likewise.
>          (vshr_n_s64): Likewise.
>          (vshr_n_u8): Likewise.
>          (vshr_n_u16): Likewise.
>          (vshr_n_u32): Likewise.
>          (vshr_n_u64): Likewise.
>          (vshrq_n_s8): Likewise.
>          (vshrq_n_s16): Likewise.
>          (vshrq_n_s32): Likewise.
>          (vshrq_n_s64): Likewise.
>          (vshrq_n_u8): Likewise.
>          (vshrq_n_u16): Likewise.
>          (vshrq_n_u32): Likewise.
>          (vshrq_n_u64): Likewise.
>          (vrshr_n_s8): Likewise.
>          (vrshr_n_s16): Likewise.
>          (vrshr_n_s32): Likewise.
>          (vrshr_n_s64): Likewise.
>          (vrshr_n_u8): Likewise.
>          (vrshr_n_u16): Likewise.
>          (vrshr_n_u32): Likewise.
>          (vrshr_n_u64): Likewise.
>          (vrshrq_n_s8): Likewise.
>          (vrshrq_n_s16): Likewise.
>          (vrshrq_n_s32): Likewise.
>          (vrshrq_n_s64): Likewise.
>          (vrshrq_n_u8): Likewise.
>          (vrshrq_n_u16): Likewise.
>          (vrshrq_n_u32): Likewise.
>          (vrshrq_n_u64): Likewise.
>          (vshrn_n_s16): Likewise.
>          (vshrn_n_s32): Likewise.
>          (vshrn_n_s64): Likewise.
>          (vshrn_n_u16): Likewise.
>          (vshrn_n_u32): Likewise.
>          (vshrn_n_u64): Likewise.
>          (vrshrn_n_s16): Likewise.
>          (vrshrn_n_s32): Likewise.
>          (vrshrn_n_s64): Likewise.
>          (vrshrn_n_u16): Likewise.
>          (vrshrn_n_u32): Likewise.
>          (vrshrn_n_u64): Likewise.
>          (vqshrn_n_s16): Likewise.
>          (vqshrn_n_s32): Likewise.
>          (vqshrn_n_s64): Likewise.
>          (vqshrn_n_u16): Likewise.
>          (vqshrn_n_u32): Likewise.
>          (vqshrn_n_u64): Likewise.
>          (vqrshrn_n_s16): Likewise.
>          (vqrshrn_n_s32): Likewise.
>          (vqrshrn_n_s64): Likewise.
>          (vqrshrn_n_u16): Likewise.
>          (vqrshrn_n_u32): Likewise.
>          (vqrshrn_n_u64): Likewise.
>          (vqshrun_n_s16): Likewise.
>          (vqshrun_n_s32): Likewise.
>          (vqshrun_n_s64): Likewise.
>          (vqrshrun_n_s16): Likewise.
>          (vqrshrun_n_s32): Likewise.
>          (vqrshrun_n_s64): Likewise.
>          (vshl_n_s8): Likewise.
>          (vshl_n_s16): Likewise.
>          (vshl_n_s32): Likewise.
>          (vshl_n_s64): Likewise.
>          (vshl_n_u8): Likewise.
>          (vshl_n_u16): Likewise.
>          (vshl_n_u32): Likewise.
>          (vshl_n_u64): Likewise.
>          (vshlq_n_s8): Likewise.
>          (vshlq_n_s16): Likewise.
>          (vshlq_n_s32): Likewise.
>          (vshlq_n_s64): Likewise.
>          (vshlq_n_u8): Likewise.
>          (vshlq_n_u16): Likewise.
>          (vshlq_n_u32): Likewise.
>          (vshlq_n_u64): Likewise.
>          (vqshl_n_s8): Likewise.
>          (vqshl_n_s16): Likewise.
>          (vqshl_n_s32): Likewise.
>          (vqshl_n_s64): Likewise.
>          (vqshl_n_u8): Likewise.
>          (vqshl_n_u16): Likewise.
>          (vqshl_n_u32): Likewise.
>          (vqshl_n_u64): Likewise.
>          (vqshlq_n_s8): Likewise.
>          (vqshlq_n_s16): Likewise.
>          (vqshlq_n_s32): Likewise.
>          (vqshlq_n_s64): Likewise.
>          (vqshlq_n_u8): Likewise.
>          (vqshlq_n_u16): Likewise.
>          (vqshlq_n_u32): Likewise.
>          (vqshlq_n_u64): Likewise.
>          (vqshlu_n_s8): Likewise.
>          (vqshlu_n_s16): Likewise.
>          (vqshlu_n_s32): Likewise.
>          (vqshlu_n_s64): Likewise.
>          (vqshluq_n_s8): Likewise.
>          (vqshluq_n_s16): Likewise.
>          (vqshluq_n_s32): Likewise.
>          (vqshluq_n_s64): Likewise.
>          (vshll_n_s8): Likewise.
>          (vshll_n_s16): Likewise.
>          (vshll_n_s32): Likewise.
>          (vshll_n_u8): Likewise.
>          (vshll_n_u16): Likewise.
>          (vshll_n_u32): Likewise.
>          (vsra_n_s8): Likewise.
>          (vsra_n_s16): Likewise.
>          (vsra_n_s32): Likewise.
>          (vsra_n_s64): Likewise.
>          (vsra_n_u8): Likewise.
>          (vsra_n_u16): Likewise.
>          (vsra_n_u32): Likewise.
>          (vsra_n_u64): Likewise.
>          (vsraq_n_s8): Likewise.
>          (vsraq_n_s16): Likewise.
>          (vsraq_n_s32): Likewise.
>          (vsraq_n_s64): Likewise.
>          (vsraq_n_u8): Likewise.
>          (vsraq_n_u16): Likewise.
>          (vsraq_n_u32): Likewise.
>          (vsraq_n_u64): Likewise.
>          (vrsra_n_s8): Likewise.
>          (vrsra_n_s16): Likewise.
>          (vrsra_n_s32): Likewise.
>          (vrsra_n_s64): Likewise.
>          (vrsra_n_u8): Likewise.
>          (vrsra_n_u16): Likewise.
>          (vrsra_n_u32): Likewise.
>          (vrsra_n_u64): Likewise.
>          (vrsraq_n_s8): Likewise.
>          (vrsraq_n_s16): Likewise.
>          (vrsraq_n_s32): Likewise.
>          (vrsraq_n_s64): Likewise.
>          (vrsraq_n_u8): Likewise.
>          (vrsraq_n_u16): Likewise.
>          (vrsraq_n_u32): Likewise.
>          (vrsraq_n_u64): Likewise.
>          (vabs_s8): Likewise.
>          (vabs_s16): Likewise.
>          (vabs_s32): Likewise.
>          (vabs_f32): Likewise.
>          (vabsq_s8): Likewise.
>          (vabsq_s16): Likewise.
>          (vabsq_s32): Likewise.
>          (vabsq_f32): Likewise.
>          (vqabs_s8): Likewise.
>          (vqabs_s16): Likewise.
>          (vqabs_s32): Likewise.
>          (vqabsq_s8): Likewise.
>          (vqabsq_s16): Likewise.
>          (vqabsq_s32): Likewise.
>          (vneg_s8): Likewise.
>          (vneg_s16): Likewise.
>          (vneg_s32): Likewise.
>          (vneg_f32): Likewise.
>          (vnegq_s8): Likewise.
>          (vnegq_s16): Likewise.
>          (vnegq_s32): Likewise.
>          (vnegq_f32): Likewise.
>          (vqneg_s8): Likewise.
>          (vqneg_s16): Likewise.
>          (vqneg_s32): Likewise.
>          (vqnegq_s8): Likewise.
>          (vqnegq_s16): Likewise.
>          (vqnegq_s32): Likewise.
>          (vmvn_s8): Likewise.
>          (vmvn_s16): Likewise.
>          (vmvn_s32): Likewise.
>          (vmvn_u8): Likewise.
>          (vmvn_u16): Likewise.
>          (vmvn_u32): Likewise.
>          (vmvn_p8): Likewise.
>          (vmvnq_s8): Likewise.
>          (vmvnq_s16): Likewise.
>          (vmvnq_s32): Likewise.
>          (vmvnq_u8): Likewise.
>          (vmvnq_u16): Likewise.
>          (vmvnq_u32): Likewise.
>          (vmvnq_p8): Likewise.
>          (vcls_s8): Likewise.
>          (vcls_s16): Likewise.
>          (vcls_s32): Likewise.
>          (vclsq_s8): Likewise.
>          (vclsq_s16): Likewise.
>          (vclsq_s32): Likewise.
>          (vclz_s8): Likewise.
>          (vclz_s16): Likewise.
>          (vclz_s32): Likewise.
>          (vclz_u8): Likewise.
>          (vclz_u16): Likewise.
>          (vclz_u32): Likewise.
>          (vclzq_s8): Likewise.
>          (vclzq_s16): Likewise.
>          (vclzq_s32): Likewise.
>          (vclzq_u8): Likewise.
>          (vclzq_u16): Likewise.
>          (vclzq_u32): Likewise.
>          (vcnt_s8): Likewise.
>          (vcnt_u8): Likewise.
>          (vcnt_p8): Likewise.
>          (vcntq_s8): Likewise.
>          (vcntq_u8): Likewise.
>          (vcntq_p8): Likewise.
>          (vrecpe_f32): Likewise.
>          (vrecpe_u32): Likewise.
>          (vrecpeq_f32): Likewise.
>          (vrecpeq_u32): Likewise.
>          (vrsqrte_f32): Likewise.
>          (vrsqrte_u32): Likewise.
>          (vrsqrteq_f32): Likewise.
>          (vrsqrteq_u32): Likewise.
>          (vget_lane_s8): Likewise.
>          (vget_lane_s16): Likewise.
>          (vget_lane_s32): Likewise.
>          (vget_lane_f32): Likewise.
>          (vget_lane_u8): Likewise.
>          (vget_lane_u16): Likewise.
>          (vget_lane_u32): Likewise.
>          (vget_lane_p8): Likewise.
>          (vget_lane_p16): Likewise.
>          (vget_lane_s64): Likewise.
>          (vget_lane_u64): Likewise.
>          (vgetq_lane_s8): Likewise.
>          (vgetq_lane_s16): Likewise.
>          (vgetq_lane_s32): Likewise.
>          (vgetq_lane_f32): Likewise.
>          (vgetq_lane_u8): Likewise.
>          (vgetq_lane_u16): Likewise.
>          (vgetq_lane_u32): Likewise.
>          (vgetq_lane_p8): Likewise.
>          (vgetq_lane_p16): Likewise.
>          (vgetq_lane_s64): Likewise.
>          (vgetq_lane_u64): Likewise.
>          (vcvt_s32_f32): Likewise.
>          (vcvt_f32_s32): Likewise.
>          (vcvt_f32_u32): Likewise.
>          (vcvt_u32_f32): Likewise.
>          (vcvtq_s32_f32): Likewise.
>          (vcvtq_f32_s32): Likewise.
>          (vcvtq_f32_u32): Likewise.
>          (vcvtq_u32_f32): Likewise.
>          (vcvt_n_s32_f32): Likewise.
>          (vcvt_n_f32_s32): Likewise.
>          (vcvt_n_f32_u32): Likewise.
>          (vcvt_n_u32_f32): Likewise.
>          (vcvtq_n_s32_f32): Likewise.
>          (vcvtq_n_f32_s32): Likewise.
>          (vcvtq_n_f32_u32): Likewise.
>          (vcvtq_n_u32_f32): Likewise.
>          (vmovn_s16): Likewise.
>          (vmovn_s32): Likewise.
>          (vmovn_s64): Likewise.
>          (vmovn_u16): Likewise.
>          (vmovn_u32): Likewise.
>          (vmovn_u64): Likewise.
>          (vqmovn_s16): Likewise.
>          (vqmovn_s32): Likewise.
>          (vqmovn_s64): Likewise.
>          (vqmovn_u16): Likewise.
>          (vqmovn_u32): Likewise.
>          (vqmovn_u64): Likewise.
>          (vqmovun_s16): Likewise.
>          (vqmovun_s32): Likewise.
>          (vqmovun_s64): Likewise.
>          (vmovl_s8): Likewise.
>          (vmovl_s16): Likewise.
>          (vmovl_s32): Likewise.
>          (vmovl_u8): Likewise.
>          (vmovl_u16): Likewise.
>          (vmovl_u32): Likewise.
>          (vmul_lane_s16): Likewise.
>          (vmul_lane_s32): Likewise.
>          (vmul_lane_f32): Likewise.
>          (vmul_lane_u16): Likewise.
>          (vmul_lane_u32): Likewise.
>          (vmulq_lane_s16): Likewise.
>          (vmulq_lane_s32): Likewise.
>          (vmulq_lane_f32): Likewise.
>          (vmulq_lane_u16): Likewise.
>          (vmulq_lane_u32): Likewise.
>          (vmla_lane_s16): Likewise.
>          (vmla_lane_s32): Likewise.
>          (vmla_lane_f32): Likewise.
>          (vmla_lane_u16): Likewise.
>          (vmla_lane_u32): Likewise.
>          (vmlaq_lane_s16): Likewise.
>          (vmlaq_lane_s32): Likewise.
>          (vmlaq_lane_f32): Likewise.
>          (vmlaq_lane_u16): Likewise.
>          (vmlaq_lane_u32): Likewise.
>          (vmlal_lane_s16): Likewise.
>          (vmlal_lane_s32): Likewise.
>          (vmlal_lane_u16): Likewise.
>          (vmlal_lane_u32): Likewise.
>          (vqdmlal_lane_s16): Likewise.
>          (vqdmlal_lane_s32): Likewise.
>          (vmls_lane_s16): Likewise.
>          (vmls_lane_s32): Likewise.
>          (vmls_lane_f32): Likewise.
>          (vmls_lane_u16): Likewise.
>          (vmls_lane_u32): Likewise.
>          (vmlsq_lane_s16): Likewise.
>          (vmlsq_lane_s32): Likewise.
>          (vmlsq_lane_f32): Likewise.
>          (vmlsq_lane_u16): Likewise.
>          (vmlsq_lane_u32): Likewise.
>          (vmlsl_lane_s16): Likewise.
>          (vmlsl_lane_s32): Likewise.
>          (vmlsl_lane_u16): Likewise.
>          (vmlsl_lane_u32): Likewise.
>          (vqdmlsl_lane_s16): Likewise.
>          (vqdmlsl_lane_s32): Likewise.
>          (vmull_lane_s16): Likewise.
>          (vmull_lane_s32): Likewise.
>          (vmull_lane_u16): Likewise.
>          (vmull_lane_u32): Likewise.
>          (vqdmull_lane_s16): Likewise.
>          (vqdmull_lane_s32): Likewise.
>          (vqdmulhq_lane_s16): Likewise.
>          (vqdmulhq_lane_s32): Likewise.
>          (vqdmulh_lane_s16): Likewise.
>          (vqdmulh_lane_s32): Likewise.
>          (vqrdmulhq_lane_s16): Likewise.
>          (vqrdmulhq_lane_s32): Likewise.
>          (vqrdmulh_lane_s16): Likewise.
>          (vqrdmulh_lane_s32): Likewise.
>          (vmul_n_s16): Likewise.
>          (vmul_n_s32): Likewise.
>          (vmul_n_f32): Likewise.
>          (vmul_n_u16): Likewise.
>          (vmul_n_u32): Likewise.
>          (vmulq_n_s16): Likewise.
>          (vmulq_n_s32): Likewise.
>          (vmulq_n_f32): Likewise.
>          (vmulq_n_u16): Likewise.
>          (vmulq_n_u32): Likewise.
>          (vmull_n_s16): Likewise.
>          (vmull_n_s32): Likewise.
>          (vmull_n_u16): Likewise.
>          (vmull_n_u32): Likewise.
>          (vqdmull_n_s16): Likewise.
>          (vqdmull_n_s32): Likewise.
>          (vqdmulhq_n_s16): Likewise.
>          (vqdmulhq_n_s32): Likewise.
>          (vqdmulh_n_s16): Likewise.
>          (vqdmulh_n_s32): Likewise.
>          (vqrdmulhq_n_s16): Likewise.
>          (vqrdmulhq_n_s32): Likewise.
>          (vqrdmulh_n_s16): Likewise.
>          (vqrdmulh_n_s32): Likewise.
>          (vmla_n_s16): Likewise.
>          (vmla_n_s32): Likewise.
>          (vmla_n_f32): Likewise.
>          (vmla_n_u16): Likewise.
>          (vmla_n_u32): Likewise.
>          (vmlaq_n_s16): Likewise.
>          (vmlaq_n_s32): Likewise.
>          (vmlaq_n_f32): Likewise.
>          (vmlaq_n_u16): Likewise.
>          (vmlaq_n_u32): Likewise.
>          (vmlal_n_s16): Likewise.
>          (vmlal_n_s32): Likewise.
>          (vmlal_n_u16): Likewise.
>          (vmlal_n_u32): Likewise.
>          (vqdmlal_n_s16): Likewise.
>          (vqdmlal_n_s32): Likewise.
>          (vmls_n_s16): Likewise.
>          (vmls_n_s32): Likewise.
>          (vmls_n_f32): Likewise.
>          (vmls_n_u16): Likewise.
>          (vmls_n_u32): Likewise.
>          (vmlsq_n_s16): Likewise.
>          (vmlsq_n_s32): Likewise.
>          (vmlsq_n_f32): Likewise.
>          (vmlsq_n_u16): Likewise.
>          (vmlsq_n_u32): Likewise.
>          (vmlsl_n_s16): Likewise.
>          (vmlsl_n_s32): Likewise.
>          (vmlsl_n_u16): Likewise.
>          (vmlsl_n_u32): Likewise.
>          (vqdmlsl_n_s16): Likewise.
>          (vqdmlsl_n_s32): Likewise.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h James Greenhalgh
@ 2014-11-18  9:16     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:16 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> If we want to move all the code relating to "builtin" initialisation and
> expansion to a common file, we must share the processor flags with that
> common file.
>
> This patch pulls those definitions out to config/arm/arm-protos.h
>
> Bootstrapped and regression tested in series, and in isolation with no
> issues.
>
> OK?
>
> Thanks,
> James
>
Ok.

Ramana
> ---
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* config/arm/t-arm (arm.o): Include arm-protos.h in the recipe.
> 	* config/arm/arm.c (FL_CO_PROC): Move to arm-protos.h.
> 	(FL_ARCH3M): Likewise.
> 	(FL_MODE26): Likewise.
> 	(FL_MODE32): Likewise.
> 	(FL_ARCH4): Likewise.
> 	(FL_ARCH5): Likewise.
> 	(FL_THUMB): Likewise.
> 	(FL_LDSCHED): Likewise.
> 	(FL_STRONG): Likewise.
> 	(FL_ARCH5E): Likewise.
> 	(FL_XSCALE): Likewise.
> 	(FL_ARCH6): Likewise.
> 	(FL_VFPV2): Likewise.
> 	(FL_WBUF): Likewise.
> 	(FL_ARCH6K): Likewise.
> 	(FL_THUMB2): Likewise.
> 	(FL_NOTM): Likewise.
> 	(FL_THUMB_DIV): Likewise.
> 	(FL_VFPV3): Likewise.
> 	(FL_NEON): Likewise.
> 	(FL_ARCH7EM): Likewise.
> 	(FL_ARCH7): Likewise.
> 	(FL_ARM_DIV): Likewise.
> 	(FL_ARCH8): Likewise.
> 	(FL_CRC32): Likewise.
> 	(FL_SMALLMUL): Likewise.
> 	(FL_IWMMXT): Likewise.
> 	(FL_IWMMXT2): Likewise.
> 	(FL_TUNE): Likewise.
> 	(FL_FOR_ARCH2): Likewise.
> 	(FL_FOR_ARCH3): Likewise.
> 	(FL_FOR_ARCH3M): Likewise.
> 	(FL_FOR_ARCH4): Likewise.
> 	(FL_FOR_ARCH4T): Likewise.
> 	(FL_FOR_ARCH5): Likewise.
> 	(FL_FOR_ARCH5T): Likewise.
> 	(FL_FOR_ARCH5E): Likewise.
> 	(FL_FOR_ARCH5TE): Likewise.
> 	(FL_FOR_ARCH5TEJ): Likewise.
> 	(FL_FOR_ARCH6): Likewise.
> 	(FL_FOR_ARCH6J): Likewise.
> 	(FL_FOR_ARCH6K): Likewise.
> 	(FL_FOR_ARCH6Z): Likewise.
> 	(FL_FOR_ARCH6ZK): Likewise.
> 	(FL_FOR_ARCH6T2): Likewise.
> 	(FL_FOR_ARCH6M): Likewise.
> 	(FL_FOR_ARCH7): Likewise.
> 	(FL_FOR_ARCH7A): Likewise.
> 	(FL_FOR_ARCH7VE): Likewise.
> 	(FL_FOR_ARCH7R): Likewise.
> 	(FL_FOR_ARCH7M): Likewise.
> 	(FL_FOR_ARCH7EM): Likewise.
> 	(FL_FOR_ARCH8A): Likewise.
> 	* config/arm/arm-protos.h: Take definitions moved from arm.c.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch ARM Refactor Builtins 4/8]  Refactor "VAR<n>" Macros
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 4/8] Refactor "VAR<n>" Macros James Greenhalgh
@ 2014-11-18  9:17     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:17 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> These macros can always be defined as a base case of VAR1 and a "recursive"
> case of VAR<n-1>. At the moment, the body of VAR1 is duplicated to each
> macro.
>
> This patch makes that change.
>
> Regression tested on arm-none-linux-gnueabihf with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* config/arm/arm-builtins.c (VAR1): Add a comma.
> 	(VAR2): Rewrite in terms of VAR1.
> 	(VAR3-10): Likewise.
> 	(arm_builtins): Remove leading comma before ARM_BUILTIN_MAX.
> 	* config/arm/arm_neon_builtins.def: Remove trailing commas.
>

OK.

Ramana

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file James Greenhalgh
@ 2014-11-18  9:17     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:17 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> The config/arm/arm.c file has always seemed a worrying size to me.
>
> This patch pulls out the builtin related code to its own file. I think
> this will be a good idea as we move forward. It seems a more sensible
> separation of concerns. There are no functional changes here.
>
> Bootstrapped and regression tested on arm-none-linux-gnueabi, with
> no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* config.gcc (extra_objs): Add arm-builtins.o for arm*-*-*.
> 	(target_gtfiles): Add config/arm/arm-builtins.c for arm*-*-*.
> 	* config/arm/arm-builtins.c: New.
> 	* config/arm/t-arm (arm_builtins.o): New.
> 	* config/arm/arm-protos.h (arm_expand_builtin): New.
> 	(arm_builtin_decl): Likewise.
> 	(arm_init_builtins): Likewise.
> 	(arm_atomic_assign_expand_fenv): Likewise.
> 	* config/arm/arm.c (arm_atomic_assign_expand_fenv): Remove prototype.
> 	(arm_init_builtins): Likewise.
> 	(arm_init_iwmmxt_builtins): Likewise
> 	(safe_vector_operand): Likewise
> 	(arm_expand_binop_builtin): Likewise
> 	(arm_expand_unop_builtin): Likewise
> 	(arm_expand_builtin): Likewise
> 	(arm_builtin_decl): Likewise
> 	(insn_flags): Remove static.
> 	(tune_flags): Likewise.
> 	(enum arm_builtins): Move to config/arm/arm-builtins.c.
> 	(arm_init_neon_builtins): Likewise.
> 	(struct builtin_description): Likewise.
> 	(arm_init_iwmmxt_builtins): Likewise.
> 	(arm_init_fp16_builtins): Likewise.
> 	(arm_init_crc32_builtins): Likewise.
> 	(arm_init_builtins): Likewise.
> 	(arm_builtin_decl): Likewise.
> 	(safe_vector_operand): Likewise.
> 	(arm_expand_ternop_builtin): Likewise.
> 	(arm_expand_binop_builtin): Likewise.
> 	(arm_expand_unop_builtin): Likewise.
> 	(neon_dereference_pointer): Likewise.
> 	(arm_expand_neon_args): Likewise.
> 	(arm_expand_neon_builtin): Likewise.
> 	(neon_split_vcombine): Likewise.
> 	(arm_expand_builtin): Likewise.
> 	(arm_builtin_vectorized_function): Likewise.
> 	(arm_atomic_assign_expand_fenv): Likewise.
>

Ok.

Ramana

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM.
  2014-11-12 17:11   ` [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM James Greenhalgh
@ 2014-11-18  9:18     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:18 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> Now we have everything we need to start keeping track of the correct
> "qualifiers" for each Neon builtin class in the arm back-end.
>
> Some of the ARM Neon itypes are redundant when mapped to the qualifiers
> framework. For now, don't change these, we will clean them up in patch
>   8 of the series.
>
> Bootstrapped on arm-none-gnueabihf with no issues.
>
> OK?
>

OK.

Ramana
> Thanks,
> James
>
> ---
> gcc/
>
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* gcc/config/arm/arm-builtins.c	(arm_type_qualifiers): New.
> 	(neon_itype): Add new types corresponding to the types used in
> 	qualifiers names.
> 	(arm_unop_qualifiers): New.
> 	(arm_bswap_qualifiers): Likewise.
> 	(arm_binop_qualifiers): Likewise.
> 	(arm_ternop_qualifiers): Likewise.
> 	(arm_getlane_qualifiers): Likewise.
> 	(arm_lanemac_qualifiers): Likewise.
> 	(arm_setlane_qualifiers): Likewise.
> 	(arm_combine_qualifiers): Likewise.
> 	(arm_load1_qualifiers): Likewise.
> 	(arm_load1_lane_qualifiers): Likewise.
> 	(arm_store1_qualifiers): Likewise.
> 	(arm_storestruct_lane_qualifiers): Likewise.
> 	(UNOP_QUALIFIERS): Likewise.
> 	(DUP_QUALIFIERS): Likewise.
> 	(SPLIT_QUALIFIERS): Likewise.
> 	(CONVERT_QUALIFIERS): Likewise.
> 	(FLOAT_WIDEN_QUALIFIERS): Likewise.
> 	(FLOAT_NARROW_QUALIFIERS): Likewise.
> 	(RINT_QUALIFIERS): Likewise.
> 	(COPYSIGNF_QUALIFIERS): Likewise.
> 	(CREATE_QUALIFIERS): Likewise.
> 	(REINTERP_QUALIFIERS): Likewise.
> 	(BSWAP_QUALIFIERS): Likewise.
> 	(BINOP_QUALIFIERS): Likewise.
> 	(FIXCONV_QUALIFIERS): Likewise.
> 	(SCALARMUL_QUALIFIERS): Likewise.
> 	(SCALARMULL_QUALIFIERS): Likewise.
> 	(SCALARMULH_QUALIFIERS): Likewise.
> 	(TERNOP_QUALIFIERS): Likewise.
> 	(SELECT_QUALIFIERS): Likewise.
> 	(VTBX_QUALIFIERS): Likewise.
> 	(GETLANE_QUALIFIERS): Likewise.
> 	(SHIFTIMM_QUALIFIERS): Likewise.
> 	(LANEMAC_QUALIFIERS): Likewise.
> 	(SCALARMAC_QUALIFIERS): Likewise.
> 	(SETLANE_QUALIFIERS): Likewise.
> 	(SHIFTINSERT_QUALIFIERS): Likewise.
> 	(SHIFTACC_QUALIFIERS): Likewise.
> 	(LANEMUL_QUALIFIERS): Likewise.
> 	(LANEMULL_QUALIFIERS): Likewise.
> 	(LANEMULH_QUALIFIERS): Likewise.
> 	(COMBINE_QUALIFIERS): Likewise.
> 	(VTBL_QUALIFIERS): Likewise.
> 	(LOAD1_QUALIFIERS): Likewise.
> 	(LOADSTRUCT_QUALIFIERS): Likewise.
> 	(LOAD1LANE_QUALIFIERS): Likewise.
> 	(LOADSTRUCTLANE_QUALIFIERS): Likewise.
> 	(STORE1_QUALIFIERS): Likewise.
> 	(STORESTRUCT_QUALIFIERS): Likewise.
> 	(STORE1LANE_QUALIFIERS): Likewise.
> 	(STORESTRUCTLANE_QUALIFIERS): Likewise.
> 	(neon_builtin_datum): Keep track of qualifiers.
> 	(VAR1): Likewise.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling James Greenhalgh
@ 2014-11-18  9:21     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:21 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> The poly types end up going through the default mangler, but only
> sometimes.
>
> We don't want to change the mangling for poly types with the next patch in
> this series, so add a test which should pass before and after.
>
> I've checked that the new tests pass at this stage of the patch series,
> and bootstrapped on arm-none-linux-gnueabihf for good luck.
>
> OK?
>

OK.

Ramana
> Thanks,
> James
>
> ---
> gcc/testsuite/
>
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* g++.dg/abi/mangle-arm-crypto.C: New.
> 	* g++.dg/abi/mangle-neon.C (f19): New.
> 	(f20): Likewise.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling
  2014-11-12 17:12   ` [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling James Greenhalgh
@ 2014-11-18  9:30     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:30 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> This patch wires up builtin initialisation similar to the AArch64 backend,
> making use of the "qualifiers" arrays to decide on types for each builtin
> we hope to initialise.
>
> We could take an old snapshot of the qualifiers code from AArch64, but as
> our end-goal is to pull in the type mangling changes, we are as well to do
> that now. In order to preserve the old mangling rules after this patch, we
> must wire all of these types up.
>
> Together, this becomes a fairly simple side-port of the logic for
> Advanced SIMD builtins from the AArch64 target.
>
> Bootstrapped on arm-none-linux-gnueabihf with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---

OK.

Ramana

> gcc/
>
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* config/arm/arm-builtins.c (arm_scalar_builtin_types): New.
> 	(enum arm_simd_type): Likewise.
> 	(struct arm_simd_type_info): Likewise
> 	(arm_mangle_builtin_scalar_type): Likewise.
> 	(arm_mangle_builtin_vector_type): Likewise.
> 	(arm_mangle_builtin_type): Likewise.
> 	(arm_simd_builtin_std_type): Likewise.
> 	(arm_lookup_simd_builtin_type): Likewise.
> 	(arm_simd_builtin_type): Likewise.
> 	(arm_init_simd_builtin_types): Likewise.
> 	(arm_init_simd_builtin_scalar_types): Likewise.
> 	(arm_init_neon_builtins): Rewrite using qualifiers.
> 	* config/arm/arm-protos.h (arm_mangle_builtin_type): New.
> 	* config/arm/arm-simd-builtin-types.def: New file.
> 	* config/arm/t-arm (arm-builtins.o): Depend on it.
> 	* config/arm/arm.c (arm_mangle_type): Call arm_mangle_builtin_type.
> 	* config/arm/arm_neon.h (int8x8_t): Use new internal type.
> 	(int16x4_t): Likewise.
> 	(int32x2_t): Likewise.
> 	(float16x4_t): Likewise.
> 	(float32x2_t): Likewise.
> 	(poly8x8_t): Likewise.
> 	(poly16x4_t): Likewise.
> 	(uint8x8_t): Likewise.
> 	(uint16x4_t): Likewise.
> 	(uint32x2_t): Likewise.
> 	(int8x16_t): Likewise.
> 	(int16x8_t): Likewise.
> 	(int32x4_t): Likewise.
> 	(int64x2_t): Likewise.
> 	(float32x4_t): Likewise.
> 	(poly8x16_t): Likewise.
> 	(poly16x8_t): Likewise.
> 	(uint8x16_t): Likewise.
> 	(uint16x8_t): Likewise.
> 	(uint32x4_t): Likewise.
> 	(uint64x2_t): Likewise.
>
> Conflicts:
> 	gcc/config/arm/arm.c
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure
  2014-11-12 17:32   ` [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure James Greenhalgh
@ 2014-11-18  9:38     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-18  9:38 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: Richard Earnshaw, nickc



On 12/11/14 17:10, James Greenhalgh wrote:
>
> Hi,
>
> This final patch clears up the remaining data structures which we no
> longer have any use for...
>
>   * "_QUALIFIERS" macros which do not name a distinct pattern of
>     arguments/return types.
>   * The neon_builtin_type_mode enum is not needed, we can map directly to
>     the machine_mode.
>   * The neon_itype enum is not needed, the builtin expand functions can
>     be rewritten to use the "qualifiers" data.
>
> This gives us reasonable parity between the builtin infrastructure for
> the ARM and AArch64 targets. We could go further and start sharing some
> of the logic between the two back-ends (and after that the builtin
> definitions, and some of arm_neon.h, etc.), but I haven't done that here
> as the immediate benefit is minimal.
>
> Bootstrapped and regression tested with no issues.
>
> OK?

OK.

Ramana
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2014-11-12  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* config/arm/arm-builtins.c (CONVERT_QUALIFIERS): Delete.
> 	(COPYSIGNF_QUALIFIERS): Likewise.
> 	(CREATE_QUALIFIERS): Likewise.
> 	(DUP_QUALIFIERS): Likewise.
> 	(FLOAT_WIDEN_QUALIFIERS): Likewise.
> 	(FLOAT_NARROW_QUALIFIERS): Likewise.
> 	(REINTERP_QUALIFIERS): Likewise.
> 	(RINT_QUALIFIERS): Likewise.
> 	(SPLIT_QUALIFIERS): Likewise.
> 	(FIXCONV_QUALIFIERS): Likewise.
> 	(SCALARMUL_QUALIFIERS): Likewise.
> 	(SCALARMULL_QUALIFIERS): Likewise.
> 	(SCALARMULH_QUALIFIERS): Likewise.
> 	(SELECT_QUALIFIERS): Likewise.
> 	(VTBX_QUALIFIERS): Likewise.
> 	(SHIFTIMM_QUALIFIERS): Likewise.
> 	(SCALARMAC_QUALIFIERS): Likewise.
> 	(LANEMUL_QUALIFIERS): Likewise.
> 	(LANEMULH_QUALIFIERS): Likewise.
> 	(LANEMULL_QUALIFIERS): Likewise.
> 	(SHIFTACC_QUALIFIERS): Likewise.
> 	(SHIFTINSERT_QUALIFIERS): Likewise.
> 	(VTBL_QUALIFIERS): Likewise.
> 	(LOADSTRUCT_QUALIFIERS): Likewise.
> 	(LOADSTRUCTLANE_QUALIFIERS): Likewise.
> 	(STORESTRUCT_QUALIFIERS): Likewise.
> 	(STORESTRUCTLANE_QUALIFIERS): Likewise.
> 	(neon_builtin_type_mode): Delete.
> 	(v8qi_UP): Map to V8QImode.
> 	(v8qi_UP): Map to V8QImode.
> 	(v4hi_UP): Map to V4HImode.
> 	(v4hf_UP): Map to V4HFmode.
> 	(v2si_UP): Map to V2SImode.
> 	(v2sf_UP): Map to V2SFmode.
> 	(di_UP): Map to DImode.
> 	(v16qi_UP): Map to V16QImode.
> 	(v8hi_UP): Map to V8HImode.
> 	(v4si_UP): Map to V4SImode.
> 	(v4sf_UP): Map to V4SFmode.
> 	(v2di_UP): Map to V2DImode.
> 	(ti_UP): Map to TImode.
> 	(ei_UP): Map to EImode.
> 	(oi_UP): Map to OImode.
> 	(neon_itype): Delete.
> 	(neon_builtin_datum): Remove itype, make mode a machine_mode.
> 	(VAR1): Update accordingly.
> 	(arm_init_neon_builtins): Use machine_mode directly.
> 	(neon_dereference_pointer): Likewise.
> 	(arm_expand_neon_args): Use qualifiers to decide operand types.
> 	(arm_expand_neon_builtin): Likewise.
> 	* config/arm/arm_neon_builtins.def: Remap operation type for
> 	many builtins.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-11-18  9:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-12 17:09 [ARM] Refactor Neon Builtins infrastructure James Greenhalgh
2014-11-12 17:11 ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" James Greenhalgh
2014-11-12 17:11   ` [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM James Greenhalgh
2014-11-18  9:18     ` Ramana Radhakrishnan
2014-11-12 17:11   ` [Patch ARM Refactor Builtins 4/8] Refactor "VAR<n>" Macros James Greenhalgh
2014-11-18  9:17     ` Ramana Radhakrishnan
2014-11-12 17:11   ` [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h James Greenhalgh
2014-11-18  9:16     ` Ramana Radhakrishnan
2014-11-12 17:12   ` [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling James Greenhalgh
2014-11-18  9:30     ` Ramana Radhakrishnan
2014-11-12 17:12   ` [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file James Greenhalgh
2014-11-18  9:17     ` Ramana Radhakrishnan
2014-11-12 17:12   ` [Patch ARM Refactor Builtins 6/8] Add some tests for "poly" mangling James Greenhalgh
2014-11-18  9:21     ` Ramana Radhakrishnan
2014-11-12 17:32   ` [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure James Greenhalgh
2014-11-18  9:38     ` Ramana Radhakrishnan
2014-11-18  9:16   ` [Refactor Builtins: 1/8] Remove arm_neon.h's "Magic Words" Ramana Radhakrishnan
2014-11-18  9:15 ` [ARM] Refactor Neon Builtins infrastructure Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).