From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7569 invoked by alias); 6 Jun 2012 12:11:34 -0000 Received: (qmail 7540 invoked by uid 22791); 6 Jun 2012 12:11:26 -0000 X-SWARE-Spam-Status: No, hits=-3.1 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_TW,TW_WC X-Spam-Check-By: sourceware.org Received: from mail-we0-f175.google.com (HELO mail-we0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 06 Jun 2012 12:11:12 +0000 Received: by werg55 with SMTP id g55so4784977wer.20 for ; Wed, 06 Jun 2012 05:11:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=T3kRwU2GAw2l0nGQTPuax/Odpft3U9y4DIiNzgRCr7I=; b=GU2cSBsjHRKgrqWfVaMUsnLTJ1ADyIjld4HgsZrEU0aPOgsFsa+IMnc6bvwTdCnGnq w/R7MNP9ePmYMu1viZzMgpuBoZNlS4dSgBqs5eajcVEqASCn3p/uPptwdiE6lRsNnvUV rwQSZFV7VCrPrEwccMppN7o5TLvFjTUr1WruDBs2rlWgKPDFKzVpQW0hWf66FVyVr3UQ IvBtgHuWVA6RD1wX9T78pY6W7yakjeD5yPUn/PiG0rO9V7MDCrqS18Ez5ZHdxQMTphMW 8G/yNWfl3x4PeqNF7P/ioKR1N0l/xJ2Mrq/fSu3FiO5DohVJXBR7rKMjtS8wKPUvyghO gDbg== MIME-Version: 1.0 Received: by 10.216.226.233 with SMTP id b83mr16327903weq.204.1338984670557; Wed, 06 Jun 2012 05:11:10 -0700 (PDT) Received: by 10.216.132.97 with HTTP; Wed, 6 Jun 2012 05:11:10 -0700 (PDT) In-Reply-To: <1338264799-12374-3-git-send-email-mattst88@gmail.com> References: <1338264799-12374-1-git-send-email-mattst88@gmail.com> <1338264799-12374-3-git-send-email-mattst88@gmail.com> Date: Wed, 06 Jun 2012 12:22:00 -0000 Message-ID: Subject: Re: [PATCH ARM iWMMXt 2/5] intrinsic head file change From: Ramana Radhakrishnan To: Matt Turner Cc: gcc-patches@gcc.gnu.org, Ramana Radhakrishnan , Richard Earnshaw , Nick Clifton , Paul Brook , Xinyu Qi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQk+ZOzwFX0QyFG0DUkYzNigNf7zggmUMXQDJIA5qkgxa8/MtywHSmJCjSnsiS8cfmaePsVu X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-06/txt/msg00366.txt.bz2 I've only had a brief look at this and point out certain stylistic issues that I noticed and would like another set of eyes on this and the next patch. On 29 May 2012 05:13, Matt Turner wrote: > From: Xinyu Qi > > =A0 =A0 =A0 =A0gcc/ > =A0 =A0 =A0 =A0* config/arm/mmintrin.h: Use __IWMMXT__ to enable iWMMXt i= ntrinsics. > =A0 =A0 =A0 =A0Use __IWMMXT2__ to enable iWMMXt2 intrinsics. > =A0 =A0 =A0 =A0Use C name-mangling for intrinsics. > =A0 =A0 =A0 =A0(__v8qi): Redefine. > =A0 =A0 =A0 =A0(_mm_cvtsi32_si64, _mm_andnot_si64, _mm_sad_pu8): Revise. > =A0 =A0 =A0 =A0(_mm_sad_pu16, _mm_align_si64, _mm_setwcx, _mm_getwcx): Li= kewise. > =A0 =A0 =A0 =A0(_m_from_int): Likewise. > =A0 =A0 =A0 =A0(_mm_sada_pu8, _mm_sada_pu16): New intrinsic. > =A0 =A0 =A0 =A0(_mm_alignr0_si64, _mm_alignr1_si64, _mm_alignr2_si64): Li= kewise. > =A0 =A0 =A0 =A0(_mm_alignr3_si64, _mm_tandcb, _mm_tandch, _mm_tandcw): Li= kewise. > =A0 =A0 =A0 =A0(_mm_textrcb, _mm_textrch, _mm_textrcw, _mm_torcb): Likewi= se. > =A0 =A0 =A0 =A0(_mm_torch, _mm_torcw, _mm_tbcst_pi8, _mm_tbcst_pi16): Lik= ewise. > =A0 =A0 =A0 =A0(_mm_tbcst_pi32): Likewise. > =A0 =A0 =A0 =A0(_mm_abs_pi8, _mm_abs_pi16, _mm_abs_pi32): New iWMMXt2 int= rinsic. > =A0 =A0 =A0 =A0(_mm_addsubhx_pi16, _mm_absdiff_pu8, _mm_absdiff_pu16): Li= kewise. > =A0 =A0 =A0 =A0(_mm_absdiff_pu32, _mm_addc_pu16, _mm_addc_pu32): Likewise. > =A0 =A0 =A0 =A0(_mm_avg4_pu8, _mm_avg4r_pu8, _mm_maddx_pi16, _mm_maddx_pu= 16): Likewise. > =A0 =A0 =A0 =A0(_mm_msub_pi16, _mm_msub_pu16, _mm_mulhi_pi32): Likewise. > =A0 =A0 =A0 =A0(_mm_mulhi_pu32, _mm_mulhir_pi16, _mm_mulhir_pi32): Likewi= se. > =A0 =A0 =A0 =A0(_mm_mulhir_pu16, _mm_mulhir_pu32, _mm_mullo_pi32): Likewi= se. > =A0 =A0 =A0 =A0(_mm_qmulm_pi16, _mm_qmulm_pi32, _mm_qmulmr_pi16): Likewis= e. > =A0 =A0 =A0 =A0(_mm_qmulmr_pi32, _mm_subaddhx_pi16, _mm_addbhusl_pu8): Li= kewise. > =A0 =A0 =A0 =A0(_mm_addbhusm_pu8, _mm_qmiabb_pi32, _mm_qmiabbn_pi32): Lik= ewise. > =A0 =A0 =A0 =A0(_mm_qmiabt_pi32, _mm_qmiabtn_pi32, _mm_qmiatb_pi32): Like= wise. > =A0 =A0 =A0 =A0(_mm_qmiatbn_pi32, _mm_qmiatt_pi32, _mm_qmiattn_pi32): Lik= ewise. > =A0 =A0 =A0 =A0(_mm_wmiabb_si64, _mm_wmiabbn_si64, _mm_wmiabt_si64): Like= wise. > =A0 =A0 =A0 =A0(_mm_wmiabtn_si64, _mm_wmiatb_si64, _mm_wmiatbn_si64): Lik= ewise. > =A0 =A0 =A0 =A0(_mm_wmiatt_si64, _mm_wmiattn_si64, _mm_wmiawbb_si64): Lik= ewise. > =A0 =A0 =A0 =A0(_mm_wmiawbbn_si64, _mm_wmiawbt_si64, _mm_wmiawbtn_si64): = Likewise. > =A0 =A0 =A0 =A0(_mm_wmiawtb_si64, _mm_wmiawtbn_si64, _mm_wmiawtt_si64): L= ikewise. > =A0 =A0 =A0 =A0(_mm_wmiawttn_si64, _mm_merge_si64): Likewise. > =A0 =A0 =A0 =A0(_mm_torvscb, _mm_torvsch, _mm_torvscw): Likewise. > =A0 =A0 =A0 =A0(_m_to_int): New define. > --- > =A0gcc/config/arm/mmintrin.h | =A0649 +++++++++++++++++++++++++++++++++++= +++++++--- > =A01 files changed, 614 insertions(+), 35 deletions(-) > > diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h > index 2cc500d..0fe551d 100644 > --- a/gcc/config/arm/mmintrin.h > +++ b/gcc/config/arm/mmintrin.h > @@ -24,16 +24,30 @@ > =A0#ifndef _MMINTRIN_H_INCLUDED > =A0#define _MMINTRIN_H_INCLUDED > > +#ifndef __IWMMXT__ > +#error You must enable WMMX/WMMX2 instructions (e.g. -march=3Diwmmxt or = -march=3Diwmmxt2) to use iWMMXt/iWMMXt2 intrinsics > +#else > + > +#ifndef __IWMMXT2__ > +#warning You only enable iWMMXt intrinsics. Extended iWMMXt2 intrinsics = available only if WMMX2 instructions enabled (e.g. -march=3Diwmmxt2) > +#endif > + Extra newline. > + > +#if defined __cplusplus > +extern "C" { /* Begin "C" */ > +/* Intrinsics use C name-mangling. =A0*/ > +#endif /* __cplusplus */ > + > =A0/* The data type intended for user use. =A0*/ > =A0typedef unsigned long long __m64, __int64; > > =A0/* Internal data types for implementing the intrinsics. =A0*/ > =A0typedef int __v2si __attribute__ ((vector_size (8))); > =A0typedef short __v4hi __attribute__ ((vector_size (8))); > -typedef char __v8qi __attribute__ ((vector_size (8))); > +typedef signed char __v8qi __attribute__ ((vector_size (8))); > > =A0/* "Convert" __m64 and __int64 into each other. =A0*/ > -static __inline __m64 > +static __inline __m64 > =A0_mm_cvtsi64_m64 (__int64 __i) > =A0{ > =A0 return __i; > @@ -54,7 +68,7 @@ _mm_cvtsi64_si32 (__int64 __i) > =A0static __inline __int64 > =A0_mm_cvtsi32_si64 (int __i) > =A0{ > - =A0return __i; > + =A0return (__i & 0xffffffff); > =A0} > > =A0/* Pack the four 16-bit values from M1 into the lower four 8-bit value= s of > @@ -603,7 +617,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2) > =A0static __inline __m64 > =A0_mm_andnot_si64 (__m64 __m1, __m64 __m2) > =A0{ > - =A0return __builtin_arm_wandn (__m1, __m2); > + =A0return __builtin_arm_wandn (__m2, __m1); > =A0} > > =A0/* Bit-wise inclusive OR the 64-bit values in M1 and M2. =A0*/ > @@ -935,7 +949,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B) > =A0static __inline __m64 > =A0_mm_sad_pu8 (__m64 __A, __m64 __B) > =A0{ > - =A0return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B); > + =A0return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B); > +} > + > +static __inline __m64 > +_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C) > +{ > + =A0return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8q= i)__C); > =A0} > > =A0/* Compute the sum of the absolute differences of the unsigned 16-bit > @@ -944,9 +964,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B) > =A0static __inline __m64 > =A0_mm_sad_pu16 (__m64 __A, __m64 __B) > =A0{ > - =A0return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B); > + =A0return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B); > =A0} > > +static __inline __m64 > +_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C) > +{ > + =A0return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4h= i)__C); > +} > + > + > =A0/* Compute the sum of the absolute differences of the unsigned 8-bit > =A0 =A0values in A and B. =A0Return the value in the lower 16-bit word; t= he > =A0 =A0upper words are cleared. =A0*/ > @@ -965,11 +992,8 @@ _mm_sadz_pu16 (__m64 __A, __m64 __B) > =A0 return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B); > =A0} > > -static __inline __m64 > -_mm_align_si64 (__m64 __A, __m64 __B, int __C) > -{ > - =A0return (__m64) __builtin_arm_walign ((__v8qi)__A, (__v8qi)__B, __C); > -} > +#define _mm_align_si64(__A,__B, N) \ > + =A0(__m64) __builtin_arm_walign ((__v8qi) (__A),(__v8qi) (__B), (N)) > > =A0/* Creates a 64-bit zero. =A0*/ > =A0static __inline __m64 > @@ -987,42 +1011,76 @@ _mm_setwcx (const int __value, const int __regno) > =A0{ > =A0 switch (__regno) > =A0 =A0 { > - =A0 =A0case 0: =A0__builtin_arm_setwcx (__value, 0); break; > - =A0 =A0case 1: =A0__builtin_arm_setwcx (__value, 1); break; > - =A0 =A0case 2: =A0__builtin_arm_setwcx (__value, 2); break; > - =A0 =A0case 3: =A0__builtin_arm_setwcx (__value, 3); break; > - =A0 =A0case 8: =A0__builtin_arm_setwcx (__value, 8); break; > - =A0 =A0case 9: =A0__builtin_arm_setwcx (__value, 9); break; > - =A0 =A0case 10: __builtin_arm_setwcx (__value, 10); break; > - =A0 =A0case 11: __builtin_arm_setwcx (__value, 11); break; > - =A0 =A0default: break; > + =A0 =A0case 0: > + =A0 =A0 =A0__asm __volatile ("tmcr wcid, %0" :: "r"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 1: > + =A0 =A0 =A0__asm __volatile ("tmcr wcon, %0" :: "r"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 2: > + =A0 =A0 =A0__asm __volatile ("tmcr wcssf, %0" :: "r"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 3: > + =A0 =A0 =A0__asm __volatile ("tmcr wcasf, %0" :: "r"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 8: > + =A0 =A0 =A0__builtin_arm_setwcgr0 (__value); > + =A0 =A0 =A0break; > + =A0 =A0case 9: > + =A0 =A0 =A0__builtin_arm_setwcgr1 (__value); > + =A0 =A0 =A0break; > + =A0 =A0case 10: > + =A0 =A0 =A0__builtin_arm_setwcgr2 (__value); > + =A0 =A0 =A0break; > + =A0 =A0case 11: > + =A0 =A0 =A0__builtin_arm_setwcgr3 (__value); > + =A0 =A0 =A0break; > + =A0 =A0default: > + =A0 =A0 =A0break; > =A0 =A0 } > =A0} > > =A0static __inline int > =A0_mm_getwcx (const int __regno) > =A0{ > + =A0int __value; > =A0 switch (__regno) > =A0 =A0 { > - =A0 =A0case 0: =A0return __builtin_arm_getwcx (0); > - =A0 =A0case 1: =A0return __builtin_arm_getwcx (1); > - =A0 =A0case 2: =A0return __builtin_arm_getwcx (2); > - =A0 =A0case 3: =A0return __builtin_arm_getwcx (3); > - =A0 =A0case 8: =A0return __builtin_arm_getwcx (8); > - =A0 =A0case 9: =A0return __builtin_arm_getwcx (9); > - =A0 =A0case 10: return __builtin_arm_getwcx (10); > - =A0 =A0case 11: return __builtin_arm_getwcx (11); > - =A0 =A0default: return 0; > + =A0 =A0case 0: > + =A0 =A0 =A0__asm __volatile ("tmrc %0, wcid" : "=3Dr"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 1: > + =A0 =A0 =A0__asm __volatile ("tmrc %0, wcon" : "=3Dr"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 2: > + =A0 =A0 =A0__asm __volatile ("tmrc %0, wcssf" : "=3Dr"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 3: > + =A0 =A0 =A0__asm __volatile ("tmrc %0, wcasf" : "=3Dr"(__value)); > + =A0 =A0 =A0break; > + =A0 =A0case 8: > + =A0 =A0 =A0return __builtin_arm_getwcgr0 (); > + =A0 =A0case 9: > + =A0 =A0 =A0return __builtin_arm_getwcgr1 (); > + =A0 =A0case 10: > + =A0 =A0 =A0return __builtin_arm_getwcgr2 (); > + =A0 =A0case 11: > + =A0 =A0 =A0return __builtin_arm_getwcgr3 (); > + =A0 =A0default: > + =A0 =A0 =A0break; > =A0 =A0 } > + =A0return __value; > =A0} > > =A0/* Creates a vector of two 32-bit values; I0 is least significant. =A0= */ > =A0static __inline __m64 > =A0_mm_set_pi32 (int __i1, int __i0) > =A0{ > - =A0union { > + =A0union > + =A0{ > =A0 =A0 __m64 __q; > - =A0 =A0struct { > + =A0 =A0struct > + =A0 =A0{ > =A0 =A0 =A0 unsigned int __i0; > =A0 =A0 =A0 unsigned int __i1; > =A0 =A0 } __s; > @@ -1041,7 +1099,7 @@ _mm_set_pi16 (short __w3, short __w2, short __w1, s= hort __w0) > =A0 unsigned int __i1 =3D (unsigned short)__w3 << 16 | (unsigned short)__= w2; > =A0 unsigned int __i0 =3D (unsigned short)__w1 << 16 | (unsigned short)__= w0; > =A0 return _mm_set_pi32 (__i1, __i0); > - > + Extra newline again here. > =A0} > > =A0/* Creates a vector of eight 8-bit values; B0 is least significant. = =A0*/ > @@ -1108,11 +1166,526 @@ _mm_set1_pi8 (char __b) > =A0 return _mm_set1_pi32 (__i); > =A0} > > -/* Convert an integer to a __m64 object. =A0*/ > +#ifdef __IWMMXT2__ > +static __inline __m64 > +_mm_abs_pi8 (__m64 m1) > +{ > + =A0return (__m64) __builtin_arm_wabsb ((__v8qi)m1); > +} > + > +static __inline __m64 > +_mm_abs_pi16 (__m64 m1) > +{ > + =A0return (__m64) __builtin_arm_wabsh ((__v4hi)m1); > + And here. > +} > + > +static __inline __m64 > +_mm_abs_pi32 (__m64 m1) > +{ > + =A0return (__m64) __builtin_arm_wabsw ((__v2si)m1); > + and here. > + > +#define _mm_qmiabb_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiabb ((__v2si)_acc, (__v4hi)_m1, = (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_qmiabbn_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiabbn ((__v2si)_acc, (__v4hi)_m1,= (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_qmiabt_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiabt ((__v2si)_acc, (__v4hi)_m1, = (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_qmiabtn_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc=3Dacc;\ > + =A0 __m64 _m1=3Dm1;\ > + =A0 __m64 _m2=3Dm2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiabtn ((__v2si)_acc, (__v4hi)_m1,= (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_qmiatb_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiatb ((__v2si)_acc, (__v4hi)_m1, = (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_qmiatbn_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiatbn ((__v2si)_acc, (__v4hi)_m1,= (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_qmiatt_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiatt ((__v2si)_acc, (__v4hi)_m1, = (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_qmiattn_pi32(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wqmiattn ((__v2si)_acc, (__v4hi)_m1,= (__v4hi)_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiabb_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiabb (_acc, (__v4hi)_m1, (__v4hi)_= m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiabbn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiabbn (_acc, (__v4hi)_m1, (__v4hi)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiabt_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiabt (_acc, (__v4hi)_m1, (__v4hi)_= m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiabtn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiabtn (_acc, (__v4hi)_m1, (__v4hi)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiatb_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiatb (_acc, (__v4hi)_m1, (__v4hi)_= m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiatbn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiatbn (_acc, (__v4hi)_m1, (__v4hi)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiatt_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiatt (_acc, (__v4hi)_m1, (__v4hi)_= m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiattn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiattn (_acc, (__v4hi)_m1, (__v4hi)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawbb_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawbb (_acc, (__v2si)_m1, (__v2si)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawbbn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawbbn (_acc, (__v2si)_m1, (__v2si= )_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawbt_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawbt (_acc, (__v2si)_m1, (__v2si)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawbtn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawbtn (_acc, (__v2si)_m1, (__v2si= )_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawtb_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawtb (_acc, (__v2si)_m1, (__v2si)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawtbn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawtbn (_acc, (__v2si)_m1, (__v2si= )_m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawtt_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawtt (_acc, (__v2si)_m1, (__v2si)= _m2);\ > + =A0 _acc;\ > + =A0 }) > + > +#define _mm_wmiawttn_si64(acc, m1, m2) \ > + =A0({\ > + =A0 __m64 _acc =3D acc;\ > + =A0 __m64 _m1 =3D m1;\ > + =A0 __m64 _m2 =3D m2;\ > + =A0 _acc =3D (__m64) __builtin_arm_wmiawttn (_acc, (__v2si)_m1, (__v2si= )_m2);\ > + =A0 _acc;\ > + =A0 }) I assume someone knows why these are macros and not inline functions like the others ? > + > +/* The third arguments should be an immediate. =A0*/ s/arguments/argument > +#define _mm_merge_si64(a, b, n) \ > + =A0({\ > + =A0 __m64 result;\ > + =A0 result =3D (__m64) __builtin_arm_wmerge ((__m64) (a), (__m64) (b), = (n));\ > + =A0 result;\ > + =A0 }) > +#endif =A0/* __IWMMXT2__ */ > +