From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2038) id 9EB37385840B; Mon, 29 Nov 2021 15:51:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9EB37385840B MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Paul Clarke To: gcc-cvs@gcc.gnu.org Subject: [gcc r12-5580] rs6000: Add Power10 optimization for most _mm_movemask* X-Act-Checkin: gcc X-Git-Author: Paul A. Clarke X-Git-Refname: refs/heads/master X-Git-Oldrev: e2194a8b39251497d770abf3fb6ee06de6072ed9 X-Git-Newrev: 85289ba36c2e62de84cc0232c954d9a74bda708a Message-Id: <20211129155148.9EB37385840B@sourceware.org> Date: Mon, 29 Nov 2021 15:51:48 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Nov 2021 15:51:48 -0000 https://gcc.gnu.org/g:85289ba36c2e62de84cc0232c954d9a74bda708a commit r12-5580-g85289ba36c2e62de84cc0232c954d9a74bda708a Author: Paul A. Clarke Date: Thu Oct 21 11:21:01 2021 -0500 rs6000: Add Power10 optimization for most _mm_movemask* Power10 ISA added `vextract*` instructions which are realized in the `vec_extractm` instrinsic. Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`. 2021-11-29 Paul A. Clarke gcc * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm when _ARCH_PWR10. * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise. (_mm_movemask_epi8): Likewise. Diff: --- gcc/config/rs6000/emmintrin.h | 8 ++++++++ gcc/config/rs6000/xmmintrin.h | 4 ++++ 2 files changed, 12 insertions(+) diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h index 4125b12fa09..c4758be0e77 100644 --- a/gcc/config/rs6000/emmintrin.h +++ b/gcc/config/rs6000/emmintrin.h @@ -1233,6 +1233,9 @@ _mm_loadl_pd (__m128d __A, double const *__B) extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movemask_pd (__m128d __A) { +#ifdef _ARCH_PWR10 + return vec_extractm ((__v2du) __A); +#else __vector unsigned long long result; static const __vector unsigned int perm_mask = { @@ -1252,6 +1255,7 @@ _mm_movemask_pd (__m128d __A) #else return result[0]; #endif +#endif /* !_ARCH_PWR10 */ } #endif /* _ARCH_PWR8 */ @@ -2030,6 +2034,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B) extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movemask_epi8 (__m128i __A) { +#ifdef _ARCH_PWR10 + return vec_extractm ((__v16qu) __A); +#else __vector unsigned long long result; static const __vector unsigned char perm_mask = { @@ -2046,6 +2053,7 @@ _mm_movemask_epi8 (__m128i __A) #else return result[0]; #endif +#endif /* !_ARCH_PWR10 */ } #endif /* _ARCH_PWR8 */ diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h index ae1a33e8d95..4c093fd1d5a 100644 --- a/gcc/config/rs6000/xmmintrin.h +++ b/gcc/config/rs6000/xmmintrin.h @@ -1352,6 +1352,9 @@ _mm_storel_pi (__m64 *__P, __m128 __A) extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movemask_ps (__m128 __A) { +#ifdef _ARCH_PWR10 + return vec_extractm ((vector unsigned int) __A); +#else __vector unsigned long long result; static const __vector unsigned int perm_mask = { @@ -1371,6 +1374,7 @@ _mm_movemask_ps (__m128 __A) #else return result[0]; #endif +#endif /* !_ARCH_PWR10 */ } #endif /* _ARCH_PWR8 */